> OK, can somebody tell me in non-techie English how cookies are put on one's
> machine? If I look at source code, what am I looking for?
One of the best things any web developer can do is spend some
time watching HTTP requests and responses between browser and server.
Doing this while troublehsooting a web application is particularly
useful. You learn a lot about the protocol and about how browsers
and servers interact.
Essentially, HTTP traffic looks a lot like raw email, i.e. text,
lines, and name/value pairs defined as "name: value\n".
There's a fairly useful article on this here:
http://hotwired.lycos.com/webmonkey/99/36/index3a.html?tw=backend
This article shows you the basic mechanisms and tools to use to
find this information, but it doesn't actually explain HTTP itself,
so...
In general, each HTTP request or response consists of:
- an HTTP response code line (like HTTP/1.1 200 OK)
- a header section
- a blank line to separate the header and body
- a body section.
The header lines generally pass back data that's meant for the
browser itself to use, while the body is meant to be displayed to the
user.
Below is the first example from that article. The first line is
the author running the lynx webbrowser with a command-line option to
have it show the raw response, instead of displaying it as a web page.
Notice the sixth, ninth and tenth lines, which are Set-Cookie header
lines.
[jay@host jay]$ lynx -mime_header www.hotbot.com
HTTP/1.1 200 OK
Server: Microsoft-IIS/4.0
Date: Thu, 26 Aug 1999 11:48:51 GMT
Set-Cookie: p_uniqid=3UcKfe5j2EQOz1nvoB; expires=Fri, 21-Dec-2012 08:00:00 GMT; domain=.hotbot.com; path=/
Connection: Keep-Alive
Content-Type: text/html
Set-Cookie: HB%5FSESSION=RH=4d7b557f6700175e0e644f3a7f2e685e43540d04; path=/
Set-Cookie: ASPSESSIONIDQQGQQRTM=ODODHAHCNPBNIELPCECBGLBA; path=/
Cache-control: private
Content-Length: 3173
<HTML>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
+"http://www.w3.org/TR/REC-html40/loose.dtd">
<HEAD><TITLE>HotBot</TITLE></HEAD>
<body>
<FORM ACTION="/text/default.asp">
<font face=Arial,Geneva,sans-serif><b>HotBot: Text-only version</b> |
<a href="/">Full graphics version</a>
<p>
<b>look for:</b> <SELECT name="SM" >
<OPTION value="MC" selected>all the words
<OPTION value="SC">any of the words
...etc.
Most headers only have an immediate, temporary effect, usually on
how the browser handles the data that's being returned in the body.
However, set-cookie headers have longer-term effects. They tell the
browser to remember that value, and to append that value in the
headers of each request the browser makes to the domain and path the
cookie came from.
Before I get into the anatomy of a cookie, I should point out
where this cookie header comes from. Back in the dawn of web time,
all such cookies were set by CGI scripts. The web server passed a
request off to a CGI script, the CGI script crafted the body response,
but could first print out additional HTTP headers. These days, there
are more ways to add programming behind a given web page than you can
shake a pointy stick at. Just about every conceivable way of adding
any complexity to a website - cgi scripts, embedded perl, server-side
includes, php and asp and JSP pages, java servlets, all of them - have
some way to set headers, and especially some way to set cookie
headers.
Now let's break down the first set-cookie header from the example
above:
Set-Cookie:
p_uniqid=3UcKfe5j2EQOz1nvoB;
expires=Fri, 21-Dec-2012 08:00:00 GMT;
domain=.hotbot.com;
path=/
p_uniqid is the payload of the cookie, a variable named p_uniqid
and a value, '3UcKfe5j2EQOz1nvoB'.
The expires variable controls how long the cookie sticks around.
The default is, "as long as the browser window is kept running". In theory,
cookies were only supposed to be kept around for a limited amount of time,
to avoid unnecessarily cluttering up your hard drive, but these days, as
you can see in this example, most sites just set their cookies to expire
many years in the future.
I remember when I first started learning about cookies (and I
hadn't yet learned to sniff the HTTP requests and responses). I
assumed that the browser and web server would have some sort of
challenge-and-response mechanism, some way for the server to
deliberately ask the browser for the cookie data. I wasted a bit of
time and energy trying to figure that out :-).
It doesn't work that way, however. The browser simply includes a
copy of the cookie data with every single request it makes to the site
where it got the cookie to begin with (this can be fine-tuned with the
domain and path variables, see below). Each response from that domain
can contain a new version of the cookie, which the browser simply
copies over the old version that it already has.
The domain and path variables control when the browser includes
the cookie in the requests. The root domain has to match the domain
where the cookie came from, but as you can see in this example, where
the domain is ".hotbot.com", it can be a bit trickier than that. The
browser will include this cookie in any request to any domain that
starts with ".hotbot.com". Since bar.foo.com and baz.foo.com don't
actually have to live on the same webserver, machine, or even network,
web applications can actually use this to communicate among different
subdomains.
There are other variables, of course, but there's plenty of docs
out there that describe them, and that wasn't the point of this essay
anyway :-).
Steven J. Owens