by Steven J. Owens (unless otherwise attributed)
>OK, can somebody tell me in non-techie English how cookies are put on one's
>machine? If I look at source code, what am I looking for?
One of the best things any web developer can do is spend some time watching HTTP requests and responses between browser and server. Doing this while troublehsooting a web application is particularly useful. You learn a lot about the protocol and about how browsers and servers interact.
Essentially, HTTP traffic looks a lot like raw email, i.e. text, lines, and name/value pairs defined as "name: value\n".
There's a fairly useful article on this here:
http://hotwired.lycos.com/webmonkey/99/36/index3a.html?tw=backend
This article shows you the basic mechanisms and tools to use to find this information, but it doesn't actually explain HTTP itself, so...
In general, each HTTP request or response consists of:
The header lines generally pass back data that's meant for the browser itself to use, while the body is meant to be displayed to the user.
Below is the first example from that article. The first line is the author running the lynx webbrowser with a command-line option to have it show the raw response, instead of displaying it as a web page. Notice the sixth, ninth and tenth lines, which are Set-Cookie header lines.
[jay@host jay]$ lynx -mime_header www.hotbot.comHTTP/1.1 200 OK Server: Microsoft-IIS/4.0 Date: Thu, 26 Aug 1999 11:48:51 GMT Set-Cookie: p_uniqid=3UcKfe5j2EQOz1nvoB; expires=Fri, 21-Dec-2012 08:00:00 GMT; domain=.hotbot.com; path=/ Connection: Keep-Alive Content-Type: text/html Set-Cookie: HB%5FSESSION=RH=4d7b557f6700175e0e644f3a7f2e685e43540d04; path=/ Set-Cookie: ASPSESSIONIDQQGQQRTM=ODODHAHCNPBNIELPCECBGLBA; path=/ Cache-control: private Content-Length: 3173
<HTML> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" +"http://www.w3.org/TR/REC-html40/loose.dtd"> <HEAD><TITLE>HotBot</TITLE></HEAD> <body> <FORM ACTION="/text/default.asp"> <font face=Arial,Geneva,sans-serif><b>HotBot: Text-only version</b> | <a href="/">Full graphics version</a> <p> <b>look for:</b> <SELECT name="SM" > <OPTION value="MC" selected>all the words <OPTION value="SC">any of the words ...etc.
Most headers only have an immediate, temporary effect, usually on how the browser handles the data that's being returned in the body. However, set-cookie headers have longer-term effects. They tell the browser to remember that value, and to append that value in the headers of each request the browser makes to the domain and path the cookie came from.
Before I get into the anatomy of a cookie, I should point out where this cookie header comes from. Back in the dawn of web time, all such cookies were set by CGI scripts. The web server passed a request off to a CGI script, the CGI script crafted the body response, but could first print out additional HTTP headers. These days, there are more ways to add programming behind a given web page than you can shake a pointy stick at. Just about every conceivable way of adding any complexity to a website - cgi scripts, embedded perl, server-side includes, php and asp and JSP pages, java servlets, all of them - have some way to set headers, and especially some way to set cookie headers.
Now let's break down the first set-cookie header from the example above:
Set-Cookie: p_uniqid=3UcKfe5j2EQOz1nvoB; expires=Fri, 21-Dec-2012 08:00:00 GMT; domain=.hotbot.com; path=/
p_uniqid is the payload of the cookie, a variable named p_uniqid and a value, '3UcKfe5j2EQOz1nvoB'.
The expires variable controls how long the cookie sticks around. The default is, "as long as the browser window is kept running". In theory, cookies were only supposed to be kept around for a limited amount of time, to avoid unnecessarily cluttering up your hard drive, but these days, as you can see in this example, most sites just set their cookies to expire many years in the future.
I remember when I first started learning about cookies (and I hadn't yet learned to sniff the HTTP requests and responses). I assumed that the browser and web server would have some sort of challenge-and-response mechanism, some way for the server to deliberately ask the browser for the cookie data. I wasted a bit of time and energy trying to figure that out :-).
It doesn't work that way, however. The browser simply includes a copy of the cookie data with every single request it makes to the site where it got the cookie to begin with (this can be fine-tuned with the domain and path variables, see below). Each response from that domain can contain a new version of the cookie, which the browser simply copies over the old version that it already has.
The domain and path variables control when the browser includes the cookie in the requests. The root domain has to match the domain where the cookie came from, but as you can see in this example, where the domain is ".hotbot.com", it can be a bit trickier than that. The browser will include this cookie in any request to any domain that starts with ".hotbot.com". Since bar.foo.com and baz.foo.com don't actually have to live on the same webserver, machine, or even network, web applications can actually use this to communicate among different subdomains.
There are other variables, of course, but there's plenty of docs out there that describe them, and that wasn't the point of this essay anyway :-).
Steven J. Owens