Details
In an article on Internet privacy in a large, respected
newspaper, they defined cookies essentially as this:
You can see in the directory that each of these files is a simple, normal text file. You can see which web site placed the file on your machine by looking at the file name (the information is also stored inside the file). You can open each file up by clicking on it.
For example, I have visited goto.com, and the site has placed a cookie on my machine. The cookie file for goto.com contains the following information:
UserID A9A3BECE0563982D www.goto.com/What goto.com has done is stored on my machine a single name-value pair. The name of the pair is UserID, and the value is A9A3BECE0563982D. The first time I visited goto.com, the site assigned me a unique ID value and stored it on my machine. [Note that there probably are several other values stored in the file after the three shown above. That is housekeeping information for the browser.]
Amazon.com stores a bit more information on my machine. When I look at the cookie file Amazon has created on my machine, it contains the following:
session-id-time 954242000 amazon.com/ session-id 002-4135256-7625846 amazon.com/ x-main eKQIfwnxuF7qtmX52x6VWAXh@Ih6Uo5H amazon.com/ ubid-main 077-9263437-9645324 amazon.com/It appears that Amazon stores a main user ID, an ID for each session, and the time the session started on my machine (as well as an x-main value, which could be anything).
The vast majority of sites store just one piece of information -- a user ID -- on your machine. But there really is no limit -- a site can store as many name-value pairs as it likes.
A name-value pair is simply a named piece of data. It is not a program, and it cannot "do" anything. A web site can retrieve only the information that it has placed on your machine. It cannot retrieve information from other cookie files, nor any other information from your machine.
How Does Cookie Data Move?
As you just read, cookie data is
simply name-value pairs stored on your hard disk by a Web site. That is all that
cookie data is. The Web site can store the data, and later it receives it back.
A Web site can only receive the data it has stored on your machine. It cannot
look at any other cookie, nor can it look at anything else on your machine.
The data moves in the following manner.:
If you type the URL of a Web site into your browser, your browser sends a request to the Web site for the page. For example, if you type the URL http://www.amazon.com into your browser, your browser will contact Amazon's server and request its home page.
When the browser does this, it will look on your machine for a cookie file that Amazon has set. If it finds an Amazon cookie file, your browser will send all of the name-value pairs in the file to Amazon's server along with the URL. If it finds no cookie file, it will send no cookie data.
Amazon's Web server receives the cookie data and the request for a page. If name-value pairs are received, Amazon can use them.
If no name-value pairs are received, Amazon knows that you have not visited before. The server creates a new ID for you in Amazon's database and then sends name-value pairs to your machine in the header for the Web page it sends. Your machine stores the name-value pairs on your hard disk.
The Web server can change name-value pairs or add new pairs whenever you visit the site and request a page.
There are other pieces of information that the server can send with the name-value pair. One of these is an expiration date. Another is a path (so that the site can associate different cookie values with different parts of the site).
You have control over this process. You can set an option in your browser so that the browser informs you every time a site sends name-value pairs to you. You can then accept or deny the values.
How Do Web Sites Use Cookies?
Cookies evolved because they
solve a big problem for the people who implement web sites. In the broadest
sense, a cookie allows a site to store state information on your machine. This
information lets a web site remember what state your browser is in. An ID is one
simple piece of state information -- if an ID exists on your machine, the site
knows that you have visited before. The state is, "Your browser has visited the
site at least one time", and the site knows your ID from that visit.
Web sites use cookies in many different ways. Here are some of the most common examples:
WEAT CC=RI%5FKingston.ION= www.msn.com/Since I live in Kingston,RI, this makes sense.
Most sites seem to store preferences like this in the site's database and store nothing but an ID as a cookie, but storing the actual values in name-value pairs is another way to do it (we'll discuss why this approach has lost favor below).
E-commerce Sites can implement things like shopping carts and "quick checkout" options. The cookie contains an ID and lets the site keep track of you as you add different things to your cart. Each item you add to your shopping cart is stored in the site's database along with your ID value. When you check out, the site knows what is in your cart by retrieving all of your selections from the database. It would be impossible to implement a convenient shopping mechanism without cookies or something like it.
In all of these examples, note that what the database is able to store is things you have selected from the site, pages you have viewed from the site, information you give to the site in online forms, etc. All of the information is stored in the site's database, and a cookie containing your unique ID is all that is stored on your computer in most cases.
Problems with Cookies
Cookies are not a perfect state
mechanism, but they certainly make a lot of things possible that would be
impossible otherwise. Here are several of the things that make cookies
imperfect.
People often share machines -- Any machine that is used in a public area, and many machines used in an office environment or at home, are shared by multiple people. Let's say that you use a public machine (in a library, for example) to purchase something from an on-line store. The store will leave a cookie on the machine, and someone could later try to purchase something from the store using your account. Stores usually post large warnings about this problem, and that is why. Even so, mistakes can happen. For example, I had once used my wife's machine to purchase something from Amazon. Later, she visited Amazon and clicked the "one-click" button, not realizing that it really does allow the purchase of a book in exactly one click.
On something like a Windows NT machine or a UNIX machine that uses accounts properly, this is not a problem. The accounts separate all of the users' cookies. Accounts are much more relaxed in other operating systems, and it is a problem.
Cookies get erased -- If you have a problem with your browser and call tech support, probably the first thing that tech support will ask you to do is to erase all of the temporary Internet files on your machine. When you do that you lose all of your cookie files. Now when you visit a site again, that site will think you are a new user and assign you a new cookie. This tends to skew the site's record of new versus return visitors, and it also can make it hard for you to recover previously stored preferences. This is why sites ask you to register in some cases -- if you register with a user name and a password, you can re-login even if you lose your cookie file and restore your preferences. If preference values are stored directly on the machine (as in the MSN weather example above), then recovery is impossible. That is why many sites now store all user information in a central database and store only an ID value on the user's machine.
Multiple machines - People often use more than one machine during the day. For example, I have a machine in the office, a machine at home and a laptop for the road. Unless the site is specifically engineered to solve the problem, I will have three unique cookie files on all three machines. Any site that I visit from all three machines will track me as three separate users. It can be annoying to set preferences three times. Again, a site that allows registration and stores preferences centrally may make it easy for me to have the same account on three machines, but the site developers must plan for this when designing the site.
There are probably not any easy solutions to these problems, short of asking users to register and storing everything in a central database.
Why the Fury around Cookies?
If you have read the article to
this point, you may be wondering why there has been such an uproar in the
media about cookies and Internet privacy. You have seen in this article that
cookies are benign text files, and you have also seen that they provide lots
of useful capabilities on the web.
There are two things that have caused the strong reaction around cookies:
The first is something that has plagued consumers for decades but is now getting out of hand. Let's say that you purchase something from a traditional mail order catalog. The catalog company has your name, address and phone number from your order, and it also knows what items you have purchased. It can sell your information to others who might want to sell similar products to you. That is the fuel that makes telemarketing and junk mail possible. On a web site, the site can track not only your purchases, but also the pages that you read, the ads that you click on, etc. If you then purchase something and enter your name and address, the site potentially knows much more about you than a traditional mail order company does. This makes targeting much more precise, and that makes a lot of people uncomfortable.
Different sites have different policies. Most reputable sites have a strict privacy policy about not selling or sharing any personal information about our readers with any third party except in cases where you specifically tell them to do so (for example, in an opt-in email program). They do aggregate information together and distribute it.
The second is new. There are certain infrastructure providers that can actually create cookies that are visible on multiple sites. DoubleClick is the most famous example of this. Many companies use DoubleClick to serve ad banners on their sites. DoubleClick can place small (1x1 pixels) GIF files on the site that allow DoubleClick to load cookies on your machine. DoubleClick can then track your movements across multiple sites. It can potentially see the search strings that you type into search engines (due more to the way some search engines implement their systems, not because anything sinister is intended). Because it can gather so much information about you from multiple sites, DoubleClick can form very rich profiles. These are still anonymous, but they are rich. DoubleClick then went one step further. By acquiring a company, DoubleClick threatened to link these rich anonymous profiles back to name and address information -- it threatened to personalize them, and then sell the data. That began to look very much like spying to most people, and that is what caused the uproar.
DoubleClick and companies like it are in a unique position to do this sort of thing, because they serve ads on so many sites. Cross-site profiling is not a capability available to individual sites, because cookies are site specific.