Do Not Track means Do Not Track

I’ve been giving some thought to proposed “Do Not Track” legislation. The proposals, currently being considered by the FTC and the legislature, seek to protect user privacy by empowering us to tell online services not to track us in a way that has teeth. The adopted approach would express some way for users to communicate their preference not to be tracked, and oblige service providers to honour that instruction.

Do Not Track

CC-NC by Peter_Schauer on Flickr

Although the name evokes the FTC’s Do Not Call list, the appropriate implementation would be somewhat different. It is difficult or impossible to implement a list like Do Not Call, since there is no fundamental, persistent online identifier like a phone number. The best candidate – IP addresses – change frequently, and are often shared between several users. There have been various suggestions, but a commonly accepted approach is the x-do-not-track HTTP header. Without too much detail: when a browser accesses a website, it sends certain headers, letting the site know what language it wants, what sort of encoding to use, and so on. x-do-not-track would just be another optional header that some browsers communicate, indicating a binding request not to be tracked.

This is actually a pretty robust approach to this problem, though there remain a few unanswered concerns. Other commentators like Harlan Yu at Freedom to Tinker, and Arvind Narayanan at 33bits have suggested that this would result in a two-tiered web. That is: some services would refuse to provide users with content unless they disable x-do-not-track. I don’t find this to be the most compelling of possible concerns, since it can be solved legislatively with a provision like:

It shall be an offence under this act to refuse service on the basis of the instruction not to track. Any service, or part thereof, which can be provided to an untracked user, and is provided to trackable users, must be provided to an untracked user on the same terms as it is provided to trackable users.

I see a greater issue in the provision itself, that is: Do Not Track. My concern is reminiscent of Justice Black’s famous statement that “‘no law’ means no law“. If there are some users which one cannot track, then one cannot keep any meaningful record of their use of the service. That means no accurate count of how many users access a service, nor even an estimation of what fraction of users request not to be tracked. For non-interactive content sites, this presents something of a concern. The New York Times, for instance does not need to track users in order to show them articles. How then, should the Grey Lady, bill its advertisers, since it can certainly no longer user the number of impressions?

The above paragraph does make one slight assumption. Although it may not be possible to determine what fraction of a site’s visitors are untrackable through automated means, it is still possible to get this information other ways – such as by asking nicely. It only takes one daring social scientist or market research firm to survey users, in order to produce reliable data about various demographics’ use of x-do-not-track. Then it just takes a little statistical analysis for a service to infer its untrackable users on the basis of its tracked population.

This actually has the potential to be good news for such services. If users can now use their sites confident in their anonymity, they are less likely to block their number one source of tracking: advertising. Surely a world where the privacy-conscious see and click on ads is better than their current habit of disabling them altogether?

Advertisements

About flamsmark

I do privacy at Mozilla. Years of security have left me incurably paranoid. Tech, policy, security, privacy, & anonymity are good. Open is better. GPG: 80AF07D3
This entry was posted in Commentary, Essays, Responses and tagged , , , , , , , , . Bookmark the permalink.

2 Responses to Do Not Track means Do Not Track

  1. Sajid says:

    Disclaimer: Since I mention Google several times in this comment (and it is public), I figure I’ll just preface this with the fact that (a) I’m writing this from my personal viewpoint, and it doesn’t represent the view of Google, and (b) none of this reflects any internal knowledge of Google I have. I actually know depressingly little about how Google handles a lot of the things I talk about (so much to learn and I’m only 2 months in!)

    I really don’t follow you on this one Tom. I think there are a lot of impracticalities you’re brushing over.

    I’m confused as to what kind of behavior you expect from a web server receiving an HTTP request with the X-Do-Not-Track header. I’ve always thought of it as signal that means “Do not actively connect this request with any other request I make.” As in: “don’t serve me a particular ad as a response to request X because I visited page Y a few hours ago.”

    However, it seems like you have a stricter definition of the header, something along the lines of: “don’t make any record of a request containing the X-Do-Not-Track header,” since you say that sites would not be able to directly measure the number untrackable users they have. If this is the definition you propose, I think this header is impossible to have in the modern web.

    For example, some web servers might log some information before they even fully parse HTTP headers. After all, IP address information is all the way down in L3 (handled by the OS), while HTTP headers are at the top of the OSI stack at L7. Are you OK with this bill outlawing all web server implementations that have a particular architecture?

    You could make the argument that not logging IP addresses is the right thing to do, since after-all, some amount of tracking could be done by correlating requests across IP addresses (which, even if dynamic, can persist for months at a time). But, if you can’t log IP addresses, how to you expect applications to load-balance across geographically distributed data centers? If Google didn’t know that you generally access Gmail from Eastern US IP addresses, your inbox could end up in Japan and be awfully slow.

    Just use my IP to get me to the right data center, and throw it away, you say? Well then, it’d really suck for you if you opened your Gmail account while visiting Russia, and then went back to the US. If Google didn’t log your IP address, it’d never figure out that it needs move your inbox to a US data center, and every inbox-related action you took would be routed through Siberia.

    Even supposing you did mean the less-restrictive definition of “don’t let the response to request X depend on request Y if Y contained the do-not-track header, for all requests X,” there are still tons of problems. It seems like the provision you write solves one problem (the two-tiered) web, and causes another one that’s orders of magnitude more severe.

    Consider this (incomplete) list of features that are impossible to provide to a “do not track” user under this definition:

    + Amazon’s “products you might like” feature
    + Amazon’s “selling” feature (shopping carts are tracked use unique identifiers. Sure, a complete re-write of their e-commerce platform might allow them to sell things using a system that keeps all state on the browser and sends it with each request, but this is (a) enormously impractical for an operation the size of Amazon (b) much more vulnerable to exploits)
    + Hulu’s “pick up where you left off in this video” feature.

    For every web application on the web that does anything more than serve up static pages, you’ve just doubled the size of their test suite, since they have to now make sure that everything works with and without the header (and in many cases, it could be worse than double: for various sequences of requests with/without the header). It’d really suck if Hulu’s whole video player crashed because the module that tried to figure out if it should try to pick up where you left off was buggy and mishandled the do-not-track header.

    And, note that because of your proposed provision, *every* site has to do this extra work, whether or not they have a single user that uses this new header.

    Oh and in response to your last comment, the added revenue from privacy-concious users who disable AdBlock (a tiny fraction of AdBlock users, mind you — most disable Ads because they either (a) annoy them (b) slow down browsing (c) expose them to malware) is dwarfed by the increased click-through-rate (and therefore revenue) generated by the increased relevance behavioral tracking brings. (It is, after all, one of the main reasons DoubleClick was worth 3.1 billion to Google).

  2. Raj says:

    First let me begin by saying I’m far from an expert in this field, I only have my own common sense to apply, or what I assume to be common sense.

    Sajid made a point I was wondering about myself. How would a host treat the request if accompanied with the DNT header. I am inclined to go with the expected meaning of not storing anything related to my browsing habit within the site/network and which ads I’ve seen, beyond the scope of my current visit. And if anything needs to be stored to guarantee functionality during the visit, I prefer it to be stored on my end of the line, either in a cookie or whatever other storage mechanisms exists. And again, only for the duration of my visit.

    The reason I am thinking this way is that this information does not nesseacarily need to be stored server side to be of the same use, much if not all could also be stored on the client side. “People who bought” type of information can also be inferred from analyzing sales history to see which articles got sold most in the same sale in conjunction with the currently selected article. In the case of things such as audio and video and games its even easier…you only need look at the genre the current article falls in to make a suggestion that might be closer to home than by looking at “Who bought what”. And like flamsmark said, if you know through research the number of people who would actively use the DNT versus those who do not care/know.

    Likewise, GMail does not need to know my exact IP in order to perform load balancing or to determine the most logical data center to host my inbox. It only needs to look back a couple of hops to determine from which continent I hail or even from which country. Just by looking at the route my request took to the data center I am currently using. It does not take that many hops to figure I’m no longer in Kansas…

    Personally speaking, as long as the information required to give me a pleasant browsing experience, gets stored on my own box instead of in some statistical goldmine, I would have a lot less issues with a site using this info for the duration of the visit.

    In short, I expect it to be treated as a “do not permanently store information related to my visit and person, but as long as I am here, you may use what I tell you”. As long as the site/network forgets I ever existed the moment I step outside.

    I do believe that a “DNC” equivalent for the internet is coming. If not voluntary then by law, and even if started voluntary, it might still end up as law just like is already happening with DNC registration in the country I live in.

    In the end I still am left wondering if it will really change something. It may change the face of the more ‘direct’ tracking mechanisms, but what about the indirect methods? Lately I am seeing social network sites being represented on more and more sites by some type of “Share” button, which in a whole lot of cases is retrieving external resources from these same networks. Link that with the fact most folks don’t seem to have a helluva lot of objections to shouting out to the world what they just did/bought/saw/liked and whatever other bit of private info you can think of.

    These days you can almost not open a page and not encounter a Share on Facebook/Twitter button.
    I can imagine that this provides facebook with an invaluable amount of information related to browsing habits…and we all know privacy policies aren’t set in stone…

What're your thoughts?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s