Docstoc

User Privacy and the Evolution of Third-party Tracking Mechanisms

Document Sample
User Privacy and the Evolution of Third-party Tracking Mechanisms Powered By Docstoc
					User Privacy and the Evolution of Third-party Tracking
         Mechanisms on the World Wide Web

                      Sonal Mittal

                      May 18, 2010
                                          Abstract

Third-party tracking refers to tracking done by websites that a user never navigates to
explicitly. Many Internet users are vaguely aware that their information may be collected
online. However, data suggests there is relatively little knowledge about third-party tracking
and its associated privacy risks. The FoxTracks software tool attempts to address this lack of
knowledge about third-party online tracking for the benefit of interested users with varying
levels of technical knowledge. FoxTracks is a Firefox add-on program that browses the
web along with the user and collects information about three types of trackers that may
be monitoring the user: HTTP cookies, Local Shared Flash Objects, and DOM Storage
entries. The interface to FoxTracks displays the user’s information as it has been collected
by the trackers; the highly personalized view of third-party tracking is uniquely accessible
and informative for end-users. Beyond the development of FoxTracks, the analysis presented
in this thesis discusses the history, key players, and motivations of third-party tracking, and
how each influenced the design choices made in the software. In particular, the motivations
of third-party entities, who are frequently online advertisers, are examined in at length. A
computer security rubric is then applied to the behavior and tracking methodologies of third
parties in order to show their adversarial qualities in matters of user privacy.
Contents

1 Introduction                                                                             2

2 Third-Party Tracking in the Literature and Code                                          6

3 HTTP Cookies and Web bugs                                                               10
  3.1 The Introduction of State Management Mechanisms . . . . . . . . . . . . . .         10
  3.2 Advertisers and Personally Identifiable Information . . . . . . . . . . . . . .      12
  3.3 HTTP Cookies and Web Bugs in FoxTracks . . . . . . . . . . . . . . . . . .          14

4 Flash Local Shared Objects                                                              17
  4.1 A History of LSOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   17
  4.2 Corporations and Market Incentives . . . . . . . . . . . . . . . . . . . . . . .    18
  4.3 Flash Cookies in FoxTracks . . . . . . . . . . . . . . . . . . . . . . . . . . .    20

5 DOM Storage                                                                             25
  5.1 Web Storage in the W3C Standard . . . . . . . . . . . . . . . . . . . . . . .       25
  5.2 Case Study: Gmail Mobile Privacy . . . . . . . . . . . . . . . . . . . . . . .      27
  5.3 Community Approach to the Study of DOM Storage . . . . . . . . . . . . .            30

6 Results                                                                                 33
  6.1 The FoxTracks Implementation . . . . . . . . . . . . . . . . . . . . . . . . .      33
  6.2 Third Parties as Privacy Adversaries . . . . . . . . . . . . . . . . . . . . . .    35

7 Conclusions                                                                             38

Acknowledgements                                                                          41

Bibliography                                                                              43




                                             1
Chapter 1

Introduction

Protection of online privacy refers to freedom from unwanted interferences with an Internet
user’s digitally stored, personal data. This includes data residing on a user’s local computer,
data transmitted by a user to remote servers, and data that is generated in the process of
browsing websites: mouse strokes, searches, history, and other page inputs. Evolving Internet
protocols and standards have contributed to a strong emphasis on user self-management of
online privacy rather than legal or regulatory restrictions on how remote servers can collect
and use user information. Users’ general lack of knowledge about the transmission and uses
of personal data, combined with little privacy jurisdiction on the Internet, has left many
paths open for the aggregation and use of individuals’ web data without their given consent.
Many non-expert Internet users are vaguely aware that their information and data may be
collected online. However, surveys of users such as Internet-using college students suggest
that individuals know little about the pervasiveness of online tracking and the kinds of
personal information that can be recorded [9][3]. The surveys show there is particularly little
awareness about the data collection and activity logging undertaken by third-party websites.
Third-party websites are entities that a user never visits explicitly; they are juxtaposed with
first-party websites which correspond to URLs that a user enters in a browser address bar.
Because third-party tracking is undertaken surreptitiously by unfamiliar entities, it is less
transparent than tracking done by first-party websites. Users are able to consult the stated


                                              2
privacy policies of first-party websites in order to understand how their data and activity
will be monitored. However, without their identities, a user cannot examine the privacy
policies of third-party websites and opt out of third-party tracking. Personal information
collected in an opaque manner by remote entities reflects an especially serious privacy risk
since digital data can be copied and distributed to additional parties with electronic ease.
       To increase awareness about third-party tracking and survey common online tracking
methodologies, I created the FoxTracks software tool. FoxTracks is designed as a Firefox
web browser add-on program and was created using JavaScript and XUL utilities. The ul-
timate goal of the software is to educate regular Internet users about the different kinds of
tracking technologies employed by third parties and demonstrate the great extent to which
their activity and data is monitored by unfamiliar entities. To achieve this aim, I designed
FoxTracks to show how three different tracking technologies personally affect the user as she
browses the web. FoxTracks contains three tab panels, one for each of HTTP cookies, Local
Shared Objects, and DOM Storage. Each panel aims to show the user which third parties
are using the technology to track her online activities and what personal information they
have collected. Each panel also provides links to web pages answering potential questions
about the tracking technology, the types of user information at risk, and available opt-out
mechanisms for the technology. The web content associated with each panel was generated
from my research and reviewed by privacy experts at the Center for Democracy and Tech-
nology (CDT).1 The CDT also kindly hosts the FoxTracks web content on their servers.2
Beyond the FoxTracks development cycle, the analysis presented in this thesis discusses the
history, key players, and motivations of third-party tracking, and how each influenced the
design choices made in the software. In the analysis, I also explore the following research
   1
     CDT is a non-profit public interest organization working at the intersection of law, technology, and
policy. It is headquartered in Washington, D.C. More information can be found on their website:
http://www.cdt.org/
   2
     See http://www.cdt.org/foxtracks/. Because of CDT’s contribution, their logo appears on the FoxTracks
tab panels.



                                                    3
questions: Does the evolution of tracking technologies suggest an adversarial relationship
between users and third parties, as in the computer security paradigm? To what extent does
the Internet infrastructure (e.g., HTML standards) facilitate third-party tracking? Should
Internet users be comfortable with opaque information collection, and if not, what kinds of
responses are effective?
   For Internet users who are interested in learning more about online privacy and the risks
posed by third-party tracking, FoxTracks is an accessible, all-in-one resource. By showing
users the information that specific third parties have collected about them, FoxTracks demon-
strates privacy risks in a novel, highly personalized way. With a better understanding of
third-party tracking, users are able to make more informed decisions about how they browse
the web. As such, FoxTracks has the ability to synchronize user beliefs about digital privacy
with online behavior in a way that closes the information gap suggested by Internet user
surveys. For instance, FoxTracks adopters may adjust their browser settings and browsing
habits based on how they find their information is collected and used by third parties. At
a minimum, users who learn about third-party tracking will continue their current browsing
patterns in a more transparent environment—one in which third parties have the “informed
consent” of users to track their online activities. In this way, FoxTracks plays an important
public role in spreading information about third-party tracking to online content consumers.
The accompanying written analysis on the history and identity of third parties, and the
potential uses of collected user data also has significance for the public discourse on privacy.
Through my research, I find that third parties have developed increasingly advanced tech-
nologies to combat user efforts to restrict access to personal data. Additionally, they have
significant economic motives to track users and are heavily aided by new HTML standards.
Taken together, these findings suggest that third parties act like adversaries to individual
users under the computer security paradigm. It follows that users should take active, and
perhaps organized, steps to restrict third-party activities online.


                                               4
   This thesis is organized as follows: Chapter 2 contextualizes my software and analysis
within the existing body of computer science literature on third-party tracking. The following
sections explore the three types of third-party tracking technologies included in the FoxTracks
tool. Chapter 3 is an overview of HTTP cookies and web bugs in relation to third-party
tracking, and Chapter 4 examines how third parties make use of flash object technology.
Chapter 5 explores the new DOM storage feature of the HTML5 standard. Each of these
chapters describes the basic technology, how the technology is modified to serve information
collection purposes, types and potential uses of the information collected by the technology,
and how these facts influenced the design of FoxTracks. Results related to the FoxTracks
software and the proposed research questions are given in Chapter 6. Chapter 6 also reviews
the results by considering some of the limitations of the software. Chapter 7 concludes the
thesis and discusses further directions for research.




                                               5
Chapter 2

Third-Party Tracking in the
Literature and Code

HTTP cookies have been included in the HTML standard since 2000. Since their introduc-
tion, cookies and web bugs, which are functionally similar to cookies, have generated a great
deal of academic interest. Kristol (2001), author of the original cookie standard, was among
the first to give a general overview of how cookie mismanagement could result in serious
security breaches involving information leakage across domains [6]. Kristol acknowledges
the third-party profiling potential of cookies in correctly implemented cookie management
systems and questions whether users are aware of the tracking potential of cookies. Similar
explanations and concerns have been reiterated as tangential points in numerous academic
computer security papers since 2001. Other researchers have taken a code-based approach
to the analysis of HTTP cookies; motivated by a desire for web transparency, they have de-
veloped software programs for viewing and managing first-party and third-party cookies. A
great deal of this research has come from the public sector with individual programmers and
non-profit advocacy groups undertaking software development and publicizing their work
and findings. The extent of this research is evidenced by the myriad cookie management
programs available for download today.
   Ghostery, a popular Firefox add-on, is one of the few programs designed to exclusively
identify and block third-party HTTP cookies and the web bugs associated with them. Unlike

                                             6
regular cookie managers, Ghostery strives to give users information about the third parties
attempting to set cookies through first-party websites that users visit. Specifically, it alerts
users about the identities and first-party associations of third-party trackers in real-time
using a menu located in the bottom of the Firefox status bar. The menu also links to more
information about the identified trackers. This informational quality of Ghostery addresses
a deficiency of popular cookie managers—an average Internet user without a precise under-
standing of third-party cookies and web bugs can easily use Ghostery and learn about online
tracking in the process. Though Ghostery is the closest content analog to FoxTracks, it does
not provide a user with a complete picture of how third parties log user activity on the web.
FoxTracks attempts to address this informational deficiency by compiling a history of all
first-party websites on which a given third-party tracker has been found in order to show the
browsing history profile that the tracker compiles.
   Local Shared Objects (LSOs) refer to client-side, remotely-accessible storage. Adobe
makes use of LSOs in its popular Flash video player. Much of the interest in LSO im-
plementations has come from the private sector, with various web company white papers
outlining the potential of Adobe’s Flash LSOs as client-side storage bins. LSOs in general
have been examined extensively in systems and networking literature as persistent storage
bins for communication between machines. They have been explored as a mechanism for
executing malicious attacks on host computers, and their third-party tracking potential is
frequently noted in privacy papers. Soltani et al. (2009) were the first to present a holistic
picture of Flash LSO usage and web practices [13]. They conclude that Flash LSOs present
a substantial privacy threat because of their third-party capabilities and their obfuscation
from users, which gives them an especially persistent nature.
   As with HTTP cookies, a great deal of LSO privacy analysis has come from the public
sector. Organizations such as the Electronic Privacy Information Center (EPIC) have com-
piled public fact sheets on how third parties use LSOs to track users and many individual


                                              7
developers have created stand-alone and browser add-on programs for Flash LSO manage-
ment. In the Firefox add-on tradition, BetterPrivacy is the most widely used LSO removal
and editing tool. BetterPrivacy lists all the LSOs that currently exist on a user’s machine
and allows a user to select the frequency and timing of LSO deletion. Despite a user-friendly
design, BetterPrivacy adopts the “install and forget” browser add-on model and thus pri-
marily benefits users with an understanding of LSOs and their risks. The add-on may be
less useful for users who do not have a good understanding of first-party and third-party
LSOs, or the nature of the privacy threats that they present. With FoxTracks, my goal was
to build on BetterPrivacy’s control over obfuscated LSOs by providing a visual explanation
of how third parties may set and use LSOs. By illuminating actual LSO tracking of a user’s
web activities, I extend LSO control to a wider, non-technical audience.
   Unlike HTTP cookies and LSOs, DOM Storage remains relatively unexplored in the aca-
demic and private sector literature. Where it does appear, it is studied for its efficiency
properties as a remote-access storage space. Between public sector organizations and indi-
vidual web developers, no software tools have been developed to specifically examine DOM
Storage contents or to clarify the exact contents to users. Some web pages hosted by privacy
interest groups like the Electronic Frontier Foundation (EFF) briefly describe the location
of DOM contents without more specific information or instances of DOM Storage use in
user tracking. FoxTracks aims to bring DOM Storage more fully into the privacy discourse
by exposing its contents to users and beginning to clarify its role in third-party tracking.
FoxTracks is thus a unique development in the body of work on DOM Storage.
   In sum, there is substantial amount of literature on HTTP cookies, significant work on
LSOs, and relatively little information about the uses of DOM Storage in tracking. While the
existing literature may inform interested users about these technologies, it provides high-level
explanations rather than a personally relevant demonstration of privacy invasion. The latter
is a more accessible and tangible educational experience for users with little background in


                                               8
online privacy. Because current software tools that do show how users are personally af-
fected by tracking require intermediate technological knowledge, I designed FoxTracks to be
relevant and accessible to all Internet users. Surpassing the traditional storage management
model in favor of a personalized approach allows non-expert users to see how tracking figures
actively in their browsing experience. The accessibility of FoxTracks comes from its interface
and its all-in-one nature. Including all three tracking technologies in a single privacy man-
agement tool provides a novel, holistic survey of third-party tracking on the web. Moreover,
each technology represents an advance in the capabilities and persistence of trackers, which
informs users about the evolution and growth of third-party trackers. FoxTracks aims to
increase general knowledge and awareness of third-party tracking through these educational
and accessible qualities.




                                              9
Chapter 3

HTTP Cookies and Web bugs

3.1         The Introduction of State Management Mechanisms

As the complexity of web-applications grew in the late 1990s, the Internet Engineering Task
Force (IETF)1 recognized the value of adopting an HTTP state management mechanism.2
The IETF believed such a mechanism could support virtual shopping carts for e-commerce
and improve the user browsing experience by “remembering” preferences for websites [7].
The state management mechanism adopted was the HTTP cookie (cookie). Cookies are
small pieces of text that servers can set and read from a client computer in order to register
its “state.” They have strictly specified structures and can contain no more than 4 KB of
data each. When a user navigates to a particular domain, the domain may call a script
to set a cookie on the user’s machine. The browser will send this cookie in all subsequent
communication between the client and the server until the cookie expires or is reset by the
server.
       As predicted by the IETF, cookies have been used to improve the functionality of many
websites. For example, they have been used to implement online shopping carts, cache data
   1
     The IETF is an open standards organization that works with similar groups to propose and review
Internet standards.
   2
     HTTP refers to the HyperText Transfer Protocol, which governs how requests are sent over the Internet.
These requests are stateless; in other words they do not carry any configuration information about the
systems exchanging requests.




                                                    10
form values, personalize website views, and transmit user authentication credentials [11].
Such use of cookies improves the user browsing experience and in turn benefits websites who
receive more visitors. However, cookies can also compromise user privacy in many ways. At
the time of adoption, the IETF described the cookie’s potential for cross-domain information
exchange, a particularly serious threat to user privacy. The following text appears under the
header of “Unexpected Cookie Sharing” in the IETF’s Request for Comment (RFC) 2965
document explaining the new cookie standard:


        A user agent should make every attempt to prevent the sharing of session infor-
        mation between hosts that are in different domains. Embedded or inlined objects
        may cause particularly severe privacy problems if they can be used to share cook-
        ies between disparate hosts. For example, a malicious server could embed cookie
        information for host a.com in a URI for a CGI on host b.com.3 User agent im-
        plementors are strongly encouraged to prevent this sort of exchange whenever
        possible.


      Users can navigate to webpages that load content such as images or advertisements from
third-party servers. Because a third-party server establishes a connection to a user’s machine
when its contents are loaded on a first-party website, the third party is able to set a cookie
on the user’s machine. Cookies set by these third parties have the potential to track a user’s
browsing habits. To see how this is possible, consider an image that is stored on a.com’s
servers and loaded on two websites: b.com and c.com. If a user navigates to b.com, a.com
can set a cookie containing a unique alphanumeric string on the user’s machine and associate
b.com with that string somewhere on its own servers. When the user next navigates to c.com,
a.com will read the cookie it previously set on the user’s machine. It can then recognize
the unique string contained in the cookie and associate c.com with the string. a.com now
  3
      URIs and CRIs are placeholders for content that exists outside the immediate context.


                                                     11
has a small profile of the user’s browsing habits and can grow this profile along the order of
the number of websites that host its content. Thus, agents interested in tracking users are
able to exploit the cookie state management mechanism to capture users’ browsing habits
without their knowledge or consent.
   Web bugs are functionally similar to cookies set by third parties. Web bugs affiliated
with particular third parties are embedded objects loaded from third-party servers that are
invisible to users. Unlike third-party cookies, they do not set any data on a user’s computer.
Rather, they collect a user’s IP address, browser type, the current first-party URL, and read
any unique-string cookies that have been set by the third party in the past. If such a cookie
is found, the server is able to augment its profile of the user with the current first-party
URL. In this way, web bugs can be used in conjunction with third-party cookies to facilitate
user tracking.


3.2     Advertisers and Personally Identifiable Informa-
        tion

The previous section explains how third-party cookies can be used to track a user’s browsing
habits. However, understanding the full privacy consequences of third-party cookie tracking
requires an examination of third parties and the anonymity of user browsing profiles. I used
the Ghostery Firefox add-on to examine which organizations are carrying out third-party
tracking using cookies and web bugs. Ghostery identifies third-party trackers on a given web
page by first searching the underlying HTML of a web page for script tags. Content loaded
from a domain different than that of the primary URL must be loaded though the script
syntax. Once Ghostery has acquired all objects associated with script tags, it compares the
objects to a database of known third-party trackers in order to determine whether any of the
scripts loaded third-party trackers. This database is the most comprehensive list of known
third-party trackers that make use of cookies and web bugs. It includes over 200 trackers


                                             12
         Table 3.1: Some third-party trackers contained in the Ghostery database.

                                       Tracker Name
                                     Google Analytics
                                     Quantcast
                                     SiteMeter
                                     Omniture
                                     Facebook Connect
                                     Google Adsense
                                     Doubleclick
                                     Tacoda
                                     WebTrends
                                     AddThis
                                     Revenue Science


and a small sample is given in Table 3.1.
   The tracking agents listed in Table 3.1 are primarily ad networks and behavioral data
providers. Ad networks connect advertisers who want to reach potential customers with
sites who want to sell advertisement space. This business model allows ad networks reach a
wide spectrum of small and medium-sized websites interested in taking on advertisements.
In 2009, 30% of the $8 billion spent on online advertising went to ad networks rather than
direct websites selling advertising space [5]. One advantage ad networks have over individ-
ual websites selling advertising space is the ability to display the same ad across multiple
websites that a single user visits. This advantage attracts advertisers who desire multiple
ad impressions per user [10]. Thus, being able to accurately track user browsing habits has
significant business consequences for ad networks. It follows that ad networks have a strong
incentive to use third-party cookies and web bugs to track users across as many sites as pos-
sible. Like ad networks, behavioral data providers aim to track and organize user browsing
patterns. However, behavioral data providers are solely in the business of stratifying users
for receiving targeted advertisements and not involved with the advertising process itself.
Such companies work with websites or ad networks to suggest relevant ads for different users


                                             13
based on browsing history data collected through third-party cookie and web bugs. For
both types of companies, more user tracking yields more user data points that translate to
improved business.
   Targeted advertising datasets containing users’ browsing habits can be augmented by
precise demographic data. Though the primary use of third-party tracking cookies and
web bugs is the aggregation of a user’s browsing history, cookies can also be used determine
demographic information about the user. Cookies and web bugs have access to primary page
URLs, which may leak pieces of personal data such as login name or data form information.
Third parties may process this information and associate it with the browsing profile and
unique string for the user [12]. This association is a serious threat to user privacy because
it may de-anonymize the browsing history profile that was otherwise only connected to an
alphanumeric string. The browsing profile newly associated with a specific person and/ or
her demographic information might be sold or publicized at the discretion of a tracking
company, resulting in a serious breach of user privacy.



3.3     HTTP Cookies and Web Bugs in FoxTracks

Following the Ghostery add-on model, FoxTracks works by reading in the underlying HTML
of a web page on loading and searching for script tags which are required when content
is loaded from another domain. All lines containing scripts are then examined to see if
they contain an external source and if that source can be identified as a third-party tracker.
Specifically, the domain name of the source is checked against the Ghostery database con-
taining information about 200 known trackers. The Ghostery database is included as a
raw file in the FoxTracks add-on. I have kept the original Ghostery database format and
methodology to maximize compatibility with simultaneous use of Ghostery. Each “entry”
in the file is specified as a {tracker name, tracker search pattern} pair. The search patterns


                                             14
were determined by the Ghostery development team, which collects new third-party tracker
submissions from Ghostery users at large.
   After verifying the presence of a third-party tracker, FoxTracks returns the name of
the tracker and associates it with the first-party website on which it was found. FoxTracks
manages a SQLite database to hold these associations. The database called trackerBase.sqlite
is updated with a {third-party tracker, first-party website} pair whenever a third party is
identified on a page. The HTTP cookies and web bugs tab in FoxTracks provides a table
view of trackerBase.sqlite, which features two columns, “Tracker” and “Origin” (see Figure
3.1). An end-user can resort the table by column. The tracker-based sort displays all entries
containing the same tracker consecutively; this view provides the user with a snapshot of the
personal browsing profiles different trackers have compiled. These profiles are consistent with
the browsing profiles that each tracker stores on its servers. The origin-based sort provides
a history of all the third-party cookies and web bugs that have ever tracked the user on a
particular website that the user has visited. This information is useful for users who may
want to adjust their website usage based on concerns about third-party tracking.
   To understand the FoxTracks interface, users require contextual information about cook-
ies and web bugs. Thus, FoxTracks provides links to web content addressing general questions
about cookies and web bugs, which third parties use these technologies, and what kinds of
information third parties can collect. The software also provides links to information about
opting out of third-party tracking with cookies and web bugs. Specifically, FoxTracks links to
instructions for blocking third-party cookies using built-in browser settings. It also links to
the latest version of Ghostery, which provides a mechanism for blocking web bug activity. By
providing informational links as well as a personalized demonstration of third-party cookie
and web bug tracking, FoxTracks engages and informs users about online privacy risks.




                                              15
Figure 3.1: Screenshot of the FoxTracks HTTP Cookies and Web Bugs panel.




                                  16
Chapter 4

Flash Local Shared Objects

4.1     A History of LSOs

Local Shared Objects are a class of remotely-accessible, client-side storage bins. Flash LSOs
were first used to store settings preferences in Macromedia’s Flash Player 6 in 2002. They
have been included in every subsequent version of the flash player, from Macromedia Flash
Player 7 to Adobe Flash Player 10 (Macromedia was acquired by Adobe in 2005). When
a Flash application is loaded on a page, a website is able to set an associated Flash LSO
without prompting the user for permission. These LSOs are formatted as .sol files and can
hold up to 100 KB of data. Additionally, they do not have an expiration date and are
located in a single system folder that is available to all users and browsers on a machine.
These characteristics of LSOs suggest they are more persistent data stores than HTTP
cookies. Greater persistence and storage size offer a number benefits as users consume more
data intensive web content like streaming music and video. LSOs are able to improve media
playback by storing video preferences or caching large amounts of data that would otherwise
have to be repeatedly retrieved from servers.
   However, this technology can also be detrimental to user privacy. Adobe Flash is a
standalone program that is independent from the browser. Most browsers, including Firefox,
do not provide any control mechanisms over the setting and accessing of Flash LSOs, nor do



                                             17
they prompt the user for permission to interact with Flash LSOs. Furthermore, Flash-based
applications on a given web page may not be visible to the user. It follows that users who are
unaware of LSOs have no control over the setting of LSOs on their machine. These concerns
are aggravated by the wide variety of information that can be stored in LSOs. According to
a Macromedia whitepaper on LSOs, the type of information that can be contained in .sol file
is limited only by the information to which the Flash application has access. This includes
any content in the Flash application file, information that the user provides to the website or
the Flash application, configuration information about the users machine for video content
playback, and other LSOs associated with the same domain [4].
   Flash LSOs can also be used for third-party tracking purposes. In 2005, United Vir-
tualities, an online advertising company, published a statement on the use of LSOs in an
online environment with increased user awareness and deletion of third-party HTTP cookies
[15]. Like third-party HTTP cookies, third-party LSOs with unique identifying strings can
be loaded through first-party websites. These third-party LSOs can then be used to compile
an enhanced browsing profile of an individual who navigates to multiple websites that load
content from the third party. Because this tracking methodology is very similar to that of
third-party HTTP cookies, LSOs are also known as “Flash cookies.” There has been little
work on identifying how and when Flash cookies are set on a user’s machine. Without this
kind of information, it is difficult to discern first-party Flash cookies from third-party Flash
cookies set by companies for tracking purposes.



4.2     Corporations and Market Incentives

Soltani et al. addressed the lack of Flash cookie data by using survey techniques to find out
which websites regularly employed first-party and third-party Flash cookies. They surveyed
the 100 most-visited websites (as of July 2009) and found that 54 sites set a total of 157


                                             18
Local Shared Objects that produced 281 Flash cookies. 31 of these sites also marked their
flash cookies with a unique identifying string that matched a unique identifier contained in
an HTTP cookie set by the same site. Upon investigation, Soltani et al. found that when
the corresponding HTTP cookies were deleted, a new HTTP cookie set by the website would
contain the same identifier. This behavior suggests that Flash cookies actually “respawn”
deleted HTTP cookies [13]. While their research doesn’t explore the possibility of browser
setting uniqueness that would allow identification by a website, further evidence of cookie
respawning is given by the United Virtualities statement on the use of Flash cookies to
defend against cookie deletion. In a March 2005 statement, the company wrote,

     All advertisers, websites and networks use [HTTP] cookies for targeted advertis-
     ing, but cookies are under attack. . . . [We] developed a backup ID system for
     cookies set by web sites, ad networks and advertisers, but increasingly deleted
     by users. UV’s ‘Persistent Identification Element’ (PIE) is tagged to the user’s
     browser, providing each with a unique ID just like traditional cookie coding.
     However, PIEs cannot be deleted by any commercially available anti-spyware,
     mal-ware, or adware removal program. They will even function at the default
     security setting for Internet Explorer.

   Of the 31 domains with Flash cookies that respawned HTTP cookies, Soltani et al. iden-
tified eight as advertising companies and four as first-party domains. The eight advertisers
in Table 4.1 constitute the only definitive list of third parties known to use Flash cookies in
a way that intentionally circumvents user efforts to delete HTTP cookies. Others may be
found by searching personal LSO collections, but this approach to identifying third parties
is subject to scrutiny by the web community at large. Of the companies listed in Table 4.1,
many publicly disclose their ability to collect large quantities of highly specific user infor-
mation such as zip code and income bracket. VideoEgg alone has a 100 million-person user
base through its distribution across 500 websites [2]. These advertisers have incentives to

                                               19
          Table 4.1: Companies using Flash cookies that respawn HTTP cookies.

                                      Company Name
                                      ClearSpring
                                      Iesnare
                                      InterClick
                                      ScanScout
                                      SpecificClick
                                      QuantCast
                                      VideoEgg
                                      Vizu


override user steps to protect privacy as outlined in Section 3.2. The tractable number of
advertisers known to use third-party Flash cookies also allowed me to examine more specific
industry incentives to ignore concerns about user privacy on the web. Public records of
venture funding show that three of the private advertisers in Table 4.1—ClearSpring Tech-
nologies, Quancast, VideoEgg—have received over $110 million in venture capital funding
from 2005 to 2010 [2]. Other advertisers have been recently honored with accolades such as
“a top 10 most innovative company.” This kind of monetary and industry support suggests
that these companies are rewarded for intrusions into user privacy. It also suggests they face
little to no opposition from organized web users or other interest groups that could weaken
their business model by preventing tracking or inducing concern among venture funders. The
lack of concern for user privacy demonstrated by funders and industry reinforces the need
for an educational tool that increases awareness of tracking with Flash cookies.



4.3     Flash Cookies in FoxTracks

While third-party tracking is of particular research interest due its intentionally obfuscated
nature, the difficulty in determining when Flash cookies are set and accessed prevented me
from focusing solely on third-party Flash cookies in FoxTracks. The current method of Flash



                                             20
cookie access detection is an examination of the local LSO folder for new LSOs and changes
in last-access timestamps on every page load [8]. This method results in noticeable browser
slow-down when the folder size is large and when significant numbers of Flash cookies are
being accessed on a single page. Significant browser latency is a disincentive to add-on usage,
so I chose not to use this method. I have concluded that an ideal model for third-party
Flash cookie detection would parallel the Ghostery method of finding third-party HTTP
cookies and web bugs: scanning the HTML of a page for script tags and comparing the
commands contained within them to strings naming known third-party trackers. While the
eight companies identified by Soltani et al. constitute the beginnings of database of known
third-party Flash cookie trackers, I intend to use the Ghostery model of community-based
input and review to compile a larger database for inclusion in later development. Once this
is a strong resource, FoxTracks can display companies that use third-party Flash cookies and
how they have personally tracked the user over time.
   In order to gather community input, FoxTracks must first be adopted by a user base. In
this version of FoxTracks, I have opted to include “view and delete” interface into a machine’s
LSO folders (see Figure 4.1). My interface accesses and lists all Flash cookies on a user’s
machine in table format. Information displayed about each flash cookie includes origin, i.e.,
with which domain the object is affiliated; name, e.g., “settings.sol;” size in bytes; and the
date and time a specific cookie was last accessed by a website the user visited. The interface
also includes information about the location of the LSO folder on the user’s machine and
buttons to delete the listed Flash cookies individually or altogether. The origin and name
information is generally sufficient to understand the owner and purpose of a particular Flash
cookie. When a user is aiming to delete tracking Flash cookies and maintain preferences for
various websites stored in other Flash cookies, the origin and name can be used to decide
whether a particular cookie should be deleted or not. The size and latest access time might
also provide insight into the quantity and frequency of information collection by websites


                                              21
the user has never visited explicitly. The view generated in FoxTracks resembles a simplified
version of the functionality in the most popular Firefox Flash cookie add-on, BetterPrivacy.




               Figure 4.1: Screenshot of the FoxTracks Flash Objects panel.


   A version of BetterPrivacy’s automatic deletion feature is included in the advanced op-
tions pane of FoxTracks, which is shown in Figure 4.2. These options have been separated
from the main Flash objects tab in order to simplify the tool and thereby further its edu-
cational and informational goals. The options for automatic Flash cookie deletion are more
restricted than those offered by BetterPrivacy, which functions as an “install and forget”
add-on for users with an intermediate understanding of Flash cookies. FoxTracks allows
the user to select between complete deletion at every session ending, timer-based deletion of
infrequently-accessed Flash cookies, and adding an option to clear Flash cookies to the built-
in Firefox “Clear Recent History” dialog box. Other advanced options include clearing the
Adobe Flashplayer settings LSO that contains playback preferences in addition to a history

                                             22
of all visited websites that use Flash, and clearing empty folders left over from deleted .sol
files. By leaving out some BetterPrivacy functionality such as a “white-list” of perpetually
allowable Flash cookies, and limiting options for automatic Flash cookie deletion, FoxTracks
aims to focus the user’s attention on the origins and purposes of the LSOs that have been
set on her machine.




                Figure 4.2: Screenshot of the FoxTracks options dialog box.


   Like the HTTP cookies and web bugs tab, the Flash objects tab also includes a sidebar
with informational links to relevant web content. These include descriptions of what Flash
cookies are, which known organizations are using them to track user behavior on their own
websites or across other websites, the types of information that can be gleaned about a user
through the use of Flash cookies, and how to opt out of being tracked by Flash cookies, either
by deleting them or managing them centrally through the Adobe website. Instead of the
deletion-blocking approach to controlling Flash cookies, users may use a formal blocking and

                                             23
storage limitation scheme to stop tracking. Adobe’s website provides a Global Settings panel
that allows users to block all third-party Flash cookies and/ or set storage size capacities
for all Flash cookies. Research done by the Electronic Frontier Foundation on surveillance
technologies suggests the former option may seriously impair some websites’ functionality
and recommends the latter approach, setting all storage capacities to zero. However, this
may result in loss of settings preferences information for first-party websites in exchange for
removing all tracking possibility. The FoxTracks web reference for using Adobe’s central
LSO manager describes the options available to users in full.




                                             24
Chapter 5

DOM Storage

5.1        Web Storage in the W3C Standard

The third and final storage-based tracking technology I examined in the course of my research
was DOM Storage. DOM Storage is proposed as an improved state management mechanism
in working drafts of the HTML 5 standard that is set to be adopted by the World Wide
Web Consortium (W3C)1 in late 2010. As of December 2009, DOM Storage specifications
have been spun off into a distinct working document entitled “Web Storage” for independent
review and adoption. Though it is only recently that DOM Storage is being considered as
a formal Internet standard, popular web browsers have included DOM Storage capabilities
since 2006. Notably, DOM Storage space was first included in Firefox 2.0 and has been
supported through the current version, Firefox 3.6 [1]. Despite its pending W3C adoption,
it is also included in the latest versions of Safari, Internet Explorer, Chrome, and Opera, all
of which were released between 2008 and 2009.
      DOM refers to the legacy term “document object model,” and serves little purpose in
describing this browser storage space. Like HTTP cookies, DOM storage is a mechanism
for maintaining a user’s state with a particular website. It is designed as a large storage
bin that exists locally on a client’s machine. According to the W3C working draft on DOM
  1
      The W3C is a standards organization like the IETF.




                                                   25
Storage, the mechanism offers two benefits over regular cookies. First, it prevents race
conditions that can occur during simultaneous browsing sessions. For instance, when two
browser windows navigate to the same site, cookie data that is transmitted in each session
may get overwritten or aggregated in a way that results in unexpected behavior. The W3C
specification solves this problem by providing a single session storage space for each brows-
ing session. This space will only ever be accessed by one window and thus prevents state
confusion from multiple connections to the same domain. Additionally, all session-only data
will be discarded on window close or browser exit, so no conflicts will manifest under this
model. The second advantage of DOM Storage is its much larger size than regular cookies.
Allowing for megabytes of persistent storage on the client-side of communication allows for
website performance enhancements in the way of a large cache.
   While it offers some advantages over HTTP cookies, DOM Storage presents the same
third-party tracking risks as regular cookies. Additionally, the collection of highly specific
user data kept in DOM Storage increases the seriousness of any privacy intrusions by third
parties. The W3C is conscious of these user privacy concerns posed by DOM Storage adop-
tion:


        A third-party advertiser (or any entity capable of getting content distributed to
        multiple sites) could use a unique identifier stored in its local storage area to track
        a user across multiple sessions, building a profile of the user’s interests to allow
        for highly targeted advertising. In conjunction with a site that is aware of the
        user’s real identity (for example an e-commerce site that requires authenticated
        credentials), this could allow oppressive groups to target individuals with greater
        accuracy than in a world with purely anonymous Web usage.


   Like RFC 2965, the DOM Storage standard promotes user agent’s role in protecting
privacy. User agents are given the following suggestions: blocking third-party storage, ex-


                                                 26
piring stored data, treating persistent storage like regular cookies, tracking the origins of
stored data and creating a blacklist or whitelist of websites accordingly. These suggested
approaches to user privacy are unsatisfying for several reasons. First, engaging in any of
these defenses requires substantial knowledge of session-only and persistent data stores. A
user would need an intermediate understanding of state management mechanisms both at
a high level and on a per website basis in order to determine whether DOM storage was
being used benignly or maliciously. Many users browse the web unaware of DOM Storage
and other state mechanisms with tracking potential. It follows that users lack to knowledge
to manage them effectively. Secondly, presuming user understanding of DOM Storage, the
standard does not propose an API or technical implementation of the suggested defenses.
Rather, a user would need the technical expertise to implement a DOM Storage settings
controller in order to realize many of these defenses. Finally, the document motions to ex-
cuse concerns about user privacy by referencing the futile nature of privacy protection. It
suggests that a first-party domain may track user activity and later sell it to a third-party, or
that session-identifying data passed through URLs may be analyzed for user data regardless
of any privacy protections that are in place. Thus DOM Storage poses unaddressed risks to
user privacy.



5.2         Case Study: Gmail Mobile Privacy

Mobile versions of the major web browsers also support the HTML5 standard for local
database storage. Persistent offline client-side storage is especially advantageous for mobile
websites2 which frequently face limited bandwidth and inconsistent network connectivity.
This is because keeping large amounts of data on the client device requires fewer requests
for bandwidth-intensive data over a sporadic network connection. As a result, many mobile
  2
      Here, “mobile websites” refers to mobile versions of regular websites.



                                                       27
websites have been implemented using the HTML5 standard and local database storage.
The Gmail website for the Apple iPhone is one such mobile website, and it provides an
interesting case study in DOM/ local storage risks to user privacy.
   To see how the Gmail mobile website makes use of local database storage, I needed to
examine the underlying program folders of the iPhone web browser. However, the Safari for
iPhone folder contents cannot be examined on the iPhone itself because the device’s system
folders are locked to users. Thus, I chose to mimic Safari for iPhone using Safari for Mac on
the standard Mac OS. This required a simple change to the Safari developer view and iPhone
user agent context. Logging into gmail.com in Safari for iPhone mode had the following re-
sult: a folder titled “Databases” was silently created within the Safari program folder on
the Mac OS. Within this folder, a management database called “Databases.db” was created
along with a second folder containing storage databases for the domain “mail.google.com.”
In this simulation, though Gmail accessed and wrote to the mobile device, the user was never
prompted for permission or notified of this activity. Along with the privacy concerns de-
scribed below, this local storage creation underscores the failure to achieve informed consent
for tracking under the current privacy paradigm.
   The database created within the mail.google.com folder corresponded to my Google pro-
file, and was populated entirely in plain text. Without any kind of encryption or access
security, the database could be opened with a regular SQLite browsing tool. After logging
out of gmail.com and locally opening the database associated with my profile, I was able
to read highly detailed information about the contents of my Gmail account. In particular,
the cached messages and cached conversation headers tables exposed an alarming amount
of personal information (see Figure 5.1). Together, these tables provide information about
frequent contacts, contacts’ addresses, email subject lines, and message contents snippets.
These data may be gleaned for further information such as site login names and passwords.
As an example, I was able to retrieve a site password from a cached message snippet asso-


                                             28
ciated with a password reset email in my inbox.




        Figure 5.1: The cached conversation headers table in my profile database.


   This storage mechanism presents a host of privacy concerns. Though the database files are
not visible to other end-users through the iPhone interface, other mobile websites and third-
party advertisers may use and exploit the same local storage area. The W3C working draft
on web storage suggests that a user should restrict access to local storage databases to only
scripts originating from the top-level website to which they navigate. However, where users
lack knowledge about DOM storage, this defense is difficult to implement. Domains may
take steps to privatize their local storage databases by using encryption or other techniques.
However, Gmail’s mobile website suggests that at least some websites storing highly personal
data do not obscure that data from third parties. Moreover, the W3C document suggests
that third-party hosts may use fake domain names in order to gain access to the local storage
databases set by the domain name. Without any kind of host authentication, this could lead

                                             29
to information leakage or information spoofing activity, both of which can compromise the
confidentiality of user data. In this example, information leakage might occur if an advertiser
read and saved any of the mail.google.com database information available in the Databases
folder. Information spoofing refers to the writing of data in another domain’s local storage.
Here, a third party might set a user’s Gmail mobile session identifier to a known value and
use this to track the user’s interaction with Gmail.
   Though this example illustrated the use of DOM storage by mobile websites, the same
features of HTML5 are available for use by regular web browsers. Non-mobile websites
may choose to make use of local storage in a similar manner to Gmail’s mobile website as
DOM Storage is adopted as an Internet standard. Should websites and users fail to protect
access to locally stored databases, third parties may be able to use DOM Storage to connect
browsing history with many kinds of personally identifiable information.


5.3     Community Approach to the Study of DOM Stor-
        age

FoxTracks aims to be informational with regard to user privacy threats posed by each third-
party tracking technology. For HTTP cookies and LSOs, I designed interfaces that are
informative and displayed databases and trackers in a way that minimizes confusion. DOM
Storage tracking potential is substantially more difficult to convey using a Firefox extension.
Despite the inclusion of DOM Storage in Firefox 2.0, no add-ons have been developed to
explore its session-only or persistent storage. BetterPrivacy features a boolean option for
clearing DOM contents on browser exit but does not provide a comprehensive view of the
contents or explain how DOM Storage is used by websites.
   Though no add-ons have been developed exclusively for viewing DOM Storage contents,
Mozilla’s developer pages highlight that all persistent data resides in “webappsstore.sqlite,”
a single database inside the Firefox user’s profile folder. The FoxTracks interface loads this


                                             30
database into a table view. However, the database entries are frequently obscure and only
in specific instances will the originating website and other information be intelligible. In
particular, each entry in the database consists of the following fields: scope, key, value,
secure, and owner. Secure is simple a boolean value related to accessibility of the database
entry. Scope and owner refer to the originating website which may be masked or non-
obvious. The key is scope-specific and its significance is not always immediately clear to the
user. The value field is the main storage space of the database entry and may contain user
data that is in human readable form. It may also store scripts that can be accessed and
run from the originating websites. Risks to user privacy can only be demonstrated when
entries’ originating websites and value stores are understandable. Thus, presenting an entire
database view of webappstore.sqlite is not the most effective demonstration of risks to user
privacy posed by DOM Storage. It has the potential to confuse users who may recognize
only some originating websites and certain pieces of data contained in entries. Moreover, the
database view says nothing about the information leakage and information spoofing potential
of DOM Storage contents (see Figure 5.2).
   If database entries could be linked to known third-party companies and augmented with
this information, the FoxTracks DOM Storage tab might be more effective. As with third-
party LSO discovery, this improvement requires a reliable, substantial resource for associating
third parties with the names of their DOM entries and the scripts they use to set DOM
entries. To further this end, I intend to work with the technologists at the Center for
Democracy and Technology to begin a DOM contents exploratory project. Following the
Ghostery model for third-party cookie discovery, we intend to uncover third-party DOM
Storage usage by applying a community-based approach. Interested users will be able to
anonymously submit their DOM content for review. This DOM content can be analyzed for
obscure-origin database entries that occur most frequently, and the first-party website DOM
entries with which they tend to appear. Additionally, users will be able submit comments


                                              31
about perceived uses of key and value fields for particular websites’ entries. A critical mass
of comments can then be peer-reviewed and facts about popular websites’ uses of DOM
Storage can be posted in a central location. A link to this location will eventually be
accessible through the DOM Storage tab in FoxTracks. With support from the privacy
experts and technologists at CDT, this community-based solution to acquiring, analyzing,
and spreading information about DOM Storage and its role in third-party tracking will lead
to more effective interface design in future versions of the FoxTracks tool.




               Figure 5.2: Screenshot of the FoxTracks DOM Storage panel.




                                             32
Chapter 6

Results


6.1      The FoxTracks Implementation

FoxTracks demonstrates how third-party HTTP cookies, Flash cookies, and DOM Storage
contents can adversely affect the privacy of an end-user. FoxTracks relies on a Ghostery
database of known trackers to identify third-party HTTP cookies loaded through the HTML
of a web page. As a Firefox add-on, FoxTracks has access to the first-party domain a user is
visiting when a third-party script attempts to get or set data on the user’s machine. Every
time third-party HTTP cookie activity is recognized on a page, FoxTracks keeps a record of
the third party and the website on which it appeared. When a user opens the tool, a XUL-
generated interface populates a table with all of these records; and a user is given insight into
the profiles different trackers have assembled from her browsing activity. These “snapshots”
of partial browsing history are identical to the browsing profiles kept on the servers of third
parties. They are completely independent from the user-controlled browser history and, as
such, demonstrate a loss of privacy control to the end-user. In this way, FoxTracks achieves
its aim to inform users about the privacy risks of third-party HTTP cookies.
   The identities of third parties that use HTTP cookies, the information collected by cook-
ies, and even the scripts used to set cookies are well-documented in the public domain. On
the other hand, the third-party risks of Flash cookies and DOM Storage exist largely as hypo-


                                               33
thetical information leakages that are periodically supported by specific instances of privacy
invasion. It was difficult to show how third parties use Flash cookies and DOM Storage
in FoxTracks without resources like the Ghostery advertiser database for these technologies.
For this reason, I chose to implement more general Flash cookie and DOM Storage interfaces
and place greater emphasis on the accessibility of these interfaces. FoxTracks provides a file
view of all Flash cookies on a user’s machine. In most cases a Flash cookie’s origin and
purpose is discernible from metadata fields. While these fields do not distinguish first-party
Flash cookies from third-party Flash cookies, users are likely to recognize origin domains
they have never explicitly visited. In this way, even a window into all Flash cookies files can
expose third-party tracking with Flash objects in a manner that is personally relevant to the
user. FoxTracks also provides single-object and all-object deletion options with the aim of
encouraging users to browse the informational web links embedded in the interface prior to
use. The user-friendliness of the Flash objects interface is also increased by the extraction
of advanced deletion methods to a separate options menu.
   The DOM Storage tab of FoxTracks also places an emphasis on user accessibility. How-
ever, because DOM Storage takes the form of a SQLite database in the Firefox web browser,
a display of its contents is only partially telling for users. When origin fields are readable,
users may find database entries that have been set by third parties. When storage con-
tents and origin names are only readable by remote servers, FoxTracks is not as effective
in informing users about third-party tracking with DOM Storage or the associated privacy
risks. Nonetheless, an overview of DOM Storage is a valuable addition to the software. By
including all three tracking technologies, FoxTracks achieves an all-in-one overview of track-
ing practices on the web. For interested users, an all-in-one resource provides a holistic,
straightforward introduction to online privacy.
   While FoxTracks succeeds in being educational and informative, it stands to benefit from
a number of code-based improvements. The FoxTracks interface was implemented in XUL


                                             34
and program functionality was added through JavaScript functions included in the standard
Mozilla Firefox development API. Many of the functions called in the software have single-
threaded and multi-threaded implementations. To avoid increases in program complexity,
multi-threaded functions were not used in the initial development of FoxTracks. However, use
of multi-threaded database functions would significantly improve the performance of both the
HTTP cookie and DOM Storage tabs by allowing SQLite queries that load database entries
into tables to be executed in parallel. While this optimization is secondary to the goals
of FoxTracks, slow program execution impacts the user’s browsing experience in a negative
manner. If FoxTracks suffers from serious latency or prevents the user from browsing the
web at regular pace, users are unlikely to keep or use the add-on. It follows that future
development of FoxTracks should consider performance improvements.
   The most prominent limitation of FoxTracks is its inability to provide information solely
on third-party Flash cookies and DOM Storage contents. As discussed in Section 5.3, Fox-
Tracks would benefit enormously from community-based input and research. Information
about third parties known to use these technologies and the scripts that set them would
provide a basis for identifying additional third-party trackers on the web. Though I plan
to work with CDT to address the lack of information about DOM Storage, such research
might also be carried out in an academic setting or by other public interest organizations.
Interested users that analyze their DOM contents for third-party activity and share their
results can also further the development of informative software tools.



6.2     Third Parties as Privacy Adversaries

Third parties that set and use HTTP cookies and Flash cookies for tracking purposes are
primarily players in the online advertising business. Many of these companies aggregate user
browsing data to serve relevant advertisements to users. Within the privacy discourse, there


                                             35
is debate about the merits of this “behavioral advertising.” Some claim that behavioral
advertisers provide useful content for online consumers. Others cite the privacy-eroding
tracking methodologies of behavioral advertisers. I apply a computer security framework to
the three tracking technologies discussed in this thesis to show how third-party advertisers
might be considered “adversaries” to Internet users.
      In the computer security literature, an adversary is an entity whose aim is to prevent
users of a cryptosystem1 from achieving a goal such as data confidentiality or integrity.
Adversaries’ actions typically include attempts to uncover secret data, corrupt data, spoof
communication messages and message sender identities, and force system failures [14]. The
concept of an adversary is used to reason about cryptosystems as “games” between users and
coordinated attackers. Web browsing can be considered a game between an Internet user
and the websites she visits, where data passed to these websites is intended to be private.
In this game, HTTP cookies used by third parties behave like passive adversaries in formal
cryptosystems. Specifically, HTTP cookies observe and record sessions between a user and
first-party website, and use this information to glean facts about the user. Third parties using
Flash cookies and DOM Storage may also behave like passive adversaries but have immense
potential to be active adversaries that spoof, corrupt, and divert communication between
users and first-party websites. Flash cookies in particular have been found to respawn HTTP
cookies, which constitutes a type of message spoofing since a user’s machine establishes an
HTTP cookie communication channel with a third-party server where none should exist.
Both third-party Flash cookies and DOM Storage contents have the ability to intercept and
fabricate users’ communications with first-party websites resulting in information leakage
and information spoofing as described in the W3C web storage standard. This kind of
action typifies active adversaries as they described in the computer security paradigm.
      In sum, the use of HTTP cookies, Flash cookies, and DOM Storage by third parties can
  1
      A cryptosystem is any computer system that involves cryptography techniques.



                                                    36
be translated precisely into a computer security context. Within this context, tracking tech-
nologies represent means by which a third party attempts break data confidentiality between
a user and the first-party websites they visit. This allows me to characterize third parties
as adversaries in the scheme of online privacy. Moreover, my research provides peripheral
evidence of the actively invasive nature of third-party advertisers. Public comments such as
the United Virtualities statement on the tracking potential of Flash LSOs, demonstrate how
advertisers as a whole have tried to circumvent user attempts to control privacy. Business
successes of advertisers and data aggregators that use Flash cookies also demonstrate the
economic incentives in place for third parties that battle user control of privacy. The char-
acterization of third parties as adversaries to individual users reaffirms the need for greater
user awareness of online tracking practices and a privacy baseline of informed consent.




                                             37
Chapter 7

Conclusions

I implemented the FoxTracks software tool to increase awareness about third-party track-
ing and survey common tracking methodologies. FoxTracks was designed to be accessible
and informational for average Internet users who have incomplete knowledge of third-party
tracking according to survey data. FoxTracks examines the roles of HTTP cookies, LSOs,
and DOM Storage in third-party tracking activities. For each technology, the tool provides
information about the identities of third parties, how they use the technology to undertake
tracking, what kinds of personal data can be exposed, and how users can opt out of tracking.
Based on the existing literature and code for each tracking technology, FoxTracks provides
these pieces of information through different interface implementations. The HTTP cookies
and web bugs panel demonstrates how a third party tracks a user across multiple websites
to compile a profile of the user’s browsing history. The Flash objects panel displays both
first-party and third-party LSOs, and emphasizes learning about Flash cookies prior to us-
ing FoxTracks deletion options. The DOM Storage panel provides a basic database view
into the user’s DOM Storage and strongly emphasizes interaction with the informational
links included in the interface. Together, the three panels provide a novel, holistic survey of
third-party tracking on the web.


                                              38
   The design choices in FoxTracks were informed by my analysis of the history, key players,
and motivations of third-party trackers. A number of significant patterns emerged from this
analysis. In both the HTTP cookie standard and the DOM Storage working document,
standards organizations highlighted the serious privacy risks posed by third-party use of
the technologies. Nonetheless, both documents place a strong emphasis on active user self-
management of privacy. The current self-management privacy default, combined with a
lack of user awareness about privacy risks, provides the basis for substantial tracking by
third parties. Between the Ghostery database of known third parties and an examination
of the companies using Flash cookies, the majority of third parties appear to be advertising
companies and behavioral data aggregators. Because their business models rely on large
amounts of accurate user profiling, these companies have economic incentives to circumvent
user attempts to control privacy. Moreover, venture funding and industry recognition shows
they are encouraged to continue privacy-eroding practices such as HTTP cookie respawning.
Applying a computer security rubric to the sum of this analysis yields qualitative results
about the nature of third parties. In particular, the tracking technologies and methods
employed by third parties are parallel to the actions of adversaries in a computer security
model. This suggests that third parties intentionally circumvent user efforts to control
privacy. In this case, third parties and third-party tracking should be considered a serious
privacy risk to users.
   By applying my analysis to the FoxTracks tool, I was able to make the software conceptu-
ally true to its goals of accessibility, informational quality, and third-party tracking exposure.
However, there is substantial room for revision of the software and a need for interface test-
ing. Further research directions that would also enhance FoxTracks include an examination
of DOM Storage contents or a focused study of Flash cookies. FoxTracks might separately
inspire similar privacy-enhancing tools that explore different web technologies or convey in-
formation in creative manners. Alternatively, work might be undertaken on a holistic survey


                                               39
of security-enhancing technologies, especially as standards promote user self-management of
online privacy. Further research in any of these directions would be supported by FoxTracks
and the accompanying analysis, and would in turn further the tool’s goal of achieving a
privacy standard of informed consent.




                                            40
Acknowledgements

I would like to express my sincere thanks to Professor John Mitchell and Professor David
Dill for their guidance on this research project.
My deepest thanks also goes to the team at CDT Labs who have advised me on technical
matters and provided online support for the project.




                                              41
Bibliography

[1] DOM Storage. https://developer-stage.mozilla.org/en/DOM/Storage, April 2010.

[2] VideoEgg. http://www.crunchbase.com/company/videoegg, March 2010.

[3] M. Ackerman, Cranor L., and J. Reagle. Privacy in e-commerce: examining user sce-
   narios and privacy preferences. Proceedings of the 1st ACM conference on Electronic
   commerce, pages 1–8, May 1999.

[4] M. Chambers. Macromedia Flash MX Security. Macromedia whitepaper describing the
   information accessible to LSOs., March 2002.

[5] R. Hof. Ad networks are transforming online advertising. BusinessWeek, March 2009.

[6] D. Kristol. HTTP cookies: Standards, privacy, and politics. ACM Transactions on
   Internet Technology, 1(2):151–198, 2001.

[7] D. Kristol and L. Montulli. RFC 2109: HTTP state management mechanism. Internet
   Engineering Task Force, Network Working Group, February 1997.

[8] W. Maes, T. Heyman, L. Desmet, and W. Joosen. Browser protection against cross-site
   request forgery. Proceedings of the first ACM workshop on Secure execution of untrusted
   code, pages 3–10, November 2009.




                                          42
 [9] Jonathan R. Mayer. “Any person... a pamphleteer”: Internet Anonymity in the Age of
    Web 2.0. Woodrow Wilson School undergraduate thesis containing relevant survey data
    about perceptions of third-party tracking on the web, April 2009.

[10] G. Nowak and J. Phelps. Understanding privacy concerns. an assessment of consumers’
    information-related knowledge and beliefs. Journal of Direct Marketing, 6(4):28–39,
    August 2006.

[11] W. Peng and J. Cisna. HTTP cookies a promising technology. Online Information
    Review, 24(1):150–153, April 2000.

[12] B. Pfitzmann and M. Waidner. Privacy in browser-based attribute exchange. Proceedings
    of the 1st ACM conference on Electronic commerce, 154(3):52–62, November 2002.

[13] A. Soltani, S. Canty, Q. Mayo, L. Thomas, and C. J. Hoofnagle. Flash Cookies and
    Privacy. UC Berkeley survey of flash cookie adoption on the web and related privacy
    concerns, August 2009.

[14] Douglas R. Stinson. Cryptography Theory and Practice, pages 355–363. Chapman &
    Hall/CRC, third edition, 2006.

[15] United Virtualities. United virtualities develops id backup to cookies, browser-based
    persistent identification element will also restore erased cookie. March 2005.




                                            43

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:31
posted:7/8/2011
language:English
pages:45