Document Sample
implemented Powered By Docstoc
					Peer-to-Peer Networking

                                          Protecting Free
                                          Expression Online
                                          with Freenet
                                                            Freenet uses a decentralized P2P architecture to create an
                                                            uncensorable and secure global information storage system.

Ian Clarke                                                          he growth of censorship and ero-                  version of the software is currently
and Scott G. Miller                                                 sion of privacy on the Internet                   available under open source at http://
Uprizer                                                             increasingly threatens freedom of       
                                                            expression in the digital age. Personal                      In simulations of up to 200,000 nodes,
Theodore W. Hong                                            information flows are becoming subject                    Freenet has proved scalable and fault tol-
Imperial College of Science,                                to pervasive monitoring and surveillance,                 erant. It operates as a self-organizing P2P
Technology, and Medicine                                    and various state and corporate actors are                network that pools unused disk space
                                                            trying to block access to controversial                   across potentially hundreds of thousands
Oskar Sandberg                                              information and even destroy certain                      of desktop computers to create a collab-
and Brandon Wiley                                           materials altogether. Recent incidents                    orative virtual file system. To increase
Freenet Project Inc.                                        such as the publication of Monica Lewin-                  network robustness and eliminate single
                                                            sky’s deleted personal e-mails in a U.S.                  points of failure, Freenet employs a com-
                                                            congressional report further point to an                  pletely decentralized architecture. Given
                                                            unprecedented level of intrusion into pri-                that the P2P environment is inherently
                                                            vate life.1 These trends cause concern not                untrustworthy and unreliable, we must
                                                            only to whistleblowers and political dis-                 assume that participants could operate
                                                            sidents, but to anyone disturbed by the                   maliciously or fail without warning at
                                                            thought of others reading their e-mail or                 any time. Therefore, Freenet implements
                                                            following their Web activities.                           strategies to protect data integrity and
                                                               Fortunately, concurrent advances in                    prevent privacy leaks in the former
                                                            the power of personal computers have                      instance, and provide for graceful degra-
                                                            made it possible to develop peer-to-peer                  dation and redundant data availability in
                                                            technologies to respond to these chal-                    the latter. The system is also designed to
                                                            lenges. Our project, Freenet, is a distrib-               adapt to usage patterns, automatically
                                                            uted information storage system                           replicating and deleting files to make the
                                                            designed to address information priva-                    most effective use of available storage in
                                                            cy and survivability concerns.2 A beta                    response to demand.

40                        JANUARY • FEBRUARY 2002   1089 -7801/02/$17.00 ©2002 IEEE                           IEEE INTERNET COMPUTING

Design Motivation                                        Maintaining privacy for creating and retrieving
As documented by Human Rights Watch                      files means little without also protecting the files
( and the          themselves — in particular, keeping their holders
Global Internet Liberty Campaign (http://www.            hidden from attack. We have thus made it hard to, governments around the world have            discover exactly which computers store which
undertaken efforts to force Internet service             files. Together with redundant replication of data,
providers to block access to content deemed              holder privacy makes it extremely difficult for
unsuitable or subversive, or to make them liable         censors to block or destroy files on the network.
for such material hosted on their servers. The Elec-        Freenet does not, however, explicitly try to
tronic Privacy Information Center (http://www.           guarantee permanent data storage. Because disk has also raised privacy and civil liber-      space is finite, a tradeoff exists between publish-
ties questions about developments like the Feder-        ing new documents and preserving old ones. Many
al Bureau of Investigation’s Carnivore electronic        systems solve this problem by requiring payment
monitoring system and the European Union’s new           (in disk space or money, for example), but we
“Convention on Cybercrime,” which gives author-          would rather encourage publishing than keep out
ities broad powers to intercept and record digital       authors who can’t run peer nodes themselves or
communications.                                          are too poor to pay for storage. To keep junk doc-
    Though seemingly separate, the prevention of         uments from filling all available space or overwrit
censorship and the maintenance of privacy are both       ing existing data, we
fundamental to free expression in a potentially hos-     implement a proba-
tile world. Preserving the availability of controver-    bilistic storage policy.     We must assume that
sial information is only half the problem; individu-     We hope, however, that
als can often be subject to adverse personal             Freenet will attract suf-    participants could
consequences for writing or reading such informa-        ficient resources from
tion and might need to conceal their activity in order   participants to preserve     operate maliciously
to protect themselves. Indeed, the U.S. Supreme          most files indefinitely.
Court, among others, has long recognized the impor-                                      or fail without warning.
tant role of anonymous speech in political dissent.      Freenet
    A common objection to mechanisms for secure          Architecture
communication is that criminals might use them           Freenet participants each run a node that provides
to evade law enforcement. Freenet is not particu-        the network some storage space. To add a new file,
larly attractive for such purposes, as it is designed    a user sends the network an insert message con-
to broadcast content to the world — not so useful        taining the file and its assigned location-indepen-
for secret criminal plots. In any case, however,         dent globally unique identifier (GUID), which
anonymous electronic communication is simply a           causes the file to be stored on some set of nodes.
tool, like payphones or postal mail, to be used for      During a file’s lifetime, it might migrate to or be
good or bad. A terrorist might use it to plan an         replicated on other nodes. To retrieve a file, a user
attack, or an informant could use it to turn the ter-    sends out a request message containing the GUID
rorist in to the authorities. Most importantly, the      key. When the request reaches one of the nodes
freedom to communicate is a fundamental value            where the file is stored, that node passes the data
in a democratic society. There is no way to deny it      back to the request’s originator.
to the “bad guys” without also denying freedom to
the “good guys” — civil rights activists, minority       GUID Keys
religious groups, or ordinary citizens who simply        Freenet GUID keys are calculated using SHA-1
wish to keep their affairs private.                      secure hashes. The network employs two main
    In designing Freenet, we focused on                  types of keys: content-hash keys, used for prima-
                                                         ry data storage, and signed-subspace keys, intend-
I    privacy for information producers, consumers,       ed for higher-level human use. The two are anal-
     and holders;                                        ogous to inodes and filenames in a conventional
I    resistance to information censorship;               file system.
I    high availability and reliability through decen-
     tralization; and                                    Content-hash keys. The content-hash key (CHK) is
I    efficient, scalable, and adaptive storage and       the low-level data-storage key and is generated by
     routing.                                            hashing the contents of the file to be stored. This

IEEE INTERNET COMPUTING                                              JANUARY • FEBRUARY 2002   41
Peer-to-Peer Networking

                                                         Related Work in P2P
 The best-known systems similar to Freenet         such a service. Free Haven is an Eternity-like   tage to these systems is that they can pro-
 are Napster ( and         anonymous P2P publication system that uses       vide strong guarantees that data will be
 Gnutella (, which       trust mechanisms and file trading to enforce      located within certain time bounds (gener-
 both implement large-scale pooling of disk        server accountability and user anonymity.2       ally logarithmic) if it exists.Thus, they can
 space among individual users. The major           Unfortunately,it can take a very long time —     provide better handling of issues like stor-
 difference is that whereas Freenet provides       even days — to retrieve files from it.            age management.
 a file-storage service, these systems pro-                                                             The main disadvantage of these systems
 vide a file-sharing service.That is, partici-     Security Issues                                  relative to Freenet is that they are more dif-
 pants make files available to others but do        Several recently developed P2P file-storage       ficult to secure against attack. It is easier for
 not push files to other nodes for storage.         systems focus on efficient data location         a malicious node to manipulate its identity
 This architecture means that data is not          rather than privacy and security against         to gain responsibility for a particular piece
 persistent in the network; rather, files are      malicious participants. Systems such as          of data and suppress it. Links and routing
 available only when their originators (or         OceanStore,3 Cooperative File System             are also more visible and deterministically
 subsequent requesters) are online.Anoth-          (CFS),4 and PAST5 are all based on routing       structured, making it easier to trace mes-
 er difference is that neither system              models in which each node is assigned a          sages and harder to route around malicious
 attempts to provide anonymity. Gnutella is        fixed identity and maintains some knowl-         nodes that sabotage requests (for example,
 also extremely inefficient, broadcasting          edge of nodes whose identities vary in           by pretending data could not be found).
 thousands of messages per request.                specified ways from its own.These systems         PAST, as currently constituted, also requires
      Freenet more closely resembles the Eter-     deterministically place data on nodes that       users to trust external smart cards.
 nity service, which was described in a pro-       most closely match the data’s globally
 posal for a highly survivable network for per-    unique identifier (GUID).A user can thus         Privacy Issues
 manently and anonymously archiving                locate data by progressively visiting nodes      Systems focusing on privacy for informa-
 information.1 However, the proposal lacked        whose identities match more and more             tion consumers include browser proxy ser-
 specifics on how to efficiently implement           bits of the desired GUID.The main advan-                                         continued on p. 43

                    process gives every file a unique absolute identi-                 “keyring”) and the descriptive string, from which
                    fier (SHA-1 collisions are considered nearly impos-                you can recreate the SSK. Adding or updating a
                    sible) that can be verified quickly. Unlike with                  file, on the other hand, requires the private key in
                    URLs, you can be certain that a CHK reference will                order to generate a valid signature. SSKs thus
                    point to the exact file intended. CHKs also permit                 facilitate trust by guaranteeing that the same pseu-
                    identical copies of a file inserted by different peo-              donymous person created all files in the subspace,
                    ple to be automatically coalesced because every                   even though the subspace is not tied to a real-
                    user will calculate the same key for the file.                     world identity. For example, you can use SSKs to
                                                                                      send out a newsletter, to publish a Web site, or
                    Signed-subspace keys. The signed-subspace key                     (operated in reverse) to receive e-mail.
                    (SSK) sets up a personal namespace that anyone                       Typically, SSKs are used to store indirect files
                    can read but only its owner can write to. You could               containing pointers to CHKs rather than to store
                    create a subspace for an archive on the Vietnam                   data files directly. Indirect files combine the human
                    War, for example, by first generating a random                    readability and publisher authentication of SSKs
                    public-private key pair to identify it. To add a file              with the fast verification of CHKs. They also allow
                    you first choose a short text description, such as                data to be updated while preserving referential
                    politics/us/pentagon-papers. You would then                       integrity. To perform an update, the data’s owner
                    calculate the file’s SSK by hashing the public half                first inserts a new version of the data, which will
                    of the subspace key and the descriptive string                    get a new CHK because the file contents are dif-
                    independently before concatenating them and                       ferent. The owner then updates the SSK to point to
                    hashing again. Signing the file with the private                  the new version. The new version will be available
                    half of the key provides an integrity check as every              by the original SSK, and the old version will
                    node that handles a signed-subspace file verifies                 remain accessible by the old CHK. Indirect files can
                    its signature before accepting it.                                also be used to split large files into multiple pieces
                       To retrieve a file from a subspace, you need only               by inserting each part under a separate CHK and
                    the subspace’s public key (perhaps stored on your                 creating an indirect file that points to all the parts.

42   JANUARY • FEBRUARY 2002                                                                IEEE INTERNET COMPUTING

                                                 Related Work in P2P (cont.)
  continued from p. 42                           tacted. Neither system protects informa-                      3. S. Rhea et al., “Maintenance-Free Global Data
  vices such as the Anonymizer (http://www.      tion producers or provides redundant                             Storage,” IEEE Internet Computing, vol. 5, no. 5, and SafeWeb/Triangle          information storage. Publius8 enhances                           Sep./Oct. 2001, pp. 40-49.
  Boy ( Both pro-       robustness and protects producer an-                          4. F. Dabek et al.,“Wide-Area Cooperative Storage
  vide anonymity by proxying requests for        onymity by distributing files as redundant                       with CFS,” Proc. 18th ACM Symp. Operating System
  Web content on the user’s behalf, although     partial shares among many holders; how-                          Principles (SOSP 2001), ACM Press, New York,
  users are vulnerable to logging by the ser-    ever, because the identity of the holders is                     2001.
  vices themselves. Crowds6 improves an-         not anonymized, an adversary could still                      5. A. Rowstron and P. Druschel, “Storage Manage-
  onymity over simple proxying through a         destroy information by attacking a sufficient                    ment and Caching in PAST, a Large-Scale, Persis-
  request-chaining technique similar to the      number of shares. None of these systems                          tent Peer-to-Peer Storage Utility,” Proc. 18th ACM
  one we use. None of these systems direct-      protects information consumers, although                         Symp. Operating System Principles (SOSP 2001),
  ly stores information; they only provide       Rewebber also operates a separate brows-                         ACM Press, New York, 2001.
  anonymized access to information available     er proxy service.                                             6. M.K. Reiter and A.D. Rubin, “Anonymous Web
  on the Web.                                                                                                     Transactions with Crowds,” Comm. ACM, vol. 42,
      On the producer-holder side, the            References                                                      no. 2, 1999, pp. 32-38.
  Rewebber ( pro-         1. R.J.Anderson,“The Eternity Service,” Proc. 1st Int’l      7. I. Goldberg and D.Wagner,“TAZ Servers and the
  vides some privacy for information holders         Conf. Theory and Applications of Cryptology, CTU             Rewebber Network: Enabling Anonymous Publish-
  with an encrypted URL service that is the          Publishing House, Prague, Czech Republic,1996,               ing on the World Wide Web,” First Monday, vol. 3,
  inverse of a browser proxy, but is similarly       pp. 242-252.                                                 no. 4, 1998; available at
  vulnerable to logging by the service oper-      2. R. Dingledine, M.J. Freedman, and D. Molnar, “The            issues/issue3_4/goldberg/.
  ator. TAZ (temporary anonymous zone)               Free Haven Project: Distributed Anonymous Stor-           8. M.Waldman, A.D. Rubin, and L.F. Cranor,“Publius: A
  servers7 extend this idea with chains of           age Service,” Designing Privacy Enhancing Technologies,      Robust,Tamper-Evident, Censorship-Resistant,Web
  nested encrypted URLs that point to suc-           Lecture Notes in Computer Science 2009, Springer-            Publishing System,” Proc. 9th Usenix Security Symp.,
  cessive Rewebber-like servers to be con-           Verlag, Berlin, H. Federrath, ed., 2001, pp. 67-95.          Usenix Assoc., Berkeley, Calif., 2000, pp. 59-72.

Finally, you can use indirect files to create hierar-              encrypted, until the message finally reaches its
chical namespaces from directory files that point                  recipient.
to other files and directories.                                        Because each node in the chain knows only
   SSKs can also be used to implement an alterna-                 about its immediate neighbors, the end points
tive domain name system for nodes that change                     could be anywhere among the network’s hundreds
address frequently. Each such node would have its                 of thousands of nodes, which are continually
own subspace, and you could contact it by look-                   exchanging indecipherable messages. Not even
ing up its public key — its address-resolution key                the node immediately after the sender can tell
— to retrieve the current address.                                whether its predecessor was the message’s origi-
                                                                  nator or was merely forwarding a message from
Messaging and Privacy                                             another node. Similarly, the node immediately
Freenet was designed from the beginning under the                 before the receiver can’t tell whether its successor
assumption of hostile attack from both inside and                 is the true recipient or will continue to forward it.
out. Therefore, it intentionally makes it difficult for            This arrangement is intended to protect not only
nodes to direct data toward themselves and keeps                  information producers and consumers (at the
its routing topology dynamic and concealed.                       beginning of chains), but also information holders
Unfortunately, these considerations have had the                  (at the end of chains). By protecting the latter, we
side effect of hampering changes that might                       can prevent an adversary from destroying a file
improve Freenet’s routing characteristics. To date,               by attacking all of its holders. Of course, ensuring
we have not discovered a way to guarantee better                  privacy is not enough; queries must be able to
data locatability without compromising security.                  locate data as well.
   Privacy in Freenet is maintained using a varia-
tion of Chaum’s mix-net scheme for anonymous                      Routing
communication.3 Rather than move directly from                    Routing queries to data is the most important
sender to recipient, messages travel through node-                element of the Freenet system. The simplest rout-
to-node chains, in which each link is individually                ing method, used by services like Napster, is to

IEEE INTERNET COMPUTING                                                                      JANUARY • FEBRUARY 2002              43
Peer-to-Peer Networking
                                                          c                   = Data request

                                                   2 3                        = Data reply       cessful, each node in the chain passes the file back
     Requester                                                                                   upstream and creates a new entry in its routing
         a          1                     b                                   = Request failed
                         12                                                                      table associating the data holder with the request-
                                                                                                 ed key. Depending on its distance from the holder,
                                                           d Data holder                         each node might also cache a copy locally.
                                  7           11                                                    To conceal the identity of the data holder, nodes
                              6                     10
                                                                                                 will occasionally alter reply messages, setting the
                                                                                                 holder tags to point to themselves before passing
                                      5       e                                                  them back up the chain. Later requests will still
                     f            8
                                                                                                 locate the data because the node retains the true
Figure 1.Typical request sequence.The request moves through the                                  data holder’s identity in its own routing table and
network from node to node, backing out of a dead-end (step 3) and                                forwards queries to the correct holder. Routing
a loop (step 7) before locating the desired file.                                                 tables are never revealed to other nodes.
                                                                                                    To limit resource usage, the requester gives each
                                                                                                 query a time-to-live limit that is decremented at
                         maintain a central index of files, so that users                        each node. If the TTL expires, the query fails,
                         can send requests directly to information hold-                         although the user can try again with a higher TTL
                         ers. Unfortunately, centralization creates a sin-                       (up to some maximum). Because the TTL can give
                         gle point of failure that is easy to attack. For                        clues about where in the chain the requester is,
                         example, if you were trying to phone Michael                            Freenet offers the option of enhancing security by
                         Jordan, the simplest way to get his number                              adding an initial mix-net route before normal
                         would ordinarily be to call directory assistance.                       routing. This effectively repositions the start of the
                         However, because directory assistance is central-                       chain away from the requester.
                         ized, your access can be easily blocked if Jordan                          If a node sends a query to a recipient that is
                         or someone else decides to remove his directory                         already in the chain, the message is bounced back
                         entry, or if the service goes down.                                     and the node tries to use the next-closest key
                            Systems like Gnutella broadcast queries to every                     instead. If a node runs out of candidates to try, it
                         connected node within some radius. Using this                           reports failure back to its predecessor in the chain,
                         method, you would ask all of your friends if any                        which then tries its second choice, and so on.
                         of them knew Jordan’s number, get them to ask                              Figure 1 depicts a typical request sequence. The
                         their friends, and so on. Within a few steps, thou-                     user initiates a request at node A and forwards the
                         sands of people could be looking for his number.                        request to B, which forwards it to C. Node C is
                         Although this process would eventually find your                         unable to contact any other nodes and returns a
                         answer, it is clearly wasteful and unscalable.                          “request failed” message to B. Node B then tries
                            Freenet avoids both problems by using a                              its second choice, E, which forwards the request
                         steepest-ascent hill-climbing search: Each node                         to F. Node F forwards the request to B, which
                         forwards queries to the node that it thinks is                          detects a loop and bounces the message back.
                         closest to the target. You might start searching                        Unable to contact any additional nodes, node F
                         for Jordan by asking a friend who once played                           backtracks one step to E, which forwards the
                         college basketball, for example, who might pass                         request to its second choice, D, and locates the
                         your request on to a former coach, who could                            file. D returns the file via E and B back to A,
                         pass it to a talent scout, who might pass it to Jor-                    which sends it to the user. Along the way, E, B,
                         dan’s agent, who could put you in touch with the                        and A might also cache the file.
                         man himself.                                                               With this approach, the request homes in closer
                                                                                                 with each hop until the key is found. A subsequent
                         Requesting files. Every node maintains a routing                         query for this key will tend to approach the first
                         table that lists the addresses of other nodes and the                   request’s path, and a locally cached copy can sat-
                         GUID keys it thinks they hold. When a node                              isfy the query after the two paths converge. Sub-
                         receives a query, it first checks its own store, and if                  sequent queries for similar keys will also jump over
                         it finds the file, returns it with a tag identifying                    intermediate nodes to one that has previously sup-
                         itself as the data holder. Otherwise, the node for-                     plied similar data. Nodes that reliably answer
                         wards the request to the node in its table with the                     queries will be added to more routing tables, and
                         closest key to the one requested. That node then                        hence, will be contacted more often than nodes
                         checks its store, and so on. If the request is suc-                     that do not.

44      JANUARY • FEBRUARY 2002                                                             IEEE INTERNET COMPUTING

Inserting files. An insert message follows the             Network Evolution
same path that a request for the same key would            The network evolves over time as new nodes join
take, sets the routing table entries in the same           and existing nodes create new connections after
way, and stores the file on the same nodes. Thus,          handling queries. As more requests are handled,
new files are placed where queries would look              local knowledge about other nodes in the network
for them.                                                  improves, and routes adapt to become more accu-
   To insert a file, a user assigns it a GUID key and       rate without requiring global directories.
sends an insert message to the user’s own node
containing the new key with a TTL value that rep-          Adding Nodes
resents the number of copies to store. Upon receiv-        To join the network, a new node first generates a
ing an insert, a node checks its data store to see if      public-private key pair for itself. This pair serves
the key already exists. If so, the insert fails — either   to logically identify the node and is used to sign a
because the file is already in the network (for            physical address reference. Note that public keys
CHKs) or the user has already inserted another file         are not certified. We don’t need to link them to
with the same description (for SSKs). In the latter        real-world identities because the node’s public key
case, the user should choose a different descrip-          is its identity, even if it changes physical address-
tion or perform an update rather than an insert.           es. Certification might be useful in the future for
(Note that we have not yet implemented updates             deciding whether to trust a new node, but for now
because we are still working on a mechanism to             Freenet uses no trust
ensure that all old copies get replaced.)                  mechanism.
   If the key does not already exist in the node’s            Next, the node              Nodes’ routing tables
data store, the node looks up the closest key and          sends an announce-
forwards the message to the corresponding node             ment message includ-           should specialize
as it would for a query. If the TTL expires with-          ing the public key and
out collision, the final node returns an “all clear”       physical address to an         in handling clusters
message. The user then sends the data down the             existing node, located
path established by the initial insert message.            through some out-of-           of similar keys.
Each node along the path verifies the data                 band means such as
against its GUID, stores it, and creates a routing         personal communica-
table entry that lists the data holder as the final        tion or lists of nodes posted on the Web, with a
node in this chain. As with requests, if the insert        user-specified TTL. The receiving node notes the
encounters a loop or a dead end, it backtracks to          new node’s identifying information and forwards
the second-nearest key, then the third-nearest,            the announcement to another node chosen ran-
and so on, until it succeeds.                              domly from its routing table. The announcement
                                                           continues to propagate until its TTL runs out. At
Data Encryption                                            that point, the nodes in the chain collectively
For political or legal reasons, node operators             assign the new node a random GUID in the key-
might wish to remain ignorant of the contents of           space using a cryptographic protocol for shared
their data stores. To this end, we encourage pub-          random number generation that prevents any par-
lishers to encrypt all data before insertion. The          ticipant from biasing the result. This procedure
network proper knows nothing about this level              assigns the new node responsibility for a region of
of encryption because it just ships already                keyspace that all participants agree on while guar-
encrypted bits.                                            anteeing that a malicious node cannot influence
   Data encryption keys are not used in routing or         the assignment for a specific key that it might want
included in network messages. Inserters distribute         to attack.
them directly to end users at the same time as the
corresponding GUIDs. Thus, node operators can-             Training Routes
not read their own files, but users can decrypt            As more requests are processed, the network’s
them after retrieval. Node operators cannot gain           routing should become better trained. Nodes’ rout-
any information by looking at GUIDs, either,               ing tables should specialize in handling clusters of
because the hashes used to generate them scramble          similar keys because each node will mostly receive
any identifying characteristics. From a node oper-         requests for keys that are similar to the keys it is
ator’s point of view, the data store consists only of      associated with in other nodes’ routing tables.
random GUIDs attached to opaque data.                      When those requests succeed, the node learns

IEEE INTERNET COMPUTING                                                JANUARY • FEBRUARY 2002   45
Peer-to-Peer Networking

            about previously unknown nodes that can supply               work for relevant keys. This is similar to the prob-
            such keys and creates new routing entries for                lem of searching the Web, and similar solutions are
            them. As the node gains more experience in han-              possible: Freenet can be spidered, or individuals
            dling queries for those keys, it will successfully           can publish lists of bookmarks. However, these
            answer them more often and, in a positive feed-              approaches are not entirely satisfactory in terms
            back loop, get asked about them more often.                  of Freenet’s design goals.
               Nodes’ data stores should also specialize in                 One simple approach for a true Freenet search
            storing clusters of files with similar keys. Because          would be to create a special public subspace for
            inserts follow the same paths as requests, similar           indirect keyword files. When authors insert files,
            keys tend to cluster in the nodes along those                they could also insert several indirect files corre-
            paths. Nodes should similarly cluster files cached            sponding to search keywords for the original file.
            after requests because most requests will be for             The “Pentagon Papers” file might have indirect files
            similar keys.                                                named keyword:politics and keyword:united-
               Taken together, the twin effects of clustering            states pointing to it, for example.
            in routing tables and data stores should improve                The system would allow multiple keyword files
            the effectiveness of future queries in a self-rein-          with the same key to coexist (unlike with normal
            forcing cycle. While we do not yet have a good               files), and requests for such keys could return mul-
            mathematical model to analyze the training and               tiple matches. Thus, a search for “politics” might
                                           convergence of the            return a pointer to the Tiananmen Papers as well
                                           Freenet algorithm,            as one to the Pentagon Papers. Managing a large
          Well-known nodes the simulations                               number of indirect files for common keywords
                                           described later show          would be difficult, however, because all the files
           tend to see more that the network                             with the same name would be attracted to the
                                           can, in practice,             same nodes. A more sophisticated approach might
       requests and become locate files quickly                          use some type of distributed search over detailed
                                           — with a median               metadata descriptors inserted along with the orig-
     even better connected. path length of just 8                        inal files, but we have not yet devised a way to
                                           hops in a 10,000-             route such a search efficiently.
                                           node network.
                                                                         Managing Storage
                    Key Clustering                                       To encourage participation, Freenet does not
                    Because GUID keys are derived from hashes, the       require payment for inserts or impose restrictions
                    closeness of keys in a data store is unrelated to    on the amount of data that publishers can insert.
                    the corresponding files’ contents. This lack of      Given finite disk space, however, the system must
                    semantic closeness is unimportant, however,          sometimes decide which files to keep. It currently
                    because the routing algorithm is based on the        prioritizes space allocation by popularity, as mea-
                    locations of particular keys, rather than particu-   sured by the frequency of requests per file. Each
                    lar topics.                                          node orders the files in its data store by time of last
                       Suppose, for example, a descriptive string such   request, and when a new file arrives that cannot
                    as politics/us/pentagon-papers yields the key        fit in the space available, the node deletes the least
                    AF5EC2. Requests for this file could be satisfied by   recently requested files until there is room.
                    creating clusters containing the keys AF5EC1,           Because routing table entries are smaller, they
                    AF5EC2, and AF5EC3, rather than clusters con-        can be kept around longer than files. Evicted files
                    taining works about U.S. politics. In fact, hashes   don’t necessarily disappear right away because the
                    are useful because they ensure that similar works    node can respond to a later request for the file
                    will be scattered throughout the network, lessen-    using its routing table to contact the original data
                    ing the chances that a single node’s failure will    holder, which might be able to supply another
                    make an entire category of files unavailable. Sim-    copy. Why would the original holder be more like-
                    ilarly, the contents of any given subspace will be   ly to have the file? Freenet’s data holder pointers
                    scattered across different nodes, which increases    have a treelike structure. Nodes at the leaves
                    robustness.                                          might see only a few local requests for a file, but
                                                                         those higher up the tree receive requests from a
                    Searching                                            larger part of the network, which makes their
                    One open issue is how users can search the net-      copies more popular.

46   JANUARY • FEBRUARY 2002                                               IEEE INTERNET COMPUTING

   File distribution is therefore determined by
two competing forces: tree growth and pruning.
The query-routing mechanism automatically cre-
ates more copies in an area of the network where

                                                         Fraction of nodes
a file is requested, and the tree grows in that
direction. This improves response time and pre-
vents overloading when the popularity of a file
increases suddenly. Files that go unrequested in
another part of the network are subject to dele-                                      0.001
tion. As that part of the tree shrinks, space is
freed up for other files. The net effect is that the
number and location of copies adjust to the
demand for each file.
                                                                                                        10                             100                              1,000
Performance Analysis                                                                                                               Number of links
We have tested Freenet’s performance using sim-          Figure 2. Degree distribution among Freenet nodes.The network
ulations. We have described more extended results        shows a close fit to a power-law distribution.
elsewhere,4 but we will summarize the most impor-
tant results here. Freenet demonstrates good scal-
ability and fault-tolerance characteristics that can                                                   First quartile
be explained in terms of a small-world network                                          100            Median
                                                                                                       Third quartile
model.5 Small-world networks are characterized by
                                                               Request path length (hops)

a power-law distribution of graph degree (here, the
number of routing table entries) of the general
form p(x) ∼ x-t, where t is a constant, x is the graph
degree, and p(x) is the probability that a node has
degree x. In such a distribution, the majority of                                           10
nodes has relatively few local connections to other
nodes, but a significant small number of nodes
have large wide-ranging sets of connections. Even
in very large networks, the small-world topology
enables efficient short paths because these well-
connected nodes provide shortcuts.                                                           1
   Figure 2 shows the graph degree distribution in                                               100            1,000            10,000           100,000               1e+06
                                                                                                                             Network size (nodes)
a simulation of a 10,000-node trained network. The
distribution closely approximates a power law with       Figure 3. Request path length versus network size.The median path
t = 1.5, except for an outlier resulting from the max-   length in the network scales as N 0.28.
imum routing table size (250 in this simulation).
This is not surprising, as power-law distributions
tend to arise naturally when networks grow by pref-      ated files to random nodes in the network, inter-
erential attachment (that is, new nodes prefer to        spersed with random requests for files that had
connect to nodes that already have many links).6 The     already been inserted (all with TTL = 20). After
new-node announcement protocol initially creates         every five inserts and requests, we created a new
a preferential attachment effect because following       node, which announced itself to a random exist-
random links gives a higher probability of arriving      ing node with TTL = 10. We measured the net-
at nodes that have more links. During normal oper-       work’s performance after every hundred inserts
ation, the effect continues because well-known           and requests by issuing a set of test requests for
nodes tend to see more requests and become even          previously inserted files and recording the result-
better connected (“the rich get richer”).                ing path length distribution (the number of hops
                                                         actually required to find the data). This continued
Scalability                                              until the network reached 200,000 nodes.
To test Freenet’s scalability, we created a simulat-        Figure 3 shows the evolution of the first, sec-
ed network of 20 nodes initially connected in a          ond, and third quartiles of the request path length
ring topology. We sent inserts of randomly gener-        versus network size, averaged over 10 trials. We

IEEE INTERNET COMPUTING                                                                                JANUARY • FEBRUARY 2002       47
Peer-to-Peer Networking

                                                          First quartile                                                                Figure 4 shows the resulting evolution of the
                                             600          Median
                                                          Third quartile                                                                request path length, averaged over 10 trials, which
                                                                                                                                        shows that the network is surprisingly robust
Request path length (hops)

                                             500                                                                                        against quite large failures. The median path
                                                                                                                                        length remained below 20 even when up to 30 per-
                                             400                                                                                        cent of nodes failed. (Note that requests were
                                                                                                                                        capped at 500 hops before giving up.)
                                             300                                                                                           The power-law distribution gives small-world
                                                                                                                                        networks a high degree of fault tolerance6 because
                                             200                                                                                        random failures are most likely to eliminate nodes
                                                                                                                                        from the poorly connected majority. Routing per-
                                             100                                                                                        formance is noticeably affected only after there are
                                                                                                                                        enough failures to knock out a significant number
                                                                                                                                        of well-connected nodes. A small-world network
                                               0         0.1       0.2         0.3      0.4       0.5        0.6         0.7      0.8
                                                                              Fraction of nodes failing                                 falls apart much more quickly, however, if the
                                                                                                                                        well-connected nodes are targeted first. This is evi-
Figure 4. Request path length under random failure. Performance                                                                         dent in Figure 5, which shows the size of the
remained reasonable even up to a 30 percent failure rate in our                                                                         largest connected component in a 10,000-node
simulation.                                                                                                                             network as nodes were removed, both randomly
                                                                                                                                        and in order from most connected to least con-
                                             10,000                                                                                     nected. Under random failure, the vast majority of
                                                                                                                Random failure          the network remained connected until almost the
                                             9,000                                                              Targeted attack
                                                                                                                                        very end. Under targeted attack, the network
       Size of largest connected component

                                             8,000                                                                                      underwent a “percolation transition” near 60 per-
                                             7,000                                                                                      cent removal, at which point it abruptly broke into
                                                                                                                                        disconnected fragments.

                                             5,000                                                                                      Future Work
                                             4,000                                                                                      Initial beta deployment of Freenet is under way,
                                                                                                                                        and users have downloaded hundreds of thousands
                                             3,000                                                                                      of copies of the software so far. The system’s
                                             2,000                                                                                      anonymous nature makes it impossible to tell
                                                                                                                                        exactly how many users there are or how well
                                                                                                                                        inserts and requests are working, but anecdotal
                                                   0     0.1     0.2       0.3    0.4     0.5    0.6   0.7         0.8    0.9      1
                                                                                                                                        evidence is positive. We are working on a simula-
                                                                              Fraction of nodes removed                                 tion and visualization suite to enable more rigor-
                                                                                                                                        ous tests of the protocol and routing algorithm.
Figure 5. Connectivity under random failure and targeted attack.The                                                                     More realistic simulation and formal modeling are
network falls apart quickly when the well-connected nodes are tar-                                                                      needed to explore the effects of nodes joining and
geted first.                                                                                                                             leaving, variations in node capacity and band-
                                                                                                                                        width, and larger network sizes.
                                                                                                                                           We still need to develop search mechanisms and
                                                               can see that the median path length scales sub-                          provide more protection against denial-of-service
                                                               linearly with network size as N 0.28, which agrees                       attacks that flood the system with junk data.
                                                               with recent results in mathematical modeling of                          Although the eviction mechanism works to elimi-
                                                               peer-to-peer networks.7 By extrapolation, it                             nate files that are never requested, important files
                                                               appears that Freenet should be capable of scaling                        could be pushed out if it did not act quickly
                                                               to one million nodes with a median path length                           enough under attack. On the other hand, reducing
                                                               of just 30.                                                              the priority of new data could result in files being
                                                                                                                                        deleted before they have had a chance to be
                                                               Fault Tolerance                                                          requested. We are exploring various modifications
                                                               After repeating the previous training procedure to                       to the caching policy, such as caching less aggres-
                                                               10,000 nodes, we progressively removed random                            sively farther down the data holder pointer tree, to
                                                               nodes from the network to simulate node failures.                        balance these considerations.

48                                              JANUARY • FEBRUARY 2002                                                            IEEE INTERNET COMPUTING

Acknowledgments                                                    Ian Clarke is vice president and chief technology officer of
The third author thanks the Marshall Aid Commemoration                 Uprizer Inc. He received a BS with honors in computer sci-
Commission for their support. This material is partly based on         ence and artificial intelligence from the University of Edin-
work supported under a U.S. National Science Foundation                burgh, Scotland. He is the original architect and coordina-
graduate research fellowship.                                          tor of the Freenet Project.

References                                                         Theodore W. Hong is a PhD student at Imperial College, London.
 1. J. Rosen, The Unwanted Gaze: The Destruction of Privacy            His research interests include the dynamics of complex net-
    in America, Vintage Books, New York, 2001.                         works, multiagent systems and game theory, and grammat-
 2. I. Clarke, et al., “Freenet: A Distributed Anonymous Infor-        ical inference for information extraction. He has appeared
    mation Storage and Retrieval System,” in Designing Pri-            as a BBC commentator on digital rights issues. He received
    vacy Enhancing Technologies, Lecture Notes in Computer             an AB in chemistry and physics and mathematics from Har-
    Science 2009, H. Federrath, ed., Springer-Verlag, Berlin,          vard, an MSc in microwaves and optoelectronics from Uni-
    2001, pp. 46-66.                                                   versity College, London, and was a 1995 Marshall scholar.
 3. D.L. Chaum, “Untraceable Electronic Mail, Return Address-
    es, and Digital Pseudonyms,” Comm. ACM, vol. 24, no. 2,        Scott G. Miller is a senior software engineer for Uprizer. He
    1981, pp. 84-90.                                                   received a BS with honors in computer science from Indi-
 4. T. Hong, “Performance,” in Peer-to-Peer: Harnessing the            ana University. His research interests include cryptography
    Power of Disruptive Technologies, A. Oram, ed., O’Reilly           and self-organizing systems, which converged in the
    and Assoc., Sebastopol, Calif., 2001, pp. 203-241.                 Freenet Project.
 5. D. Watts and S. Strogatz, “Collective Dynamics of ‘Small-
    World’ Networks,” Nature, no. 393, June 1998, pp. 440-442.     Oskar Sandberg is a core programmer who has been with
 6. R. Albert, H. Jeong, and A. Barabási, “Error and Attack Tol-      Freenet since 1999. He is on the board of Freenet Project,
    erance of Complex Networks,” Nature, no. 406, July 2000,          Inc. Sandberg is in the final year of his master’s degree in
    pp. 378-381.                                                      mathematics at the University of Stockholm, Sweden.
 7. L.A. Adamic et al., “Search in Power-Law Networks,” Phys-
    ical Rev. E, vol. 64, no. 4, 2001, article no. 046135.         Readers can contact Clarke at

                        SET                                         Posix
                                                                                    gigabit Ethernet
                     INDUSTRY                                            enhanced parallel ports
                    STANDARDS                                      wireless token rings
              Computer Society members work together to define standards like
                      IEEE 1003, 1394, 802, 1284, and many more.

IEEE INTERNET COMPUTING                                                             JANUARY • FEBRUARY 2002   49

Shared By: