implemented
Document Sample


Peer-to-Peer Networking
Protecting Free
Expression Online
with Freenet
Freenet uses a decentralized P2P architecture to create an
uncensorable and secure global information storage system.
T
Ian Clarke he growth of censorship and ero- version of the software is currently
and Scott G. Miller sion of privacy on the Internet available under open source at http://
Uprizer increasingly threatens freedom of www.freenetproject.org/.
expression in the digital age. Personal In simulations of up to 200,000 nodes,
Theodore W. Hong information flows are becoming subject Freenet has proved scalable and fault tol-
Imperial College of Science, to pervasive monitoring and surveillance, erant. It operates as a self-organizing P2P
Technology, and Medicine and various state and corporate actors are network that pools unused disk space
trying to block access to controversial across potentially hundreds of thousands
Oskar Sandberg information and even destroy certain of desktop computers to create a collab-
and Brandon Wiley materials altogether. Recent incidents orative virtual file system. To increase
Freenet Project Inc. such as the publication of Monica Lewin- network robustness and eliminate single
sky’s deleted personal e-mails in a U.S. points of failure, Freenet employs a com-
congressional report further point to an pletely decentralized architecture. Given
unprecedented level of intrusion into pri- that the P2P environment is inherently
vate life.1 These trends cause concern not untrustworthy and unreliable, we must
only to whistleblowers and political dis- assume that participants could operate
sidents, but to anyone disturbed by the maliciously or fail without warning at
thought of others reading their e-mail or any time. Therefore, Freenet implements
following their Web activities. strategies to protect data integrity and
Fortunately, concurrent advances in prevent privacy leaks in the former
the power of personal computers have instance, and provide for graceful degra-
made it possible to develop peer-to-peer dation and redundant data availability in
technologies to respond to these chal- the latter. The system is also designed to
lenges. Our project, Freenet, is a distrib- adapt to usage patterns, automatically
uted information storage system replicating and deleting files to make the
designed to address information priva- most effective use of available storage in
cy and survivability concerns.2 A beta response to demand.
40 JANUARY • FEBRUARY 2002 http://computer.org/internet/ 1089 -7801/02/$17.00 ©2002 IEEE IEEE INTERNET COMPUTING
Freenet
Design Motivation Maintaining privacy for creating and retrieving
As documented by Human Rights Watch files means little without also protecting the files
(http://www.hrw.org/advocacy/internet/) and the themselves — in particular, keeping their holders
Global Internet Liberty Campaign (http://www. hidden from attack. We have thus made it hard to
gilc.org/), governments around the world have discover exactly which computers store which
undertaken efforts to force Internet service files. Together with redundant replication of data,
providers to block access to content deemed holder privacy makes it extremely difficult for
unsuitable or subversive, or to make them liable censors to block or destroy files on the network.
for such material hosted on their servers. The Elec- Freenet does not, however, explicitly try to
tronic Privacy Information Center (http://www. guarantee permanent data storage. Because disk
epic.org/) has also raised privacy and civil liber- space is finite, a tradeoff exists between publish-
ties questions about developments like the Feder- ing new documents and preserving old ones. Many
al Bureau of Investigation’s Carnivore electronic systems solve this problem by requiring payment
monitoring system and the European Union’s new (in disk space or money, for example), but we
“Convention on Cybercrime,” which gives author- would rather encourage publishing than keep out
ities broad powers to intercept and record digital authors who can’t run peer nodes themselves or
communications. are too poor to pay for storage. To keep junk doc-
Though seemingly separate, the prevention of uments from filling all available space or overwrit
censorship and the maintenance of privacy are both ing existing data, we
fundamental to free expression in a potentially hos- implement a proba-
tile world. Preserving the availability of controver- bilistic storage policy. We must assume that
sial information is only half the problem; individu- We hope, however, that
als can often be subject to adverse personal Freenet will attract suf- participants could
consequences for writing or reading such informa- ficient resources from
tion and might need to conceal their activity in order participants to preserve operate maliciously
to protect themselves. Indeed, the U.S. Supreme most files indefinitely.
Court, among others, has long recognized the impor- or fail without warning.
tant role of anonymous speech in political dissent. Freenet
A common objection to mechanisms for secure Architecture
communication is that criminals might use them Freenet participants each run a node that provides
to evade law enforcement. Freenet is not particu- the network some storage space. To add a new file,
larly attractive for such purposes, as it is designed a user sends the network an insert message con-
to broadcast content to the world — not so useful taining the file and its assigned location-indepen-
for secret criminal plots. In any case, however, dent globally unique identifier (GUID), which
anonymous electronic communication is simply a causes the file to be stored on some set of nodes.
tool, like payphones or postal mail, to be used for During a file’s lifetime, it might migrate to or be
good or bad. A terrorist might use it to plan an replicated on other nodes. To retrieve a file, a user
attack, or an informant could use it to turn the ter- sends out a request message containing the GUID
rorist in to the authorities. Most importantly, the key. When the request reaches one of the nodes
freedom to communicate is a fundamental value where the file is stored, that node passes the data
in a democratic society. There is no way to deny it back to the request’s originator.
to the “bad guys” without also denying freedom to
the “good guys” — civil rights activists, minority GUID Keys
religious groups, or ordinary citizens who simply Freenet GUID keys are calculated using SHA-1
wish to keep their affairs private. secure hashes. The network employs two main
In designing Freenet, we focused on types of keys: content-hash keys, used for prima-
ry data storage, and signed-subspace keys, intend-
I privacy for information producers, consumers, ed for higher-level human use. The two are anal-
and holders; ogous to inodes and filenames in a conventional
I resistance to information censorship; file system.
I high availability and reliability through decen-
tralization; and Content-hash keys. The content-hash key (CHK) is
I efficient, scalable, and adaptive storage and the low-level data-storage key and is generated by
routing. hashing the contents of the file to be stored. This
IEEE INTERNET COMPUTING http://computer.org/internet/ JANUARY • FEBRUARY 2002 41
Peer-to-Peer Networking
Related Work in P2P
The best-known systems similar to Freenet such a service. Free Haven is an Eternity-like tage to these systems is that they can pro-
are Napster (http://www.napster.com/) and anonymous P2P publication system that uses vide strong guarantees that data will be
Gnutella (http://gnutella.wego.com/), which trust mechanisms and file trading to enforce located within certain time bounds (gener-
both implement large-scale pooling of disk server accountability and user anonymity.2 ally logarithmic) if it exists.Thus, they can
space among individual users. The major Unfortunately,it can take a very long time — provide better handling of issues like stor-
difference is that whereas Freenet provides even days — to retrieve files from it. age management.
a file-storage service, these systems pro- The main disadvantage of these systems
vide a file-sharing service.That is, partici- Security Issues relative to Freenet is that they are more dif-
pants make files available to others but do Several recently developed P2P file-storage ficult to secure against attack. It is easier for
not push files to other nodes for storage. systems focus on efficient data location a malicious node to manipulate its identity
This architecture means that data is not rather than privacy and security against to gain responsibility for a particular piece
persistent in the network; rather, files are malicious participants. Systems such as of data and suppress it. Links and routing
available only when their originators (or OceanStore,3 Cooperative File System are also more visible and deterministically
subsequent requesters) are online.Anoth- (CFS),4 and PAST5 are all based on routing structured, making it easier to trace mes-
er difference is that neither system models in which each node is assigned a sages and harder to route around malicious
attempts to provide anonymity. Gnutella is fixed identity and maintains some knowl- nodes that sabotage requests (for example,
also extremely inefficient, broadcasting edge of nodes whose identities vary in by pretending data could not be found).
thousands of messages per request. specified ways from its own.These systems PAST, as currently constituted, also requires
Freenet more closely resembles the Eter- deterministically place data on nodes that users to trust external smart cards.
nity service, which was described in a pro- most closely match the data’s globally
posal for a highly survivable network for per- unique identifier (GUID).A user can thus Privacy Issues
manently and anonymously archiving locate data by progressively visiting nodes Systems focusing on privacy for informa-
information.1 However, the proposal lacked whose identities match more and more tion consumers include browser proxy ser-
specifics on how to efficiently implement bits of the desired GUID.The main advan- continued on p. 43
process gives every file a unique absolute identi- “keyring”) and the descriptive string, from which
fier (SHA-1 collisions are considered nearly impos- you can recreate the SSK. Adding or updating a
sible) that can be verified quickly. Unlike with file, on the other hand, requires the private key in
URLs, you can be certain that a CHK reference will order to generate a valid signature. SSKs thus
point to the exact file intended. CHKs also permit facilitate trust by guaranteeing that the same pseu-
identical copies of a file inserted by different peo- donymous person created all files in the subspace,
ple to be automatically coalesced because every even though the subspace is not tied to a real-
user will calculate the same key for the file. world identity. For example, you can use SSKs to
send out a newsletter, to publish a Web site, or
Signed-subspace keys. The signed-subspace key (operated in reverse) to receive e-mail.
(SSK) sets up a personal namespace that anyone Typically, SSKs are used to store indirect files
can read but only its owner can write to. You could containing pointers to CHKs rather than to store
create a subspace for an archive on the Vietnam data files directly. Indirect files combine the human
War, for example, by first generating a random readability and publisher authentication of SSKs
public-private key pair to identify it. To add a file with the fast verification of CHKs. They also allow
you first choose a short text description, such as data to be updated while preserving referential
politics/us/pentagon-papers. You would then integrity. To perform an update, the data’s owner
calculate the file’s SSK by hashing the public half first inserts a new version of the data, which will
of the subspace key and the descriptive string get a new CHK because the file contents are dif-
independently before concatenating them and ferent. The owner then updates the SSK to point to
hashing again. Signing the file with the private the new version. The new version will be available
half of the key provides an integrity check as every by the original SSK, and the old version will
node that handles a signed-subspace file verifies remain accessible by the old CHK. Indirect files can
its signature before accepting it. also be used to split large files into multiple pieces
To retrieve a file from a subspace, you need only by inserting each part under a separate CHK and
the subspace’s public key (perhaps stored on your creating an indirect file that points to all the parts.
42 JANUARY • FEBRUARY 2002 http://computer.org/internet/ IEEE INTERNET COMPUTING
Freenet
Related Work in P2P (cont.)
continued from p. 42 tacted. Neither system protects informa- 3. S. Rhea et al., “Maintenance-Free Global Data
vices such as the Anonymizer (http://www. tion producers or provides redundant Storage,” IEEE Internet Computing, vol. 5, no. 5,
anonymizer.com/) and SafeWeb/Triangle information storage. Publius8 enhances Sep./Oct. 2001, pp. 40-49.
Boy (http://www.safeweb.com/). Both pro- robustness and protects producer an- 4. F. Dabek et al.,“Wide-Area Cooperative Storage
vide anonymity by proxying requests for onymity by distributing files as redundant with CFS,” Proc. 18th ACM Symp. Operating System
Web content on the user’s behalf, although partial shares among many holders; how- Principles (SOSP 2001), ACM Press, New York,
users are vulnerable to logging by the ser- ever, because the identity of the holders is 2001.
vices themselves. Crowds6 improves an- not anonymized, an adversary could still 5. A. Rowstron and P. Druschel, “Storage Manage-
onymity over simple proxying through a destroy information by attacking a sufficient ment and Caching in PAST, a Large-Scale, Persis-
request-chaining technique similar to the number of shares. None of these systems tent Peer-to-Peer Storage Utility,” Proc. 18th ACM
one we use. None of these systems direct- protects information consumers, although Symp. Operating System Principles (SOSP 2001),
ly stores information; they only provide Rewebber also operates a separate brows- ACM Press, New York, 2001.
anonymized access to information available er proxy service. 6. M.K. Reiter and A.D. Rubin, “Anonymous Web
on the Web. Transactions with Crowds,” Comm. ACM, vol. 42,
On the producer-holder side, the References no. 2, 1999, pp. 32-38.
Rewebber (http://www.rewebber.de/) pro- 1. R.J.Anderson,“The Eternity Service,” Proc. 1st Int’l 7. I. Goldberg and D.Wagner,“TAZ Servers and the
vides some privacy for information holders Conf. Theory and Applications of Cryptology, CTU Rewebber Network: Enabling Anonymous Publish-
with an encrypted URL service that is the Publishing House, Prague, Czech Republic,1996, ing on the World Wide Web,” First Monday, vol. 3,
inverse of a browser proxy, but is similarly pp. 242-252. no. 4, 1998; available at http://www.firstmonday.dk/
vulnerable to logging by the service oper- 2. R. Dingledine, M.J. Freedman, and D. Molnar, “The issues/issue3_4/goldberg/.
ator. TAZ (temporary anonymous zone) Free Haven Project: Distributed Anonymous Stor- 8. M.Waldman, A.D. Rubin, and L.F. Cranor,“Publius: A
servers7 extend this idea with chains of age Service,” Designing Privacy Enhancing Technologies, Robust,Tamper-Evident, Censorship-Resistant,Web
nested encrypted URLs that point to suc- Lecture Notes in Computer Science 2009, Springer- Publishing System,” Proc. 9th Usenix Security Symp.,
cessive Rewebber-like servers to be con- Verlag, Berlin, H. Federrath, ed., 2001, pp. 67-95. Usenix Assoc., Berkeley, Calif., 2000, pp. 59-72.
Finally, you can use indirect files to create hierar- encrypted, until the message finally reaches its
chical namespaces from directory files that point recipient.
to other files and directories. Because each node in the chain knows only
SSKs can also be used to implement an alterna- about its immediate neighbors, the end points
tive domain name system for nodes that change could be anywhere among the network’s hundreds
address frequently. Each such node would have its of thousands of nodes, which are continually
own subspace, and you could contact it by look- exchanging indecipherable messages. Not even
ing up its public key — its address-resolution key the node immediately after the sender can tell
— to retrieve the current address. whether its predecessor was the message’s origi-
nator or was merely forwarding a message from
Messaging and Privacy another node. Similarly, the node immediately
Freenet was designed from the beginning under the before the receiver can’t tell whether its successor
assumption of hostile attack from both inside and is the true recipient or will continue to forward it.
out. Therefore, it intentionally makes it difficult for This arrangement is intended to protect not only
nodes to direct data toward themselves and keeps information producers and consumers (at the
its routing topology dynamic and concealed. beginning of chains), but also information holders
Unfortunately, these considerations have had the (at the end of chains). By protecting the latter, we
side effect of hampering changes that might can prevent an adversary from destroying a file
improve Freenet’s routing characteristics. To date, by attacking all of its holders. Of course, ensuring
we have not discovered a way to guarantee better privacy is not enough; queries must be able to
data locatability without compromising security. locate data as well.
Privacy in Freenet is maintained using a varia-
tion of Chaum’s mix-net scheme for anonymous Routing
communication.3 Rather than move directly from Routing queries to data is the most important
sender to recipient, messages travel through node- element of the Freenet system. The simplest rout-
to-node chains, in which each link is individually ing method, used by services like Napster, is to
IEEE INTERNET COMPUTING http://computer.org/internet/ JANUARY • FEBRUARY 2002 43
Peer-to-Peer Networking
c = Data request
2 3 = Data reply cessful, each node in the chain passes the file back
Requester upstream and creates a new entry in its routing
a 1 b = Request failed
12 table associating the data holder with the request-
ed key. Depending on its distance from the holder,
d Data holder each node might also cache a copy locally.
4
7 11 To conceal the identity of the data holder, nodes
6 10
9
will occasionally alter reply messages, setting the
holder tags to point to themselves before passing
5 e them back up the chain. Later requests will still
f 8
locate the data because the node retains the true
Figure 1.Typical request sequence.The request moves through the data holder’s identity in its own routing table and
network from node to node, backing out of a dead-end (step 3) and forwards queries to the correct holder. Routing
a loop (step 7) before locating the desired file. tables are never revealed to other nodes.
To limit resource usage, the requester gives each
query a time-to-live limit that is decremented at
maintain a central index of files, so that users each node. If the TTL expires, the query fails,
can send requests directly to information hold- although the user can try again with a higher TTL
ers. Unfortunately, centralization creates a sin- (up to some maximum). Because the TTL can give
gle point of failure that is easy to attack. For clues about where in the chain the requester is,
example, if you were trying to phone Michael Freenet offers the option of enhancing security by
Jordan, the simplest way to get his number adding an initial mix-net route before normal
would ordinarily be to call directory assistance. routing. This effectively repositions the start of the
However, because directory assistance is central- chain away from the requester.
ized, your access can be easily blocked if Jordan If a node sends a query to a recipient that is
or someone else decides to remove his directory already in the chain, the message is bounced back
entry, or if the service goes down. and the node tries to use the next-closest key
Systems like Gnutella broadcast queries to every instead. If a node runs out of candidates to try, it
connected node within some radius. Using this reports failure back to its predecessor in the chain,
method, you would ask all of your friends if any which then tries its second choice, and so on.
of them knew Jordan’s number, get them to ask Figure 1 depicts a typical request sequence. The
their friends, and so on. Within a few steps, thou- user initiates a request at node A and forwards the
sands of people could be looking for his number. request to B, which forwards it to C. Node C is
Although this process would eventually find your unable to contact any other nodes and returns a
answer, it is clearly wasteful and unscalable. “request failed” message to B. Node B then tries
Freenet avoids both problems by using a its second choice, E, which forwards the request
steepest-ascent hill-climbing search: Each node to F. Node F forwards the request to B, which
forwards queries to the node that it thinks is detects a loop and bounces the message back.
closest to the target. You might start searching Unable to contact any additional nodes, node F
for Jordan by asking a friend who once played backtracks one step to E, which forwards the
college basketball, for example, who might pass request to its second choice, D, and locates the
your request on to a former coach, who could file. D returns the file via E and B back to A,
pass it to a talent scout, who might pass it to Jor- which sends it to the user. Along the way, E, B,
dan’s agent, who could put you in touch with the and A might also cache the file.
man himself. With this approach, the request homes in closer
with each hop until the key is found. A subsequent
Requesting files. Every node maintains a routing query for this key will tend to approach the first
table that lists the addresses of other nodes and the request’s path, and a locally cached copy can sat-
GUID keys it thinks they hold. When a node isfy the query after the two paths converge. Sub-
receives a query, it first checks its own store, and if sequent queries for similar keys will also jump over
it finds the file, returns it with a tag identifying intermediate nodes to one that has previously sup-
itself as the data holder. Otherwise, the node for- plied similar data. Nodes that reliably answer
wards the request to the node in its table with the queries will be added to more routing tables, and
closest key to the one requested. That node then hence, will be contacted more often than nodes
checks its store, and so on. If the request is suc- that do not.
44 JANUARY • FEBRUARY 2002 http://computer.org/internet/ IEEE INTERNET COMPUTING
Freenet
Inserting files. An insert message follows the Network Evolution
same path that a request for the same key would The network evolves over time as new nodes join
take, sets the routing table entries in the same and existing nodes create new connections after
way, and stores the file on the same nodes. Thus, handling queries. As more requests are handled,
new files are placed where queries would look local knowledge about other nodes in the network
for them. improves, and routes adapt to become more accu-
To insert a file, a user assigns it a GUID key and rate without requiring global directories.
sends an insert message to the user’s own node
containing the new key with a TTL value that rep- Adding Nodes
resents the number of copies to store. Upon receiv- To join the network, a new node first generates a
ing an insert, a node checks its data store to see if public-private key pair for itself. This pair serves
the key already exists. If so, the insert fails — either to logically identify the node and is used to sign a
because the file is already in the network (for physical address reference. Note that public keys
CHKs) or the user has already inserted another file are not certified. We don’t need to link them to
with the same description (for SSKs). In the latter real-world identities because the node’s public key
case, the user should choose a different descrip- is its identity, even if it changes physical address-
tion or perform an update rather than an insert. es. Certification might be useful in the future for
(Note that we have not yet implemented updates deciding whether to trust a new node, but for now
because we are still working on a mechanism to Freenet uses no trust
ensure that all old copies get replaced.) mechanism.
If the key does not already exist in the node’s Next, the node Nodes’ routing tables
data store, the node looks up the closest key and sends an announce-
forwards the message to the corresponding node ment message includ- should specialize
as it would for a query. If the TTL expires with- ing the public key and
out collision, the final node returns an “all clear” physical address to an in handling clusters
message. The user then sends the data down the existing node, located
path established by the initial insert message. through some out-of- of similar keys.
Each node along the path verifies the data band means such as
against its GUID, stores it, and creates a routing personal communica-
table entry that lists the data holder as the final tion or lists of nodes posted on the Web, with a
node in this chain. As with requests, if the insert user-specified TTL. The receiving node notes the
encounters a loop or a dead end, it backtracks to new node’s identifying information and forwards
the second-nearest key, then the third-nearest, the announcement to another node chosen ran-
and so on, until it succeeds. domly from its routing table. The announcement
continues to propagate until its TTL runs out. At
Data Encryption that point, the nodes in the chain collectively
For political or legal reasons, node operators assign the new node a random GUID in the key-
might wish to remain ignorant of the contents of space using a cryptographic protocol for shared
their data stores. To this end, we encourage pub- random number generation that prevents any par-
lishers to encrypt all data before insertion. The ticipant from biasing the result. This procedure
network proper knows nothing about this level assigns the new node responsibility for a region of
of encryption because it just ships already keyspace that all participants agree on while guar-
encrypted bits. anteeing that a malicious node cannot influence
Data encryption keys are not used in routing or the assignment for a specific key that it might want
included in network messages. Inserters distribute to attack.
them directly to end users at the same time as the
corresponding GUIDs. Thus, node operators can- Training Routes
not read their own files, but users can decrypt As more requests are processed, the network’s
them after retrieval. Node operators cannot gain routing should become better trained. Nodes’ rout-
any information by looking at GUIDs, either, ing tables should specialize in handling clusters of
because the hashes used to generate them scramble similar keys because each node will mostly receive
any identifying characteristics. From a node oper- requests for keys that are similar to the keys it is
ator’s point of view, the data store consists only of associated with in other nodes’ routing tables.
random GUIDs attached to opaque data. When those requests succeed, the node learns
IEEE INTERNET COMPUTING http://computer.org/internet/ JANUARY • FEBRUARY 2002 45
Peer-to-Peer Networking
about previously unknown nodes that can supply work for relevant keys. This is similar to the prob-
such keys and creates new routing entries for lem of searching the Web, and similar solutions are
them. As the node gains more experience in han- possible: Freenet can be spidered, or individuals
dling queries for those keys, it will successfully can publish lists of bookmarks. However, these
answer them more often and, in a positive feed- approaches are not entirely satisfactory in terms
back loop, get asked about them more often. of Freenet’s design goals.
Nodes’ data stores should also specialize in One simple approach for a true Freenet search
storing clusters of files with similar keys. Because would be to create a special public subspace for
inserts follow the same paths as requests, similar indirect keyword files. When authors insert files,
keys tend to cluster in the nodes along those they could also insert several indirect files corre-
paths. Nodes should similarly cluster files cached sponding to search keywords for the original file.
after requests because most requests will be for The “Pentagon Papers” file might have indirect files
similar keys. named keyword:politics and keyword:united-
Taken together, the twin effects of clustering states pointing to it, for example.
in routing tables and data stores should improve The system would allow multiple keyword files
the effectiveness of future queries in a self-rein- with the same key to coexist (unlike with normal
forcing cycle. While we do not yet have a good files), and requests for such keys could return mul-
mathematical model to analyze the training and tiple matches. Thus, a search for “politics” might
convergence of the return a pointer to the Tiananmen Papers as well
Freenet algorithm, as one to the Pentagon Papers. Managing a large
Well-known nodes the simulations number of indirect files for common keywords
described later show would be difficult, however, because all the files
tend to see more that the network with the same name would be attracted to the
can, in practice, same nodes. A more sophisticated approach might
requests and become locate files quickly use some type of distributed search over detailed
— with a median metadata descriptors inserted along with the orig-
even better connected. path length of just 8 inal files, but we have not yet devised a way to
hops in a 10,000- route such a search efficiently.
node network.
Managing Storage
Key Clustering To encourage participation, Freenet does not
Because GUID keys are derived from hashes, the require payment for inserts or impose restrictions
closeness of keys in a data store is unrelated to on the amount of data that publishers can insert.
the corresponding files’ contents. This lack of Given finite disk space, however, the system must
semantic closeness is unimportant, however, sometimes decide which files to keep. It currently
because the routing algorithm is based on the prioritizes space allocation by popularity, as mea-
locations of particular keys, rather than particu- sured by the frequency of requests per file. Each
lar topics. node orders the files in its data store by time of last
Suppose, for example, a descriptive string such request, and when a new file arrives that cannot
as politics/us/pentagon-papers yields the key fit in the space available, the node deletes the least
AF5EC2. Requests for this file could be satisfied by recently requested files until there is room.
creating clusters containing the keys AF5EC1, Because routing table entries are smaller, they
AF5EC2, and AF5EC3, rather than clusters con- can be kept around longer than files. Evicted files
taining works about U.S. politics. In fact, hashes don’t necessarily disappear right away because the
are useful because they ensure that similar works node can respond to a later request for the file
will be scattered throughout the network, lessen- using its routing table to contact the original data
ing the chances that a single node’s failure will holder, which might be able to supply another
make an entire category of files unavailable. Sim- copy. Why would the original holder be more like-
ilarly, the contents of any given subspace will be ly to have the file? Freenet’s data holder pointers
scattered across different nodes, which increases have a treelike structure. Nodes at the leaves
robustness. might see only a few local requests for a file, but
those higher up the tree receive requests from a
Searching larger part of the network, which makes their
One open issue is how users can search the net- copies more popular.
46 JANUARY • FEBRUARY 2002 http://computer.org/internet/ IEEE INTERNET COMPUTING
Freenet
0.1
File distribution is therefore determined by
two competing forces: tree growth and pruning.
The query-routing mechanism automatically cre-
ates more copies in an area of the network where
0.01
Fraction of nodes
a file is requested, and the tree grows in that
direction. This improves response time and pre-
vents overloading when the popularity of a file
increases suddenly. Files that go unrequested in
another part of the network are subject to dele- 0.001
tion. As that part of the tree shrinks, space is
freed up for other files. The net effect is that the
number and location of copies adjust to the
demand for each file.
0.0001
10 100 1,000
Performance Analysis Number of links
We have tested Freenet’s performance using sim- Figure 2. Degree distribution among Freenet nodes.The network
ulations. We have described more extended results shows a close fit to a power-law distribution.
elsewhere,4 but we will summarize the most impor-
tant results here. Freenet demonstrates good scal-
ability and fault-tolerance characteristics that can First quartile
be explained in terms of a small-world network 100 Median
Third quartile
model.5 Small-world networks are characterized by
Request path length (hops)
a power-law distribution of graph degree (here, the
number of routing table entries) of the general
form p(x) ∼ x-t, where t is a constant, x is the graph
degree, and p(x) is the probability that a node has
degree x. In such a distribution, the majority of 10
nodes has relatively few local connections to other
nodes, but a significant small number of nodes
have large wide-ranging sets of connections. Even
in very large networks, the small-world topology
enables efficient short paths because these well-
connected nodes provide shortcuts. 1
Figure 2 shows the graph degree distribution in 100 1,000 10,000 100,000 1e+06
Network size (nodes)
a simulation of a 10,000-node trained network. The
distribution closely approximates a power law with Figure 3. Request path length versus network size.The median path
t = 1.5, except for an outlier resulting from the max- length in the network scales as N 0.28.
imum routing table size (250 in this simulation).
This is not surprising, as power-law distributions
tend to arise naturally when networks grow by pref- ated files to random nodes in the network, inter-
erential attachment (that is, new nodes prefer to spersed with random requests for files that had
connect to nodes that already have many links).6 The already been inserted (all with TTL = 20). After
new-node announcement protocol initially creates every five inserts and requests, we created a new
a preferential attachment effect because following node, which announced itself to a random exist-
random links gives a higher probability of arriving ing node with TTL = 10. We measured the net-
at nodes that have more links. During normal oper- work’s performance after every hundred inserts
ation, the effect continues because well-known and requests by issuing a set of test requests for
nodes tend to see more requests and become even previously inserted files and recording the result-
better connected (“the rich get richer”). ing path length distribution (the number of hops
actually required to find the data). This continued
Scalability until the network reached 200,000 nodes.
To test Freenet’s scalability, we created a simulat- Figure 3 shows the evolution of the first, sec-
ed network of 20 nodes initially connected in a ond, and third quartiles of the request path length
ring topology. We sent inserts of randomly gener- versus network size, averaged over 10 trials. We
IEEE INTERNET COMPUTING http://computer.org/internet/ JANUARY • FEBRUARY 2002 47
Peer-to-Peer Networking
700
First quartile Figure 4 shows the resulting evolution of the
600 Median
Third quartile request path length, averaged over 10 trials, which
shows that the network is surprisingly robust
Request path length (hops)
500 against quite large failures. The median path
length remained below 20 even when up to 30 per-
400 cent of nodes failed. (Note that requests were
capped at 500 hops before giving up.)
300 The power-law distribution gives small-world
networks a high degree of fault tolerance6 because
200 random failures are most likely to eliminate nodes
from the poorly connected majority. Routing per-
100 formance is noticeably affected only after there are
enough failures to knock out a significant number
of well-connected nodes. A small-world network
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Fraction of nodes failing falls apart much more quickly, however, if the
well-connected nodes are targeted first. This is evi-
Figure 4. Request path length under random failure. Performance dent in Figure 5, which shows the size of the
remained reasonable even up to a 30 percent failure rate in our largest connected component in a 10,000-node
simulation. network as nodes were removed, both randomly
and in order from most connected to least con-
10,000 nected. Under random failure, the vast majority of
Random failure the network remained connected until almost the
9,000 Targeted attack
very end. Under targeted attack, the network
Size of largest connected component
8,000 underwent a “percolation transition” near 60 per-
7,000 cent removal, at which point it abruptly broke into
disconnected fragments.
6,000
5,000 Future Work
4,000 Initial beta deployment of Freenet is under way,
and users have downloaded hundreds of thousands
3,000 of copies of the software so far. The system’s
2,000 anonymous nature makes it impossible to tell
exactly how many users there are or how well
1,000
inserts and requests are working, but anecdotal
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
evidence is positive. We are working on a simula-
Fraction of nodes removed tion and visualization suite to enable more rigor-
ous tests of the protocol and routing algorithm.
Figure 5. Connectivity under random failure and targeted attack.The More realistic simulation and formal modeling are
network falls apart quickly when the well-connected nodes are tar- needed to explore the effects of nodes joining and
geted first. leaving, variations in node capacity and band-
width, and larger network sizes.
We still need to develop search mechanisms and
can see that the median path length scales sub- provide more protection against denial-of-service
linearly with network size as N 0.28, which agrees attacks that flood the system with junk data.
with recent results in mathematical modeling of Although the eviction mechanism works to elimi-
peer-to-peer networks.7 By extrapolation, it nate files that are never requested, important files
appears that Freenet should be capable of scaling could be pushed out if it did not act quickly
to one million nodes with a median path length enough under attack. On the other hand, reducing
of just 30. the priority of new data could result in files being
deleted before they have had a chance to be
Fault Tolerance requested. We are exploring various modifications
After repeating the previous training procedure to to the caching policy, such as caching less aggres-
10,000 nodes, we progressively removed random sively farther down the data holder pointer tree, to
nodes from the network to simulate node failures. balance these considerations.
48 JANUARY • FEBRUARY 2002 http://computer.org/internet/ IEEE INTERNET COMPUTING
Freenet
Acknowledgments Ian Clarke is vice president and chief technology officer of
The third author thanks the Marshall Aid Commemoration Uprizer Inc. He received a BS with honors in computer sci-
Commission for their support. This material is partly based on ence and artificial intelligence from the University of Edin-
work supported under a U.S. National Science Foundation burgh, Scotland. He is the original architect and coordina-
graduate research fellowship. tor of the Freenet Project.
References Theodore W. Hong is a PhD student at Imperial College, London.
1. J. Rosen, The Unwanted Gaze: The Destruction of Privacy His research interests include the dynamics of complex net-
in America, Vintage Books, New York, 2001. works, multiagent systems and game theory, and grammat-
2. I. Clarke, et al., “Freenet: A Distributed Anonymous Infor- ical inference for information extraction. He has appeared
mation Storage and Retrieval System,” in Designing Pri- as a BBC commentator on digital rights issues. He received
vacy Enhancing Technologies, Lecture Notes in Computer an AB in chemistry and physics and mathematics from Har-
Science 2009, H. Federrath, ed., Springer-Verlag, Berlin, vard, an MSc in microwaves and optoelectronics from Uni-
2001, pp. 46-66. versity College, London, and was a 1995 Marshall scholar.
3. D.L. Chaum, “Untraceable Electronic Mail, Return Address-
es, and Digital Pseudonyms,” Comm. ACM, vol. 24, no. 2, Scott G. Miller is a senior software engineer for Uprizer. He
1981, pp. 84-90. received a BS with honors in computer science from Indi-
4. T. Hong, “Performance,” in Peer-to-Peer: Harnessing the ana University. His research interests include cryptography
Power of Disruptive Technologies, A. Oram, ed., O’Reilly and self-organizing systems, which converged in the
and Assoc., Sebastopol, Calif., 2001, pp. 203-241. Freenet Project.
5. D. Watts and S. Strogatz, “Collective Dynamics of ‘Small-
World’ Networks,” Nature, no. 393, June 1998, pp. 440-442. Oskar Sandberg is a core programmer who has been with
6. R. Albert, H. Jeong, and A. Barabási, “Error and Attack Tol- Freenet since 1999. He is on the board of Freenet Project,
erance of Complex Networks,” Nature, no. 406, July 2000, Inc. Sandberg is in the final year of his master’s degree in
pp. 378-381. mathematics at the University of Stockholm, Sweden.
7. L.A. Adamic et al., “Search in Power-Law Networks,” Phys-
ical Rev. E, vol. 64, no. 4, 2001, article no. 046135. Readers can contact Clarke at ian@freenetproject.org.
SET Posix
gigabit Ethernet
INDUSTRY enhanced parallel ports
STANDARDS wireless token rings
networks
FireWire
Computer Society members work together to define standards like
IEEE 1003, 1394, 802, 1284, and many more.
HELP SHAPE FUTURE TECHNOLOGIES • JOIN A COMPUTER SOCIETY STANDARDS WORKING GROUP AT
computer.org/standards/
IEEE INTERNET COMPUTING http://computer.org/internet/ JANUARY • FEBRUARY 2002 49
Get documents about "