P2P is a Peer-to-Peer Network?
A network in which all nodes cooperate with each
other to provide specific functionality with minimal
Peer-to-Peer Networks No distinction between client and server.
All nodes are potential users of a service AND potential
providers of a service.
From a functional perspective, it is about sharing
Yuh-Jzer Joung storage, cycles, bandwidth, contents …etc
P2P Facts about P2P
P2P has re-invented the history of the More than 200M hosts are now connected to
music industry (Napster) and the Internet
communication(ICQ) ICQ now handles more than 100M accounts
P2P has helped cure-of-cancer, search-of- Napster hosted multi-Tera-Bytes of mp3s with
alien, crack-down-of-DES and more! thousands of users online
P2P is more than a file-swapping network SETI @Home now analyzes the Space with
potential sharing of resources: 4.7M hosts around the world
files, storage, CPUs, bandwidth, information Morpheus network has now more than 1400TB
of data and 1M users anytime!
2003/12/31 3 2003/12/31 4
Important Social Implication
Massively parallel computation
When we share with others, others share with us
Content delivery network We can actually form up something MUCH MUCH
Application services platform BIGGER than we thought in the beginning!
Network storage services… etc
Ad hoc mobile network
Arecibo Radio Telescope SETI@home
2003/12/31 5 2003/12/31 6
Napster: The P2P epic
P2P brief history
Founded in June 999 by Shawn Fanning, a 19-year-old
Northeastern University dropout June 1999 Dec. 1999 July 2001
Success due to ability to create and foster an online community
Shawn Fanning RIAA sues Napster for Napster is
Had 640000 users at any given moment in November 2000 starts Napster. copyright infringement. shut down!
More than 60 million people, including an estimated 73% of US
students, had downloaded the software and songs in less than
one year. Napster
At the peak, 1 million new users per week
Universities begin to ban Napster, due to overwhelmed CAN
bandwidth consumption Chord
Battles with RIAA Morpheus Pastry ……
RIAA sues Napster in Dec. 1999, asking $100K per song copied MojoNation Tapestry
Court rules Napster to shutdown in July 2001 …… Hypercub Distributed computing
files for bankruptcy in June 2002 ……
Technically not interesting Academic Research
2003/12/31 7 2003/12/31 8
Classification of P2P
P2P Challenge: Locating and Routing
Centralized Systems How to get to another node to find a particular
central server to assist administration piece of data, considering the following:
Napster, Seti@Home Dynamics: Node can join and leave any time
Pure Distributed Systems Scalability: There may be millions of hosts
CFS/Chord, Tapestry, Pastry
2003/12/31 9 2003/12/31 10
P2P with a centralized server
Napster-Server Peer 1 has …
Peer 2 has … Purpose:
Peer 3 has …
Client uses idle CPU cycles on ordinary PCs for massively
parallel analysis of extraterrestrial radio signals
register central SETI@home server distributes data, similar to
analyses done locally by SETI@home screen saver
I have “love.mp3”
Centralized design, with bottlenecks and vulnerability
Low scalability Arecibo Radio Telescope SETI@home
2003/12/31 11 2003/12/31 12
P2P Structure: KaZaA, Morpheus
P2P Structure: KaZaA, Morpheus
Has a centralized server that maintains user
registrations, logs users, into the systems to keep
statistics, provides downloads of client software,
and bootstraps the peer discovery process.
Two types of peers:
Supernodes (fast CPU + high bandwidth connections)
Nodes (slower CPU and/or connections)
Supernodes addresses are provided in the initial
download. They also maintain searchable indexes
and proxies search requests for users.
2003/12/31 13 2003/12/31 14
Architecture peer peer
peer On initial registration, the client may be provided
Peer 1: File 1, 2, 3
Peer 2: File 4, 5
with a list of more than one supernode.
Peer 3: File 6, 7, 8
Supernodes are “elected” by the central server –
users can decline.
Search File 4
Supernodes can come and go so links may fail over
Search Response peer 2 peer 3
File transfers use http protocol and port 1214 (the
peer 2 peer 3
File 1 Get File 4 File 4 File 6
File 2 File 5 File 7
File 3 File 8
Supernodes act as regional index server
They communicate by broadcasting
2003/12/31 15 2003/12/31 16
P2P – Consequences
P2P Survives By Legal Maneuvering
Huge bandwidth Hog March 2001, Kazaa is founded by two Dutchmen, Niklas Zennstrom
and Janus Friis in a company called Computer Empowerment
Potential for original client and/or files download to be a Trojan. The software is based upon their FastTrack P2P Stack, a proprietary
KaZaA clients come complete with a Trojan from Brilliant Digital algorithm for peer-to-peer communication
Entertainment. Kazaa licenses FastTrack to Morpheus and Grokster
3D advertising technology + node software that can be controlled Oct. 2001 MPAA and RIAA sue Kazaa, Morpheus and Grokster
by Brilliant Digital. Nov. 2001, Consumer Empowerment is sued in the Netherlands by the
Dutch music publishing body, Buma/Stemra. The court orders KaZaA
Intent is to use the massed horsepower to host and distribute to take steps to prevent its users from violating copyrights or else pay
content belonging to other companies for a fee. a heavy fine.
With the user’s permission of course – opt out basis (not opt in!). Jan. 2002, Zennstrom&Friis sell Kazaa software and website to
Sharman Networks, based in Vanuatu, an island in the Pacific, but
Content to include advertising, music, video – anything digital. operating out of Australia
Also have mentioned tapping unused cycles to do compute work. Feb. 2002, Kazaa cuts off Morpheus clients from FastTrack
April 2002, Sharman Networks agrees to let Brilliant Digital bundle
their own stealth P2P application called AltNet within KaZaA. This
network would be remotely switched on, allowing KaZaA users to trade
Brilliant Digital content throughout FastTrack
2003/12/31 17 2003/12/31 18
Industry Countermeasures (cont.)
At any time, 3 million people are using KaZaA, sharing 500 Sue end users and ISPs
million files (October 2002)
Danish Anti Pirat Gruppen issued fines up to $14,000 to
MPAA and RIAA are using 3 “countermeasures” to stop this: approximately 150 KaZaA and eDonkey users for
Sue the network operators/software creators out of uploading copyrighted material.
In the US, civil penalties are min $750 per song, with
Napster, Scour, Aimster, Audio Galaxy, Grokster/Morpheus/
criminal penalties as high as $250,000 and five years in
KaZaA, Replay TV…
prison under NETA.
Berman Bill Style “Self help”
Denial of service attacks against network
ISPs can be sued under DMCA (The Digital Millenium
Copyright Act), unless they notify offending subscribers
Insert bogus files (Universal and Overpeer)
and take down infringing material.
Falsify replies and misdirect queries
Denial of Service attacks against users MPAA sent 54,000 cease and desist letters to ISPs in 2001, should
be 100,000 in 2002. 90% result in ISPs taking action.
Overload particular nodes
2003/12/31 19 2003/12/31 20
P2P Music May Be Watching You
Fully Distributed P2P Systems
MPAA uses services like Ranger Online to: Case Study
By collecting IP addresses of uploaders and downloaders Freenet
Identify the content users are sharing
Collect evidence for notice and takedown Gnutella
By downloading file, logging time and location CAN
A project led by Ian Clarke in his 4th Year
Project at University of Edinburgh Anonymity for both producers and consumers
Philosophy of information
One should be able to publish and obtain Deniability for stores of information
information on the Internet without fear of Resistance to attempts by third parties to deny
access to information
Efficient dynamic storage and routing of
A Distributed Anonymous Information Storage
and Retrieval System
Decentralization of all network functions
2003/12/31 23 2003/12/31 24
Protocol Detail: Retrieving Data
Protocol Detail: Messages
12 Randomly-generated 64-bit
start 2 C
Data Request IP address and port number
3 Request Failed
Decremented at each hop to prevent indefinite routing
B detects a Messages will still be forwarded with some probability even
4 11 though hops-to-live value has reached 1, so as to prevent attacker
loop and so 6
replies failed 10 Data Reply from tracing the requester’s location.
7 D Depth counter
5 9 Incremented at each hop to let a replying node to set hops-
to-live high enough to reach a request.
F 8 Requestors should initialize it to a small random value to
obscure their location.
Basically a depth-first search!
2003/12/31 25 2003/12/31 26
Neighbor selection in routing
Neighbor selection in routing (cont.)
A description of the file to be requested (e.g., its file Once a copy of the requested file is found, it will be sent
name) is hashed to obtain a file key that is attached in to the requester along the search path. Each node in
the request message. the path caches a copy of the file, and creates a new
A node receiving a request for a file key checks if its own entry in its routing table associating the actual data
store has the file. If so, it returns the file. Otherwise, it source with the requested key.
forwards the request to the node in its routing table that
has the most ‘similar’ key to the requested one.
If that node cannot successfully (and recursively!) find
the requested file, then a second (again, according to
key similarity) candidate from the routing table is
chosen, and so on, until either the file is found, or a
request failed message is returned.
2003/12/31 27 2003/12/31 28
Quality of routing should improve over time Newly inserted files are selectively placed on nodes
Nodes should come to specialize in locating similar keys. already possessing files with similar keys
Nodes should become similarly specialized in storing files Insert basically follows the same depth-first like search
having similar keys. to see if there is a key collision. If so, the colliding node
Popular data will be replicated and mirrored closer to returns the pre-existing file as if a request has been
requesters. made. If no collision, then the file will be placed on each
node in the search path.
Connectivity increases as requests are processed.
The nodes will also update their routing tables, and associate
Note that files with similar hashed keys do not mean the inserter as the data source with the new key.
that they are similar in content.
For security reason, any node along the way can unilaterally
A crucial node failure cannot cause a particular subject to decide to change the insert message and claim itself or
extinguish. another arbitrarily node as the data source.
2003/12/31 29 2003/12/31 30
Newly inserted files are selectively placed on Node storage is managed as an LRU (Least
nodes already possessing files with similar Recently Used) cache.
keys. No file is eternal! A file could disappear in the
New nodes can use inserts as a supplementary network if it has not been requested for a long
means of announcing their existence to the period of time.
rest of the network. Inserted files are encrypted so that node
An attempt to supply spurious files will likely to operators can ‘claim’ innocence of possessing
simply spread the real file further, as the some controversial files.
original file is propagated back on collusion. The requester who knows the actual file name can use
that information to decrypt the encrypted file.
Vulnerable to dictionary attack
2003/12/31 31 2003/12/31 32
Freenet File Export
Hierarchical name system Consider exporting file with name “My life.mp3”
Files are identified by the hash of their filenames Compute a public/private key pair from name using a
Cannot have multiple files with the same name
File is encrypted with the hash of the public key
Global single-level namespace is not desirable, since
Goal is not to protect data – the file contents should be
malicious users can engage in “key-squatting” visible to anyone who knows the original keyword
Two - level namespace Goal is to protect site operators – if a file is stored on your
Each user has their own directory system, you have no way of decrypting its contents
File is signed with the private key
Integrity check (though not a very strong one)
2003/12/31 33 2003/12/31 34
Freenet Directories P2P
Two - level directories Node connecting to network must obtain
Users can create a signed-subspace existing node’s address though out-of-band
Akin to creating a top-level directory per user means
Subspace creation involves generating a public/private
key pair for the user Once connected, a new node message is
The user’s public key is hashed, XORed and then propagated to randomly selected, connected
rehashed with the file public key to yield the file nodes so existing nodes learn of new node’s
encryption key existence
For retrieval, you need to know the user’s
public key and the file’s original name
2003/12/31 35 2003/12/31 36
Advantages Fully Distributed P2P Networks are essentially an
Totally decentralize architecture overlay over the Internet
robust and scalable Neighbors in the overlay may not necessarily
Does not always guarantee that a file is found, even if
the file is in the network
2003/12/31 37 2003/12/31 38
Originally conceived of by Justin Frankel, 21
year old founder of Nullsoft
March 2000, Nullsoft posts Gnutella to the web
Case Study: GNUTELLA A day later AOL removes Gnutella at the
behest of Time Warner
there are multiple open source
Peer-to-Peer Storage Space Sharing System implementations
the Gnutella protocol has been widely analyzed
Gnutella was developed in the early 2000 by Ability to operate in a dynamic environment
Justin Frankel’s Nullsoft. Nullsoft was later Performance and Scalability
acquired by AOL, which has now stopped this Reliability
But, this application had already distributed as
an open source application, and so different
versions still exist.
Gnutella has become a Protocol now.
2003/12/31 41 2003/12/31 42
message broadcasting for node discovering A new node (called a servent in Gnutella)
and search requests (Call-and-Response joins a system by connecting to a known host.
protocol) Communication between servents:
forming of overlay network; connecting: join A set of descriptors used for communicating data
the “several known hosts” between servents
user data transfer: store and forward using A set of rules governing the inter-servent exchange of
2003/12/31 43 2003/12/31 44
P2P Node Join/Discovery
G-Node G-Node Ping
Ping Used to actively discover hosts on the network
Pong The response to a Ping
A new node connects to one of several known hosts.
Query The primary mechanism for searching the distributed flooding of PING/PONG messages; broadcasting range
network limited by TTL counter.
QueryHit The response to a Query short time memory of messages already seen; prevents
re-broadcasting; GUIDs to distinguish msg
To cope with the dynamic environment, a node
Push A mechanism that allows a firewalled servent to
contribute file-based data to the network
periodically PINGs its neighbors to discover other
2003/12/31 45 2003/12/31 46
G-Node unstable/loose connectivity of the servents
1) Node A asks Node B
performance management difficult
3) B forwards the request to
its neighbors. scalability: e.g. TTL=10, every node
4) These return any match-
broadcasts to six others msg;
6) B returns matching info
ing info. problem in huge networks
5) B looks up
low TTL, low search horizon
2) B keeps a record that
source of G-Node
A initiated the request
7) A may initiate Node A
search: Query/Query-Response (flooding/breadth-first search!)
download: GET/PUSH. (direct transmission)
2003/12/31 47 2003/12/31 48
P2P riding on Gnutella
P2P Map of the Gnutella Network
The Top Share As percent of the whole
1% 1,142,645 37%
5% 2,182,087 70%
10% 2,692,082 87%
20% 3,037,232 98%
25% 3,082,572 99%
Source: E. Adar, B. Huberman, Free riding on Gnutella, First Monday, Vol. 5-10, Source: http://dss.clip2.com Source: http://www.limewire.com
October 2, 2000.
2003/12/31 49 2003/12/31 50
Gnutella Host Count - 2003
Gnutella Host Count - 2001
2003/12/31 51 2003/12/31 52
Gnutella Network Analysis
Reference: Peer-to-Peer Architecture Case More than 40,000 hosts were found.
Study:Gnutella Network, Matei Ripeanu, in The number of connected components is
proceedings of IEEE 1st International relatively small
Conference on Peer-to-peer Computing. The largest connected component includes more than
Research conducted in 2000-2001. 95% of the active nodes.
The 2nd largest connected component usually has less
than 10 nodes.
The dynamic graph structure
40% of the nodes leave the network in less than 4 hrs
25% are alive for more than 24 hrs.
2003/12/31 53 2003/12/31 54
P2P Node Connectivity P2P Connectivity distribution
2003/12/31 Source: [Ripeanu2001] 55 2003/12/31 Source: [Ripeanu2001] 56
P2P Power-Law Networks P2P power-law graph
In a power-law network, the fraction of nodes found
nodes with L links is proportional to L−k, where 94
k is a network dependent constant.
most nodes have few links and a tiny number of hubs
have a large number of links. 63
Power-law networks are generally highly stable 54
and resilient, yet prone to occasional
extremely robust when facing random node failures,
but vulnerable to well-planned attacks.
The power law distribution appears in Gnutella 2
networks, in Internet routers, in the Web, in
call graphs, ecosystems, as well as in sociology.
2003/12/31 57 2003/12/31 58
AT&T Call Graph Gnutella and the bandwidth barrier
from which calls were made
# of telephone numbers
queries broadcast to every node within radius ttl
⇒ as network grows, encounter a bandwidth barrier
(dial up modems cannot keep up with query traffic,
fragmenting the network)
Clip 2 report
# of telephone numbers called Gnutella: To the Bandwidth Barrier and Beyond
2003/12/31 59 2003/12/31 60
Aiello et al. STOC ‘00
A lookup protocol (indexing system) that maps a desired key to
insert (key, value)
Content-Addressable Network value = retrieve (key)
Example: key can be a hashed value of a file name, and key be the IP
(CAN) address of the host storing the file.
A storage protocol layered on top of the lookup protocol then
takes care of storing, replicating, caching, retrieving, and
authenticating the files.
Uses virtual d-dimensional coordinate space on a d-torus to
store (key,value) pairs, and a uniform hash function to map
keys to points in the d-torus
At any instant the entire coordinate space is partitioned among
all the peers in the system; each peer “owns” one individual,
(key, value) pair of a file is stored at the peer that owns the zone
containing the corresponding point of the key.
P2P Example: 2-D CAN P2P
Routing in CAN
When host 184.108.40.206 wishes to
insert a file “love.mp3” into the y handle zone [(0000,0111), (0111,1111)) Intuitively – following the straight path through the
system, it hashes the name to obtain
a key, say (0100,1100). Cartesian space from source to destination
Then the entry (0100,1100),
Node maintains coordinate routing table that holds
220.127.116.11 ) is added to node y’s file information IP addresses and zones’ coordinate of its neighbors
in the space
index information for file
“love.mp3”, i.e., ( (0100,1100),
y Two nodes are neighbors if their coordinate spans
18.104.22.168 ) 0111
overlap along d−1 dimensions and abut along one
To find file “love.mp3”, one uses the
same hash function to find the key
CAN message contains destination coordinate.
(0100,1100); then, routes to the peer x w Node greedy forwards it to the neighbor with
handling the point/key (0100,1100),
which is y, and then obtains the IP
coordinates closest to the destination coordinate
w handle zone [(0111,000), (1111,0111))
22.214.171.124 from y.
0000 0011 0111 1111
So, inserting and finding a file becomes a routing
problem: how to find a peer handling a given point x handle zone [(0000,0000), (0011,0111))
in the coordinate space?
2003/12/31 63 2003/12/31 64
Routing in CAN
For the d-dimensional space equally
partitioned into n nodes the average routing
path is (d/4)*n(1/d)
Individual nodes maintain 2d neighbors
8 10 The path length growth proportionally to the
11 12 13 O(n(1/d))
16 Many different routes between two points
Neighbors of node 12: 8, 10, 11, 13, 16
2003/12/31 65 2003/12/31 66
P2P operations of CAN
Inserting, updating, deleting of (key,value)
pairs New node is allocated its own portion of the
Retrieving value associated with a given key coordinate space in three steps:
Adding new nodes to CAN Find a node already in the CAN – look up the CAN
domain name in DNS
Handling departing nodes
Pick zone to join to and route request to its owner
Dealing with node failures using CAN mechanisms
Split the zone between old and new node
The neighbors of the split zone must be notified so the
routing can include the new node
2003/12/31 67 2003/12/31 68
Finding a zone
Splitting a zone
The new peer X finds a node E to join
1 2 3 4 5 X then finds a peer whose zone will be 1 2 3 4 5 P splits its zone in half and
split. assigns one half to X (J),
X Randomly chooses a joining point J assuming certain ordering of the
6 7 8 9P 10 6 7 8 9 P 21 10
for load balancing dimensions, i.e. first X then Y
X sends a JOIN request message Transfer (key, value) pairs from
11 12 13 14 11 12 13 14
destined for point J (handled by P)
the half of the zone to the new
via E using CAN routing mechanisms
15 16 17 18 19 15 16 17 18 19
2003/12/31 69 2003/12/31 70
Joining the Routing
Joining the Routing: Illustration
the neighbors of the split zone must be notified so that routing can 1 2 3 4 5 1 2 3 4 5
include the new node.
O sends its routing table to N ( IPs and Zones of O’s neighbors) 6 7 8 9 10 6 7 8 9 21 10
Having O’s routing table, N may figure out its routing table.
11 12 13 14 11 12 13 14
O re-computes its routing table.
Both O’s and N’s neighbors must be informed of this
15 16 17 18 19 15 16 17 18 19
reallocation of space.
All of their neighbors update their own routing table.
Before 21 joins: After 21 joins:
9: 4,8,10,13 9: 4,8, 21 ,13
4: 3,5,9 4: 3,5,9, 21
8: 2,3,7,9,12 8: 2,3,7,9,12
10: 5,9,14 10: 5, 21 ,14
13: 9,12,14,17 13: 9,12,14,17, 21
2003/12/31 71 2003/12/31 72
P2PNode departure and recovery P2P
Normal procedure – explicit hand over of (key,value) CAN state may become inconsistent if multiple adjacent
database to one of the neighbors
nodes fail simultaneously
Node failure – immediate takeover procedure:
In such cases perform an expanding ring search for any
Failure detected as a lack of update messages
nodes residing beyond the failure region (?)
Each neighbor starts timer with proportion to the node’s
zone size If it fails initiate repair mechanism (?)
After timer expires the node extends its own zone to contain If a node holds more than one zone initiate the
the failed neighbor’s zone and sends TAKEOVER message background zone-reassignment algorithm
to all failed node’s neighbors
On receive of the TAKEOVER node cancels its timer if the
sender’s zone size is smaller than his own. Otherwise it
sends it’s own TAKEOVER message.
2003/12/31 73 2003/12/31 74
P2P Reassignment (cont.)
Case 1: when zone 2 is to be merged Case 2: when zone 9 is to be merged
Step1: Searching for sibling of 9, but fail
search for 2’s sibling --- node 3
Step2: Use DFS until two sibling leaves are reached
node 3 takes over zone 2. Step3: Merge zone 10 with zone 11 and takeover by node 11
Step4: node 11 now takeover zone x , DONE !!
1 3 8
4 6 10 3 8
5 7 11 2 4 6 10 1 9
3 4 5 6 7 10 11 X
Binary Partition tree 2
5 7 11 3 4 5 6 7 10 11
Binary Partition tree
2003/12/31 75 2003/12/31 76
Chord provides support for just one operation:
given a key, it maps the key onto a node.
MIT Chord Applications can be easily implemented on top
Cooperative File System (CFS)
A Scalable Peer-to-peer Lookup Protocol
for Internet Applications
Chord-based distributed storage P2P Block Store Layer
Block Block Block
Store Store Store
A CFS File Structure example
The root-block is identified by a public-key and signed by
CHORD CHORD CHORD corresponding private key
Other blocks are identified by cryptographic hashes of their
peer peer peer
2003/12/31 79 2003/12/31 80
Efficient: O(Log N) messages per lookup Hashing is generally used to distribute objects evenly
into a set of servers
N is the total number of servers
E.g., the liner congruential function h(x)= ax+b (mod p)
Scalable: O(Log N) state per node SHA-1
Robust: survives massive changes in When the number of servers changes (p in the above case),
membership then almost every item would be hashed to a new location
Cached objects become useless in each server when a server
is removed or introduced to the system.
mod 5 0
103 × 1
× 4 Add two new buckets (now mod 7)
2003/12/31 81 2003/12/31 6 82
P2P possible implementation
Load is balanced Some interval
Relocation is minimum
When an N th server joins/leaves the system, with high
probability only an O(1/N) fractions of the data objects 012 1
need to be relocated 103
Objects are servers are first mapped (hashed) to points in the
Then objects are actually placed into the servers that are
closest to them w.r.t. the mapped points in the interval.
E.g., D001→S0, D012→S1, D313→S3
2003/12/31 83 2003/12/31 84
P2P server 4 joins
P2P server 3 leaves
objects Some interval servers objects Some interval servers
001 0 001 0
012 1 012 1
044 4 044 4
Only D103 needs to be moved from S3 to S4. The rest remains
Only D313 and D044 need to be moved from S3 to S4.
2003/12/31 85 2003/12/31 86
Consistent Hashing in Chord
P2P An ID Ring of length 26-1
Node’s ID = SHA-1 (IP address) N1
Key’s ID = SHA-1 (object’s key/name) N56
Chord views the ID’s as k10
occupying a circular identifier space N48 Circular N14
Keys are placed at the node whose Ids are the
closest to the (ids of) the keys in the clockwise N21
successor(k): the first node clockwise from k. k24
Place object k to successor(k). k30
2003/12/31 87 2003/12/31 88
P2P Simple Lookup
P2P Scalable Lookup
N1 lookup(k54) N1
N56 N56 N8+16 N32
k54 N8 k54 N8 N8+32 N42
+32 +8 +4
N48 Circular N14
N38 k38 N38
N32 N32 k30
Lookup correct if successors are correct The ith entry in the finger table points to successor(n+2i−1 (mod 26))
Average of n/2 message exchanges
2003/12/31 89 2003/12/31 90
P2P Scalable Lookup
Finger table at N8
P2P Scalable Lookup
Finger table at N42
N8+2 N14 N42+2 N48
lookup(k54) N8+4 N14 lookup(k54) N42+4 N48
N8+8 N21 N42+8 N51
N56 N8+16 N32 N56 N42+16 N1
k54 N8 N8+32 N42 k54 N8 N42+32 N14
+32 +8 +4
N48 N14 N48 N14
Look in local finger table for the largest n s.t.
my_id < n < ket_id
If n exists, call n.loopup(key_id), else return successor(my_id)
2003/12/31 91 2003/12/31 92
P2P Scalable Lookup
Finger table at N51
lookup(k54) N51+4 N56
N51+8 N1 When a node i joins the system from any
N56 N51+16 N8 existing node j:
k54 N8 N51+32 N21 Node j finds successor(i) for i, say k
i sets its successor to k, and informs k to set its
predecessor to i.
k’s old predecessor learns the existence of i by running,
periodically, a stabilization algorithm to check if k’s
predecessor is still it.
Each node can forward a query at least halfway along the
remaining distance between the node and the target identifier.
Lookup takes O(N) steps.
2003/12/31 93 2003/12/31 94
P2P Node joins (cont.)
P2P Node Fails
N1 via N8 N8+4 N14
N56 N8+16 N32 Can be handled simply as the invert of node
N8+32 N42 joins; I.r., by running stabilization algorithm.
N48 Circular N14
k30 aggressive mechanisms requires
too many messages and updates
2003/12/31 95 2003/12/31 96
P2P Handling Failures
Use successor list NOT that simple (compared to CAN)
Each node knows r immediate successors Member joining is complicated
After failure, will know first live successor aggressive mechanisms requires too many messages and
Correct successors guarantee correct lookups
no analysis of convergence in lazy finger mechanism
Guarantee is with some probability Key management mechanism mixed between layers
Can choose r to make probability of lookup failure upper layer does insertion and handle node failures
arbitrarily small Chord transfer keys when node joins (no leave mechanism!)
Routing table grows with # of members in group
Worst case lookup can be slow
2003/12/31 97 2003/12/31 98
Filed guaranteed to be found in O(log(N)) steps
Routing table size O(log(N))
Robust, handles large number of concurrent join and leaves
Performance: routing in the overlay network can be more
Related Work: OceanStore
expensive than in the underlying network
No correlation between node ids and their locality; a query can
repeatedly jump from Taiwan to America, though both the initiator
and the node that store the item are in Taiwan!
A Global Scale Persistent Storage Utility
Partial solution: Weight neighbor nodes by Round Trip Infrastructure
when routing, choose neighbor who is closer to destination with
lowest RTT from me » reduces path latency
Ubiquitous Computing Assume 1010 people in world, say 10,000 files/person
Computing everywhere, (very conservative?), then 1014 files in total!
Connectivity everywhere If 1MB/file, then 1020 size is needed
But, are data just out there? Surely, this must be maintained cooperatively by many
OceanStore: An architecture for global-scale ISPs.
persistent storage Persistent
Geographic independence for availability, durability,
and freedom to adapt to circumstances
2003/12/31 101 2003/12/31 102
Security Untrusted Infrastructure:
Encryption for privacy, signatures for authenticity, and The OceanStore is comprised of untrusted components
Byzantine commitment for integrity Only ciphertext within the infrastructure
Robust Nomadic Data: data are allowed to flow freely
Redundancy with continuous repair and redistribution Promiscuous Caching: Data may be cached anywhere,
for long-term durability anytime
Management continuous introspective monitoring is used to discover tacit
Automatic optimization, diagnosis and repair relationships between objects.
Optimistic Concurrency via Conflict Resolution
Users pay monthly fee to access their data
2003/12/31 103 2003/12/31 104
P2P OceanStore system
Unique, location independent identifiers:
Every version of every unique entity has a permanent,
Globally Unique ID (GUID)
160 bits SHA-1 hashes
280 names before name collision
Users map from names to GUIDs via hierarchy of
OceanStore objects (ala SDSI)
Requires set of “root keys” to be acquired by user
The core of the system is composed of a multitude of
highly connected “pools”, among which data is allowed
to “flow” freely. Clients connect to one or more pools,
2003/12/31 105 2003/12/31 106
P2P Location and Routing
P2P Filter (BF)
a two-tiered approach A probabilistic algorithm to quickly test
Attenuated bloom filters membership in a large set using multiple hash
fast, probabilistic functions into a single array of bits.
Wide-scale distributed data location
slower, reliable, hierarchical
BF Key no in database
Check DB to see if Key is there Main Idea:
yes no h(x) ≠ h(y) ⇒ x ≠ y
h(x) = h(y) ⇒ x = y
2003/12/31 107 2003/12/31 108
P2P Filter Design
m = 11 (normally, m would be much much
Use an m bits vector, initialized to 0’s, for the BF. larger).
Larger m => fewer filter errors.
h = 2 (2 hash functions).
Choose h > 0 hash functions: f1(), f2(), …, fh().
f1(k) = k mod m.
When key k inserted into DB, set bits f1(k), f2(k), …,
f2(k) = (2k) mod m.
and fh(k) in the BF to 1.
f1(k), f2(k), …, fh(k) is the signature of key k. K=15 inserted
BF 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 K=17 inserted
10 9 8 7 6 5 4 3 2 1 0
2003/12/31 109 2003/12/31 110
P2P Filter Design
BF 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1
Choose m (filter size in bits).
Use as much memory as is available.
10 9 8 7 6 5 4 3 2 1 0
Pick h (number of hash functions).
DB has k = 15 and k = 17. h too small => probability of different keys having same
Search for k. signature is high.
f1(k) = 0 or f2(k) = 0 => k not in DB. h too large => filter becomes filled with ones too soon.
f1(k) = 1 and f2(k) = 1 => k may be in DB. Select the h hash functions.
Hash functions should be relatively independent.
k = 25 => not in DB
k = 6 => filter error.
2003/12/31 111 2003/12/31 112
Attenuated bloom filters
P2P probabilistic query process in
An attenuated bloom filter of depth D can be
viewed as an array of D normal Bloom filters 11010
The first Bloom filter is a record of the objects 11100
1st BF n3 Key
contained locally on the current node. Lookup at n1 11011
The ith Bloom filter is the union of all of the Bloom Key
(0,1,3) n1 n2
filters for all of the nodes a distance i through any path 00011
from the current node. 10101 11100
Query is routed along the edges whose filters
indicate the presence of the object at the
2003/12/31 113 2003/12/31 114
Routing and Location
A prototype of a decentralized, fault-tolerant, Object location:
adaptive overlay infrastructure map GUIDs to root node Ids
Network substrate of OceanStore Each object has its own hierarchy rooted at Root
Routing: Suffix-based hypercube Root responsible for storing object’s location
Similar to Plaxton, Rajamaran, Richa (SPAA97) Suffix routing from A to B
Decentralized location: At hth hop, arrive at nearest node hop(h) such that:
Virtual hierarchy per object with cached location hop(h) shares suffix with B of length h digits
Example: 5324 routes to 0729 via
5324 7349 3429 4729 0729
2003/12/31 115 2003/12/31 116
P2P Basic Plaxton Mesh
Incremental suffix-based routing
4 2 Neighbor table at (3120)4
NodeID NodeID NodeID
0x79FE 0x23FE 0x993E 0 1 2 3
NodeID 1 3 0120 1120 2120 3120
3 NodeID 0x43FE
NodeID 2 x020 x120 x220 x320
3 NodeID 1 xx00 xx10 xx20 xx30
4 4 3
2 0xF990 0 xxx0 xxx1 xxx2 xxx3
0x035E 3 NodeID
NodeID 0x04FE 4
NodeID 2 NodeID 0x13FE NodeID
3 0x555E 0xABFE 1 0x9990
1 2 3
NodeID NodeID 1 NodeID
0x73FF 0x239E 0x1290
2003/12/31 117 2003/12/31 118
Neighbor table will have holes A little bit complicated
Must be able to find unique root for every object Try to think about the problem by yourself
Tapestry’s solution: try next highest.
2003/12/31 119 2003/12/31 120
P2P vs. Copyright
Research Commercial P2p disrupts traditional distribution
.NET Groove Notions of copyright and intellectual property
Jabber Jibe need to be put in a digital-age context (and
Open Cola Endeavors
Folders Entropia new business models will need to be developed
DataSynapse and implemented)
Research Commercial Source: Burton Group
2003/12/31 121 2003/12/31 122
Content Delivery Network
Increasing Internet traffic
Congestion in the Internet.
Web Servers sometimes become overloaded due to too
many people trying to access their content over a
short period of time.
Internet 911 Attack
The 8-Second Rule
If your webpage hasn't loaded within 8 seconds, chances
are your viewers are history.
Complexity and Cost of managing and operating a
Increasing demand of rich content delivery
2003/12/31 123 2003/12/31 124
P2P is a Content Delivery Network?
P2P How does it work?
Web publishers decide on the portions of their web site they want to be
served by the CDNs.
Network of content servers deployed throughout the
Use CDNs for images or rich content.
Internet available on a subscription basis to publishers.
Most web pages: 70% objects
Web publishers use these to store their high-demand or rich CDN companies provide web content distributors with the software
content (ie, certain portions of their web site). tools to modify their HTML code.
Support for delivery of many content types (e.g, HTML, CDN (e.g., Akamai) creates new domain names for each client content
graphics, streaming media, etc.)
Brings content closer to end-users but no changes required The URL’s pointing to these objects on the publishers server are then
at end-hosts. modified so that the content can now be served from the CDN servers.
• http://www.cnn.com/image-of-the-day.gif becomes
Using multiple domain names for each client allows the CDN to further
subdivide the content into groups.
DNS sees only the requested domain name, but it can route requests for different
2003/12/31 125 2003/12/31 126
P2PAkamai with DNS hooks P2P How does it work?
www.cnn.com Akamai servers
“Akamaizes” its content. akamai.net Monitoring/Routing:
content for Some kind of probing algorithms used to monitor state of
“Akamaized” services. network - traffic conditions, load on servers, and location of
DNS server lookup
for cnn.com generate network map incorporating this information - maps
b updated frequently to ensure the most current view of the
CDN develops its own “routing tables to direct the user to
c the fastest location.”
“Akamaized” response object has inline URLs for
secondary content at a128.g.akamai.net and other
Akamai-managed DNS names. Source: Jeff Chase
2003/12/31 127 2003/12/31 128
P2P How does it work?
Delivery: Highly scalable:
Data to be served by CDNs is pre-loaded onto the servers. As the demand for a document increases, the number of
CDNs take care of migration of data to the appropriate servers serving that document also increases.
servers. Ensure that no content server is overloaded by requests.
Users retrieve modified HTML pages from the original Fault Tolerant: guarantee 100% uptime
server, with references to objects pointing to the CDN. High speed connections from content servers to the
Content is served from the best server. Internet.
2003/12/31 129 2003/12/31 130
P2P and Layer 4 Switching:
P2P and Layer 4 Switching:
What is Layer 4 switching? Switch performs Load Balancing:
Switch employs the information contained in the transport Multiple servers assigned the same virtual IP address.
header to assist in switching traffic. switch maintains information on server loads.
Layer 4 info - port numbers to identify applications (port 80 traffic load-balancing done based on specified criteria (e.g.,
for HTTP, 20/21 for FTP, etc.) least connections, round robin, etc.)
Switch keeps track of established sessions to individual Maintain session management information:
servers ensure that all packets within a session are forwarded to the same
use Destination IP address + destination port + Source IP server
address + source port for session identification Ex: eShopping sessions: 2 connections - persistent HTTP for
shopping cart and SSL for purchases within cart.
2003/12/31 131 2003/12/31 132
Akamai: Service Operator Model
P2P Theoretical Foundations
Background Consistent Hashing and Random Trees [Karger,
Founded by a group of MIT students and professors in 1998
et al., STOC 1997]
Customer is content provider
Value: quality of service
Given contents provided by clients, Akamai deploys the contents into
its worldwide network to provide some “quality of service” browsing
More than 15,000+ servers deployed in over 1,100+ networks in 66+
2003/12/31 133 2003/12/31 134
Random Trees (cont.)
Use a tree of caches to coalesce requests All requests for pages is a 4-tuple:
Balance load by using a different tree for each Requester’s Id
page and assigning tree nodes to caches via a Name of the desired page
A routing path
random hash function.
A sequence of caches that should act as the nodes in the path
The root of the tree is always the server for the requested page
All other nodes are mapped to the caches by a hash function
h: P × [1..C] → C, where
P: the set of all pages
C: the number of caches
C: the set of caches
A cache stores a copy of a requested page only after it
has seen q requests for the page.
2003/12/31 135 2003/12/31 136
Random Trees (cont.)
Random Trees in an Inconsistent World
In the above scheme, a consistent hash
Browsing: function can be used for h to cope with the
A browser picks a random leaf to root path, maps the dynamic nature of the cache servers.
nodes to machines with h, and asks the leaf node for
the page. A node can also know only about a 1/t fraction
of the cache servers without a risk (in
reasonable probability) of swamping or load
When a cache receives a request, returns a copy of the unbalancing.
page if it has it, or forwards the request to the next
node. It also increments a counter for the page and the
node it is acting as.
2003/12/31 137 2003/12/31 138
Akamai’s Stock Price
Deploying and maintaining a CDN network is VERY expensive