Part II Infrastructure of P2P Professor Shiao-Li Tsao Dept. Computer Science National Chiao Tung University Content Centralized P2P Unstructured P2P Structured P2P Hybrid P2P Hierarchical P2P 2 Major Operations of P2P Connect How to join a P2P overlay network? Search How to search an object in the P2P network? Download How to download the object? 3 Part II-1 Centralized P2P Professor Shiao-Li Tsao Dept. Computer Science National Chiao Tung University Outline Centralized index model Case study - Napster Centralized Index Model (1/3) Utilize a central directory for object location, ID assignment, etc. For file-sharing P2P, location inquiry form central servers then downloaded directly from peers Centralized Index Model (2/3) Centralized Repository R 1 upload index 2 Query (connect) (search) A B 3 download Centralized Index Model (3/3) Benefits Simplicity Efficient search Limited bandwidth usage Drawbacks Unreliable (single point of failure) Performance bottleneck Scalability limits (scale the central directory) Vulnerable to DoS attacks Copyright infringement Case Study - Napster When and who January 1999 Shawn Fanning, a freshman of Northeastern University Why Difficult to find and download music over networks Want to share music with friends How A program that allowed computer users to share and swap files, specifically music, through a centralized file server Napster – the first popular P2P file-sharing application Napster: A Brief History(1/2) May 1999: Shawn Fanning (freshman, Northeastern Univ., dropout) founds Napster Inc. Success of Napster First massively popular peer-to-peer file sharing systems Simultaneous users: 640000 (November 2000) More than 60 million users downloaded the software Universities begin to block Napster due to overwhelmed bandwidth consumption Napster: A Brief History(2/2) Law cases of Napster Dec. 7, 1999: Recording Industry Association of America (RIAA) sues Napster for copyright infringement April 13, 2000: Heavy metal rock group Metallica sues Napster for copyright infringement April 27, 2000: Rapper Dr. Dre sues Napster July 2000: Court orders Napster to shut down Oct. 2000: German media firm Bertelsmann becomes a partner and drops lawsuit July 2001: Napster’s server shut down completely May 17, 2002: Napster announced that its assets would be acquired by Bertelsmann for $8 million Reborn of Napster Nov. 2002: Napster's brand and logos were acquired at bankruptcy auction by the company Roxio, Inc. May 2003: Roxio acquired pressplay for the re-launch of Napster music pay service Oct. 2003: Napster 2.0 was announced 2005: The second most popular legal music service overall (behind Apple iTunes) Revenue $94.69M in 2005, $108.73M in 2006. Napster: System Overview A large cluster of dedicated central servers maintain an index of shared files The centralized servers also monitor the state of each peer and keep track of metadata The metadata is returned with the results of a query Each peer maintains a connection to one of the central servers Napster Operation: Connect (1/4) napster.com File list and IP address is uploaded Napster Operation: Search (2/4) napster.com User requests search at server result query Napster Operation: Download (3/4) napster.com User pings hosts that apparently have data to look for best transfer rate Napster Operation: Download (4/4) napster.com User choose to initiate a file exchange directly Napster: Summary Napster is not a pure P2P system but it was the first one that raised important issues to the P2P community Hybrid decentralized unstructured system File transfer is decentralized but locating content is centralized Combination of client/server and P2P approaches Napster protocol is proprietary Stanford University senior David Weekly posted the protocol in 2000 Napster requested that he remove it, but Weekly created the OpenNap project instead Napster introduces two major problems Unreliable: central indexing server represents a single point of failure Legal responsibility for music files sharing Part II-2 Unstructured P2P Professor Shiao-Li Tsao Dept. Computer Science National Chiao Tung University Outline Introduction Flooded requests model Gnutella Structella: Gnutella on a structured overlay Supernode model FastTrack and Kazaa Introduction Without the central index, a peer may blindly flood a query to the network among peers or among supernodes Flooded requests model Supernode model (hierarchical) Flooded Requests Model (1/3) A pure P2P model Each request is flooded (broadcast) to directly connected peers, which flood their neighbors Until the request is answered or with a certain scope (TTL limit) This model is used by Gnutella Flooded Requests Model (2/3) … found Request! Flooded Requests Model (3/3) Benefits Highly decentralized Reliability and fault-tolerance Drawbacks Excessive query traffic Not scalable Fail to find content that is actually in the system Case Study - Gnutella Gnutella is an open, decentralized P2P search protocol Gnutella uses a fully distributed architecture without any central entity It relies on flooding to route message through the Gnutella network Gnutella: A Brief History Developed by Justin Frankel and Tom Pepper of Nullsoft (Winamp) Gnutella = GNU + Nutella March 14, 2000: released and thousands downloaded that day The next day, AOL (the owner of Nullsoft) withdrawn A few days the protocol had been reverse engineered by Bryan Mayland and became open source The development of the Gnutella protocol is currently led by the GDF (Gnutella Developer Forum) Several open source Gnutella clients LimeWire, Morpheus, Gnucleus, etc. Gnutella System Overview Build at the application level Peers self-organize into an application-level mesh Each peer maintains connections with other peers (e.g., 4) Search: each peer initiates a controlled flooding through the network by sending a query packet to all of its neighbors TTL is decremented on each hop Gnutella Protocol (1/4) Discovery of peers and searching for files are implemented by passing five descriptors (message types) between nodes Group membership message: Ping and Pong Search message: Query and QueryHit File transfer message: PUSH Each message has a randomly generated identifier Messages can be broadcasted or simply back-propagated To prevent re-broadcasting and implement back-propagation, each node keeps a short memory of the recently routed messages Messages are flagged with time-to-live (TTL) and “hops passed” fields Gnutella Protocol Message Types Type Description Contained Information Ping Announce availability and None probe for other servents Pong Response to a Ping IP address and port# of responding servent; number and total kb of file shared Query Search request Minimum network bandwidth of responding servent; search criteria QueryHit Returned by servents that IP address, port# and network have the requested file bandwidth of responding servent; number of results and result set Push File downloaded requests Servent identifier; index of requested for firewalled servents file; IP address and port to send file to Gnutella Protocol: Connect (2/4) Peer discovery IRC (Internet Relay Chat), web pages, Ping-Pong messages Send GNUTELLA CONNECT to one known address node then wait for GNUTELLA OK Gnutella Connect Operation Each peer connects to any peer already on the Gnutella network Upon connecting, The peer announce its presence to the neighboring peers The neighboring peers propagate the announcement until it reaches all peers on the network Upon receiving the announcement The contacted peer responds with a bit of information about itself For example, number of files and amount of disk space on the particular peer to share with the network Gnutella Connect Operation CONNECT OK Gnutella Discovery Operation Ping Pong Ping Ping Pong Pong Ping Pong Ping/Pong Routing Example Source: http://rfc-gnutella.sourceforge.net/developer/stable/index.html Gnutella Protocol: Search (3/4) Search Query messages which contain a user specified search string are broadcasted The neighboring peers propagate the search query as well as match against locally stored file names QueryHit messages are back-propagated replies to Query messages and include information necessary to download a file Gnutella Search & Transfer Operations Gnutella Protocol: Download (4/4) File download The file download protocol is HTTP, i.e., HTTP GET request Each Gnutella peer has web browser functions built-in A file Push request is sent to the file provider when this peer is behind a firewall Peer is behind a firewall Query/QueryHit/Push Routing Example Source: http://rfc-gnutella.sourceforge.net/developer/stable/index.html Gnutella Measurement (1/3) In November 2000 36% user-generated traffic (QUERY messages) 55% overhead traffic (PING and PONG messages) In June 2001, these problems were solved with the arrival of newer Gnutella implementations 92% QUERY messages 8% PING messages Gnutella Measurement (2/3) In power-law networks, most nodes have few links and a tiny number of hubs have a large number of links Gnutella is similar to a power-law network, thus being able to operate in highly dynamic environments Gnutella Measurement (3/3) Jovanović et al. develop a Gnutella network crawler called gnutcrawl Obtain several instances of Gnutella network topology between Nov. 13 and Dec. 28, 2000 This topology instance consists of 1026 nodes and 3752 edges The diameter of graph is 8 Results Gnutella network topology exhibits strong small-world characteristics Discover a power-law distribution of node degrees Free Riding on Gnutella Peers that free ride on Gnutella are those that only download files for themselves without ever providing files for download by others Most of Gnutella users are free riders 22,084 (66%) of the peers share no files 24,347 (73%) share ten or less files Gnutella Summary A fully distributed peer-to-peer protocol Reliability and fault-tolerance properties Flooding raises questions of cost and scalability The current Gnutella protocol can not scale beyond a network size of a few thousand nodes without becoming fragmented References  The Gnutella Protocol Specification v4.0.  M. Ripeanu, “Peer-to-Peer Architecture Case Study: Gnutella Network,” University of Chicago Technical Report TR-2001-26.  M. Jovanović, F. Annexstein, and K. Berman “Scalability issues in large peer-to-peer networks - a case study of Gnutella,” Technical report, University of Cincinnati, 2001.  E. Adar and B. Huberman, “Free Riding on Gnutella,” First Monday 5, October 2000.  M. Portmann, P. Sookavatana, S. Ardon, and A. Seneviratne, “The cost of peer discovery and searching in the Gnutella peer- to-peer file sharing protocol,” in Proc. of ICON’01, Vol. 1, pp. 263-268, 2001. Part II-3 Structured P2P Professor Shiao-Li Tsao Dept. Computer Science National Chiao Tung University Outline Introduction Document routing model CAN Chord Pastry Introduction Most of P2P file sharing systems before Napster is expensive and vulnerable Gnutella is not scalable and may fail to find content How to make a scalable peer-to-peer file distributed system? Phase 1: find the peer from whom to retrieve the file Need scalable indexing mechanisms Phase 2: peer-to-peer file transfer process It is inherently scalable Document Routing Model (1/5) Each peer is assigned a random or hashed ID and knows a given number of peers An ID is assigned to every shared document based on a hash function A document is published (shared) to the peer with the ID that is most similar to the document ID A request will go to the peer with the ID most similar to the document ID Document Routing Model (2/5) File transfer ID 5000 Register File ID=h(data)=0008 0005 … 0008 at ID5000 xxxx … Lookup File ID=h(data)=0008 ID 0100 Response ID 1000 ID 0200 Document Routing Model (3/5) Benefits Scalability: more efficient searching Logarithmic bounds to locate a document Fault tolerance Drawbacks Routing table maintenance Network partitioning may cause an islanding problem Document Routing Model (4/5) P2P systems implement the document routing model CAN: S. Ratnasamy et al., UC Berkeley, 2001 Chord: I. Stoica et al., MIT and Berkeley, 2001 Pastry: A. Rowstron and P. Druschel, MS, UK, 2001 Tapestry: Ben Y. Zhao et al., UC Berkeley, 2001 These systems are also known as DHT- based P2P systems Document Routing Model (5/5) The hash table is a data structure that efficiently maps keys onto values The distributed hash table (DHT) Distributed, Internet-scale, hash table Lookup, insertion and deletion of (key, value) pairs Only support exact-match search, rather than keyword search Context-Addressable Network (CAN) CAN resembles a hash table and performs operations such as insertion, lookup and deletion of (key,value) pairs CAN space A virtual d-dimensional Cartesian coordinate space on a d-torus This virtual coordinate space is used to store (key,value) pairs The entire coordinate space is dynamically partitioned among all the nodes in the system CAN System Overview Every CAN node owns its distinct zone Every CAN node learns and maintains a coordinate routing table that holds the IP addresses and virtual coordinate zone of each of its immediate neighbors A key is mapped onto a point P in the coordinate space using uniform hash function The (key, value) pair is stored at the node that owns the zone within which the point P lies CAN Example (1/4) Every CAN node owns a zone in the overall space Example 2-dimensional [0,1]×[0,1] space with 5 nodes join in succession 1 For a 2-d space, split along X-axis first, then Y, then X again followed by Y and so forth CAN Example (2/4) 2 1 2 1 3 CAN Example (3/4) 1 2 4 2 4 (0.0-0.5, 0.5-1.0) (0.5- (0.75- 0.75, 1.0, 0.5- 0.5-1.0) 1.0) 1 3 5 3 (0.0-0.5, 0.0-0.5) (0.5-1.0, 0.0-0.5) CAN Example (4/4) (0.0-0.5, *) (0.5-1.0, *) 1 2 4 0 1 (0.5- (0.75- split along the x (0.0-0.5, 0.5-1.0) 0.75, 1.0, 0.5- (0.5-1.0, (0.5-1.0, 0.5-1.0) 1.0) 0.0-0.5) 0.5-1.0) 0 1 0 1 split along the y 5 1 3 0 1 5 3 2 4 (0.0-0.5, 0.0-0.5) (0.5-1.0, 0.0-0.5) 5 CAN nodes and its corresponding binary partition tree. Think of each existing zone as a leaf. CAN Construction (1/4) When a new node joins, an existing node splits its allocated zone in half, retains half and hands the other half to the new one First find a node already in the CAN Next, find a node whose zone will be split Finally, neighbors of the split zone are notified CAN Construction (2/4) Bootstrapping 7 A new node looks up the 6 2 CAN domain name in 6 DNS to retrieve a 5 bootstrap node’s IP bootstrap 3 1 5 node 4 address 3 The bootstrap node then 4 supplies the IP 2 addresses of nodes 1 currently in the system 0 0 1 2 3 4 5 6 7 new node CAN Construction (3/4) Finding a zone The new node randomly 7 chooses a point P and 6 2 sends a JOIN request 6 destined for point P P Each existing CAN node 5 3 1 5 uses the CAN routing 4 mechanism to forward the request 3 The current occupant node 4 splits its zone in half and 2 assigns one half to the new node 1 The (key, value) pairs from 0 the half zone to be handed over are transferred to the 0 1 2 3 4 5 6 7 new node new node CAN Construction (4/4) Joining the routing The new node learns the IP 7 6 2 addresses of its coordinate 6 neighbor set from the previous occupant 5 3 1 7 5 The previous occupant 4 updates its neighbor set 3 Both the new and old 4 nodes’ neighbors are 2 informed of this reallocation 1 of space 0 Through immediate and periodic update messages 0 1 2 3 4 5 6 7 CAN Routing Data stored in the CAN is addressed by name (i.e. key) not location (i.e. IP address) A node routes a message towards its destination by greedy forwarding to the neighbor with closest coordinates If nodes crash, many different paths exist (routing fault tolerance) If lose all neighbors, and repair mechanisms haven’t yet rebuilt neighbor states, then perform an expanding ring search to locate and forward to a closer node For d dimensions partitioned into n equal zones Each node maintains 2d neighbors The average routing path length: (d/4)(n1/d) hops CAN Routing Example A CAN message includes the destination coordinates d4 7 Using the neighbor coordinate set, a node 6 routes a message towards 5 its destination by greedy 4 forwarding to the neighbor with closest coordinates 3 Example: 2-dimensional 2 space 1 0 0 1 2 3 4 5 6 7 CAN Node Departure When a node leaves, it explicitly hands over its zone and the associated (key, value) database to one of its neighbors The neighbor can merge the departing node’s zone to a single zone Or the zone volume is smallest A node may hold more than one zone results in fragmentation CAN Maintenance (1/3) The (key, value) pairs are periodically refreshed by the holders of the data Each node sends periodic update messages to its neighbors giving its zone coordinates and a list of its neighbors and their zone coordinates The absence of update message signals a failure of neighbor CAN Maintenance (2/3) When node or network failures happen, the takeover mechanism ensures one of the failed node’s neighbors can take over the zone A neighboring node is chosen that is still alive and has a small zone volume However, the (key, value) pairs held by the failed node are lost until the state is refreshed When the simultaneous failure of multiple adjacent nodes are involved First perform an expanding ring search to rebuild sufficient neighbor states Initiate the takeover mechanism CAN Maintenance (3/3) A background zone-reassignment algorithm retains the one-to-one node to zone assignment and prevents fragmentations When a leaf x is removed Find a leaf node y that is Either x’s sibling, thus zones x and y merge into a single zone Or descendant of x’s sibling where y’s sibling is also a leaf y takes over x’s zone y’s sibling takes over y’s previous zone Zone Reassignment Example 7 2 6 1 8 5 3 8 4 Node 11 takes over the 3 takeover algo. combined zone 4 6 with smaller 10 1 X 9 2 zone volume ↓DFS to find two 9 sibling leaves 1 2 3 X 4 5 6 7 10 11 5 7 11 0 Leaf node 5 is node 4’s sibling and can merge Node 10 is the descendant of node 9 0 1 2 3 4 5 6 7 zones 4 and 5 The sibling of node 10 is also a leaf Node 6 discovers sibling nodes 10 and 11 by background reassignment CAN Design Improvements (1/5) The hops in the CAN path are application level hops, not IP-level hops The latency of each hops might be substantial The average latency of a lookup is the average number of CAN hops times the average latency of each CAN hop The improvements aim to reduce the latency of CAN routing Reduce either the path length or the per-CAN-hop latency CAN Design Improvements (2/5) Multiple dimensions Increase the number implies that a node has more neighbors Multiple coordinate spaces Each coordinate space is called a reality Increase the reality implies that the distant portions of the coordinate space may be reached in a single hop CAN Design Improvements (3/5) Multiple dimensions vs. realities Both yield shorter path # hops lengths, but higher per-node a single node is assigned r zones, one on every neighbor state and coordinate space maintenance traffic the path length Increasing d results in scales as O(d(n1/d)) shorter lengths than increasing r But multiple r improves more data availability and routing fault-tolerance # neighbors maintained per node CAN Design Improvements (4/5) Better CAN routing metrics RTT-weighted routing Overloading coordinate zones Multiple nodes share the same zone CAN Design Improvements (5/5) Multiple hash functions Use k hash functions to map a single key onto k points Topologically-sensitive construction of the CAN overlay network Distributed binning based on distances from landmarks More uniform partitioning Caching and replication techniques for “hot spot” management CAN Summary CAN is completely distributed, scalable, and fault-tolerant For d dimensions and n nodes Per-node neighbor state: O(d) The node insertion affects O(d) existing nodes Independent of # nodes in the system The routing path length: O(dn1/d) hops When d=logn, CAN resembles the other DHT algorithms (O(logn) ) Chord Chord is a distributed lookup protocol that tries to efficiently find the location of the node that stores a desired data Chord protocol supports just one operation: given a key, it maps the key onto a node Chord maps its nodes to a one-dimensional space In an N-node network, each node maintains information about only O(logN) other nodes, and a lookup requires only O(logN) messages Chord System Overview m-bit key/node identifier using SHA-1 hash function (m must be large enough) These identifiers are ordered on an identifier circle modulo 2m Chord ring: one-dimensional circular key space Key k is assigned to the successor, the first node whose identifier is equal to or follows k in the identifier space Each node maintains A routing table with (at most) m entries, called finger table The previous node on the identifier circle, called predecessor Example: Chord Ring with m=6 successor(K54) is N56 successor(K10) is N14 predecessor(N14) is N8 successor(K38) is N38 An identifier circle consisting of 10 nodes storing 5 keys The Finger Table The ith entry at node n contains the identity of the first node s that succeeds n by at least 2i- 1 on the identifier circle, where 1 ≤ i ≤ m ith finger s = successor(n+2i-1) modulo 2m A table entry includes both Chord identifier and IP address of the relevant node The first finger of node n is its immediate successor which also called the successor Example: The Finger Table Finger table entries point to the N14 is the first node that first node greater than or equal to succeeds (8+21-1) mod 26=9 a distance 2i-1 away from the node, for 1≤i≤m , modulo 2m N42 is the first node that succeeds (8+26-1) mod 26=40 Chord Ring Construction Decide the m number Map identifiers (0~2m-1) to nodes or keys Identifiers are ordered on an identifier circle modulo 2m The first node joins the Chord ring Chord Node Join When node n first starts It calls n.join(n’), this function ask n’ to find the immediate successor of n or n.create() to create a new Chord network Node Join Steps Node n tries to join an existing Chord s=Lookup (n) Copy the keys between (s.predecessor, n) from s to n n.predecessor = s.predecessor n.sucessor =s s.predecessor.successor = n s.predecessor=n Update finger tables Example: Chord Node Join 1 keys less 0 2 than 14 N14+1 [15, 0) N3 15 3 N14+2 [0, 2) N3 N3+1 [4, 5) N5 N14+4 [2, 6) N3 N3+2 [5, 7) N5 14 4 N14+8 [6, 14) N3 N3+4 [7, 11) N10 N3+8 [11, 3) N3 N14 13 5 N5+1 [6, 7) N10 6 12 N5+2 [7, 9) N10 7 N5+4 [9, 13) N10 11 N5+8 [13, 5) N3 N14 N10+1 [11, 12) N14 N3 10 8 N10+2 N14 [12, 14) N3 9 N10+4 [14, 2) N14 N3 N10+8 [2, 10) N3 Simple Key Location Key Lookup: determine the successor of the key Each node only knows how to contact its current successor node on the identifier circle Lookup uses a number of message linear in the number of nodes Example Path taken by a query form node 8 for key 54 Scalable Key Location Lookup using the finger table Node n calls id falls between n and find_successor(id) to find its successor the successor node of an identifier id n searches finger The closer n’ is to id, the table for the node n’ whose ID most more it will know about the immediately identifier circle in the region precedes id of id Theorem: the number of nodes that must be contacted to find a successor in an N-node network is O(logN) Example: Lookup for Key 54 N51+1 N56 return successor N51+2 N56 N51+4 N56 N8+1 N14 N51+8 N1 N8+2 N14 N51+16 N8 N8+4 N14 N51+32 N21 N8+8 N21 N8+16 N32 +23 N8+32 N42 N42+1 N48 +25 N42+2 N48 closest_preceding_node(K54) N42+4 N48 N42+8 N51 N42+16 N1 N42+32 N14 54 - 8 = 46 = 101110two = 25 + 0 + 23 + 22 + 21 + 0 closest_preceding_node(K54) Chord Node Departure A node transfers its keys to its successor before it departs It also notify its predecessor and successor before leaving Assume that node n has sent its predecessor to successor s, and the last node in its successor list to predecessor p p will remove n from its successor list, and add the last node in n's successor list to its own list s will replace its predecessor with n's predecessor Chord Stabilization Must ensure each node’s successor pointer is up to date Each node runs a stabilization protocol periodically in the background and which updates Chord’s finger tables and successor pointers if predecessor has failed, n will accept a new predecessor in notify Chord Stabilization when Node Joins successor=N32 successor.predecessor=N21 N21.stabolize() x=N32.predecessor =N26 successor=N26 N26.notify(N21) N26.notify(N21) predecessor=N21 predecessor=N26 Chord Failure and Replication To increase robustness, each node maintains a successor list containing its first r successors if successor fails, substitute the first live entry in its list r = Ω(logN) by theorems If a node fails during find_successor procedure, the lookup proceeds, after a timeout, by trying the next best predecessor among the nodes in the finger table and the successor list Store replicas of the data associated with a key at the k nodes succeeding the key Chord Summary Chord is Efficiency: O(logn) messages per lookup Scalability: O(logn) states per node Robustness: surviving massive failures Chord consists of Consistent hashing Small routing tables size: O(logn) Fast join/leave protocol Pastry Pastry is a peer-to-peer content location and routing system based on a self-organizing overlay network of nodes A circular nodeId space Each node maintains some routing states Pastry takes into account network locality to reduce the routing latency Pastry System Overview A 128-bit circular nodeId space ranges from 0 to 2128-1 Each node is randomly assigned a unique identifier (nodeId) NodeIds and keys are a sequence of digits with base 2b Each node maintains a routing table, neighborhood set and leaf set A node routes the message to the node with a nodeId that is numerically closest to the given key The Routing Table R The routing table is log2bN rows with 2b-1 entries each The top row is row zero The entries at row n refer to a node whose nodeId shares the present node’s id in the first n digits, but whose n+1th digit is different from the n+1th digit of the present node’s id Each entry contains IP address of one node with the appropriate prefix Example: 16-bit nodeIds with base 4 (b=2), a node with nodeId 10233102 Routing table format: matched digits–column number–rest of ID The Neighborhood Set M The neighborhood set contains the nodeIds and IP addresses of |M| closest nodes (according the proximity metric) |M| is 2b or 2x2b The neighborhood set is not used in routing, but is used in maintaining locality properties Example: a node with nodeId 10233102, M = 8 The Leaf Set L The leaf set contains |L|/2 nodes with numerically closest larger nodeIds and |L|/2 nodes with numerically closest smaller nodeIds |L| is 2b or 2x2b The leaf set serves as a fall back for routing table Example: a node with nodeId 10233102, l = 8 Pastry Routing When a message with key D arrives at a node with nodeId A, there are three cases If key D is within the range of leaf set, the message is forwarded directly to the node in the leaf set Else, search the routing table and forward the message to a node whose nodeId shares a prefix with the key by at least one more digit If no such node, forward the message to a node whose nodeId shares a prefix with the key as long as the current node, and is numerically closer to the key than the present node’s id The expected number of routing steps is O(logN), where N is the number of Pastry nodes Pseudo Code for Pastry Routing (1) Node is in the leaf set (2) Forward message to a closer node (Better match) (3) Forward towards numerically Closer node (not a better match) D: Message Key Li: ith closest NodeId in leaf set shl(A, B): Length of prefix shared by nodes A and B i : (j, i)th entry of routing table Rj Pastry Routing Example Pastry Node Join A nearby Pastry node A (according the proximity metric) is located by using expanding ring IP multicast or out-of-band channels A new node with nodeId X asks A to route a join message to the existing zone Z whose id is numerically closest to X A forwards the join message to B, C, …, Z X initiates its own state tables Obtain the ith row of routing table from the ith node encountered along the route from A to Z X’s Row 0 = A’s row 0 (X0 = A0) X’s Row 1 = B’s row 1 (X1 = B1) …etc. A’s neighborhood set is the basis for X’s Z’s leaf set is the basis for X’s X transmits its resulting state to nodes that need to be aware of its arrival, and states in all affected nodes are updated Pastry Maintenance (1/2) Nodes in the Pastry network may fail or depart without warning A Pastry node is considered failed when its immediate neighbors can no longer communicate with it To replace a failed node in the leaf set Contact the live node with the largest index on the side of the failed node, and asks for that node’s leaf table Pastry Maintenance (2/2) To replace a failed node in the neighborhood set Ask other members for their neighborhood tables, checks the distance of each discovered nodes, and updates its neighborhood set To repair a failed routing table entry Contact the node referred to by another entry of the same row, and asks for that node’s entry If no live node in the same row, contact an entry of the next row Repair a Failed Routing Table Entry Routing Performance |L|=16 * b=4 * |M|=32 * 200,000 lookups Source: Rowstron & Drushel, 2001 CMPT 880: P2P Systems - SFU 105 Pastry routing Source: Rowstron & Drushel, 2001 CMPT 880: P2P Systems - SFU 106 Routing with failures Source: Rowstron & Drushel, 2001 CMPT 880: P2P Systems - SFU 107 Pastry locality |L|=16 * b=4 * |M|=32 * 200,000 lookups Source: Rowstron & Drushel, 2001 CMPT 880: P2P Systems - SFU 108 Pastry Summary Pastry routes to any node in the overlay network in O(logN) steps Pastry maintains routing tables with O(logN) entries Several applications have been built on top of Pastry PAST uses the fileId as the key SCRIBE uses the topicId as the key References  S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker, “A Scalable Content-Addressable network,” in Proc. SIGCOMM, San Diego, CA, Aug. 2001, pp. 161-172.  I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” in Proc. SIGCOMM, San Diego, CA, Aug. 2001, pp. 149-160.  A. Rowstron and P. Druschel, “Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems,” in Proc. Middleware, Heidelberg, Germany, Nov. 2001, pp. 329-350. Part II-4 Hybrid P2P Professor Shiao-Li Tsao Dept. Computer Science National Chiao Tung University Structella (1/2) Two types of peer-to-peer overlays Unstructured overlays support complex queries but it is inefficient Structured overlays provide efficient discovery but don’t support complex queries Castro et al. proposed Structella to build Gnutella on a structured overlay Use Pastry’s overlay and maintenance algorithms Retain the content placement and discovery mechanisms of Gnutella to support complex queries Structella (2/2) Maintenance overhead Compare Structella and the optimized Gnutella It is commonly believed that unstructured overlays have lower maintenance overhead than structured overlays In contrast, Structella exploits overlay structure to reduce overheads Supernode Model (1/3) Each peer is either designated as a supernode or assigned to a supernode Supernode acts both as a local central index for files shared by local peers and as an equal in a network of supernodes Peers connect to their local supernode to upload information about the files they share, and to perform searches Supernodes are equal in search while all peers are equal in download Examples: FastTrack and Kazaa Supernode Model (2/3) supernode peer node Supernode Model (3/3) Benefits No single point of failure Drawbacks Supernode may become overloaded or been attacked Copyright infringement FastTrack Proprietary and encrypted protocol used by Kazaa, Grokster, and iMesh file-sharing programs Presumed architecture Supernodes index data, provide search capabilities for a set of data peers and forward queries to other superpeers A central super-superpeer Employ UUHash hashing algorithm to allow downloading from multiple sources RIAA takes advantage of UUHash (hash files very quickly) to spread false files on the network Reverse engineering clients giFT-FastTrack, iMesh, Grokster, Kazaa, etc. connect to the decentralized network UUHash Hash the first 300 kilobytes using MD5 Then apply a smallhash function (identical to the CRC32 checksum used by PNG) to 300 KB blocks at file offsets 2n MB. offset 1 MB, 300 KB hashed offset 2 MB, 300 KB hashed offset 4 MB, 300 KB hashed offset 8 MB, 300 KB hashed ... last 300 KB of file hashed Finally the last 300 KB of the file are hashed. If the last 300 KB of the file overlap with the last block of the 2n sequence this block is ignored in favor of the file end block. The actual hash used on the FastTrack network is a concatenation of 128 bit MD5 of the first 300 KB of the file and a sparse 32 bit smallhash calculated in the way described above. The resulting 160 bits when encoded using Base64 become the UUHash. Kazaa: A Brief History Both Kazaa and FastTrack protocol are the brainchild of Niklas Zennström and Janus Friis and were introduced in March 2001 by their Dutch company Consumer Empowerment When the owners were sued in the Netherlands in 2001, Consumer Empowerment sold the Kazaa application to Sharman Networks, headquartered in Australia and incorporated in Vanuatu Kazaa Media Desktop, http://www.kazaa.com/ Kazaa System Overview Kazaa is proprietary Few details are publicly available Traffic on control channel is encrypted Traffic on download channel is not encrypted HTTP transfers Two-tier hierarchical system with two classes of peers Ordinary Node (ON) Super Node (SN): mini Napster-like hub Each supernode knows where the other supernodes are List of potential supernodes included within software download Query Node first sends query to supernode Supernode may forward query to a subset of supernodes Kazaa Measurement A measurement study in 2003 There are roughly 30,000 SNs The SNs form the backbone of the Kazaa network which is self- organizing and is managed with a distributed, but proprietary, gossip algorithm The average supernode lifetime is about 2.5 hours Each supernode maintains a list of SNs The SNs frequently exchange (possibly subsets) of these lists with each other SNs establish both short-lived and long-lived TCP connections with each other Each SN has about 40-60 connections to other SNs at any given time Each SN has about 100 to 200 children ONs at any given time Each SN maintains a database, storing the metadata of the files its children are sharing SNs do not exchange metadata with each other Kazaa Summary The supernodes form the backbone of the Kazaa network Combine the efficiency of a centralized search with the autonomy, load balancing and robustness to attacks No central point of failure Copyright infringement References  J. Liang, R. Kumar and K.W. Ross, “Understanding KaZaA,” submitted, 2004.  Miguel Castro, Manuel Costa, and Antony Rowstron, “Should We Build Gnutella on a Structured Overlay?” 2nd Workshop on Hot Topics in Networks (HotNets-II), November 2003. Part II-5 Hierarchical P2P Professor Shiao-Li Tsao Dept. Computer Science National Chiao Tung University Outline Introduction Superpeer network HIERAS Hierarchical P2P system Introduction Motivation Traditional P2P systems organize peers into a flat overlay network However, peers differ in up times, bandwidth connectivity, and CPU power Large-scale P2P systems should exploit the heterogeneity of peers To exploit the heterogeneity, peers should be organized in a hierarchy We introduce some hierarchical P2P systems Superpeer network HIERAS: A DHT based hierarchical P2P routing algorithm Hierarchical peer-to-peer system Superpeer Network Superpeer Network (1/3) A P2P network consisting of superpeers and their clients Superpeer A node acts both as a server to a set of clients and as an equal in a network of super-peers Keep an index over its clients’ data Superpeer Network (2/3) Superpeer redundancy A superpeer is k-redundant if there are k nodes sharing the super-peer load Superpeer redundancy improves both reliability and performance of the superpeer network, and reduces individual load No redundancy 2-redundancy Superpeer Network (3/3) Layer management Superpeer P2P systems divides peers into two layers: superlayer and leaf-layer The lack of an appropriate layer size ratio maintenance mechanism The ratio of the number of leaf-peers to the number of superpeers Xiao et al. propose a dynamic layer management algorithm (DLM) to adaptively adjust peers between superlayer and leaf-layer Step 1: Information collection Step 2: Maintain appropriate layer-size-ratio Step 3: Scaled comparisons of metric values of peers Step 4: Promotion or demotion HIERAS: A DHT Based Hierarchical P2P Routing Algorithm HIERAS In current DHT based routing algorithms, a routing hop could happen between two peers with long network link latency A multi-layer DHT based P2P routing algorithm called HIERAS Combine a hierarchical structure with the current DHT based routing algorithms Keep scalability property of current DHT algorithms, i.e., routing finishes within O(logN) steps HIERAS Hierarchical Architecture (1/2) Hierarchical P2P layers A lot of P2P rings coexist in different layers In each P2P layer, all the peers are grouped into several disjointed P2P rings The biggest P2P ring in the highest layer which contains all the peers Group topologically adjacent peers into other P2P rings in lower layers A peer must belong to a P2P ring in each layer If hierarchy depth is k, a peer belongs to k P2P rings with one in each layer Average link latency between two peers in lower level rings is much smaller than higher level rings 133 HIERAS System Example 3 layer-2 P2P rings: P1, P2 and P3 P is the biggest layer-1 ring Node A is a member of P3 and also belongs to P 134 HIERAS Hierarchical Architecture (2/2) The distributed binning scheme is used to determine to which rings a new node should be added The scheme is a topology measurement mechanism A well-known set of machines as the landmark nodes Nodes partition into bins such that nodes that fall within a given bin are relatively closer to each other in terms of network latency (ping) 135 Example Level 0 for latencies in [0,20] The order information is created Level 1 for latencies in [20,100] according to the measured latencies Level 2 for latencies larger than 100 to the 4 landmarks L1, L2, L3, and L4 136 HIERAS System Design (1/4) HIERAS is a multi-layer DHT based routing algorithm HIERAS is built on top of an existing DHT routing algorithm Each node or file is given a unique identifier generated by using a hash algorithm Use Chord algorithm as the underlying routing algorithm Assume that the identifier length is k bit and the hierarchy depth is m HIERAS System Design (2/4) Finger table Each node has a finger table with at most k entries The highest layer finger table in HIERAS is the same as Chord finger table Each node in HIERAS creates m-1 other finger tables in lower layer P2P rings it belongs to Only the peers within its corresponding lower layer ring can be put into this finger table Example: a two-layer system with 3 landmarks, 28 namespace HIERAS System Design (3/4) The layer-1 successor nodes can All layer-2 successor nodes be chosen from all system peers belong to layer-2 P2P ring “012” as node 121 139 HIERAS System Design (4/4) Landmark table Record the IP addresses of all landmark nodes Ring table Used to maintain information of different P2P rings and to find a node in this particular P2P ring when a new node is added into system Stored on the node whose nodeid is the numerically closest to its ringid Duplicated on several nodes for fault tolerance defined by the landmark order information such as “012” generated by using hash on the ringname 140 HIERAS Routing Algorithm The higher the layer, the more nodes are included, and more closer to the destination In a m-layer HIERAS system, a routing procedure has m loops At lowest layer, route to the closest node at this layer which the request originator is located in Move up, do the same thing and repeat Eventually reach the biggest P2P ring and the destination node, and the location information of the requested file is returned to the originator 141 HIERAS Routing Example Destination node Request originator Layer 1 ring 100ms latency each hop 3 layer-2 rings 25ms latency each hop 25ms Assume 4 routing hops Original Chord needs 4x100ms, but HIERAS needs 25ms+3x100ms HIERAS Node Join (1/2) When a new node n joins, it sends a join message to a nearby node n’ n get information of landmarks from n’ and create the landmark table n use the distributed binning scheme to determine the suitable P2P rings (ringname) it should join Create the highest layer finger table Learn fingers by asking n’ to look up in the whole P2P overlay network 143 HIERAS Node Join (2/2) Create finger tables in lower layers Assume that node c stores the ring table of a specific ring and node p in that particular ring n calculates the ringid of this ring and sends a ring table request message to c using the highest layer finger table c responses with the stored ring table, then n knows several nodes in this ring n sends a finger table creation request to p p updates its table and creates the finger table for n, then sends back to n n compares its nodeid with the nodeids in the ring table If n should replace one of them, it sends a ring table modification message back to c and c modifies the ring table The above procedures will repeat m times in a m-layer HIERAS system 144 HIERAS Maintenance A node may leave or fail silently As in Chord algorithm, a node keeps a successor list of its r nearest successors in each layer The maintenance overhead is affordable because the nodes within the low layer rings are topologically closer 145 HIERAS Performance Evaluation (1/3) Both have good scalability. HIERAS Performance Evaluation (2/3) The average routing latency per hop in HIERAS is much smaller than Chord HIERAS Performance Evaluation (3/3) HIERAS spends 1.55% more The average routing latency in hops per request than Chord on HIERAS is only 54.07% of Chord average But only 1.887% hops per request are taken In the higher layer HIERAS Summary HIERAS creates many small P2P rings in different layers by grouping topologically adjacent nodes together The hierarchical structure can improve routing performance of current DHT routing algorithms 149 Hierarchical P2P System Hierarchical P2P System (1/2) Internet uses hierarchical routing and has several benefits such as scalability and administrative autonomy In contrast, Chord, CAN, Pastry and Tapestry are all flat DHT designs without hierarchical routing Garces-Erice et al. present a framework for hierarchical DHTs Peers are organized into groups Each group has its autonomous intra-group overlay network and lookup service A top-level overlay is defined among the groups Hierarchical P2P System (2/2) Several advantages compared to the flat overlay networks Exploiting heterogeneous peers: more stable peers in top-level overlay Transparency: key moved, change intra-group lookup algorithm Faster lookup time: number of groups << total number of peers Less messages in the wide-area: most overlay reconstruction messages happen inside groups Hierarchical Framework (1/6) Focus on a two-tier hierarchy The peers are organized into groups Peers in the same group are topologically close The groups are organized into a top-level overlay network Each group has one or more superpeers The superpeers like gateways are used for inter-group query propagation Within each group there is an overlay network that is used for query communication among peers in the group Each of the groups operates autonomously from the other groups The peers send lookup query messages to each other using a hierarchical overlay network Hierarchical Framework (2/6) Hierarchical Framework (3/6) Hierarchy and group management The joining peer p which belongs to group g contacts another peer p’ to lookup p’s group using key g If the group id of the returned superpeer is g, p joins the group and notifies the superpeers of its capability If not g, a new group g is created and p is the only superpeer Superpeers are the most stable and powerful group nodes Superpeer keep an ordered list of the superpeer candidates This list is sent periodically to the regular peers of the group When a superpeer fails or disconnects The first regular peer in the list becomes superpeer, joins the top- level overlay, and informs all peers in its group and the superpeers of the neighboring groups Hierarchical Framework (4/6) Hierarchical lookup service The querying peer sends query message to one of the superpeers in its group The top-level overlay first determines the group responsible for the key The responsible group then uses its intra-group overlay to determine the specific peer that is responsible for the key Hierarchical Framework (5/6) Intra-group lookup At the intra-group level, the groups can use different overlays If a small group, the intra-group lookup can be O(1) step when CARP is used within the group If a large group, the intra-group lookup can be O(log M) when a DHT overlay structure is used within the group, where M is the number of peers in the group Hierarchical Framework (6/6) Cooperative caching Expect peers in a same group to be topologically close and to be interconnected by high-speed links To reduce file transfer delays, files are cached in the groups where they have been previously requested When peer p wants to obtain a file with key k, it first uses intra- group lookup to find the peer p’ that would be responsible for k If p’ has a local copy, it returns the file to p Otherwise, p’ uses the hierarchical DHT to obtain the file, caches a copy, and forwards the file to p Chord Instantiation (1/3) A particular instantiation of a two-tier hierarchy that uses Chord for the top-level overlay Regular Chord ring Each peer and each key has a m-bit id, ids are ordered on a circle modulo 2m Key is assigned to the successor The successor, predecessor, and fingers make up the finger table Chord Instantiation (2/3) Inter-group Chord ring In the top-level overlay network, each node is a group of peers Each node in top-level Chord has a predecessor and successor vector which hold the IP addresses of superpeers Each finger is also a vector When the superpeers of a group changes, they immediately updates the vector of the predecessor and successor groups However, lazy update of the fingers when detecting invalid references To route a request to a group, choose a random superpeer from the vector and forward to it 160 Chord Instantiation (3/3) P peers, I groups stable peer will fail with probability ps instable peer will fail with probability pr Hierarchical P2P System: Summary The hierarchical organization improves overall system scalability and offers various advantages over a flat organization By gathering peers into groups based on topological proximity, it generates less messages and can significantly improve the lookup performance References  B. Yang and H. Garcia-Molina, “Designing a super-peer network,” in Proceedings of 19th IEEE International Conference on Data Engineering (ICDE’03), pp. 49 - 60, 5-8 March 2003.  Li Xiao, Z. Zhuang, and Liu Yunhao, “Dynamic Layer Management in Superpeer Architectures,” Transactions on Parallel and Distributed Systems, vol.16, no.11, pp. 1078-1091, Nov. 2005.  Zhiyong Xu, Rui Min and Yiming Hu, “HIERAS: A DHT Based Hierarchical P2P Routing Algorithm,” in Proceedings of the 2003 International Conference on Parallel Processing (ICPP’03), pp. 187-194.  L. Garces-Erice, E.W. Biersack, P.A. Felber, K.W. Ross, and G. Urvoy-Keller, “Hierarchical Peer-to-peer Systems,” in Proceedings of ACM/IFIP International Conference on Parallel and Distributed Computing (Euro-Par 2003), August 26-29, 2003.
Pages to are hidden for
"Part II-1 Centralized P2P"Please download to view full document