Document Sample
P2P P2P P2P P2P P2P Powered By Docstoc
					                                                                             P2P is a Peer-to-Peer Network?
                                                                                   A network in which all nodes cooperate with each
                                                                                   other to provide specific functionality with minimal
                                                                                   centralized services
                    Peer-to-Peer Networks                                              No distinction between client and server.
                                                                                       All nodes are potential users of a service AND potential
                                                                                       providers of a service.
                                                                                   From a functional perspective, it is about sharing
                                Yuh-Jzer Joung                                     storage, cycles, bandwidth, contents …etc

                                                                                              Client/Server                 Peer-to-Peer
                                                                                 2003/12/31                                                       2

                                                                             P2P Facts about P2P
       P2P has re-invented the history of the                                       More than 200M hosts are now connected to
       music industry (Napster) and                                                 the Internet
       communication(ICQ)                                                           ICQ now handles more than 100M accounts
       P2P has helped cure-of-cancer, search-of-                                    Napster hosted multi-Tera-Bytes of mp3s with
       alien, crack-down-of-DES and more!                                           thousands of users online
       P2P is more than a file-swapping network                                     SETI @Home now analyzes the Space with
       potential sharing of resources:                                              4.7M hosts around the world
           files, storage, CPUs, bandwidth, information                             Morpheus network has now more than 1400TB
                                                                                    of data and 1M users anytime!

    2003/12/31                                                           3       2003/12/31                                                       4

  Potential Applications
                                                                               Important Social Implication
       Massively parallel computation
                                                                                  When we share with others, others share with us
       Content delivery network                                                   We can actually form up something MUCH MUCH
       Application services platform                                              BIGGER than we thought in the beginning!
            Web services
            Network storage services… etc
       Ad hoc mobile network


                                   Arecibo Radio Telescope   SETI@home
    2003/12/31                                                           5       2003/12/31                                                       6
  Napster: The P2P epic
                                                                                                   P2P brief history
    Founded in June 999 by Shawn Fanning, a 19-year-old
    Northeastern University dropout                                                                     June 1999                    Dec. 1999                       July 2001
    Immensely successful
        Success due to ability to create and foster an online community
                                                                                                       Shawn Fanning          RIAA sues Napster for                  Napster is
        Had 640000 users at any given moment in November 2000                                          starts Napster.        copyright infringement.                shut down!
        More than 60 million people, including an estimated 73% of US
        students, had downloaded the software and songs in less than
        one year.                                                                                                                   Napster
        At the peak, 1 million new users per week
        Universities begin to ban Napster, due to overwhelmed                                                                        CAN
                                                                                                              KaZaA                                                SETI@home
        bandwidth consumption                                                                                                       Chord
                                                                                                             Gnutella                                             folding@home
    Battles with RIAA                                                                                        Morpheus               Pastry                              ……
        RIAA sues Napster in Dec. 1999, asking $100K per song copied                                        MojoNation             Tapestry
        Court rules Napster to shutdown in July 2001                                                           ……                  Hypercub                    Distributed computing
                                                                                                                                  Skip graphs
        files for bankruptcy in June 2002                                                                                             ……
                                                                                                          Napster clones
    Technically not interesting                                                                                               Academic Research

     2003/12/31                                                                                7        2003/12/31                                                                      8

  Classification of P2P
                                                                                                   P2P Challenge: Locating and Routing
        Centralized Systems                                                                                How to get to another node to find a particular
          central server to assist administration                                                          piece of data, considering the following:
                     Napster, Seti@Home                                                                         Dynamics: Node can join and leave any time
        Pure Distributed Systems                                                                                Scalability: There may be millions of hosts
                     Gnutella, Freenet
                     CFS/Chord, Tapestry, Pastry
        Hybrid Systems
             Morpheus, KaZaA

     2003/12/31                                                                                9        2003/12/31                                                                     10

P2P with a centralized server
                                           Index Table
                       Napster-Server       Peer 1 has …
                                            Peer 2 has …                                                 Purpose:
                                            Peer 3 has …
                                                                        Client                                uses idle CPU cycles on ordinary PCs for massively
                                                                                  Who has
                                                                                                              parallel analysis of extraterrestrial radio signals

                                register                                                                      central SETI@home server distributes data, similar to
                                                                                                              analyses done locally by SETI@home screen saver

                  I have “love.mp3”

                    Centralized design, with bottlenecks and vulnerability
                    Simple implementation
                    Low scalability                                                                                                  Arecibo Radio Telescope   SETI@home
     2003/12/31                                                                               11        2003/12/31                                                                     12
P2P Structure: KaZaA, Morpheus
                                                                                                            P2P Structure: KaZaA, Morpheus
       Has a centralized server that maintains user
       registrations, logs users, into the systems to keep
       statistics, provides downloads of client software,
       and bootstraps the peer discovery process.
       Two types of peers:
            Supernodes (fast CPU + high bandwidth connections)
            Nodes (slower CPU and/or connections)
       Supernodes addresses are provided in the initial
       download. They also maintain searchable indexes
       and proxies search requests for users.

    2003/12/31                                                                                         13       2003/12/31                                                               14

  Architecture                                                                    peer          peer
                                                                                                            P2P Details
                                               Index Table:
                                                                                         peer                      On initial registration, the client may be provided
                                               Peer 1: File 1, 2, 3
                                               Peer 2: File 4, 5
                                                                                                                   with a list of more than one supernode.
                                               Peer 3: File 6, 7, 8
                                                                                                                   Supernodes are “elected” by the central server –
                 Server                        Supernode
                                                                                                                   users can decline.
                              Search File 4
                                                                                                                   Supernodes can come and go so links may fail over
        Initial    Download
                              Search Response                             peer 2          peer 3
                                                                                                                   File transfers use http protocol and port 1214 (the
            peer 1
                                                                                                                   KaZaA port).
                                      peer 2          peer 3
            File 1     Get File 4        File 4        File 6
            File 2                       File 5        File 7
            File 3                                     File 8

          Supernodes act as regional index server
          They communicate by broadcasting
    2003/12/31                                                                                         15       2003/12/31                                                               16

P2P – Consequences
                                                                                                            P2P Survives By Legal Maneuvering
       Huge bandwidth Hog                                                                                          March 2001, Kazaa is founded by two Dutchmen, Niklas Zennstrom
                                                                                                                   and Janus Friis in a company called Computer Empowerment
       Potential for original client and/or files download to be a Trojan.                                         The software is based upon their FastTrack P2P Stack, a proprietary
           KaZaA clients come complete with a Trojan from Brilliant Digital                                        algorithm for peer-to-peer communication
           Entertainment.                                                                                          Kazaa licenses FastTrack to Morpheus and Grokster
           3D advertising technology + node software that can be controlled                                        Oct. 2001 MPAA and RIAA sue Kazaa, Morpheus and Grokster
           by Brilliant Digital.                                                                                   Nov. 2001, Consumer Empowerment is sued in the Netherlands by the
                                                                                                                   Dutch music publishing body, Buma/Stemra. The court orders KaZaA
           Intent is to use the massed horsepower to host and distribute                                           to take steps to prevent its users from violating copyrights or else pay
           content belonging to other companies for a fee.                                                         a heavy fine.
           With the user’s permission of course – opt out basis (not opt in!).                                     Jan. 2002, Zennstrom&Friis sell Kazaa software and website to
                                                                                                                   Sharman Networks, based in Vanuatu, an island in the Pacific, but
           Content to include advertising, music, video – anything digital.                                        operating out of Australia
           Also have mentioned tapping unused cycles to do compute work.                                           Feb. 2002, Kazaa cuts off Morpheus clients from FastTrack
                                                                                                                   April 2002, Sharman Networks agrees to let Brilliant Digital bundle
                                                                                                                   their own stealth P2P application called AltNet within KaZaA. This
                                                                                                                   network would be remotely switched on, allowing KaZaA users to trade
                                                                                                                   Brilliant Digital content throughout FastTrack

    2003/12/31                                                                                         17       2003/12/31                                                               18
  Industry Countermeasures
                                                                                   Industry Countermeasures (cont.)

  At any time, 3 million people are using KaZaA, sharing 500                        Sue end users and ISPs
    million files (October 2002)
                                                                                        Danish Anti Pirat Gruppen issued fines up to $14,000 to
  MPAA and RIAA are using 3 “countermeasures” to stop this:                             approximately 150 KaZaA and eDonkey users for
    Sue the network operators/software creators out of                                  uploading copyrighted material.
                                                                                        In the US, civil penalties are min $750 per song, with
        Napster, Scour, Aimster, Audio Galaxy, Grokster/Morpheus/
                                                                                        criminal penalties as high as $250,000 and five years in
        KaZaA, Replay TV…
                                                                                        prison under NETA.
    Berman Bill Style “Self help”
        Denial of service attacks against network
                                                                                        ISPs can be sued under DMCA (The Digital Millenium
                                                                                        Copyright Act), unless they notify offending subscribers
            Insert bogus files (Universal and Overpeer)
                                                                                        and take down infringing material.
            Falsify replies and misdirect queries
        Denial of Service attacks against users                                             MPAA sent 54,000 cease and desist letters to ISPs in 2001, should
                                                                                            be 100,000 in 2002. 90% result in ISPs taking action.
            Overload particular nodes

     2003/12/31                                                             19       2003/12/31                                                             20

P2P Music May Be Watching You
                                                                                            Fully Distributed P2P Systems
       MPAA uses services like Ranger Online to:                                                      Case Study
         Identify users
                  By collecting IP addresses of uploaders and downloaders                         Freenet
            Identify the content users are sharing
            Collect evidence for notice and takedown                                              Gnutella
                  By downloading file, logging time and location                                  CAN

     2003/12/31                                                             21

                                                                                   Design Goals
        A project led by Ian Clarke in his 4th Year
        Project at University of Edinburgh                                              Anonymity for both producers and consumers
        Philosophy                                                                      of information
           One should be able to publish and obtain                                     Deniability for stores of information
           information on the Internet without fear of                                  Resistance to attempts by third parties to deny
                                                                                        access to information
        The result
                                                                                        Efficient dynamic storage and routing of
           A Distributed Anonymous Information Storage
           and Retrieval System
                                                                                        Decentralization of all network functions

     2003/12/31                                                             23       2003/12/31                                                             24
  Protocol Detail: Retrieving Data
                                                                                    Protocol Detail: Messages
                                                                                      Transaction ID
                          12                                                               Randomly-generated 64-bit
      start                                            2           C
                   Data Request                                                       IP address and port number
        A                                 B
                       1                                                              Hops-to-live
                                                   3 Request Failed
                                                                                           Decremented at each hop to prevent indefinite routing
          B detects a                                                                      Messages will still be forwarded with some probability even
                                      4       11                                           though hops-to-live value has reached 1, so as to prevent attacker
          loop and so       6
          replies failed                           10 Data Reply                           from tracing the requester’s location.
                                  7                                   D               Depth counter
                                  5                    9                                 Incremented at each hop to let a replying node to set hops-
                                                                                         to-live high enough to reach a request.
                        F         8                                                      Requestors should initialize it to a small random value to
                                                                                         obscure their location.
                 Basically a depth-first search!

    2003/12/31                                                            25        2003/12/31                                                                  26

  Neighbor selection in routing
                                                                                 Neighbor selection in routing (cont.)
       A description of the file to be requested (e.g., its file                       Once a copy of the requested file is found, it will be sent
       name) is hashed to obtain a file key that is attached in                        to the requester along the search path. Each node in
       the request message.                                                            the path caches a copy of the file, and creates a new
       A node receiving a request for a file key checks if its own                     entry in its routing table associating the actual data
       store has the file. If so, it returns the file. Otherwise, it                   source with the requested key.
       forwards the request to the node in its routing table that
       has the most ‘similar’ key to the requested one.
       If that node cannot successfully (and recursively!) find
       the requested file, then a second (again, according to
       key similarity) candidate from the routing table is
       chosen, and so on, until either the file is found, or a
       request failed message is returned.

    2003/12/31                                                            27        2003/12/31                                                                  28

P2P Observations
                                                                                 Storing Data
       Quality of routing should improve over time                                     Newly inserted files are selectively placed on nodes
          Nodes should come to specialize in locating similar keys.                    already possessing files with similar keys
          Nodes should become similarly specialized in storing files                   Insert basically follows the same depth-first like search
          having similar keys.                                                         to see if there is a key collision. If so, the colliding node
       Popular data will be replicated and mirrored closer to                          returns the pre-existing file as if a request has been
       requesters.                                                                     made. If no collision, then the file will be placed on each
                                                                                       node in the search path.
       Connectivity increases as requests are processed.
                                                                                           The nodes will also update their routing tables, and associate
       Note that files with similar hashed keys do not mean                                the inserter as the data source with the new key.
       that they are similar in content.
                                                                                           For security reason, any node along the way can unilaterally
          A crucial node failure cannot cause a particular subject to                      decide to change the insert message and claim itself or
          extinguish.                                                                      another arbitrarily node as the data source.

    2003/12/31                                                            29        2003/12/31                                                                  30
                                                                           Managing Data
       Newly inserted files are selectively placed on                           Node storage is managed as an LRU (Least
       nodes already possessing files with similar                              Recently Used) cache.
       keys.                                                                    No file is eternal! A file could disappear in the
       New nodes can use inserts as a supplementary                             network if it has not been requested for a long
       means of announcing their existence to the                               period of time.
       rest of the network.                                                     Inserted files are encrypted so that node
       An attempt to supply spurious files will likely to                       operators can ‘claim’ innocence of possessing
       simply spread the real file further, as the                              some controversial files.
       original file is propagated back on collusion.                                The requester who knows the actual file name can use
                                                                                     that information to decrypt the encrypted file.
                                                                                          Vulnerable to dictionary attack

    2003/12/31                                                      31       2003/12/31                                                               32

  Freenet Naming
                                                                           Freenet                File Export
       Hierarchical name system                                                 Consider exporting file with name “My life.mp3”
            Files are identified by the hash of their filenames                     Compute a public/private key pair from name using a
                                                                                    deterministic algorithm
            Cannot have multiple files with the same name
                                                                                File is encrypted with the hash of the public key
            Global single-level namespace is not desirable, since
                                                                                    Goal is not to protect data – the file contents should be
            malicious users can engage in “key-squatting”                           visible to anyone who knows the original keyword
       Two - level namespace                                                        Goal is to protect site operators – if a file is stored on your
            Each user has their own directory                                       system, you have no way of decrypting its contents
                                                                                File is signed with the private key
                                                                                    Integrity check (though not a very strong one)

    2003/12/31                                                      33       2003/12/31                                                               34

  Freenet Directories                                                    P2P
                                                                           Adding Nodes
       Two - level directories                                                  Node connecting to network must obtain
            Users can create a signed-subspace                                  existing node’s address though out-of-band
            Akin to creating a top-level directory per user                     means
            Subspace creation involves generating a public/private
            key pair for the user                                               Once connected, a new node message is
            The user’s public key is hashed, XORed and then                     propagated to randomly selected, connected
            rehashed with the file public key to yield the file                 nodes so existing nodes learn of new node’s
            encryption key                                                      existence
       For retrieval, you need to know the user’s
       public key and the file’s original name

    2003/12/31                                                      35       2003/12/31                                                               36
  Freenet Summary
      Advantages                                                                 Fully Distributed P2P Networks are essentially an
           Totally decentralize architecture                                     overlay over the Internet
                robust and scalable                                              Neighbors in the overlay may not necessarily
                                                                                 reflect proximity
           Does not always guarantee that a file is found, even if
           the file is in the network

   2003/12/31                                                        37       2003/12/31                                             38

                                                                            Gnutella History
                                                                                 Originally conceived of by Justin Frankel, 21
                                                                                 year old founder of Nullsoft
                                                                                 March 2000, Nullsoft posts Gnutella to the web
                Case Study: GNUTELLA                                             A day later AOL removes Gnutella at the
                                                                                 behest of Time Warner
                                                                                 there are multiple open source
      Peer-to-Peer Storage Space Sharing System                                  implementations
                                                                                 the Gnutella protocol has been widely analyzed

                                                                              2003/12/31                                             40

                                                                            Design Goals
      Gnutella was developed in the early 2000 by                                Ability to operate in a dynamic environment
      Justin Frankel’s Nullsoft. Nullsoft was later                              Performance and Scalability
      acquired by AOL, which has now stopped this                                Reliability
      But, this application had already distributed as
      an open source application, and so different
      versions still exist.
      Gnutella has become a Protocol now.

   2003/12/31                                                        41       2003/12/31                                             42
                                                                                                             Protocol Definition
             message broadcasting for node discovering                                                            A new node (called a servent in Gnutella)
             and search requests (Call-and-Response                                                               joins a system by connecting to a known host.
             protocol)                                                                                            Communication between servents:
             forming of overlay network; connecting: join                                                              A set of descriptors used for communicating data
             the “several known hosts”                                                                                 between servents
             user data transfer: store and forward using                                                               A set of rules governing the inter-servent exchange of
             HTTP                                                                                                      descriptors

          2003/12/31                                                                                  43       2003/12/31                                                       44

  Defined Descriptors
                                                                                                           P2P Node Join/Discovery

 Descriptor         Description
                                                                                                                            G-Node           G-Node    Ping
 Ping               Used to actively discover hosts on the network

 Pong               The response to a Ping
                                                                                                                 A new node connects to one of several known hosts.
 Query              The primary mechanism for searching the distributed                                          flooding of PING/PONG messages; broadcasting range
                    network                                                                                      limited by TTL counter.
 QueryHit           The response to a Query                                                                      short time memory of messages already seen; prevents
                                                                                                                 re-broadcasting; GUIDs to distinguish msg
                                                                                                                 To cope with the dynamic environment, a node
 Push               A mechanism that allows a firewalled servent to
                    contribute file-based data to the network
                                                                                                                 periodically PINGs its neighbors to discover other
                                                                                                                 participating nodes.
          2003/12/31                                                                                  45       2003/12/31                                                       46

  Search Query/Download
                                                                                      G-Node                      unstable/loose connectivity of the servents
                       1) Node A asks Node B
                                                                                                                       performance management difficult
                                                                           3) B forwards the request to
                          for data.
                                                                              its neighbors.                      scalability: e.g. TTL=10, every node
             A                                        B
                                                                           4) These return any match-
                                                                                                                  broadcasts to six others msg;
                       6) B returns matching info
                                                                              ing info.                                problem in huge networks
                                                               5) B looks up
                                                                                                                  low TTL, low search horizon
                         2) B keeps a record that
                                                                  source of          G-Node
                            A initiated the request
                                                                  request.             D

        7) A may initiate                             Node A
                                                                                                                  denial-of-service attacks
           download using

           search: Query/Query-Response (flooding/breadth-first search!)
           download: GET/PUSH. (direct transmission)

          2003/12/31                                                                                  47       2003/12/31                                                       48
P2P riding on Gnutella
                                                                                            P2P Map of the Gnutella Network

      The Top Share                    As percent of the whole
      1%            1,142,645 37%

      5%            2,182,087 70%

      10%           2,692,082 87%

      20%           3,037,232 98%

      25%           3,082,572 99%

     Source: E. Adar, B. Huberman, Free riding on Gnutella, First Monday, Vol. 5-10,            Source:         Source:
     October 2, 2000.

    2003/12/31                                                                         49       2003/12/31                                                             50

P2P                                                                                         P2P
                                                                                              Gnutella Host Count - 2003
    Gnutella Host Count - 2001

    2003/12/31                                                                         51       2003/12/31                                                             52

  Gnutella Network Analysis
                                                                                            P2P Findings
       Reference: Peer-to-Peer Architecture Case                                                   More than 40,000 hosts were found.
       Study:Gnutella Network, Matei Ripeanu, in                                                   The number of connected components is
       proceedings of IEEE 1st International                                                       relatively small
       Conference on Peer-to-peer Computing.                                                             The largest connected component includes more than
       Research conducted in 2000-2001.                                                                  95% of the active nodes.
                                                                                                         The 2nd largest connected component usually has less
                                                                                                         than 10 nodes.
                                                                                                   The dynamic graph structure
                                                                                                         40% of the nodes leave the network in less than 4 hrs
                                                                                                         25% are alive for more than 24 hrs.

    2003/12/31                                                                         53       2003/12/31                                                             54
P2P                           Node Connectivity                                          P2P Connectivity distribution

    2003/12/31                                              Source: [Ripeanu2001]   55           2003/12/31                                   Source: [Ripeanu2001]                   56

P2P Power-Law Networks                                                                   P2P                          power-law graph
                                                                                             number of
      In a power-law network, the fraction of                                                nodes found

      nodes with L links is proportional to L−k, where                                                94

      k is a network dependent constant.
                 most nodes have few links and a tiny number of hubs
                 have a large number of links.                                              63

      Power-law networks are generally highly stable                                                  54

      and resilient, yet prone to occasional
      catastrophic collapse.
                 extremely robust when facing random node failures,
                 but vulnerable to well-planned attacks.
      The power law distribution appears in Gnutella                                         2
      networks, in Internet routers, in the Web, in
      call graphs, ecosystems, as well as in sociology.

    2003/12/31                                                                      57           2003/12/31                                                                           58

P2P                                                                                      P2P
    AT&T Call Graph                                                                              Gnutella and the bandwidth barrier
         from which calls were made
         # of telephone numbers

                                                                                                  queries broadcast to every node within radius ttl
                                                                                                  ⇒ as network grows, encounter a bandwidth barrier
                                                                                                  (dial up modems cannot keep up with query traffic,
                                                                                                  fragmenting the network)
                                                                                                                                      Clip 2 report
                                      # of telephone numbers called                                                                   Gnutella: To the Bandwidth Barrier and Beyond

    2003/12/31                                                                      59           2003/12/31                                                                           60
                                                      Aiello et al. STOC ‘00
                                                                                                                          P2P overview
                                                                                                                               A lookup protocol (indexing system) that maps a desired key to
                                                                                                                               a value
                                                                                                                                    insert (key, value)

                    Content-Addressable Network                                                                                     value = retrieve (key)
                                                                                                                                        Example: key can be a hashed value of a file name, and key be the IP
                               (CAN)                                                                                                    address of the host storing the file.
                                                                                                                               A storage protocol layered on top of the lookup protocol then
                                                                                                                               takes care of storing, replicating, caching, retrieving, and
                                                                                                                               authenticating the files.
                                                                                                                               Uses virtual d-dimensional coordinate space on a d-torus to
                                                                                                                               store (key,value) pairs, and a uniform hash function to map
                                                                                                                               keys to points in the d-torus
                                                                                                                               At any instant the entire coordinate space is partitioned among
                                                                                                                               all the peers in the system; each peer “owns” one individual,
                                                                                                                               distinct zone.
                                                                                                                                    (key, value) pair of a file is stored at the peer that owns the zone
                                                                                                                                    containing the corresponding point of the key.
                                                                                                                              2003/12/31                                                                   62

P2P                    Example: 2-D CAN                                                                                   P2P
                                                                                                                            Routing in CAN
When host wishes to
insert a file “love.mp3” into the                                      y handle zone [(0000,0111), (0111,1111))              Intuitively – following the straight path through the
system, it hashes the name to obtain
a key, say (0100,1100).                                                                                                      Cartesian space from source to destination
Then the entry (0100,1100),
                                                                                                                             Node maintains coordinate routing table that holds ) is added to node y’s                                                              file information           IP addresses and zones’ coordinate of its neighbors
                                                                                                                             in the space
index table.

index information for file
“love.mp3”, i.e., ( (0100,1100),
                                                            y                                                                Two nodes are neighbors if their coordinate spans )                       0111
                                                                                                                             overlap along d−1 dimensions and abut along one
To find file “love.mp3”, one uses the
same hash function to find the key
                                                                                                                             CAN message contains destination coordinate.
(0100,1100); then, routes to the peer            x                                     w                                     Node greedy forwards it to the neighbor with
handling the point/key (0100,1100),
which is y, and then obtains the IP
                                                                                                                             coordinates closest to the destination coordinate
                                                                              w handle zone [(0111,000), (1111,0111)) from y.

                                        0000    0011            0111                     1111
  So, inserting and finding a file becomes a routing
  problem: how to find a peer handling a given point    x handle zone [(0000,0000), (0011,0111))
  in the coordinate space?
             2003/12/31                                                                                              63       2003/12/31                                                                   64

  Routing in CAN
                                                                                                                            Routing Complexity
                                                                                                                                 For the d-dimensional space equally
                                                                                                                                 partitioned into n nodes the average routing
                                                                                                                                 path is (d/4)*n(1/d)
                                                                                                                                 Individual nodes maintain 2d neighbors
                                                       8 10                                                                      The path length growth proportionally to the
                                               11      12        13                                                              O(n(1/d))
                                                       16                                                                        Many different routes between two points

                                   Neighbors of node 12: 8, 10, 11, 13, 16

             2003/12/31                                                                                              65       2003/12/31                                                                   66
P2P operations of CAN
                                                                                            CAN construction
               Inserting, updating, deleting of (key,value)
               pairs                                                                             New node is allocated its own portion of the
               Retrieving value associated with a given key                                      coordinate space in three steps:
               Adding new nodes to CAN                                                                Find a node already in the CAN – look up the CAN
                                                                                                      domain name in DNS
               Handling departing nodes
                                                                                                      Pick zone to join to and route request to its owner
               Dealing with node failures                                                             using CAN mechanisms
                                                                                                      Split the zone between old and new node
                                                                                                      The neighbors of the split zone must be notified so the
                                                                                                      routing can include the new node

           2003/12/31                                                               67           2003/12/31                                                              68

  Finding a zone
                                                                                           Splitting a zone
                                           The new peer X finds a node E to join
                                           the network
  1           2     3   4        5         X then finds a peer whose zone will be           1          2      3   4        5         P splits its zone in half and
                                           split.                                                                                    assigns one half to X (J),
                                              X Randomly chooses a joining point J                                                   assuming certain ordering of the
  6    7      8         9P       10                                                         6    7     8          9 P 21 10
                                                   for load balancing                                                                dimensions, i.e. first X then Y
                             J                                                                                         X/J
                                              X sends a JOIN request message                                                         Transfer (key, value) pairs from
  11          12        13       14                                                         11         12         13       14
                                              destined for point J (handled by P)
                                                                                                                                     the half of the zone to the new
                                              via E using CAN routing mechanisms
  15          16        17       18 19                                                      15         16         17       18 19
       E                                                                                          E
                                      20                                                                                        20


           2003/12/31                                                               69           2003/12/31                                                              70

  Joining the Routing
                                                                                           Joining the Routing: Illustration

  the neighbors of the split zone must be notified so that routing can                      1          2      3   4    5                  1       2    3   4        5
  include the new node.
      O sends its routing table to N ( IPs and Zones of O’s neighbors)                      6 7        8          9    10                 6 7     8        9 21 10
      Having O’s routing table, N may figure out its routing table.
                                                                                            11         12         13   14                 11      12       13       14
      O re-computes its routing table.
      Both O’s and N’s neighbors must be informed of this
                                                                                            15         16         17   18 19              15      16       17   18 19
      reallocation of space.
                                                                                                                          20                                       20
      All of their neighbors update their own routing table.
                                                                                                 Before 21 joins:                              After 21 joins:
                                                                                                 9: 4,8,10,13                                  9: 4,8, 21 ,13
                                                                                                 4: 3,5,9                                      4: 3,5,9, 21
                                                                                                 8: 2,3,7,9,12                                 8: 2,3,7,9,12
                                                                                                 10: 5,9,14                                    10: 5, 21 ,14
                                                                                                 13: 9,12,14,17                                13: 9,12,14,17, 21
                                                                                                                                               21: 4,9,10,13
           2003/12/31                                                               71           2003/12/31                                                              72
P2PNode departure and recovery                                                 P2P
                                                                                 Recovery cont.
    Normal procedure – explicit hand over of (key,value)                               CAN state may become inconsistent if multiple adjacent
    database to one of the neighbors
                                                                                       nodes fail simultaneously
    Node failure – immediate takeover procedure:
                                                                                       In such cases perform an expanding ring search for any
       Failure detected as a lack of update messages
                                                                                       nodes residing beyond the failure region (?)
       Each neighbor starts timer with proportion to the node’s
       zone size                                                                       If it fails initiate repair mechanism (?)
       After timer expires the node extends its own zone to contain                    If a node holds more than one zone initiate the
       the failed neighbor’s zone and sends TAKEOVER message                           background zone-reassignment algorithm
       to all failed node’s neighbors
       On receive of the TAKEOVER node cancels its timer if the
       sender’s zone size is smaller than his own. Otherwise it
       sends it’s own TAKEOVER message.

    2003/12/31                                                            73       2003/12/31                                                                74

P2P Reassignment
                                                                               P2P Reassignment (cont.)
       Case 1: when zone 2 is to be merged                                            Case 2: when zone 9 is to be merged
                                                                                             Step1: Searching for sibling of 9, but fail
          search for 2’s sibling --- node 3
                                                                                             Step2: Use DFS until two sibling leaves are reached
          node 3 takes over zone 2.                                                          Step3: Merge zone 10 with zone 11 and takeover by node 11
                                                                                             Step4: node 11 now takeover zone x , DONE !!

          1        3       8
                 3                                                                              2
                                                                                         1                   8
          4      6             10                                                               3                                          8
                                    1                       9
          5      7             11       2                                                4      6                10    1                       9
                                            3 4   5 6 7         10   11                                X
                                                                                                       910        11
                                            Binary Partition tree                                                          2
                                                                                         5      7                11            3 4   5 6 7         10   11

                                                                                                                               Binary Partition tree

    2003/12/31                                                            75       2003/12/31                                                                76

                                                                                      Chord provides support for just one operation:
                                                                                      given a key, it maps the key onto a node.
                           MIT Chord                                                  Applications can be easily implemented on top
                                                                                      of Chord.
                                                                                             Cooperative File System (CFS)

              A Scalable Peer-to-peer Lookup Protocol
                      for Internet Applications

                                                                                   2003/12/31                                                                78
  Chord-based distributed storage                                      P2P Block Store Layer
    system: CFS


             Block              Block              Block
             Store              Store              Store

                                                                              A CFS File Structure example
                                                                              The root-block is identified by a public-key and signed by
           CHORD               CHORD             CHORD                        corresponding private key
                                                                              Other blocks are identified by cryptographic hashes of their

             peer              peer                peer
    2003/12/31                                                    79       2003/12/31                                                                 80

P2P properties
  Chord                                                                P2P
       Efficient: O(Log N) messages per lookup                             Hashing is generally used to distribute objects evenly
                                                                           into a set of servers
            N is the total number of servers
                                                                                E.g., the liner congruential function h(x)= ax+b (mod p)
       Scalable: O(Log N) state per node                                        SHA-1
       Robust: survives massive changes in                                 When the number of servers changes (p in the above case),
       membership                                                          then almost every item would be hashed to a new location
                                                                                Cached objects become useless in each server when a server
                                                                                is removed or introduced to the system.
                                                                                                  mod 5     0
                                                                                        103       ×         1

                                                                                                  ×         2
                                                                                        303   ×
                                                                                                  ×         3
                                                                                                                mod 7
                                                                                                  ×         4       Add two new buckets (now mod 7)
    2003/12/31                                                    81       2003/12/31                                 6                               82

  Consistent Hashing
                                                                       P2P possible implementation
       Load is balanced                                                                               Some interval
                                                                              objects                                           servers
       Relocation is minimum
                                                                                  001                                               0
            When an N th server joins/leaves the system, with high
            probability only an O(1/N) fractions of the data objects              012                                               1
            need to be relocated                                                  103

                                                                            Objects are servers are first mapped (hashed) to points in the
                                                                            same interval
                                                                            Then objects are actually placed into the servers that are
                                                                            closest to them w.r.t. the mapped points in the interval.
                                                                                E.g., D001→S0, D012→S1, D313→S3
    2003/12/31                                                    83       2003/12/31                                                                 84
P2P server 4 joins
                                                                                P2P server 3 leaves
       objects                        Some interval          servers                    objects                           Some interval                         servers
           001                                                   0                          001                                                                       0
           012                                                   1                          012                                                                       1
           103                                                                              103
                                                                 2                                                                                                    2
           303                                                                              303
                                                                 3                                                                                                    3
           637                                                                              637
           044                                                   4                          044                                                                       4

     Only D103 needs to be moved from S3 to S4. The rest remains
                                                                                      Only D313 and D044 need to be moved from S3 to S4.

    2003/12/31                                                             85        2003/12/31                                                                                       86

  Consistent Hashing in Chord
                                                                                P2P               An ID Ring of length 26-1
       Node’s ID = SHA-1 (IP address)                                                                                                   N1

       Key’s ID = SHA-1 (object’s key/name)                                                                   N56
                                                                                                    k54                                                    N8
       Chord views the ID’s as                                                                                                                                             k10
            Uniformly distributed
            occupying a circular identifier space                                                   N48                    Circular                             N14
                                                                                                                          ID Space
       Keys are placed at the node whose Ids are the
       closest to the (ids of) the keys in the clockwise                                                                                                    N21

       direction.                                                                                         N42
            successor(k): the first node clockwise from k.                                                                                                  k24
                                                                                                              k38 N38
            Place object k to successor(k).                                                                                                            k30

    2003/12/31                                                             87        2003/12/31                                                                                       88

P2P              Simple Lookup
                                                                                P2P               Scalable Lookup
                                                                                                                                                                      Finger table
                                                                                                                                                                      N8+1      N14
                                                                                                                                                                      N8+2      N14
                                                                                                                                                                      N8+4      N14
                                                 N1          lookup(k54)                                                              N1
                                                                                                                                                                      N8+8      N21
                         N56                                                                              N56                                                         N8+16 N32
                 k54                                  N8                                          k54                                                 N8              N8+32 N42
                   N51                                                                              N51
                                                                                                                                                +2                        k10
                                                                                                                    +32            +8      +4
                 N48                  Circular             N14
                                                                                                  N48                                                      N14
                                     ID Space
                                                       N21                                                                                             N21

                       N42                                                                              N42
                               N38                                                                        k38 N38
                                         N32                                                                                 N32                     k30

        Lookup correct if successors are correct                                  The ith entry in the finger table points to successor(n+2i−1 (mod 26))
        Average of n/2 message exchanges
    2003/12/31                                                             89        2003/12/31                                                                                       90
P2P            Scalable Lookup
                                                                               Finger table at N8
                                                                               N8+1       N14
                                                                                                    P2P             Scalable Lookup
                                                                                                                                                               Finger table at N42
                                                                                                                                                               N42+1       N48
                                                                               N8+2       N14                                                                  N42+2       N48
                                                          lookup(k54)          N8+4       N14                                                      lookup(k54) N42+4       N48
                                               N1                                                                                             N1
                                                                               N8+8       N21                                                                  N42+8       N51
                       N56                                                     N8+16      N32                               N56                                N42+16      N1
                k54                                            N8              N8+32      N42                       k54                              N8        N42+32      N14
                 N51                                                                                                  N51
                              +32             +8    +4
               N48                                               N14                                                N48                                 N14

                                                                N21                                                                                   N21

                     N42                                                                                                  N42

                         N38                                                                                                      N38
                                   N32                                                                                                  N32
    Look in local finger table for the largest n s.t.
      my_id < n < ket_id
 If n exists, call n.loopup(key_id), else return successor(my_id)
  2003/12/31                                                                                  91       2003/12/31                                                               92

P2P            Scalable Lookup
                                                                      Finger table at N51
                                                                      N51+1       N56
                                                                                                    P2P joins
                                                                      N51+2       N56
                                                          lookup(k54) N51+4       N56
                                                                      N51+8       N1                      When a node i joins the system from any
                       N56                                            N51+16      N8                      existing node j:
                k54                                         N8        N51+32      N21                          Node j finds successor(i) for i, say k
                                                                                                               i sets its successor to k, and informs k to set its
                                                                                                               predecessor to i.
                                                                                                               k’s old predecessor learns the existence of i by running,
                                                                                                               periodically, a stabilization algorithm to check if k’s
                                                                                                               predecessor is still it.

    Each node can forward a query at least halfway along the
    remaining distance between the node and the target identifier.
    Lookup takes O(N) steps.
  2003/12/31                                                                                  93       2003/12/31                                                               94

P2P            Node joins (cont.)
                                                                      Finger table
                                                                      N8+1     N14
                                                                                                    P2P             Node Fails
                                                                      N8+2     N14
                                                   N25 joins
                                        N1         via N8             N8+4     N14
                                                                      N8+8     N21
               N56                                                    N8+16    N32                       Can be handled simply as the invert of node
                                                                      N8+32    N42                       joins; I.r., by running stabilization algorithm.

    N48                      Circular                    N14
                            ID Space


                             N32                    N25

                               k30                   aggressive mechanisms requires
                                                     too many messages and updates
  2003/12/31                                                                                  95       2003/12/31                                                               96
P2P                       Handling Failures
       Use successor list                                                                                  NOT that simple (compared to CAN)
            Each node knows r immediate successors                                                         Member joining is complicated
            After failure, will know first live successor                                                     aggressive mechanisms requires too many messages and
            Correct successors guarantee correct lookups
                                                                                                              no analysis of convergence in lazy finger mechanism
       Guarantee is with some probability                                                                  Key management mechanism mixed between layers
            Can choose r to make probability of lookup failure                                                upper layer does insertion and handle node failures
            arbitrarily small                                                                                 Chord transfer keys when node joins (no leave mechanism!)
                                                                                                           Routing table grows with # of members in group
                                                                                                           Worst case lookup can be slow

    2003/12/31                                                                                97        2003/12/31                                                       98

P2P Summary
          Filed guaranteed to be found in O(log(N)) steps
          Routing table size O(log(N))
          Robust, handles large number of concurrent join and leaves
          Performance: routing in the overlay network can be more
                                                                                                                     Related Work: OceanStore
          expensive than in the underlying network
                 No correlation between node ids and their locality; a query can
                 repeatedly jump from Taiwan to America, though both the initiator
                 and the node that store the item are in Taiwan!
                                                                                                                     A Global Scale Persistent Storage Utility
            Partial solution: Weight neighbor nodes by Round Trip                                                                 Infrastructure
            Time (RTT)
                 when routing, choose neighbor who is closer to destination with
                 lowest RTT from me » reduces path latency

    2003/12/31                                                                                99

  OceanStore: Motivation
    Ubiquitous Computing                                                                                        Assume 1010 people in world, say 10,000 files/person
          Computing everywhere,                                                                                 (very conservative?), then 1014 files in total!
          Connectivity everywhere                                                                               If 1MB/file, then 1020 size is needed
    But, are data just out there?                                                                               Surely, this must be maintained cooperatively by many
    OceanStore: An architecture for global-scale                                                                ISPs.
    persistent storage                                                                                     Persistent
                                                                                                                Geographic independence for availability, durability,
                                                                                                                and freedom to adapt to circumstances

    2003/12/31                                                                                101       2003/12/31                                                      102
  Challenges (cont.)
                                                                                                         Design Goals
       Security                                                                                               Untrusted Infrastructure:
            Encryption for privacy, signatures for authenticity, and                                               The OceanStore is comprised of untrusted components
            Byzantine commitment for integrity                                                                     Only ciphertext within the infrastructure
       Robust                                                                                                 Nomadic Data: data are allowed to flow freely
            Redundancy with continuous repair and redistribution                                                   Promiscuous Caching: Data may be cached anywhere,
            for long-term durability                                                                               anytime
       Management                                                                                                       continuous introspective monitoring is used to discover tacit
            Automatic optimization, diagnosis and repair                                                                relationships between objects.
                                                                                                                   Optimistic Concurrency via Conflict Resolution
            Utility Infrastructure
            Users pay monthly fee to access their data

    2003/12/31                                                                                   103       2003/12/31                                                               104

P2P OceanStore system
                                                                                                         Secure Naming
                                                                                                              Unique, location independent identifiers:
                                                                                                                   Every version of every unique entity has a permanent,
                                                                                                                   Globally Unique ID (GUID)
                                                                                                                   160 bits SHA-1 hashes
                                                                                                                        280 names before name collision
                                                                                                              Naming hierarchy:
                                                                                                                   Users map from names to GUIDs via hierarchy of
                                                                                                                   OceanStore objects (ala SDSI)
                                                                                                                   Requires set of “root keys” to be acquired by user

      The core of the system is composed of a multitude of
      highly connected “pools”, among which data is allowed
      to “flow” freely. Clients connect to one or more pools,
      perhaps intermittently.
    2003/12/31                                                                                   105       2003/12/31                                                               106

P2P Location and Routing
                                                                                                       P2P Filter (BF)
       a two-tiered approach                                                                                  A probabilistic algorithm to quickly test
            Attenuated bloom filters                                                                          membership in a large set using multiple hash
                 fast, probabilistic                                                                          functions into a single array of bits.
            Wide-scale distributed data location
                 slower, reliable, hierarchical

                                                                                                                          BF           Key no in database

                                                                                                          Check DB to see if Key is there              Main Idea:
                                                                                                                  yes           no                     h(x) ≠ h(y) ⇒ x ≠ y
                                                                                                                                                       h(x) = h(y) ⇒ x = y
                                                                                                                               Filter error

    2003/12/31                                                                                   107       2003/12/31                                                               108
P2P Filter Design
                                                                                   m = 11 (normally, m would be much much
   Use an m bits vector, initialized to 0’s, for the BF.                           larger).
      Larger m => fewer filter errors.
                                                                                   h = 2 (2 hash functions).
   Choose h > 0 hash functions: f1(), f2(), …, fh().
                                                                                   f1(k) = k mod m.
   When key k inserted into DB, set bits f1(k), f2(k), …,
                                                                                   f2(k) = (2k) mod m.
   and fh(k) in the BF to 1.
      f1(k), f2(k), …, fh(k) is the signature of key k.                                                                     K=15 inserted
                                                                                  BF 0 0 0 0 0 0 0 0 0 0 0
                                                                                         1   1   1     1                    K=17 inserted
                                                                                            10 9 8 7 6 5 4 3 2 1 0

     2003/12/31                                                      109          2003/12/31                                                     110

                                                                           P2P Filter Design

                  BF 0 0 0 0 0 0 0 0 0 0 0
                         1   1   1     1
                                                                                  Choose m (filter size in bits).
                                                                                        Use as much memory as is available.
                     10 9 8 7 6 5 4 3 2 1 0
                                                                                  Pick h (number of hash functions).
        DB has k = 15 and k = 17.                                                       h too small => probability of different keys having same
        Search for k.                                                                   signature is high.
             f1(k) = 0 or f2(k) = 0 => k not in DB.                                     h too large => filter becomes filled with ones too soon.
             f1(k) = 1 and f2(k) = 1 => k may be in DB.                           Select the h hash functions.
                                                                                        Hash functions should be relatively independent.
        k = 25 => not in DB
        k = 6 => filter error.

     2003/12/31                                                      111          2003/12/31                                                     112

  Attenuated bloom filters
                                                                           P2P probabilistic query process in
        An attenuated bloom filter of depth D can be
        viewed as an array of D normal Bloom filters                                                                           11010

             The first Bloom filter is a record of the objects                                   11100
                                                                                                         1st BF                 n3      Key
             contained locally on the current node.                          Lookup at n1        11011
                                                                                                         2nd BF
             The ith Bloom filter is the union of all of the Bloom                 Key
                                                                                  (0,1,3)      n1             n2
             filters for all of the nodes a distance i through any path                                             00011
             from the current node.                                                          10101          11100
        Query is routed along the edges whose filters
        indicate the presence of the object at the
        smallest distance.
                                                                                   Local BF

                                                                                   ith BF

     2003/12/31                                                      113          2003/12/31                                                     114
                                                                                                                       Routing and Location
                 A prototype of a decentralized, fault-tolerant,                                                        Object location:
                 adaptive overlay infrastructure                                                                           map GUIDs to root node Ids
                 Network substrate of OceanStore                                                                               Each object has its own hierarchy rooted at Root
                    Routing: Suffix-based hypercube                                                                            Root responsible for storing object’s location
                    Similar to Plaxton, Rajamaran, Richa (SPAA97)                                                       Suffix routing from A to B
                    Decentralized location:                                                                                At hth hop, arrive at nearest node hop(h) such that:
                    Virtual hierarchy per object with cached location                                                      hop(h) shares suffix with B of length h digits
                                                                                                                           Example: 5324 routes to 0729 via
                                                                                                                           5324 7349        3429 4729 0729

            2003/12/31                                                                                         115        2003/12/31                                                     116

P2P Basic Plaxton Mesh
                    Incremental suffix-based routing
                                                                                                                       Neighbor Table
                                                       4                    2                                                                     Neighbor table at (3120)4
           NodeID                         NodeID                                        NodeID
           0x79FE                         0x23FE                                        0x993E                                                       0         1        2          3
                                                                   NodeID                              1                                    3      0120     1120      2120        3120
                                 3            NodeID               0x43FE
                 NodeID                                                                                                                     2      x020     x120      x220        x320
                 0x44FE                       0x73FE
       2                                                                                     1
                                                                            3                          NodeID                               1      xx00      xx10     xx20        xx30
                                     4                         4      3
             2                                                                                         0xF990                               0      xxx0      xxx1     xxx2        xxx3
 0x035E                                           3                                 NodeID
                                                      NodeID                        0x04FE                 4
             NodeID          2           NodeID       0x13FE                                     NodeID
   3         0x555E                      0xABFE                             1                    0x9990
            1                            2                                                         3
   NodeID                                                      NodeID           1        NodeID
   0x73FF                                                      0x239E                    0x1290
            2003/12/31                                                                                         117        2003/12/31                                                     118

  Surrogate Routing
                                                                                                                     P2P Joins
                 Neighbor table will have holes                                                                              A little bit complicated
                    Must be able to find unique root for every object                                                             Try to think about the problem by yourself
                    Tapestry’s solution: try next highest.

            2003/12/31                                                                                         119        2003/12/31                                                     120
  Classifying p2p…
                                                                                        P2P vs. Copyright
                        Research      Commercial                                                P2p disrupts traditional distribution
                          JXTA         WorldStreet

                          .NET           Groove                                                 Notions of copyright and intellectual property
                         Jabber            Jibe                                                 need to be put in a digital-age context (and
                        Open Cola       Endeavors
                         Folders         Entropia                                               new business models will need to be developed
                                      DataSynapse                                               and implemented)

                       Genome Seq
                       Free Haven
                                     Kazaa, Morpheus
                       Mojo Nation
                        Research      Commercial           Source: Burton Group
     2003/12/31                                                                   121        2003/12/31                                                                              122

  Content Delivery Network
                                                                                                Increasing Internet traffic
                                                                                                Congestion in the Internet.
     content provider
                                                                                                Web Servers sometimes become overloaded due to too
                                                                                                many people trying to access their content over a
                                                                                                short period of time.
                                        Internet                                                   911 Attack
                                                                                                The 8-Second Rule
                                                                                                   If your webpage hasn't loaded within 8 seconds, chances
                                                                                                   are your viewers are history.
                                                                                                Complexity and Cost of managing and operating a
                                                                                                global network
                                                                                                Increasing demand of rich content delivery

     2003/12/31                                                                   123        2003/12/31                                                                              124

P2P is a Content Delivery Network?
                                                                                        P2P How does it work?
                                                                                             Web publishers decide on the portions of their web site they want to be
                                                                                             served by the CDNs.
   Network of content servers deployed throughout the
                                                                                                  Use CDNs for images or rich content.
   Internet available on a subscription basis to publishers.
                                                                                                  Most web pages: 70% objects
   Web publishers use these to store their high-demand or rich                               CDN companies provide web content distributors with the software
   content (ie, certain portions of their web site).                                         tools to modify their HTML code.
   Support for delivery of many content types (e.g, HTML,                                    CDN (e.g., Akamai) creates new domain names for each client content
   graphics, streaming media, etc.)
   Brings content closer to end-users but no changes required                                The URL’s pointing to these objects on the publishers server are then
   at end-hosts.                                                                             modified so that the content can now be served from the CDN servers.
                                                                                                  “Akamaize” content
                                                                                                    • becomes
                                                                                             Using multiple domain names for each client allows the CDN to further
                                                                                             subdivide the content into groups.
                                                                                                  DNS sees only the requested domain name, but it can route requests for different
                                                                                                  domains independently.

     2003/12/31                                                                   125        2003/12/31                                                                              126
P2PAkamai with DNS hooks                                                                   P2P How does it work?
                                                                                             CDN:                                                   Akamai servers
   “Akamaizes” its content.                                                    Monitoring/Routing:
                                                                 store/cache secondary
                                               DNS servers
                                                                       content for                   Some kind of probing algorithms used to monitor state of
                                                                 “Akamaized” services.               network - traffic conditions, load on servers, and location of
                                                             a                                       users.
 DNS server                          lookup
 for                                                                                         generate network map incorporating this information - maps
                                                                      b                              updated frequently to ensure the most current view of the
                                                                                                     CDN develops its own “routing tables to direct the user to
                                                                                c                    the fastest location.”
                                          DNS server

                    “Akamaized” response object has inline URLs for
                    secondary content at and other
                    Akamai-managed DNS names.                   Source: Jeff Chase

       2003/12/31                                                                    127       2003/12/31                                                                        128

P2P How does it work?
                                                                                           P2P Benefits:
           Delivery:                                                                               Highly scalable:
              Data to be served by CDNs is pre-loaded onto the servers.                               As the demand for a document increases, the number of
              CDNs take care of migration of data to the appropriate                                  servers serving that document also increases.
              servers.                                                                                Ensure that no content server is overloaded by requests.
              Users retrieve modified HTML pages from the original                                 Fault Tolerant: guarantee 100% uptime
              server, with references to objects pointing to the CDN.                              High speed connections from content servers to the
              Content is served from the best server.                                              Internet.

       2003/12/31                                                                    129       2003/12/31                                                                        130

P2P and Layer 4 Switching:
                                                                                           P2P and Layer 4 Switching:
           What is Layer 4 switching?                                                              Switch performs Load Balancing:
              Switch employs the information contained in the transport                              Multiple servers assigned the same virtual IP address.
              header to assist in switching traffic.                                                 switch maintains information on server loads.
              Layer 4 info - port numbers to identify applications (port 80                          traffic load-balancing done based on specified criteria (e.g.,
              for HTTP, 20/21 for FTP, etc.)                                                         least connections, round robin, etc.)
           Switch keeps track of established sessions to individual                                  Maintain session management information:
           servers                                                                                          ensure that all packets within a session are forwarded to the same
              use Destination IP address + destination port + Source IP                                     server
              address + source port for session identification                                              Ex: eShopping sessions: 2 connections - persistent HTTP for
                                                                                                            shopping cart and SSL for purchases within cart.

       2003/12/31                                                                    131       2003/12/31                                                                        132
  Akamai: Service Operator Model
                                                                                         P2P Theoretical Foundations
    Background                                                                                   Consistent Hashing and Random Trees [Karger,
         Founded by a group of MIT students and professors in 1998
                                                                                                 et al., STOC 1997]
         Customer is content provider
         Value: quality of service
            Given contents provided by clients, Akamai deploys the contents into
            its worldwide network to provide some “quality of service” browsing
    Network Deployment
         More than 15,000+ servers deployed in over 1,100+ networks in 66+

    2003/12/31                                                                     133        2003/12/31                                                                      134

  Random Trees
                                                                                           Random Trees (cont.)
       Use a tree of caches to coalesce requests                                               All requests for pages is a 4-tuple:
       Balance load by using a different tree for each                                             Requester’s Id
       page and assigning tree nodes to caches via a                                               Name of the desired page
                                                                                                   A routing path
       random hash function.
                                                                                                   A sequence of caches that should act as the nodes in the path
                                                                                                           The root of the tree is always the server for the requested page
                                                                                                           All other nodes are mapped to the caches by a hash function
                                                                                                                   h: P × [1..C] → C, where
                                                                                                                              P: the set of all pages
                                                                                                                              C: the number of caches
                                                                                                                              C: the set of caches
                                                                                               A cache stores a copy of a requested page only after it
                                                                                               has seen q requests for the page.

    2003/12/31                                                                     135        2003/12/31                                                                      136

  Random Trees (cont.)
                                                                                           Random Trees in an Inconsistent World
                                                                                                 In the above scheme, a consistent hash
     Browsing:                                                                                   function can be used for h to cope with the
           A browser picks a random leaf to root path, maps the                                  dynamic nature of the cache servers.
           nodes to machines with h, and asks the leaf node for
           the page.                                                                             A node can also know only about a 1/t fraction
                                                                                                 of the cache servers without a risk (in
                                                                                                 reasonable probability) of swamping or load
           When a cache receives a request, returns a copy of the                                unbalancing.
           page if it has it, or forwards the request to the next
           node. It also increments a counter for the page and the
           node it is acting as.

    2003/12/31                                                                     137        2003/12/31                                                                      138
  Akamai’s Stock Price

    Deploying and maintaining a CDN network is VERY expensive

    2003/12/31                                                  139

Shared By: