Peer-to-Peer _P2P_ networks and applications

Shared by: jizhen1947
-
Stats
views:
3
posted:
7/17/2011
language:
English
pages:
90
Document Sample
scope of work template
							Peer-to-Peer (P2P) networks
and applications




                              1
What is P2P?
 “the sharing of computer resources and
 services by direct exchange of
 information”




                                           2
What is P2P?
 “P2P is a class of applications that take
  advantage of resources – storage, cycles,
  content, human presence – available at the
  edges of the Internet. Because accessing
  these decentralized resources means
  operating in an environment of unstable and
  unpredictable IP addresses P2P nodes must
  operate outside the DNS system and have
  significant, or total autonomy from central
  servers”


                                                3
What is P2P?
 “A distributed network architecture may
 be called a P2P network if the participants
 share a part of their own resources. These
 shared resources are necessary to provide
 the service offered by the network. The
 participants of such a network are both
 resource providers and resource
 consumers”



                                               4
What is P2P?
 Various definitions seem to agree on
   sharing of resources
   direct communication between equals (peers)
   no centralized control




                                                  5
What is a peer?


 “…an entity with capabilities similar
 to other entities in the system.”




                                          6
Client/Server Architecture
 Well known,
  powerful, reliable
  server is a data                                               Server
  source
 Clients request data
  from server                         Client                              Client

                                                        Internet
 Very successful
   model
       WWW (HTTP), FTP,              Client                              Client
        Web services, etc.




* Figure from http://project-iris.net/talks/dht-toronto-03.ppt                7
Client/Server Limitations
 Scalability is hard to achieve
 Presents a single point of failure
 Requires administration
 Unused resources at the network edge


 P2P systems try to address these
  limitations



                                         8
P2P Architecture
 All nodes are both
   clients and servers
       Provide and consume
        data
                                                                  Node
       Any node can initiate a
        connection                                                       Node
                                       Node
 No centralized data
   source                                               Internet
       “The ultimate form of
        democracy on the
        Internet”                      Node                              Node
       “The ultimate threat to
        copy-right protection
        on the Internet”




* Content from http://project-iris.net/talks/dht-toronto-03.ppt            9
P2P Network Characteristics
 Clients are also servers and routers
    Nodes contribute content, storage, memory, CPU

 Nodes are autonomous (no administrative
  authority)
 Network is dynamic: nodes enter and leave the
  network “frequently”
 Nodes collaborate directly with each other (not
  through well-known servers)
 Nodes have widely varying capabilities




                                                      10
P2P Goals and Benefits
   Efficient use of resources
        Unused bandwidth, storage, processing power at the “edge of the network”
   Scalability
        No central information, communication and computation bottleneck
        Aggregate resources grow naturally with utilization
   Reliability
        Replicas
        Geographic distribution
        No single point of failure
   Ease of administration
        Nodes self-organize
        Built-in fault tolerance, replication, and load balancing
        Increased autonomy
   Anonymity – Privacy
        not easy in a centralized system
   Dynamism
        highly dynamic environment
        ad-hoc communication and collaboration



                                                                                    11
What is Peer-to-Peer (P2P)?




                              12
P2P Applications
 File sharing (Napster, Gnutella, Kazaa)


 Multiplayer games (Unreal Tournament, DOOM)


 Collaborative applications (ICQ, shared whiteboard)


 Distributed computation (Seti@home)


 Ad-hoc networks




                                                        13
P2P System Taxonomy
 Historic
 Data-centric
 Computation-centric
 User-centric
 Network-centric
 Platforms




                        14
P2P Goals/Benefits
 Cost sharing
 Resource aggregation
 Improved scalability/reliability
 Increased autonomy
 Anonymity/privacy
 Dynamism
 Ad-hoc communication



                                     15
P2P Challenges
 Decentralization
 Scalability and Performance
 Anonymity
 Fairness
 Dynamism
 Security
 Transparency
 Fault Resilience and Robustness




                                    16
Peer-to-Peer Content Sharing




                               17
Popular file sharing P2P Systems

 Napster, Gnutella, Kazaa, Freenet


 Large scale sharing of files.
    User A makes files (music, video, etc.) on their
     computer available to others
    User B connects to the network, searches for
     files and downloads files directly from user A


 Issues of copyright infringement



                                                        18
Research Areas
 Peer discovery and group management
 Data placement and searching
 Reliable and efficient file exchange
 Security/privacy/anonymity/trust




                                         19
Design Concerns
 Group Management
    Per-node state
    Load balancing
    Fault tolerance/resiliency

 Search
    Bandwidth usage
    Time to locate item
    Success rate
    Fault tolerance/resiliency




                                  20
Approaches
 Centralized
 Unstructured
 Structured (Distributed Hash Tables)




                                         21
Centralized index
                                                                      Bob
original “Napster” design        centralized
1) when peer connects, it     directory server
                                                     1
   informs central server:                                           peers

      IP address                                    1

      content
                                                     1           3
2) Alice queries for “Hey
                                           2
   Jude”                                         1


3) Alice requests file from
   Bob

                                                         Alice



                                                                             22
Centralized model
 Bob        Alice




                      file transfer is
                      decentralized, but
                      locating content is
                      highly centralized




Judy           Jane

                                            23
 Centralized
 Benefits:
                              Bob    Alice
    Low per-node state
    Limited bandwidth
     usage
    Short location time
    High success rate
    Fault tolerant
 Drawbacks:
   Single point of failure
   Limited scale             Judy     Jane


   Possibly unbalanced
    load
 copyright infringement
                                              24
 Napster
 program for sharing files over the Internet
 a “disruptive” application/technology?
 history:
      5/99: Shawn Fanning (freshman, Northeasten U.) founds
       Napster Online music service
      12/99: first lawsuit
      3/00: 25% UWisc traffic Napster
      2000: est. 60M users
      2/01: US Circuit Court of
        Appeals: Napster knew users
        violating copyright laws
      7/01: # simultaneous online users:
        Napster 160K, Gnutella: 40K, Morpheus: 300K



                                                               25
 Napster: how does it work

Application-level, client-server protocol over point-to-
  point TCP

Four steps:
 Connect to Napster server
 Upload your list of files (push) to server.
 Give server keywords to search the full list with.
 Select “best” of correct answers. (pings)




                                                           26
Napster
                  napster.com
1. File list is
   uploaded




                   users




                                27
Napster

2. User              napster.com

   requests
   search at
               Request
   server.       and
               results




                         user




                                   28
Napster

3. User pings              napster.com

   hosts that
   apparently
   have data.

  Looks for best                         pings
                   pings
  transfer rate.
                           user




                                                 29
Napster

                                napster.com
4. User retrieves
   file




                    Retrieves
                       file
                                user




                                              30
 Napster

 Central Napster server
      Can ensure correct results
      Fast search

      Bottleneck for scalability
      Single point of failure
      Susceptible to denial of service
        • Malicious users
        • Lawsuits, legislation

 Hybrid P2P system – “all peers are equal but some are more equal
  than others”
      Search is centralized
      File transfer is direct (peer-to-peer)



                                                                     31
Unstructured
 fully distributed          overlay network: graph
    no central server        edge between peer X
 used by Gnutella             and Y if there’s a TCP
 Each peer indexes the        connection
  files it makes available    all active peers and
  for sharing (and no          edges form overlay net
  other files)
                              edge: virtual (not
                               physical) link
                              given peer typically
                               connected with < 10
                               overlay neighbors

                                                        32
Gnutella: Query flooding
                              File transfer:
 Query message               HTTP
sent over existing TCP
connections
                              Query
 peers forward
                              QueryHit
Query message
 QueryHit
sent over
reverse
path
                     Query
                   QueryHit




                                               33
Gnutella: Peer joining
1. joining peer Alice must find another peer in
   Gnutella network: use list of candidate peers
2. Alice sequentially attempts TCP connections with
   candidate peers until connection setup with Bob
3. Flooding: Alice sends Ping message to Bob; Bob
   forwards Ping message to his overlay neighbors
   (who then forward to their neighbors….)
    peers receiving Ping message respond to Alice
      with Pong message
4. Alice receives many Pong messages, and can then
   setup additional TCP connections

                                                      34
Unstructured
  Carl
               Jane
                              Scalability:
                              limited scope
                              flooding




         Bob




                      Alice
 Judy


                                              35
 Unstructured
                                    Carl         Jane
 Gnutella model
 Benefits:
    Limited per-node state
    Fault tolerant

 Drawbacks:
    High bandwidth usage
    Long time to locate item              Bob

    No guarantee on success rate
    Possibly unbalanced load
                                                        Alice
                                    Judy




                                                                36
Gnutella
Searching by flooding:
 If you don’t have the file
   you want, query 7 of your
   neighbors.
 If they don’t have it, they
   contact 7 of their
   neighbors, for a maximum
   hop count of 10.
 Requests are flooded, but
   there is no tree structure.
 No looping but packets may
   be received twice.
 Reverse path forwarding




 * Figure from http://computer.howstuffworks.com/file-sharing.htm   37
 Gnutella



fool.* ?


           TTL = 2




                     38
Gnutella
               X   fool.her


IPX:fool.her
                   TTL = 1



                        TTL = 1




                     TTL = 1
                                  39
Gnutella




                       fool.me
                   Y
                          fool.you



           IPY:fool.me
               fool.you        40
Gnutella




 IPY:fool.me
     fool.you   41
Gnutella: strengths and weaknesses

 pros:
    flexibility in query processing
    complete decentralization
    simplicity
    fault tolerance/self-organization
 cons:
    severe scalability problems
    susceptible to attacks
 Pure P2P system



                                         42
Gnutella: initial problems and fixes

 2000: avg size of reachable network only 400-800
  hosts. Why so small?
      modem users: not enough bandwidth to provide search
       routing capabilities: routing black holes
 Fix: create peer hierarchy based on capabilities
    previously: all peers identical, most modem black holes
    preferential connection:
        • favors routing to well-connected peers
        • favors reply to clients that themselves serve large number
          of files: prevent freeloading




                                                                       43
 Structured
                            001                 012


 FreeNet, Chord, CAN,
  Tapestry, Pastry model
                                        212 ?
                                                      212 ?


                                  332

                                                         212

                           305




                                                               44
 Structured
                                     001                 012


 FreeNet, Chord, CAN,
  Tapestry, Pastry model
                                                 212 ?
 Benefits:                                                    212 ?

      Manageable per-node state
                                           332
      Manageable bandwidth usage
                                                                  212
       and time to locate item
                                    305
      Guaranteed success
 Drawbacks:
    Possibly unbalanced load
    Harder to support fault
     tolerance

                                                                        45
Unstructured vs Structured
P2P
 The systems we described do not offer any
  guarantees about their performance (or even
  correctness)
 Structured P2P
   Scalable guarantees on numbers of hops to answer
    a query
   Maintain all other P2P properties (load balance,
    self-organization, dynamic nature)

 Approach: Distributed Hash Tables (DHT)


                                                       46
Distributed Hash Tables (DHT)
 Distributed version of a hash table data structure
 Stores (key, value) pairs
      The key is like a filename
      The value can be file contents, or pointer to location
 Goal: Efficiently insert/lookup/delete (key, value) pairs

 Each peer stores a subset of (key, value) pairs in the system
 Core operation: Find node responsible for a key
      Map key to node
      Efficiently route insert/lookup/delete request to this node
 Allow for frequent node arrivals/departures




                                                                     47
DHT Desirable Properties
 Keys should be mapped evenly to all nodes
  in the network (load balance)
 Each node should maintain information
  about only a few other nodes (scalability,
  low update cost)
 Messages should be routed to a node
  efficiently (small number of hops)
 Node arrival/departures should only affect
  a few nodes


                                               48
DHT Routing Protocols

 DHT is a generic interface


 There are several implementations of this interface
      Chord [MIT]
      Pastry [Microsoft Research UK, Rice University]
      Tapestry [UC Berkeley]
      Content Addressable Network (CAN) [UC Berkeley]

      SkipNet [Microsoft Research US, Univ. of Washington]
      Kademlia [New York University]
      Viceroy [Israel, UC Berkeley]
      P-Grid [EPFL Switzerland]
      Freenet [Ian Clarke]


                                                              49
Basic Approach
In all approaches:
 keys are associated with globally unique IDs
      integers of size m (for large m)
 key ID space (search space) is uniformly populated
  - mapping of keys to IDs using (consistent) hashing
 a node is responsible for indexing all the keys in a
  certain subspace (zone) of the ID space
 nodes have only partial knowledge of other node’s
  responsibilities




                                                         50
Improvements: SuperPeers
 KaZaA model
 Hybrid centralized and unstructured
 Advantages and disadvantages?




                                        51
Hierarchical Overlay
 between centralized
  index, query flooding
  approaches
 each peer is either a
  super node or assigned to
  a super node
      TCP connection between
       peer and its super node.
      TCP connections between
       some pairs of super nodes.   ordinary peer


 Super node tracks content         group-leader peer


  in its children                   neighoring relationships
                                       in overlay network


                                                               52
Kazaa (Fasttrack network)
 Hybrid of centralized Napster and decentralized Gnutella
      hybrid P2P system

 Super-peers act as local search hubs
      Each super-peer is similar to a Napster server for a small
       portion of the network
      Super-peers are automatically chosen by the system based on
       their capacities (storage, bandwidth, etc.) and availability
       (connection time)

 Users upload their list of files to a super-peer
 Super-peers periodically exchange file lists
 You send queries to a super-peer for files of interest




                                                                      53
File Distribution: Server-Client vs P2P
  Question : How much time to distribute file
   from one server to N peers?
                                                us: server upload
                                                bandwidth
               Server
                                                ui: peer i upload
                         u1   d1   u2           bandwidth
                    us                  d2
                                                di: peer i download
File, size F                                    bandwidth
               dN
                          Network (with
               uN         abundant bandwidth)




                                                                      54
File distribution time: server-client
                                    Server
 server sequentially           F                u1 d1 u2
  sends N copies:                         us                d2

      NF/us time                    dN        Network (with
                                               abundant bandwidth)
 client i takes F/di                uN
  time to download

   Time to distribute F
       to N clients using = dcs = max { NF/us, F/min(di) }
                                                  i
 client/server approach
                              increases linearly in N
                              (for large N)                          55
File distribution time: P2P
                                      Server
 server must send one
                                  F            u1 d1 u2
  copy: F/us time                     us              d2

 client i takes F/di time
                                         Network (with
  to download
                                   dN
                                         abundant bandwidth)
 NF bits must be
                                   uN

  downloaded (aggregate)
    fastest possible upload rate: us + Sui




       dP2P = max { F/us, F/min(di) , NF/(us + Sui) }
                                      i
                                                               56
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us

                                   3.5
                                             P2P
       Minimum Distribution Time



                                    3
                                             Client-Server
                                   2.5

                                    2

                                   1.5

                                    1

                                   0.5

                                    0
                                         0    5       10     15       20   25   30   35

                                                                  N
                                                                                          57
File distribution: BitTorrent
 Efficient content distribution system using   file
  swarming. Usually does not perform all the
  functions of a typical p2p system, like searching.
 CacheLogic estimated (around 2003 or so) that
  BitTorrent Traffic accounts for roughly 35% of all
  traffic on the Internet.
 Author: Bram Cohen




                                                       58
File distribution: BitTorrent
 P2P file distribution

     tracker: tracks peers          torrent: group of
     participating in torrent       peers exchanging
                                    chunks of a file


    obtain list
    of peers

                          trading
                          chunks




                  peer

                                                        59
BitTorrent
 file divided into 256KB   chunks.
 peer joining torrent:
    has no chunks, but will accumulate them over time
    registers with tracker to get list of peers,
     connects to subset of peers (“neighbors”)
 while downloading, peer uploads chunks to other
  peers.
 peers may come and go
 once peer has entire file, it may (selfishly) leave or
  (altruistically) remain
                                                           60
BT: File sharing
 To share a file or group of files, a peer first creates
  a .torrent file, a small file that contains:
      metadata about the files to be shared, and
      Information about the tracker, the computer that
       coordinates the file distribution.


 Peers first obtain a.torrent file, and then connect to
  the specified tracker, which tells them from which
  other peers to download the pieces of the file.




                                                            61
Overall Architecture
             Web Server          Tracker




                                            C
     A
                                           Peer
   Peer                                    [Seed]
                           B
 [Leech]
Downloader                Peer
   “US”               [Leech]                       62
Overall Architecture
             Web Server          Tracker




                                            C
     A
                                           Peer
   Peer                                    [Seed]
                           B
 [Leech]
Downloader                Peer
   “US”               [Leech]                       63
Overall Architecture
             Web Server          Tracker




                                            C
     A
                                           Peer
   Peer                                    [Seed]
                           B
 [Leech]
Downloader                Peer
   “US”               [Leech]                       64
Overall Architecture
             Web Server          Tracker




                                            C
     A
                                           Peer
   Peer                                    [Seed]
                           B
 [Leech]
Downloader                Peer
   “US”               [Leech]                       65
Overall Architecture
             Web Server          Tracker




                                            C
     A
                                           Peer
   Peer                                    [Seed]
                           B
 [Leech]
Downloader                Peer
   “US”               [Leech]                       66
Overall Architecture
             Web Server          Tracker




                                            C
     A
                                           Peer
   Peer                                    [Seed]
                           B
 [Leech]
Downloader                Peer
   “US”               [Leech]                       67
BT: The .torrent file
 The URL of the tracker
 Pieces <hash1, hash 2,…, hash n>
 Piece length
 Name of the file
 Length of the file




                                     68
BT: The Tracker

 IP address, port, peer id
 State information (Completed or Downloading)
 Returns a random list of peers




                                                 69
BT: Pulling Chunks

 at any given time, different peers have different
  subsets of file chunks
 periodically, a peer (Alice) asks each neighbor for
  list of chunks that they have.
 Alice sends requests for her missing chunks




                                                        70
BT: Pieces and Sub-Pieces
 A piece is broken into sub-pieces ...
  typically 16KB in size
 Until a piece is assembled, only download
  the sub-pieces of that piece only
 This policy lets pieces assemble quickly




                                              71
BT: Pipelining

 When transferring data over TCP, always have several

  requests pending at once, to avoid a delay between
  pieces being sent. At any point in time, some number,
  typically 5, are requested simultaneously.

 Every time a piece or a sub-piece arrives, a new

  request is sent out.




                                                       72
BT: Piece Selection

 The order in which pieces are selected by
  different peers is critical for good performance
 If an inefficient policy is used, then peers may
  end up in a situation where each has all identical
  set of easily available pieces, and none of the
  missing ones.
 If the original seed is prematurely taken down,
  then the file cannot be completely downloaded!
  What are “good policies?”



                                                       73
  BT: Chunk Selection
 Strict Priority
      First Priority
 Rarest First
      General rule
 Random First Piece
      Special case, at the beginning
 Endgame Mode
      Special case




                                        74
Random First Piece
 Initially, a peer has nothing to trade
 Important to get a complete piece ASAP
 Select a random piece of the file and
  download it




                                           75
Rarest Piece First

 Determine the pieces that are most rare
 among your peers, and download those first.
 This ensures that the most commonly
 available pieces are left till the end to
 download.




                                               76
Endgame Mode

 Near the end, missing pieces are requested from
  every peer containing them. When the piece
  arrives, the pending requests for that piece are
  cancelled.
 This ensures that a download is not prevented
  from completion due to a single peer with a slow
  transfer rate.
 Some bandwidth is wasted, but in practice, this is
  not too much.




                                                       77
BT: Sending Chunks
tit-for-tat
 Alice sends chunks to four neighbors currently
  sending her chunks at the highest rate
    re-evaluate top 4 every 10 secs
 every 30 secs: randomly select another peer, starts
  sending chunks
    newly chosen peer may join top 4
    “optimistically unchoke”




                                                        78
BT: Choking
 Choking is a   temporary refusal to upload. It is one
    of BitTorrent’s most powerful idea to deal with
    free riders (those who only download but never
    upload).

   Tit-for-tat strategy is based on game-theoretic
    concepts.




                                                          79
BitTorrent: Tit-for-tat
(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers




                                 With higher upload rate,
                                 can find better trading
                                 partners & get file faster!
                                                                80
Optimistic unchoking
 A BitTorrent peer has a single “optimistic unchoke”
  to which it uploads regardless of the current
  download rate from it. This peer rotates every
  30s

 Reasons:
    To discover currently unused connections are better
     than the ones being used
    To provide minimal service to new peers




                                                           81
Upload-Only mode

 Once download is complete, a peer has no
  download rates to use for comparison nor
  has any need to use them. The question is,
  which nodes to upload to?

 Policy: Upload to those with the best
  upload rate. This ensures that pieces get
  replicated faster, and new seeders are
  created fast


                                               82
   Questions about BT

 Which features contribute to the efficiency of BitTorrent?

 What is the effect of bandwidth constraints?

 Is the Rarest First policy really necessary?

 Must nodes perform seeding after downloading is complete?

 How serious is the Last Piece Problem?

 Does the incentive mechanism affect the performance much?




                                                               83
Trackerless torrents

 BitTorrent also supports "trackerless" torrents,
 featuring a DHT implementation that allows the
 client to download torrents that have been
 created without using a BitTorrent tracker.




                                                    84
P2P: searching for information
Index in P2P system: maps information to peer location
(location = IP address & port number)
.                            Instant messaging
 File sharing (eg e-mule)
  Index dynamically          Index maps user
    tracks the locations of     names to locations.
    files that peers share.   When user starts IM
  Peers need to tell           application, it needs to
    index what they have.       inform index of its
  Peers search index to
                                location
    determine where files     Peers search index to
    can be found                determine IP address
                                of user.

                                                           85
P2P Case study: Skype
                                        Skype clients (SC)
 inherently P2P: pairs
  of users communicate.
 proprietary               Skype
  application-layer      login server        Supernode
  protocol (inferred via                     (SN)
  reverse engineering)
 hierarchical overlay
  with SNs
 Index maps usernames
  to IP addresses;
  distributed over SNs

                                                       86
Peers as relays
 Problem when both
  Alice and Bob are
  behind “NATs”.
      NAT prevents an outside
       peer from initiating a call
       to insider peer
 Solution:
    Using Alice’s and Bob’s
     SNs, Relay is chosen
    Each peer initiates
     session with relay.
    Peers can now
     communicate through
     NATs via relay

                                     87
Anonymity
 Napster, Gnutella, Kazaa don’t provide
  anonymity
    Users know who they are downloading from
    Others know who sent a query



 Freenet
    Designed to provide anonymity among other
     features



                                                 88
P2P Review
 Two key functions of P2P systems
      Sharing content
      Finding content

 Sharing content
      Direct transfer between peers
        • All systems do this
      Structured vs. unstructured placement of data
      Automatic replication of data

 Finding content
      Centralized (Napster)
      Decentralized (Gnutella)
      Probabilistic guarantees (DHTs)


                                                       89
Issues with P2P
 Free Riding (Free Loading)
    Two types of free riding
       • Downloading but not sharing any data
       • Not sharing any interesting data
     On Gnutella
       • 15% of users contribute 94% of content
       • 63% of users never responded to a query
           – Didn’t have “interesting” data
 No ranking: what is a trusted source?




                                                   90

						
Related docs
Other docs by jizhen1947
Veterans Portal Scope
Views: 281  |  Downloads: 0
Workspace Whitepaper
Views: 380  |  Downloads: 0
VIII — CHEMICAL WEED CONTROL
Views: 393  |  Downloads: 1
Unifying Access to Patient Data - Oracle
Views: 252  |  Downloads: 0
Okun's Law
Views: 307  |  Downloads: 0
SkywardServerDesign
Views: 4  |  Downloads: 0
District71_Jul_Aug_2009
Views: 10  |  Downloads: 0
OFFHAM PRIMARY SCHOOL
Views: 6  |  Downloads: 0
John-Hulley
Views: 326  |  Downloads: 0