BitTorrent description by liwenting


									        The BitTorrent
   content distribution system

           Nikitas Liogkas
       CS219 – P2P Systems
University of California, Los Angeles

 flash crowd (aka slashdot) effect
     many clients, few servers

 Problem: servers cannot handle load

 Solution: swarming
     clients download pieces of the file
      from each other
     has been proven to have good scaling and
      performance properties
            Presentation outline

 Joining the system
 Encoding / metadata file
 Tracker protocol
 Peer wire protocol
 Piece selection
 Peer selection
 Client implementations
 Resources
                           Joining a torrent
                                   metadata file
            new leecher                             website

      2   join         peer list   3        data
                                          request   seed/leecher
                 tracker                      4

Peers divided into:
    seeds: have the entire file
    leechers: still downloading
1. obtain the metadata file (out of band)
2. contact the tracker
3. obtain a peer list (contains seeds & leechers)
4. contact peers from that list for data
                    Exchanging data

        leecher B                leecher A        I have   !


                                 leecher C

● Verify pieces   using hashes
● Download sub-pieces (blocks) in parallel
● Advertise received pieces to the entire peer list
● interested: need pieces that a given peer has
 encoding format of all exchanged messages
 four types
     byte strings
     integers
     lists
     dictionaries (mapping keys to values)
 examples
     4:spam represents the string “spam”
     i10e represents the integer 10
             Metadata file structure

 contains information necessary to contact the
  tracker and describes the files in the torrent
     announce URL of tracker
     file name
     file length
     piece length (typically 256KB)
     SHA-1 hashes of pieces for verification
     and creation date, comment, creator, …
                    Tracker protocol
 communicates with clients via HTTP/HTTPS
 client GET request
     info_hash: uniquely identifies the file
     peer_id: uniquely identifies the client
     client IP and port (typically 6881-6889)
     numwant: how many peers to return (defaults to 50)
     stats: bytes uploaded, downloaded, left
 tracker GET response
     interval: how often to contact the tracker
     list of peers, containing peer id, IP and port
     stats: complete, incomplete
 tracker-less mode; based on the Kademlia DHT
            Presentation outline

 Joining the system
 Encoding / metadata file
 Tracker protocol
 Peer wire protocol
 Piece selection
 Peer selection
 Client implementations
 Resources
                Peer wire protocol
 implemented on top of TCP
 messages
     handshake (maybe with bitfield)
     keep-alive
     choke / unchoke
     interested / not interested
     have (advertisement of a newly-acquired piece)
     request / piece
     cancel (only used in “endgame mode”)
     port (used in tracker-less mode)
                       Piece selection
 when downloading starts: choose at random
      get complete pieces as quickly as possible
      obtain something to trade
 after we have 4 pieces: (local) rarest first
      achieves the fastest replication of rare pieces
      obtain something of value to trade
      get unique pieces from the seed
 endgame mode
      defense against the “last-block problem”
      send requests for missing sub-pieces to all
       peers in our peer list
      send cancel messages upon receipt of a sub-piece
               Last-block problem
 at the end of the download, a peer may have
  trouble finding the missing pieces
 based on anecdotal evidence
 other proposals
     network coding [Gkantsidis et al., Infocom’05]
     prefer to upload to peers with similar file
      completeness; unfair for the peers with most
      of the pieces [Tian et al., Infocom’06]
        Last-block problem – a myth?

 is it a problem after all?
 figure from [Legout et al., INRIA-TR-2006], with permission
           Peer selection - unchoking

             leecher B                   leecher A


                                         leecher C
• calculate data-receiving rates
• upload to (unchoke) the fastest
• rate calculation is performed periodically
  (a round occurs typically every 10 seconds)
• constant number of unchoking slots
• attempt to achieve Pareto efficiency
              Optimistic unchoking

 periodically select a peer at random
  and upload to it
     typically performed every 3 rounds (30 seconds)

 multi-purpose mechanism
     allow bootstrapping of new clients
     continuously look for the fastest partners
     keep the network connected; every peer has a
      non-zero chance of interacting with any other peer
                     Seed unchoking

 old algorithm
      unchoke the fastest downloaders
      problem: fastest peers may monopolize seeds

 new algorithm
      periodically sort all leechers according to
       their last unchoke time
       prefer the most recently unchoked leechers;
        on a tie, prefer the fastest
       (presumably) achieves equal spread of seed bandwidth
       Downloading only from seeds
                                            new list
     leecher B                 leecher A                    tracker
                                                peer list


                               leecher C

● Repeatedly query the tracker for peer lists
● Distinguish the seeds, and receive data from them
● Violates fairness model; may be harmful to honest peers
          Evaluation in private torrents
                          Download rates for all peers


 Limit bandwidth of leechers 1 to 6, no limit on seed.
 Modest fairness violation (22% better rate)                             median
  when selfish peer is fast                                25%ile
 Robustness does not suffer: most honest slower by <15%            min
         Evaluation with modified seed
                          Download rates for all peers


 Seed only unchokes one leecher at a time
 Considerable fairness violation: selfish peer faster by 155%
 Robustness suffers: honest peers slower by at least 32%
    Rate- vs. volume-based selection

 Proponents of rate-based decision metrics:
  [Cohen, P2PECON’03] and
  [INRIA TR’2006]
 Proponents of volume-based metrics:
  [Bharambe et al., MSR-TR-2005],
  [Gkantsidis et al., Infocom’05],
  [Jun et al., P2PECON’05], and
  eDonkey file-sharing system
 No clear winner yet!
               Client implementations
 mainline: written in Python; right now, the only
  one employing the new seed unchoking algorithm
 Azureus: the most popular, written in Java;
  implements a special protocol between clients
  (e.g. peers can exchange peer lists)
 other popular clients: ABC, BitComet, BitLord,
  BitTornado, μTorrent, Opera browser
 various non-standard extensions
      retaliation mode: detect compromised/malicious peers
      anti-snubbing: ignore a peer who ignores us
      super seeding: masqueraded seed
                 Resources #1

 Basic BitTorrent mechanisms
  [Cohen, P2PECON’03]
 BitTorrent specification Wiki
 Measurement studies
  [Izal et al., PAM’04],
  [Pouwelse et al., Delft TR 2004 and IPTPS’05],
  [Guo et al., IMC’05], and
  [Legout et al., INRIA-TR-2006]
                Resources #2

 Theoretical analysis and modeling
  [Qiu et al., SIGCOMM’04], and
  [Tian et al., Infocom’06]
 Simulations
  [Bharambe et al., MSR-TR-2005]
 Incentives and exploiting them
  [Shneidman et al., PINS’04],
  [Jun et al., P2PECON’05], and
  [Liogkas et al., IPTPS’06]
      Conclusion and food for thought

 BitTorrent is fast and robust

 Yet, many parameters are arbitrarily set
     number of unchoking slots
     round duration
     size of pieces/sub-pieces

 What can we learn from BitTorrent for
  the design of future P2P content
  distribution protocols?

To top