Docstoc

BitTorrent network

Document Sample
BitTorrent network Powered By Docstoc
					BitTorrent
BitTorrent network
   On the itinerary:
     Introduction to BitTorrent
     Basics & properties

     3 Interesting analysis results
Publishing
   How to publish a (usually large) file ?
     Dedicated     server:
       Easyto manage, Easy to find, Persistent service
       Nevertheless…

     BitTorrent:
       Organizes   multiple clients that share the same file
       Leveraging the upload bandwidth of the participants
       Self scaling, resilience, operates well in “Flash crowed”
        period
       Takes 50%-60% of all p2p traffic
The Basics of BitTorrent
   Content provider (everyday people) wants to
    publish a file (the initial seed):
       Creates   a meta file (*.torrent)
       Publish this (light) *.torrent file on a web server
       File is broken to small blocks (32-256 KB)
       Uploads blocks to other peers
       The goal: to publish the file to many nodes by the help of
        other peers, with minimum load on the seeder
The Basics of BitTorrent
   Third party (the tracker):
      A  tracker site keeps track of the active participants + extra
        statistics.
       Upon requests from nodes, supplies a random subset of
        active nodes
       Receives updates from the active nodes
       Keeps track of new node joining the ‘torrent’ (or ‘swarm’)
        and nodes that left it
The Basics of BitTorrent
   Peer (leecher) who is interested:
       Obtains  the public *.torrent file
       Being directed to the tracker
       Obtains a list of random neighbors (~40)
       Downloads and uploads blocks to its ‘best’ neighbors
        (choking and unchoking)
       Upon download completion, becomes a seed
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent basic schemes
   (Immediate) Problems arise:
       Lastblock problem: assume nodes depart upon completion,
        how would leeches obtain the last block ?
       Free riding problem: willing to download, but unwilling to
        serve others
   Simple and effective solutions:
       Lastblock problem: Nodes employ Local Rarest First policy
       Free riding problem: Nodes employ “Tit For Tat” policy, i.e.
        give more to those whom you accept more from
On the itinerary
   One interesting limitation of BitTorrent networks
       BitTorrentprovides poor service availability
       via analysis of tracker logs over long period

   Proposition of peer selection approach
       Enables to lower costs and resources of ISPs
       Does not require ISP and peers cooperation
       Implementation of the proposition as part of Azureus client,
        codename ‘Ono’
   Tradeoffs
       Performance   (avg. Download time) vs. Fairness (avg. share
        ratio)
BitTorrent limitations
   Taken from the paper “Measurements, Analysis and
    Modeling of BitTorrent-like systems” [1]
       Inspect overall performance in the lifetime of a torrent
       Analysis is based on traces
       A model is derived which is used to draw conclusions
       Verify that the derived conclusions match the observed
        behavior
   Limitations were found:
       Poor  service availability (coming next)
       Fluctuating download performance
       Unfair service to peers
Service availability - Analysis
   Analysis is based on traces
       Tracker logs (~1500 torrents, sampled every 30 sec)
       Traces from servers that publish *.torrent files



   Extracted data
       Identify peers
       Birth time of the torrent, size
       For each peer in the same torrent: arrival time, download &
        upload bandwidth, download & upload accumulative bytes
Service availability - Analysis

  Y-axis at time t : The
   total number of
   requests for all torrents
   in the trace minus the
   cumulative number of
   requests for all torrents
   after time t, since they
   are born.
  Similar observations are
   seen in the *.torrent
   metadata traces
Service availability - Analysis
   This suggests an exponential decrease rate of
    requests, since a torrent is born
   Notice that this is a cumulative measure
   Does a specific torrent behaves the same?
       Use  the least square method to measure how much a specific
        torrent deviates from this logarithmic fitting
       Analyze the average relative deviation distribution (which is
        mostly small – on average 6%)
Service availability – A model
   Based on the observations, define the torrent
    popularity at time t as the peers arrival rate, which
    is the derivative of the of the peer arrival time
    distribution for that torrent.
   Arrival rate:
       Where       is the initial arrival rate when the torrent starts
       And    is the attenuation parameter
       Both are evaluated from the observations
Service availability – A model
   Define
       Torrent lifespan: duration from the birth to the time of no
        complete copy (thus new leeches would not be able to
        complete the download)
       Inter arrival time between two successive arriving peers
        could be approximated as
       Assume seeds leave the system at rate then the average
        service time for a seed is approximately
Service availability – A model
   Look at consecutive peers that join the torrent
       Peer n and (n+1) join the torrent in time t(n) and t(n+1)
        respectively
       The inter arrival time between them is approximately
       Peer n downloads the file with speed u(n) and stays in the
        torrent for time duration
       When peer arrival rate is small enough (n is large), peer
        n+1, with speed u(n+1) <= u(n) could only be served by
        peer n
Service availability – A model
    Thus  when                    peer n+1 can’t complete the
     download and the torrent is dead
    Using the definition of arrival rate, we get the torrent
     lifespan:

    Both     and are extract from the trace (using linear
     regression), as well as
    Compare results from the model to what the trace holds
         Results match very well
Model vs. Observations
   Comparison   of
   torrents lifespan:
   average lifespan
   according to trace
   is 8.89 and 8.34
   based on the model
Service availability - Summery
   Conclusions were obtained relying on extensive
    trace analysis and modeling
       Existing BitTorrent systems provides poor service availability
       This is due to the exponential decreasing peer arrival rate
       This provides strong motivation for inter-torrent
        collaborations
Next on the itinerary
   One interesting limitation of BitTorrent networks
       BitTorrentprovides poor service availability
       via analysis of tracker logs over long period

   Proposition of peer selection approach
       Enables to lower costs and resources of ISPs
       Does not require ISP and peers cooperation
       Implementation of the proposition as part of Azureus client,
        codename ‘Ono’
   Tradeoffs
       Performance   (avg. Download time) vs. Fairness (avg. share
        ratio)
Reducing cross-ISP traffic
   Taken from the paper “Taming the torrent” [2]
     Motivation:  overwhelming popularity of p2p (70% of
      internet traffic worldwide) yielded significant revenues
      for ISP.
     However, p2p traffic significantly has increased ISP’s
      costs, particularly in terms of cross-ISP traffic
     This has driven ISP to try and forcefully reduce p2p
      traffic
       Blockspecific ports, tricking clients to close connections
       Deep packet inspection
       Caching
Reducing cross-ISP traffic
   One approach to alleviate this pain is to use an
    Oracle that provides knowledge about which peers
    are in the same ISP
     This would benefit both ISPs and p2p community
     But, this requires p2p users and ISP to collaborate and
      to trust each other
     Not likely to be adopted
Reducing cross-ISP traffic
   Another approach is to recycle data that is already
    being collected by Content Distribution Networks
     CDNs   attempt to improve web performance by
      redirecting requests to replica servers
     The goal is to help content providers (i.e. CNN) to
      distribute content by redirecting requests to replica
      servers that are:
       Topologicallyproximate
       Provide lower-latency
CDNs as oracles
   Hypothesis: when peers exhibit similar redirection
    behavior, they are likely to be close to the replica
    server, and thus to each other
     Represent  redirection behavior using ratio-maps
     Each ratio represents the frequency of redirecting to a
      specific replica
     Number of replicas is usually small (max 31)

     Keep a time window (~ a day)
CDNs redirections as ratio-maps
   The ratio map of a peer is a set of (replica server,
    ratio)
                                        for peer a
     Specifically, if peer a is redirected toward replica
      server r1 75% of the time window, and toward replica
      server r2 25% of the time window, then the
      corresponding ratio-map is
     The sum of all    in a given ratio map equals one
Similarity via ratio-maps
   Define a metric that, given two peers, produces a
    value describing the similarity between the peers’
    redirections behavior
     We  are looking for overlap in redirection frequencies
      maps between each two peers
     Use cosine-similarity between two peers a and b:
Cosine-similarity
   Distance(a,b):
    Sum is over the set of replica servers that a (b) to which
      the peer has been redirected over the time window
          is the ratio of time that peer a has been
      redirected to replica server i
     Cosine-similarity is analogous to dot product
       When  maps are identical, equals 1
       When maps are orthogonal (no common replica), equals 0
       Values lie in [0,1]
       Determine a threshold (0.15)
CDNs implementation
   Ono, an extension to Azureus client
     Upon  handshake of two peers, exchange ratio-maps
     This enables Ono to perform a biased peer selection
       Performs  DNS lookup for each CDN name to determine
        redirection behavior and encodes it in ratio-maps
       Periodically update the ratio-maps

     Overhead     is extremely small
       18KBupstream, 36KB downstream per day
       Computation of cosine-similarity is easy
Ono-recommended empirical results
   Over 120,000 peers use Ono
   Ono collects extra network data
          Trace-Route to replica servers and peers
     Ping,

     Obtain feedback on the biased peer selection
       Not  easy to determine cross-ISP hops
       IP hops is easy and gives some measure

   Compare Ono-recommended peers selection to
    random peer selection
Ono-recommended empirical results
   Cumulative  Distribution function of the number of ip hops
    taken along paths between Ono client and his peers.
   Each value represents the average number of hops for all
    peers, seen by a particular Ono client during 6 hour interval
        Ono finds shorter paths
        Median in less than half
        More than 20%
         are only one hop
         away, via less
         than 2%
Ono-recommended empirical results
   Each  ip address was mapped to corresponding Autonomous
    System id
   Similar to the previous graph
        Over 33% of paths found
         by Ono do not leave
         the origin AS
        Median AS hops is one
         vs. less than 10% in
         the random case
CDNs as oracles - Summery
   Recycling network views collected by CDNs
     Good   internet citizenship in terms of reducing cross-ISP
      traffic
     Performance of peers is not effected

     Scalable (the more clients adopt it, the more accurate
      the bias would get)
     Available easily and freely
Last on the itinerary
   One interesting limitation of BitTorrent networks
       BitTorrentprovides poor service availability
       via analysis of tracker logs over long period

   Proposition of peer selection approach
       Enables to lower costs and resources of ISPs
       Does not require ISP and peers cooperation
       Implementation of the proposition as part of Azureus client,
        codename ‘Ono’
   Tradeoffs
       Performance   (avg. Download time) vs. Fairness (avg. share
        ratio)
BitTorrent Tradeoffs
   Taken from the paper “The delicate tradeoffs in
    BitTorrent-like file sharing protocol design” [3]
     Peers that participate in BT are heterogeneous with
      regard to download and upload capacities
     Taking a system approach
       The  system throughput depends critically on the “fat” peers
       However, this might result in unfairness towards those who
        contribute more
       This in turn would encourage peers to supply low upload
        rate to others
BitTorrent Tradeoffs
   A user would look for download it gets to be
    proportional to the upload it supplies
     Assuming  peers take the system overview
     Long lasting, steady state, rational

   Two parameters and their inter relations are
    explored
     Performance:   minimum average rate of download time
     Fairness: ratio between give and take
Model
   W.L.O.G. the file is of size 1
   Assume peer average arrival rate of
     Assume Peers do not abort
     Upon completion, peer leaves the torrent (BT provides no
      incentive of seeding)
   Assume n classes (types) of peers
     For each new peer arrival, with probability it belongs to
      type I
     Thus, average arrival rate for class i is
     Class i has Ui and Di as upload and download capacities
     Assume U1>U2>…>Un (type 1 are the “fat” ones…)
Model
   Visualization of the model for n=2
Model – measuring performance
   Assume (quite natural) that the bottleneck is the
    upload capacity
     i.e.   no network bottleneck such as server saturation
   The file uploading capacity of the entire system
    is
   Consider the steady state:
     Define     as the average number of type i peers
     Approximation:

     Substitute the later in the former, in s.s.:
Model – measuring performance
   In steady state, the capacity should be equal to the
    arrival rate (as the file size is 1)
     Obtain

     Define    “share-ratio”          and rewrite as
       In   a steady and balanced system, share-ratio should be 1
     Average     system download time
   Those two equations define the solution space, as
    well as the resulting performance
   Feasible solutions are the set
Model – measuring fairness
   Consider share-ratio as a good and natural
    measure of fairness
   Define fairness index:
     Measures   how equal the ratios are (if all are the same,
      it equals 1)
   (after some work) Obtain:
   Also expressed in terms of upload and download of
    each class
Rate strategies
    General assumption: all peers maximize their
    upload capacity
     Based   on experiments
   We want optimal average download time T
     Solve   the constraint problem




       Use   Lagrangian multiplier method
Rate strategies
   Optimal average download time T that is obtained
    is:


     We  get an assignment for
     The system gives the “thin” peers (other than type-1)
      maximum upload capacity
     “Thin” peers get more than they contribute
     Calculate the fairness index under this solution (shown
      to be quite low)
Rate strategies
   Now, apply the same strategy to achieve optimal
    fairness
     Then   check what is the resulting performance measure
   Optimal fairness is achieved when
   We get different assignments for
   Compare the two:
     In terms of system performance, we have:
     In terms of system fairness, we have:
Entire design space
   Actually those are only two of infinite solutions for
    assigning
   The space lies on a curve
     Example:   a system of two types with specific capacities
Simulation results
   Experiments with two-types system (with the same
    characteristics as the last system)
     Average     downloading time:




     Fairness:
Simulation results
   Fundamental tradeoffs:
     Taken   for the extreme values of the two strategies:



   Summery:
     Cannot  enjoy both heavens
     Current BitTorrnet implementations lie somewhere on the
      curve
Think about …
   With regard to the first paper (service availability),
    how did the ~8.3 average torrent lifespan was
    deduced ?



     Thank   you (those who are still awake…)
Papers
   [1] Measurements, Analysis, and Modeling of BitTorrent-like
    Systems
       Lei Guo, Songqing Chen, Zhen Xiao, Enhua Tan, Xiaoning Ding, and
        Xiaodong Zhang
   [2] Taming the Torrent - A Practical Approach to Reducing Cross-ISP
    Traffic in Peer-to-Peer Systems
       David R. Choffnes and Fabián E. Bustamante
   [3] The Delicate Tradeoffs in BitTorrent-like FileSharing Protocol
    Design
       Bin Fan, Dah-Ming Chiu, John C.S. Lui

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:3/13/2013
language:English
pages:57