Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing by juanagui

VIEWS: 0 PAGES: 45

									Measurement, Modeling, and
Analysis of a Peer-to-Peer File-
Sharing Workload


K.P. Gummadi, R. J. Dunn, et al
SOSP’03
Presented by Lu-chuan Kung
kung@uiuc.edu
Outline
   Trace methodology and analysis
    – User characteristics
    – Client activities
    – Object dynamics
 Analyze why Kazaa workload is not Zipf
 A model of P2P file-sharing workloads
 A study of bandwidth-saving techniques
 Conclusion
Trace Methodology
 Passively collect Kazaa traffic at the border of
  campus network and internet
 Query traffic was not captured b/c of encryption.
  File transfers are HTTP transfers w/ Kazaa-
  specific header
 Summary statistics of the trace:
Kazaa Users Are Patient




 – Transfer time: the difference between the start
   time and the end time of a request
 – Small objects: <10MB (mostly audio files)
 – Large objects: >100MB (typically video files)
User Slow Down As They Age
   Do people become hungrier for content as they
    gain experience with Kazaa?
   Older clients requested fewer bytes b/c:
    1. Attrition: population declines as clients age
    2. Slowing down: older clients ask for less
Client Activity
 It’s difficult to quantify the availability of clients in
  a p2p system
 Client activity includes:
    – Activity fraction: time spent in transfers / duration
      of lifetime. Lower bound on availability
    – Average session length: typical duration length
Object Characteristics
   Kazaa is not one workload
    – Kazaa is a blend of workloads of different
      properties
    – 3 ranges of objects: small (<10MB), medium
      (10MB~100GB), and large (>100GB)
    – Majority of requests are for smaller objects
    – Most bytes transferred are due to large objects
Kazaa Object Dynamics
   Multimedia objects are immutable, therefore
    affect object dynamics
    – Kazaa clients fetch objects at most once
       • Kazaa client requests an object once: 94% of time
       • Kazaa client requests an object twice: 99% of time
    – Most requests are for old (repeated) objects
       • An object is old if at least one month has passed
         since the first request of the object
       • 72% of requests for large objects are old
       • 52% of requests for small objects are old
Kazaa Object Dynamics
   The popularity of Kazaa objects is often short-lived
    – The most popular pages remains stable for the Web
    – Popularity is fleeting in Kazaa
    – Audio files lose popularity faster than popular video files
   The most popular Kazaa objects tend to be recently born
    objects
    – Newly born objects: did not receive any requests during the
      first month of the trace
Kazaa Is Not Zipf
   Zipf’s law:
    – The popularity of ith-most popular object is
      proportional to i-α, α: Zipf coefficient
   Kazaa is not Zipf
    – Most popular objects are less popular than Zipf
      would predict
Why Kazaa Is Not Zipf
 Fetch-repeatly vs. fetch-at-most-once
 Simulate the two cases based on the same Zipf
  distribution
 The result of fetch-at-most-once is similar to
  Kazaa.
 Non-Zipf workloads are also observed in web
  proxy caches and VoD servers
A Model of P2P File-Sharing
Workloads
 Hypothesis: underlying popularity of objects in a
  fetch-at-most-once system is driven by Zipf’s law
 A client requests 2 objects per day. Choose
  which object to fetch from Zipf(1)
 An object is born with rate λo , its popularity rank
  is selected from Zipf(1)
 Total object population cannot be observed from
  the trace. Use back-inference: given 18,000
  distinct objects are requested in the trace, what’s
  the total number of objects? Ans: 40,000
Model Structure and Notation
   Parameter value are chosen to reflect the
    measured data from the trace
File-Sharing Effectiveness
 How should organization exepect bandwidth
  demand to change over time, given a shared
  proxy server?
 Hit rate of the proxy cache decreases in the
  fetch-at-most-once case
 Fetch-at-most-once clients consume the most
  popular objects early
New Object Arrivals Improve Hit Rate
 Object updates in Web lower the hit rate
 New objects arrivals are beneficial in P2P system
    – Arrivals of popular objects increase hit rate
    – If no arrivals, clients are forced to choose from the
      remaining unpopular objects
New Clients Cannot Stabilize
Performance
 The infusion of new clients at a constant rate
  cannot compensate for the increasing number of
  old clients
 If we want to keep hit rate as a constant, we
  need exponential client arrival rate
Model Validation
 Underlying Zipf assumption cannot be validated
  directly.
 Use the proposed model to replicate the object
  popularity distribution in the trace
    – Estimate various parameters
    – Arrival rate of new objects is chosen to fit the
      measured data. λo = 5,475 objects per year
Exploring Locality-aware Request
Routing
 A significant fraction of Internet bandwidth is
  consumed by Kazaa
 How would exploitation of locality help to save
  bandwidth?
 Different ways to exploit locality:
    – A centralized proxy cache placed at organization
      border
    – Request redirection: favor organization-internal
      peers
       • Centralized request redirection
       • Decentralized request redirection
An Ideal Proxy Cache
 Assume an ideal proxy: infinite capacity and
  bandwidth
 86% of external bandwidth would be saved
 However, some may not want to store P2P file-
  sharing content in a proxy server due to legal
  issues
Benefits of Locality-Awareness
   Trace-based simulation
    –   Infinite storage capacity
    –   At most 12 concurrent downloads
    –   Upload bandwidth 500 Kb/s
    –   External bandwidth 100 Kb/s
    –   Clients are available only when they’re transferring
        (a very conservative assumption)
 Cold misses: objects cannot be found in peers
 Busy misses: objects found but the peer is
  unavailable due to concurrent transfers
Benefits of Locality-Awareness
 Locality awareness obtained 68% byte hit rate for
  large objects and 37% byte hit rate for small
  objects
 A substantial number of miss bytes (62% of large
  objects, 43% of small objects) are due to
  unavailable clients
Benefits of Increased Availability
 Most of bytes served and consumed come from
  highly available peers
 Adding availability to the most available hosts
  earns a higher hit rate than adding to the least
  available host
Conclusion
   P2P file-sharing workloads are different to Web
    workloads
    – User are patient
    – Aged clients demand less
    – Fetch-at-most once
 The proposed model suggests that client births
  and object births are the fundamental forces
  driving P2P workloads
 There’s significant locality in the Kazaa workload
    – Locality-aware peers would save 63% external
      transfers even under conservation assumption
Comments
 Some of the observed characteristic may be
  related to the design of Kazaa and the measuring
  methodology and thus cannot be generalized
 The lack of portal sites in P2P system may also
  be a reason that most popular objects in P2P are
  less popular than Zipf’s law would predict
Assessing the Quality of Voice
Communications Over Internet
Backbones


A.P. Markopoulou, F.A. Tobagi, M.J.
Karam
Tran. on Networking v11 no5 Oct 2003
Presented by Lu-chuan Kung
Outline
   VoIP System
    – Playout schemes
 Voice Impairment in Networks
 Internet measurements
 Numerical results
 Discussion
VoIP System
VoIP System
   Speech signal
    – Talkspurts have mean ~ 352ms
    – Silence periods have mean ~ 650ms
 Encoding schemes
 Packetizer: add headers for different protocols
 Playout buffer: packets are held for a later
  playout time in order to smooth playout
 Decoder: reconstruct the speech signal
Playout Schemes
 Two types: fixed and adaptive
 Fixed playout scheme:
    – End-to-end delay p is the same for all packets
    – Large delay decreases packet loss due to late
      arrivals, but also decreases interactivity
   Adaptive playout scheme:
    – Estimate p based on delay dav and delay variation
      v
    – p = dav + 4v
    – Estimate p
       • Talkspurt by talkspurt
       • Packet by packet
Voice Impairment in Networks
   Quality of voice is affected by
    –   Encoding
    –   Packet loss
    –   Network delay jitter
    –   End-to-end delay
    –   Echo
   End-to-end delay consists of
    –   Encoding delay
    –   Packetization delay
    –   Network delay
    –   Playout bufferring delay
    –   Decoding delay
Assessment of Voice
Communication in Packet Networks
 Mean Opinion Score (MOS): a subjective rating
  given by listeners, given on a scale of 1-5
 Intrinsic quality MOSintr: quality after compression
Degradation Due to Loss




 PLC: Packet Loss Concealment
 Convert loss rate to MOS
Loss of Interactivity
 Loss of interactivity due to large end-to-end delay
 NTT study
    – 6 conversation modes (tasks), task 1 is the
      hardest, task 6 is the most relaxed type
Echo Impairment
 Echo can cause major quality degradation
 The effect of echo is a function of delay and echo
  losses
Emodel
 Published by ITU-T. Provide formulas to predict
  MOS of voice quality
 R = (R0 – Is) – Id – Ie + A
    –   R0 : basic SNR
    –   Is : impairment of signal, eg. sidetone and PCM
    –   Id: impairment due to delay (echo + interactivity)
    –   Ie : impairment due to distortion (loss)
    –   A : advantage factor (lenient users)
Internet Measurements
   Probe measurement
    –   5 major U.S. cities
    –   43 paths in total
    –   7 providers: P1,P2,…,P7
    –   50 bytes probes sent every 10 ms
Observations on the Traces
 Duration of the trace: 3 days
 Network loss
    – 6 out of 7 providers have outages
    – Outages happened at least once per day
   Delay characteristics
    – Delay spikes
    – Alternation between high and low states
    – Periodic clustered delay spikes
Delay Characteristics
Consistent Characteristics Per
Provider
One Example Call
 Apply emodel to the traces using different playout
  buffer scheme
 Example of a 15-min call
One Example Call
   Fixed playout incurs many losses in the last 5
    mins
How to Choose p for Fixed Scheme
 Tradeoff between loss and delay
 There is a optimal value of delay to achieve
  maximum MOS value
Example Path – Many Calls
 Random calls uniformly spread over an hour
 150 short (3.5-min) and 50 long (10-min) calls
 Plot CDF vs. MOS

    Fixed Playout             Adaptive Playout
Discussion
   Backbone networks have a wide range of
    performance
    – Some are already able to support high quality
      voice communications
    – Some are barely able to provide acceptable VoIP
      service (MOS >3.6)
    – Reliability problems are more serious than QoS
      service mechanisms
Comments
   How representative are the chosen paths among
    the typical paths on Internet?
    – End host to end host paths have larger delay


								
To top