Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

YouTube Traffic Characterization A View From the by xma10817

VIEWS: 0 PAGES: 37

									             YouTube Traffic
            Characterization:
          A View From the Edge
         Phillipa Gill¹, Martin Arlitt²¹,
        Zongpeng Li¹, Anirban Mahanti³

 ¹Dept. of Computer Science, University of Calgary, Canada
    ²Enterprise Systems & Software Lab, HP Labs, USA
³Dept. of Computer Science and Engineering, IIT Delhi, India

                                                               1
Introduction
   The way people use the Web is
    changing.
   Creation and sharing of media:
       Fast, easy, cheap!
   Volume of data associated with
    extremely popular online media.


                                      2
What is Web 2.0?
   User generated content
       Text: Wordpress, Blogspot
       Photos: Flickr, Facebook
       Video: YouTube, MySpace
   Social Networking
       Facebook, MySpace
   Tagging
       Flickr, YouTube

                                    3
YouTube: Facts and Figures
   Founded in February 2005
       Enabled users to easily share movies by
        converting them to Flash
   Largest video sharing Website on the
    Internet [Alexa2007]
   Sold to Google for $1.65 billion in
    November 2006

                                                  4
How YouTube Works (1/2)
      GET: /watch?v=wQVEPFzkhaM


      OK (text/html)

      GET: /vi/fNaYQ4kM4FE/2.jpg


      OK (img/jpeg)




                                   5
How YouTube Works (2/2)
        GET: swfobject.js

     OK (application/x-javascript)


        GET: /p.swf


     OK (application/shockwave-flash)


           GET: /get_video?video_id=wQVEPFzkhaM


           OK (video/flv)

                                                  6
Our Contributions
   Efficient measurement framework
   One of the first extensive
    characterizations of Web 2.0 traffic
       File properties
       File access patterns
       Transfer properties
   Implications for network and content
    providers

                                           7
Outline
   Introduction & Background
   Contributions
   Methodology
   Results
   Implications
   Conclusions


                                8
Our View Points
   Edge (University Campus)
       28,000 students
       5,300 faculty & staff
       /16 address space
       300Mb/s full-duplex network link
   Global
       Most popular videos

                                           9
Campus Data Collection
   Goals:
       Collect data on all campus YouTube usage
       Gather data for an extended period of time
       Protect user privacy
   Challenges:
       YouTube’s popularity
       Monitor limitations
       Volume of campus Internet usage

                                                 10
Our Methodology
   Identify servers providing YouTube
    content
   Use bro to summarize each HTTP
    transaction in real time
   Restart bro daily and compress the
    daily log
   Map visitor identifier to a unique ID

                                            11
Categories of Transactions
   Complete – the entire transaction was
    parsed successfully
   Interrupted – TCP connection was reset
   Gap – monitor missed a packet
   Failure – transaction could not be
    parsed


                                         12
Categories of Transactions (2)
  Status      % of Total     % of Video

Complete            90.82          24.66

Interrupted           1.88         24.25

   Gap                1.56         51.09

  Failure             5.75                -

                                              13
  Our Traces
Start Date:                 Jan. 14, 2007
End Date:                    Apr. 8, 2007
Total Valid Transactions:     23,250,438
Total Bytes:                      6.54 TB
Total Video Requests:             625,593
Total Video Bytes:                6.45 TB
Unique Video Requests:            323,677
Unique Video Bytes:               3.26 TB
                                            14
    HTTP Response Codes
Code                  % of Responses % of Bytes
200 (OK)                         75.80     89.78
206 (Partial Content)             1.29     10.22
302 (Found)                       0.05      0.00
303 (See Other)                   5.33      0.00
304 (Not Modified)               17.34      0.00
4xx (Client Error)                0.19      0.00
5xx (Server Error)                0.01      0.00
                                              15
Global Data Collection
       Crawling all videos is infeasible
       Focus on top 100 most popular videos
         Four time frames: daily, weekly, monthly
          and all time.
       2 step data collection:
         Retrieve pages of most popular videos
         Use YouTube API to get details on these
          videos

                                                    16
Outline
   Introduction & Background
   Contributions
   Methodology
   Results
   Implications
   Conclusions


                                17
Results
   Campus Usage Patterns
   File Properties
   File Access Patterns
   Transfer Properties




                            18
Campus Usage Patterns




                        Reading
                        Break




                                  19
Results
   Campus Usage Patterns
   File Properties
   File Access Patterns
   Transfer Properties




                            20
Unique File Sizes




   Video data is significantly larger than the
    other content types
                                                  21
Time Since Modification
                  Videos and images
                   rarely modified

                  Text and application
                   data modified more
                   frequently




                                       22
Video Durations
   Spike around 3 minutes likely music videos
   Campus videos are relatively short: μ=3.3 min




                                                    23
Summary of File Properties
 Video content is much larger than other
  content types
 Image and video content is more static

  than application and text content
 Video durations are relatively short

Videos viewed on campus tend to be
  more than 1 month old

                                        24
Results
   Campus Usage Patterns
   File Properties
   File Access Patterns
   Transfer Properties




                            25
Relative Popularity of Videos
   Video popularity
    follows a weak Zipf
    distribution
                           β = 0.56
   Possibly due to edge
    network point of
    view




                                      26
Commonality of Videos




   ~10% commonality between consecutive days
    during the week
   ~5% commonality between consecutive days on
    the weekend

                                              27
Summary of File Referencing
   Zipf distribution is weak when observed
    from the edge of the network
   There is some overlap between videos
    viewed on consecutive days
   Significant amount of content viewed
    on campus is non-unique


                                          28
Results
   Campus Usage Patterns
   File Properties
   File Access Patterns
   Transfer Properties




                            29
Transfer Sizes
                 Flash player (p.swf, player2.swf)




   Javascripts

                                              30
Transfer Durations
   Video transfers have
    significantly longer
    durations than other
    content types




                           31
Summary of Transfer Properties
   Javascript and flash objects have an
    impact on the size of files transferred
   Video transfers have significantly larger
    sizes and durations




                                            32
Outline
   Introduction & Background
   Contributions
   Methodology
   Results
   Implications
   Conclusions


                                33
Implications for Network
Providers
   Web 2.0 poses challenges to caching
       Larger multimedia files
       More diversity in content
   Meta data may be used to improve
    caching efficiency




                                          34
Implications for Content
Providers
   Multimedia content is large!
   65,000 videos/day x 10MB/video = 19.5 TB/month
   Long tail effect -> much of the content
    will be unpopular
       Cheap storage solutions
   Longer transfer durations for video files
       more CPU cycles required for transfers

                                                     35
Conclusions
   Multimedia content has much larger transfer
    sizes and durations than other content types
   From the edge of the network, video
    popularity follows a weak Zipf distribution
   Web 2.0 facilitates diversity in content which
    poses challenges to caching
   New approaches are needed to efficiently
    handle the resource demands of Web 2.0
    sites

                                                     36
Questions?



             Contact
      psessini@ucalgary.ca

                             37

								
To top