I Tube, You Tube, Everyone Tubes Analyzing the World’s

Shared by: ipx46851
-
Stats
views:
9
posted:
5/18/2010
language:
English
pages:
28
Document Sample
scope of work template
							I Tube, You Tube, Everyone Tubes:
Analyzing the World’s Largest User
Generated Content Video System.


                 Presented By:
               Anirban Banerjee,
   Dept. Of Computer Science and Engineering,
             UC Riverside, CA 92521
              Anirban@cs.ucr.edu
     Problem Addressed
• We don’t know much about how
  popularity of content on Youtube-like
  sites change.
              Motivation
• Understanding how users access
  content can help to
  – target ads
  – predict which videos will become popular
  – help in resource allocation.
            Contribution
• Extensive trace-driven analysis
• Extract interesting features about
  content popularity
• Understand shift in popularity over time
• Effect of duplicate content
                    Outline
•   Methodology
•   UGC Popularity
•   Popularity Evolution
•   Caching issues
•   Duplicate and Illegal content
•   Conclusions
• Comments
        Methodology (UGC)
• Youtube and Daum (Korea)
• Youtube - 2 categories
   – Entertainment
   – Science and Technology
• Daum - all categories
• Video info: uploader, upload time, length, views,
  ratings
   Methodology (non UGC)
• Netflix, Lovefilm, Yahoo Movies.
• Brief Observations:
  – It takes 15 days in YouTube to produce the number of
    movies listed in IMDBs DB
  – # of publishers is massive for UGC
  – # of movies/publisher is more or less same for UGC and
    non-UGC
  – Popularity and ratings show strong correlation
  – User participation levels are low
                    Outline
•   Methodology
•   UGC Popularity
•   Popularity Evolution
•   Caching issues
•   Duplicate and Illegal content
•   Conclusions
• Comments
          UGC Popularity
• Not easy to conclude that popularity
  follows power law.
  – Non popular items in Netflix don’t follow
    power law
  – 10% of videos get 80% of hits in UGC
          UGC Popularity
• Why is this interesting:
  – This behavior is different from other VOD
    systems: PowerInfo (China)
  – Caching small # of videos will satisfy large
    # of requests.
             UGC Popularity
• Popular content analysis
  – Exhibit power-law behavior
  – Sharp decay for popular content
  – Exact popularity distr. Is category dependent
  – Truncation at tail with exp. Cutoff
      • Reason: Hit a video once (P2P concept, hitonce users)
              UGC Popularity
• Popular content analysis
   – UGC has fetch-at-most-once behavior
   – Extend simulation from Gummadi et al. paper (U: # of users
     in the system, R: # of requests per user, V: # of videos)
      • All HitOnce scenarios show truncated tail
      • Increasing R or reducing V amplifies tail, increasing U has no
        effect,
              UGC Popularity
• Not so Popular content analysis
   – Questions
      • What is the distribution of these items
      • What effects the distribution
• Sci dataset follows Zipf
• Result filtering causes sharp drop-off
            UGC Popularity
• Not so Popular content analysis
   – What will be the result of removing result
     filters
      • The videos in the tail will receive views
                    Outline
•   Methodology
•   UGC Popularity
•   Popularity Evolution
•   Caching issues
•   Duplicate and Illegal content
•   Conclusions
• Comments
      Popularity Evolution
• Question
  – Requests concentrate on young/old videos
  – How fast does the popularity change
• Findings
  – For really young items (< 1 month) slight increase
    in avg. requests observed.
  – 80% of videos requested on a day are older than 1
    month (72% of traffic)
  – Except very new videos, user preference seems
    insensitive to age
      Popularity Evolution
• Out of top 20 videos requested on a day, 50%
  are new


      Insensitivity           50% point
      Popularity Evolution
• After 1 day, 90% of items will be watched at
  least once, 40% over 10 times
• Prob. of video being requested decreases
  over time
• If a video does not receive enough hits early,
  it will probably not receive hits later
                                      Dips at
                                      predictable
                                      intervals
       Popularity Evolution
• Predicting future popularity
  – Analyzing 2-3 days worth of popularity data is
    good enough for prediction
  – Young videos can make rapid changes in rank
  – Revival of the dead does not happen
  – Rank of older videos don’t fluctuate as much
                    Outline
•   Methodology
•   UGC Popularity
•   Popularity Evolution
•   Caching issues
•   Duplicate and Illegal content
•   Conclusions
• Comments
               Caching Issues
• 3 scenarios
  – Static finite cache (long term popular vids, 90% of
    traffic)
  – Dynamic infinite cache
  – Hybrid finite cache (static +10k vids per day)
  – Replay 6 day trace under various schemes
    and calculate hit and miss ratios
            Caching Issues
     Cache efficiency



Hybrid model is best, better than static by 10%

              Can P2P help?
95% of videos requested after 10
mins or longer, small fraction of
files will benefit
          Caching Issues
• Expected number of concurrent users in
  the system
    • Users watch full video
    • Start to share as soon as streaming starts
    • Stay on site for about 28 mins




     Very few videos helped       Load is decreased
                    Outline
•   Methodology
•   UGC Popularity
•   Popularity Evolution
•   Caching issues
•   Duplicate and Illegal content
•   Conclusions
• Comments
   Duplicate and Illegal content

• Duplicates (aliases), sample 216 videos
  from 10K, and use 51 volunteers
• Most videos have 1-4 aliases
                           Duplicates cause
                           popularity dilution
    Duplicate and Illegal content

• Aliases are uploaded on the same day or
  within a week
• Possibly responsible for flattened tail
• Aliases mostly uploaded by one-time
  uploaders
• Only 0.4% of all videos have been deleted by
  Youtube due to “concerns”
  – Of these 5% are copyright violations
            Conclusions
• Extensive study of the Youtube UGC
  portal.
  – What effects popularity of videos
  – Analyzed long tail behavior of popularity
  – Simple caching policies can help
  – Aliases dilute popularity rankings
            My Comments
• Wait for it waaaiiiit for it!!

						
Related docs