I Tube, You Tube, Everyone Tubes Analyzing the World’s
Document Sample


I Tube, You Tube, Everyone Tubes:
Analyzing the World’s Largest User
Generated Content Video System.
Presented By:
Anirban Banerjee,
Dept. Of Computer Science and Engineering,
UC Riverside, CA 92521
Anirban@cs.ucr.edu
Problem Addressed
• We don’t know much about how
popularity of content on Youtube-like
sites change.
Motivation
• Understanding how users access
content can help to
– target ads
– predict which videos will become popular
– help in resource allocation.
Contribution
• Extensive trace-driven analysis
• Extract interesting features about
content popularity
• Understand shift in popularity over time
• Effect of duplicate content
Outline
• Methodology
• UGC Popularity
• Popularity Evolution
• Caching issues
• Duplicate and Illegal content
• Conclusions
• Comments
Methodology (UGC)
• Youtube and Daum (Korea)
• Youtube - 2 categories
– Entertainment
– Science and Technology
• Daum - all categories
• Video info: uploader, upload time, length, views,
ratings
Methodology (non UGC)
• Netflix, Lovefilm, Yahoo Movies.
• Brief Observations:
– It takes 15 days in YouTube to produce the number of
movies listed in IMDBs DB
– # of publishers is massive for UGC
– # of movies/publisher is more or less same for UGC and
non-UGC
– Popularity and ratings show strong correlation
– User participation levels are low
Outline
• Methodology
• UGC Popularity
• Popularity Evolution
• Caching issues
• Duplicate and Illegal content
• Conclusions
• Comments
UGC Popularity
• Not easy to conclude that popularity
follows power law.
– Non popular items in Netflix don’t follow
power law
– 10% of videos get 80% of hits in UGC
UGC Popularity
• Why is this interesting:
– This behavior is different from other VOD
systems: PowerInfo (China)
– Caching small # of videos will satisfy large
# of requests.
UGC Popularity
• Popular content analysis
– Exhibit power-law behavior
– Sharp decay for popular content
– Exact popularity distr. Is category dependent
– Truncation at tail with exp. Cutoff
• Reason: Hit a video once (P2P concept, hitonce users)
UGC Popularity
• Popular content analysis
– UGC has fetch-at-most-once behavior
– Extend simulation from Gummadi et al. paper (U: # of users
in the system, R: # of requests per user, V: # of videos)
• All HitOnce scenarios show truncated tail
• Increasing R or reducing V amplifies tail, increasing U has no
effect,
UGC Popularity
• Not so Popular content analysis
– Questions
• What is the distribution of these items
• What effects the distribution
• Sci dataset follows Zipf
• Result filtering causes sharp drop-off
UGC Popularity
• Not so Popular content analysis
– What will be the result of removing result
filters
• The videos in the tail will receive views
Outline
• Methodology
• UGC Popularity
• Popularity Evolution
• Caching issues
• Duplicate and Illegal content
• Conclusions
• Comments
Popularity Evolution
• Question
– Requests concentrate on young/old videos
– How fast does the popularity change
• Findings
– For really young items (< 1 month) slight increase
in avg. requests observed.
– 80% of videos requested on a day are older than 1
month (72% of traffic)
– Except very new videos, user preference seems
insensitive to age
Popularity Evolution
• Out of top 20 videos requested on a day, 50%
are new
Insensitivity 50% point
Popularity Evolution
• After 1 day, 90% of items will be watched at
least once, 40% over 10 times
• Prob. of video being requested decreases
over time
• If a video does not receive enough hits early,
it will probably not receive hits later
Dips at
predictable
intervals
Popularity Evolution
• Predicting future popularity
– Analyzing 2-3 days worth of popularity data is
good enough for prediction
– Young videos can make rapid changes in rank
– Revival of the dead does not happen
– Rank of older videos don’t fluctuate as much
Outline
• Methodology
• UGC Popularity
• Popularity Evolution
• Caching issues
• Duplicate and Illegal content
• Conclusions
• Comments
Caching Issues
• 3 scenarios
– Static finite cache (long term popular vids, 90% of
traffic)
– Dynamic infinite cache
– Hybrid finite cache (static +10k vids per day)
– Replay 6 day trace under various schemes
and calculate hit and miss ratios
Caching Issues
Cache efficiency
Hybrid model is best, better than static by 10%
Can P2P help?
95% of videos requested after 10
mins or longer, small fraction of
files will benefit
Caching Issues
• Expected number of concurrent users in
the system
• Users watch full video
• Start to share as soon as streaming starts
• Stay on site for about 28 mins
Very few videos helped Load is decreased
Outline
• Methodology
• UGC Popularity
• Popularity Evolution
• Caching issues
• Duplicate and Illegal content
• Conclusions
• Comments
Duplicate and Illegal content
• Duplicates (aliases), sample 216 videos
from 10K, and use 51 volunteers
• Most videos have 1-4 aliases
Duplicates cause
popularity dilution
Duplicate and Illegal content
• Aliases are uploaded on the same day or
within a week
• Possibly responsible for flattened tail
• Aliases mostly uploaded by one-time
uploaders
• Only 0.4% of all videos have been deleted by
Youtube due to “concerns”
– Of these 5% are copyright violations
Conclusions
• Extensive study of the Youtube UGC
portal.
– What effects popularity of videos
– Analyzed long tail behavior of popularity
– Simple caching policies can help
– Aliases dilute popularity rankings
My Comments
• Wait for it waaaiiiit for it!!
Related docs
Get documents about "