Characteristics of User-Generated Video Content

Document Sample
Characteristics of User-Generated Video Content Powered By Docstoc
					Characteristics of User-Generated Video Content
Haewoon Kwak†
†

Meeyoung Cha†§∗

Pablo Rodriguez§
§

Sue Moon†

Division of Computer Science KAIST, Korea

Telefonica Research Lab Barcelona, Spain

{haewoon@an,sbmoon@cs}.kaist.ac.kr

{mia,pablo.rodriguez}@tid.es

Abstract
Streaming of user-generated video content (or UGC) has become an extremely popular Internet application. In this paper, we collect traces from two popular UGC services, YouTube1 and Daum Movies2 , and study their video characteristics.

We begin our study by pointing out distinctive differences between UGC and traditional VoD services, which we simply call VoD services from now on. • Contents Producers and Consumers: Videos in VoD services have been historically created and supplied by production companies, TV networks, and cable networks. They are produced by professionals and consumed by general public. In UGC services, every single user is a potential producer. These user-generated clips are often made for a very small group of viewers (e.g., sharing baby videos with families) – possibly leading to abundance of unpopular videos within the system. • Video Length: Traditional VoD services serve highquality, lengthy videos. As service subscription consumes both time and money, users tend to select movies carefully before watching. User-generated videos are very short in length (e.g., a few minutes), and viewers are less burdened with their choices, even if they later find them not interesting. • Production Process: Producing movies or TV series for general public requires sophisticated infrastructures and professionals. By contrast, UGC requires less production efforts. Any user can readily rip TV programs, DVD movies, or make short videos by using a camcoder. Fast production rate and the vast number of producers together create the massive scale of UGCs. • Revenue Model: In standard VoD services, content is often pay-per-view. This allows VoD services to provide a better end-user experience as well as much more personalized experience with targeted advertisements and personal recommendations. Online UGC systems are monetized based on advertising and open to the general public. Thus, user behavior is harder to predict. • Content Duplication: Trained experts handle videos in VoD services. Videos are categorized and made accessible by the genre, the director, the year, 1

1

Introduction

Video-on-demand (VoD) service has become extremely popular in the Internet. Started primarily as a download-based service within restricted network domains (e.g., broadband access networks), VoD has evolved to a web-based live streaming distribution service reaching hundreds of millions of Internet users. Content diversity in VoD has also grown massively, as a virtually unlimited number of videos are becoming available. Especially the demand for user-generated contents (UGC) has grown explosively. For instance, YouTube, a digital video repository for sharing videos, was the fastest-growing web service in 2006. High demand for UGC has brought new business opportunities in userdriven video market, sparking an explosion of YouTube clone sites. AOL’s UnCut Video3 aims right at this niche market, and WeWin Video4 entices users with gift coupons to watch best videos. Wired magazine recently reported on this new culture [11], referring it as “bitesize bits for high-speed munching.” Despite the excitement, there lacks a study on how these UGC services are fundamentally different from traditional well-explored video distribution services. In this paper we study the characteristics of videos in UGC services and how they are different from traditional professionally generated content.
∗ Meeyoung is supported by Brain Korea 21 Project through the school of information technology in KAIST. This work was done while Meeyoung was an intern at Telefonica. 1 http://www.youtube.com 2 http://tvpot.daum.net 3 http://uncutvideo.aol.com 4 http://wewin.com

Producers

Many

Personal, unpopular UGC

Popular UGC

Few

Niche film industry

Traditional VoD content (blockbusters)

Few

Many Consumers

Figure 1: Content producers and consumers and so on. Multiple copies of a video may exist to support diverse networking conditions. Users readily choose the best rate they can get by streaming. On the contrary, in UGC services, there often exist identical or very similar copies of videos from a single popular event like a popular soccer game. Consumers of UGCs are faced with different versions of one event and choose based on the title, the view count, the length, the uploader, etc. With these differences in mind, we discuss key features that distinguish UGC services from traditional VoD services. We use traces from two representative UGC services, YouTube and Daum Movies, for analysis. More specifically, we focus on content producers and consumers, production process, and video lengths. We capture the differences in user behavior between UGC services and traditional VoD services, and expect to use them to the distribution system design for future UGC services.

daily. Roughly 65, 000 new videos are uploaded every day. YouTube provides an easy-to-use platform for uploading and sharing. Users may watch videos, without logging in to YouTube. No additional client programs are required. YouTube streams videos through the Adobe’s Flash Player plug-in in web browsers – which is known to come equipped with 90% of Internet-connected computers [1]. To upload, rate, or comment on a video clip, users must log-in to YouTube. From January to February 2007, we have crawled all the videos within ‘Howto & DIY’ category (previously called ‘Science & Technology’). We obtained 252, 255 videos, along with information about the owner, upload time, length, and views. 2. Daum Movies, launched in late 2006, is the most popular video sharing service in Korea. Unlike the stand-alone YouTube service, Daum Movies is an add-on service of the main Daum portal site – a major provider for e-mail, blog, and web search services in Korea. Like YouTube, Daum Movies also streams videos via Adobe’s Flash Player. However, their choice of codecs allow users to upload higherquality videos5 , streaming at 800 kb/s. We crawl all videos in the ‘Music’ category in March 2007. We have obtained over 69, 489 videos, along with the detailed video information. While there are numerous real-world VoD systems, there exist only a handful of publicly known large-scale analysis. One of the extensive analyses on the traditional VoD system has been done by Yu et al. in [14]. Their work is based on PowerInfo, a major video streaming service in China. It was launched in early 2004, and provides over 6,700 titles of movies, TV series, and cartoons during the first year of its service. For comparison purposes, we cite some of the results [14], as it is the only available large-scale public data. To have further comparison points between UGC and professionally generated content, we also use results from several movie databases, including IMDb [2], NetFlix, Yahoo Movies!, and LoveFilm.com.

2

UGC Service Overview

Before we look into detailed characteristics of UGC service, we summarize the differences between UGC and VoD services in Figure 1. Traditional VoD systems serve legitimate, lengthy contents made for general public. There exists a clear distinction in the role and the scale between the content producers and consumers. There also exist videos produced not for wide distribution, but for niche markets, such as art cinema and documentary films. In UGC services, all users are potential content producers themselves. The number of targeted viewers may be significantly smaller than for traditional VoD systems, especially if the content is personal to producers (e.g., sharing videos with friends). Nonetheless, many UGCs become highly popular and are enjoyed by an enormous number of viewers. Following are brief introductions to YouTube and Daum Movies that we have analyzed. 1. YouTube is credited for jump-starting the UGC boom with their simple upload and viewing interface. Launched in early 2004, YouTube is now known to serve over 100 million distinct videos 2

3

Content Production

The content volume in traditional video production is inheritantly constrained by the high production efforts and resources required. IMDb, the largest online movie database, gives us some hints on this scale [2]. IMDb carries 963, 309 titles of movies, TV episodes, and mini series produced since 1888, up until now. PowerInfo VoD system introduced on the average nine new videos daily in its first year of service [14]. The number of
5 YouTube uses Sorenson Spark H.263 video codec, while Daum Movies uses On2’s VP6 codec.

Median views across videos

1 0.9

105
4

Power uploaders Overall median

10

CDF

0.8 0.7 0.6 0.5 0 10

103 102 101 1300 uploads 100 uploads Uploaders ranked by the number of uploads

Youtube Lovefilm Daum
10
1

10

2

10

3

Number of posts per uploader (movies per director)

Figure 2: The number of videos posted by uploaders newly added videos in UGC services is in striking difference. In YouTube, 65, 000 new videos are uploaded daily – which means that it only takes 15 days in YouTube to produce the same number of videos as in IMDb. Daum service showed 2, 000 daily uploads during the first few months of its service. Now 15, 000 new videos are uploaded daily according to their reports. In this section we take a detailed look at UGC production. We focus on the UGC producers (or uploaders), video length distributions, and upload patterns.

Figure 3: Video popularity of heavy-uploaders in YouTube
1 0.8

CDF

0.6 0.4 0.2 0 0 10

Youtube Daum Lovefilm
10
2

10

4

10

6

10

8

Video length (second)

Figure 4: Distribution of video lengths monetary incentive to produce videos in UGC services. Nonetheless, power uploaders exist. Here we attempt to speculate on their motivations. One possibility is reputation. UGC uploaders may enjoy cyber-space fame in the Internet. Another possibility is for uploaders to use UGC services as personal archives. As many video sharing websites provide free repository, one may use this service to store one’s multimedia files. In fact, we have manually confirmed that some power uploaders in Figure 3 use YouTube service as personal video archives. Next, focusing on all the uploaders, we divide them into three groups: light, if one posts fewer than 10 videos; heavy, if one posts 10 to 100 videos; and power, if one posts more than 100 videos. The numbers of them are 120,430, 2,290, and 35, respectively. In YouTube, 98% of all users are light, while a few upload from 10 to 1300 videos. Then for each uploader group, we count how many uploaders post (above-the-average) popular videos. The average of all videos is 2, 586, and the median is 275. We find that 12.5% of light uploaders, 15, 056, post popular videos. In the case of heavy uploaders, 21.5% or 492 heavy uploaders post popular videos. The 22.9% or 8 power uploaders post popular ones. We see a clear trend that active uploaders post more popular videos than less active uploaders. This probably indicates that most of the one-time uploaders post unpopular videos.

3.1

Content Producers

There exist a huge number of producers in UGC services. In our trace, YouTube and Daum show 122, 782 and 16, 461 distinct uploaders, respectively. We plot the cumulative distribution function (CDF) of the number of videos posted per uploader in Figure 2. For comparison purposes, we also show the number of movies per director in the LoveFilm site. LoveFilm6 is a Europe’s most popular online DVD rental service. For UGC, we observe that over 60% of the uploaders post only a single video in YouTube. The average number of videos per uploader is two in YouTube and four in Daum, similar to the mean number of movies per director in LoveFilm. This means that the majority of UGCs are posted by one-time producers who more likely enjoy the role as a consumer. We also find 35 power uploaders who posted more than 100 movies. In comparison, note that LoveFilm results show that no director has made more than 60 movies. This is an interesting point, since the understanding of heavy uploaders’s behavior can be helpful to design better UGC systems. To this extent, we rank the power uploaders by the number of video posts and plot the total number of views for their video clips in Figure 3. The horizontal dashed line represents the median views across all the videos in YouTube. Note that the y-axis is in log-scale. We observe that nearly half of the power uploaders post very popular videos. Further digging in, we find that videos posted by the top two power uploaders serve 2% of all the requests of YouTube. Unlike in traditional VoD services, there is less or no
6 http://www.lovefilm.com

3.2

Video Length

Now let us focus on video lengths. In Figure 4 we plot the distribution of video lengths for Daum and YouTube. YouTube has few videos that are longer than 3

an hour (e.g., recording of lectures). For convenience of illustration, we ignore videos longer than 1, 000 seconds, which account for only 0.72% of the entire videos. Daum videos are typically longer than YouTube ones; the median video length is 70 and 200 seconds for YouTube and Daum, respectively. This gap is due to the difference in genres of videos. We have observed that video length distributions vary across genres (e.g., Sports, Music, Humor, etc). Our YouTube trace is from the ‘Science & Technology’ category, and ‘Music’ for Daum. This shows the different optimizations such as file placement policies are needed by genres. Totally, 80% of videos are shorter than 5 minutes for both of the services. Compared to relatively short UGCs, PowerInfo system carries videos with a wide range of lengths, from 30 to 60 minute-long TV series to 90 to 120 minutelong movies [14]. On a separate note, we do not see an evident correlation between the videos popularity (e.g., the number of views) and video lengths. The correlation coefficient between the two distributions is significantly small: −0.0001 for Daum and 0.0190 for YouTube. The video length distribution, along with the session time, gives us interesting insights on the content usage pattern. It is known that the average YouTube viewers use YouTube service for 17 minutes [5]. Considering the median video length, we estimate that typical users enjoy about 10 video clips per visit. On the contrary, PowerInfo users watch roughly 0.8633 videos on average [14] – leaving the system even before a single video streaming reaches a completion.

that inter-content introduction time in HP corporate media servers follow Pareto distribution [12]. Next, we see a subtle increase from Monday through Wednesday for YouTube, and from Sunday through Tuesday for Daum. Sunday is the heaviest day in Daum, while Wednesday is in YouTube. Monday sees moderately heavy uploads in both YouTube and Daum, as upload traffic shows building up on Monday in YouTube and tapers down after Sunday in Daum. We reason that cultural differences may cause Daum uploaders to be more active on Sundays, while making it a moderate off-peak-day for YouTube users. Also, high broadband penetration in Korea is likely to facilitate users to enjoy video sharing as a major pastime activity.

4

Content Consumption

Amongst the massive volume of UGCs, it is crucial to understand how videos are consumed by users. In this section we focus on the content consumption patterns in UGC. We look into the popularity distribution across videos and request distribution over a short period of time.

4.1

Video Popularity Distribution

A consequential phenomenon from a massive number of new and existing UGCs is a long list of unpopular videos. Figure 6 shows the views of three services: YouTube, Netflix, and Yahoo.
1 0.8
YouTube (Sci) Netflix Yahoo

3.3

Content Upload Patterns
CDF

Finally we focus on content upload patterns. We first measure the hourly upload rate of new videos to analyze the time-of-day uploader behavior. Figure 5(a) shows the distribution of the number of new videos uploaded by the hour from our Daum trace. Videos are uploaded around the clock, while we observe a peak between 8 P M and 2 AM – accounting for 50% of the total uploads. This is in sharp contrast to the peak usage hours of business applications. We omit the same analysis on YouTube videos because YouTube showed only date and no time. However, we conjecture that the time-of-day effect might be less apparent in the YouTube trace, as its service is enjoyed geographically distributed users. Now we look into the uploader behavior by the week in Figures 5(b) and (c). The y-axis represents the aggregate number of videos uploaded over 1 year for Daum and over random 5 weeks for YouTube. We make the following observations. First, new videos are uploaded relatively equally throughout a week. This daily basis upload behavior is consistent with PowerInfo VoD system [14]. However, in other studies, Tang et al. showed 4

0.6 0.4 0.2 0 0 10

10

2

10 Views

4

10

6

10

8

Figure 6: CDF on video popularity In YouTube, we plot CDF of the number of views per video across videos in ‘Science’ category. Netflix is a popular online video rental store. We use customer ratings of 17, 770 videos available in public7 . We expect the actual number of rentals to be significantly larger than the number of ratings per movie, however, we have empirically determined from other data sets that there is a linear relationship between ratings and viewings. The plot on Netflix is, therefore, a lower bound on the number of views per movie. Lastly, we use the movie charts in Yahoo! Movies website8 . Yahoo dataset includes all the movies appeared in top ten daily Box Office Chart
7 http://www.netflixprize.com 8 http://movies.yahoo.com/

0.1 Number of videos (PDF) 0.09 0.08
Number of videos

Daum

12000 DAUM 10000 Number of videos 8000 6000 4000 2000

8000 7000 6000 5000 4000 3000 2000 1000 0 Mon Tue Wed Thu Fri Day-of-week Sat Sun Mon Tue Wed Thu Fri Day-of-week Sat Sun YouTube

0.07 0.06 0.05 0.04 0.03 0.02 0.01 00:00 06:00 12:00 18:00 Time of day 24:00

0

(a) Hourly uploads in Daum

(b) Daily uploads in Daum

(c) Daily uploads in YouTube

Figure 5: Time-of-day and day-of-week trends in uploads within the United States, from January 2004 to March 2007. We obtained box office earnings for 361 videos. We get an approximate number of viewers per movie by dividing the box office earnings by the movie ticket price (e.g., 10 dollars). We note that Yahoo data only contains extremely popular movies, and consumers in Yahoo and Netflix are regionally limited to the US, while YouTube is enjoyed by international users. We make several observations distinguishing UGC services from standard VoD services. First, 1, 782 videos within YouTube have zero views, while all the videos have positive views in Netflix and Yahoo. Second, the median number of views is much smaller for YouTube than the other two dataset. The median is 182, 561, and 3, 843, 300 for YouTube, Netflix, and Yahoo, respectively. Many unpopular UGCs make the lowest median views. Finally, the most popular YouTube video had upto 2.5 million viewers, while the most popular movies in Yahoo had approximately 43 million viewers. In contrast to the massive number of videos in UGC and standard VoD service, we here observe the difference in the scale of consumers per video between the two services. In addition, from a different scale, we conjecture the movies in Yahoo remain popular for a much longer period than those of UGC services.
1 0.9 0.8 0.7 0.6 0.5 1 10 100 1*e3 1*e4 1*e5 1*e6 Daum Youtube

CDF

Increase in views

Figure 7: Distribution of requests over videos lates popularity with video age, we have noted that the videos uploaded within the most recent week are 0.8% of all videos in Feburary 2007, while their aggregate requests account for 7.9% of all requests. This shows that users strongly prefer recent videos rather than old ones. We have also noted that 97% of Daum videos have fewer than 10 requests and 60% of YouTube videos have fewer than 100 requests over one week, as most clips receive a very small number of views.
Increase in views over a week
10
6

Sum across videos Average across videos 10
4

4.2

Content Consumption Patterns

Here we focus on the video request trend over time: its aging process. To identify the content consumption patterns, we use two snapshots of videos over a short period of time and consider the increment in the number of views as requests over videos. For all the videos in our dataset, we obtain the increment in views over one week for Daum and YouTube. Figure 7 shows the distribution of increments in views over one week for Daum and YouTube videos. We observe that 55% and 60% of videos do not receive any request in YouTube and Daum, respectively. It is surprising to see that over a half of all UGCs are absoultely unpopular in both two services. From the request volumes, we can observe that requests are highly skewed in YouTube. TIn fact, doing a closer analysis that re5

10

2

10

0

0

100

200

300

400

500

600

Video age (days)

Figure 8: CDF on video popularity Next, we examine closely how the age of a video relates to popularity. We group videos uploaded on the same day together and calculate the aggregate requests toward those videos. In Figure 8, we plot the sum and the median of requests made towards videos in each age group for YouTube. (Daum results are omitted, however they show a similar trend.) Note that the y-axis is in log scale. The top line shows the overall number

of requests for all videos of a given age. We clearly see that most requests are geared towards young videos. This could be due to two factors a) larger fraction of younger videos in the system, and b) users preferring to watch younger videos rather than old ones. To try to answer this question bottom line in Figure 7 shows the average number of requests per video of a given age. When we look at the average number of requests per video, we can see that there is a clear bias of requests towards younger videos, although, not as pronounced as when looking at aggregate views over all videos in that age category. This indicates that the number of young videos is significantly larger than the number of old videos. Both sum and median values become noisy after video age greater than 450, due to small number of videos.

sure time to be consumed, mounting a long list of unpopular videos. Even for these videos, popularity more or less lasts for a day. This highlights the importance of having a smart recommendation engine within the system. We hope that this analysis will provide useful input in designing the next generation UGC services such as YouTube Mashups and others (e.g., [4], Rimo9 DARAO10 , Dogga11 , Oreseg12 ). In particular, we intend to extend this analysis to better understand popularity distribution, why and how videos become popular, how are videos found, and what is the potential for better recommendation engines.

References
[1] Adobe flash player version penetration. http://www.adobe.com/ products/player census/flashplayer/version penetration.html. [2] Imdb statistics. http://www.imdb.com/database statistics. [3] Now starring on the web: Youtube. http://wired-vig.wired. com/techbiz/media/news/2006/04/70627. [4] Web 2.0 mashup listing for youtube. programmableweb.com/api/YouTube/mashups. http://www.

5

Related Work

Much interest has risen dramatically with the popularity of YouTube-like services in recent years. However, relatively little attempts have been made to analyze those services [3] [9]. They roughly show YouTube’s scale, the characteristics of users such as the oldeast active viewer, the most devoted uploaders, etc. By contrast, much has been written about traditional VoD services. Yu et al. provides an exhaustive analysis of user behavior in VoD services [15]. Contrast to many studies before them mainly relied on video rental services or Internet streaming services, they study a large deployed VoD system through detailed server logs. They focus on user behavior, content access patterns, and introduce user-arrival model. Some researchers offers good case studies of video streaming services. Almeida et al. present an analysis of the workloads for educational media servers [7]. They model the distribution of file access frequencies by the concatenation of two Zipf-like distributions as in Costa et al.’s work [13]. They pose the need of the reevaluation of the traditional cache-on-first-miss strategy for video streaming services because of crowd-effects. High temporal locality is also presented by Acharya’s work [6]. The locality helps VoD services apply to p2p delivery concepts. Peer-assisted delivery concepts to VoD services have been much studied [8] [10]. Their analysis are mainly relied on simulation. Likewise, we can evaluate the feasibility to distribute UGCs by peer-to-peer.

[5] Youtube fact sheet. http://www.youtube.com/press room. [6] S. Acharya, B. Smith, and P. Parnes. Characterizing user access to videos on the world wide web. In Proc. of ACM/SPIE Multimedia Computing and Networking (MMCN), San Jose, CA, USA, January 2000. [7] J. M. Almeida, J. Krueger, D. L. Eager, and M. K. Vernon. Analysis of educational media server workloads. In ACM NOSSDAV, pages 21–30, New York, NY, USA, 2001. ACM Press. [8] T. Do, K. A. Hua, and M. Tantaoui. P2vod: providing fault tolerant video-on-demand streaming in peer-to-peer environment. Communications, 2004 IEEE International Conference on, 3:1467–1472, 2004. [9] L. Gomes. Will all of us get our 15 minutes on a youtube video?, 2006. [10] Y. Guo, K. Suh, J. Kurose, and D. Towsley. A peer-to-peer on-demand streaming service and its performance evaluation. In Proc. of the International Conference on Multimedia and Expo, pages 649–652, Washington, DC, USA, 2003. IEEE Computer Society. [11] N. Miller. Manifesto for a new age. In Wired Mag., March 2007. [12] W. Tang, Y. Fu, L. Cherkasova, and A. Vahdat. Medisyn: Synthetic streaming media service workload generator. In ACM NOSSDAV, 2003. [13] C. P. C. et al. Analyzing client interactivity in streaming media. In Proc. of 13th WWW, pages 534–543, New York, NY, USA, 2004. ACM Press. [14] H. Yu, D. Zheng, B. Y. Zhao, and W. Zheng. Understanding user behavior in large-scale video-on-demand systems. In Proceedings of the 2006 ACM Eurosys Conference, April 2006. [15] H. Yu, D. Zheng, B. Y. Zhao, and W. Zheng. Understanding user behavior in large-scale video-on-demand systems. ACM SIGOPS, 40(4):333–344, 2006.

6

Concluding Remarks

In this paper, we have compared UGC with traditional VoD services from various aspects. We have shown that fast content production rate and enormous number of existing and new videos are distinctive features for UGC service. As a result, most videos get a very short expo6

10 http://saqoosha.net/darao/ 11 http://www.dogga.jp/wii/ 12 http://oreseg.com/

9 http://rimo.tv/


				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:16
posted:1/28/2010
language:English
pages:6