Analyzing Patterns of User Content Generation in Online Social
Document Sample


Analyzing Patterns of User Content
Generation in Online Social Networks
Lei Guo, Yahoo!
Enhua Tan, Ohio State University
Songqing Chen, George Mason University
Xiaodong Zhang, Ohio State University
Yihong (Eric) Zhao, Yahoo!
1
Online social networks: platforms for social
connections and content sharing
• Networking oriented OSNs User network
social connections
– Knowledge-sharing mainly among friends
• Knowledge-sharing oriented OSNs Content network
common interest topics
– Content sharing is among all users 2
UGC content in online social networks
• User generated content (UGC)
– Users are basic elements of OSNs
– OSNs are driven by user contributions
Users create new contents
advertisement
User
Contents attract new users
• Understanding UGC content generation patterns is important
– Business success: attract new users and clients
– Identify and distinguish active users from spamming users
– Predict hot spots and the trends of topics in user communities
– Perform efficient resource management in the underlying supporting system
3
Existing studies about user contributions in
online social networks
• Wikipedia
– Power law: core users contribute most articles log y
y
(ISSI’05)
• Number of articles a user edited
• Number of co-authors of a Wiki article
slope: -a
• Heavy tailed, scale free: highly skewed
towards top users
heavy tail
– User contribution shifts from “elite” users to
common users (CHI’07) i
log i
• Log analysis from 2001 to 2006
a
yi i
Power law or not: no conclusion
• Delicious social bookmark (CHI’07)
– Similar shifts for user contribution as in i : contribution rank of a user
Wikipedia yi : contribution of the user
– Power law or not: no conclusion
4
Our study
• UGC content in three large online social networks
– Blog, social bookmark, question answer
• User posting over time
•User posting over time
• Distribution of user contributions
• Implications of UGC generation patterns
• Concluding remarks
5
UGC creation traffic overview
Blog article, Asia Bookmark, US Answer, US
Weekly pattern
Weekly pattern Weekly pattern
1 1
Posts (%)
Posts (%)
Posts (%)
1
0.5 0.5
0 0
Mon Tue Wed Thu Fri Sat Sun 0
Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun
Daily pattern Daily pattern Daily pattern
10
Posts (%)
5
Posts (%)
5
Posts (%)
5
0 0
0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0
0 3 6 9 12 15 18 21
Blog photo, Asia
• Weekly patterns
Weekly pattern
– Blog (article/photo): weekday and weekend posts are
Posts (%)
1 similar—daily web journaling
– Bookmark/Answer: weekend posts are smaller than
0
Mon Tue Wed Thu Fri Sat Sun weekdays
Daily pattern
10 • Daily patterns
Posts (%)
5 – Peak times are all 11:00 PM local time
0 – Bottom times are different for US and Asia: different
0 3 6 9 12 15 18 21
cultures
6
Dynamics of user joining and posting in OSNs
Blog Bookmark User join rate (new users per day)
– increases with time
– bursty in large time scales
User increase rate
– decrease with time
– bursty in large time scales
Post increase rate
– decrease with time
– less bursy than user increase rate
Implications
– total user population and content
do not increase exponentially
– User join bursts: post inc rate <
user inc rate
– Bursts and dynamics need to be
considered for data analysis
new users per day new posts per day
user increase rate post increase rate
all users all posts
7
User activity over time
Author’s OSN age of posts • User’s posting frequency over time
– The age of the user in OSN when an UGC
object is posted
– Bookmark: almost uniform distribution
– Blog: a little skewed towards small ages
– Answer: more skewed towards small ages
• User’s lifetime (active duration) in OSNs
Author’s OSN lifetime
– Assumed exponential distribution before
– For user posting behavior
• Long lifetime users
• Short lifetime users
• Other users: a wide range of lifetime
8
Outline
• User generated content
• User posting over time
• Distribution of user contributions
• Implications of UGC generation patterns
• Concluding remarks
9
Original and non-original UGC content
Weekly pattern
• Three kinds of UGC objects 1.5
Posts (%)
1
– Original UGC objects 0.5
0
– Cut-and-paste objects Mon Tue Wed Thu Fri Sat Sun
Diurnal pattern
– Spam and advertisement 10
Posts (%)
• Spam: filtered out with ML model 5
0
• Cut-and-paste objects in Blog 4
0 3 6 9 12 15 18 21
x 10
– Posted by a small number of users 4
Number of posts
3
– No clear posting peak time
2
– Focused on recreation and social
event categories 1
• Spam users and cut-and-paste 0
Environment/Health
Organization
Photography
Religion/Philosophy
Technology Products
Relationship
Leisure Habits
Social Events
Travel
Computer/Internet
Movie
Music
Family
Study
Creation
Recreation
Pop Culture
Business
Other
Game
Working
Art Design
Sport
Food
Life
users are removed in our analysis
10
Cut-and-paste posts
Stretched exponential distribution
• User contribution in a social network follows the stretched
exponential distribution
log y
fat head
Rank order distribution:
• fat head and thin tail in log-log scale
• straight line in logx-yc scale (SE scale)
thin tail
log i
i : rank of users (N users)
y : number of objects created by the user yc c: stretch factor
yic a log i b (1 i N )
b slope: -a
b 1 a log N (assuming yN 1)
11
log i
UGC creation patterns of Blog
article photo
fat head
powered scale yc
thin tail
log scale
c = 0.418
R2 = 0.997
log scale in x axis
x: contribution rank of user y: number of original posts by the user
Y left: y^c scale Parameters: maximum likelihood method
Y right: log scale R2: coefficient of determination (1 means a perfect fit)
12
UGC creation patterns of Bookmark
Bookmark (imports) Bookmark (all posts)
fat head
powered scale yc
thin tail
log scale in x axis log scale
x: contribution rank of user y: number of bookmark posts by the user
• Bookmark imports: bookmarks imported from user’s Web browser when
joining the system
• Bookmark posts: bookmarks posted to the system by the bookmark plug-in
of web browser
13
UGC creation patterns of Answer
Answer (all posts) Answer (best)
fat head
powered scale yc
thin tail
log scale
log scale in x axis
x: contribution rank of user y: number of answer posts by the user
• Best answer: the asker can select a best answer from all received answers.
Best answers are high quality UGC posts since they are judged by the
askers themselves.
14
Model validation
• Chi-square test Chi-square test results (a = 0.05)
Data set k 2 2(a,k-c) Result
k
(O Ei ) 2
2 i Blog article 11 11.403 14.067 pass
i 1 Ei Blog photo 12 14.072 15.507 pass
k: number of bins, Oi: total observed posts, Bookmark (all posts) 10 11.486 12.592 pass
Ei: expected number of posts
Bookmark (imports) 11 9.367 14.067 pass
2 (2a ,kc) rejected by the test Answer (all posts) 11 13.340 14.067 pass
Answer (best ans) 10 7.001 12.592 pass
• Validation on users joined the system
simultaneously
– Users join rate increases with time
– Some users may become inactive
• Validation on different parts of workloads
– follow SE distribution with the same c
– parameter c is the shape factor, not
change for different parts of a workload
15
Outline
• User generated content
• User posting over time
• Distribution of user contributions
• Implications of UGC generation patterns
• Conclusion and future work
16
The “80-20” rule
• 80-20 rule of power law distributions
– Pareto principle: 20% people
own 80% social wealth
– Internet systems: 20% web pages
account for 80% requests
– …
• In social networks
– Blog: 20% users for 80% posts
– Bookmark: 17% users for 83% posts
– Answer: 13% users for 87% posts Roughly follows the 80-20 rule
User contribution is stretched exponential
What is the difference between user contribution distribution in online
social networks and user income distribution in a real society?
17
Asymptotical properties of top users
Highly skewed towards top users Contributions of top users
1
SE (blog article)
log y Power law Power law (a = 0.9)
Cumulative contribution ratio
0.8
0.6
0.4
0.2
log i
0 -5 -4 -3 -2 -1 0
10 10 10 10 10 10
Less skewed towards top users Fraction of users
log y Stretched exponential The cumulative contribution ratio of top-k
users among all n users in an OSN
k T
0, se 0
n T pow
A small number of top users cannot
dominate the content in an OSN 18
log i
The “core” users in social networks
5
10
• Looking for a threshold to identify most A
important users 4
10
Number of posts
• Power law distribution: hard threshold 3
10
– By number of or fraction of users
2
– By a predefined user contribution 10
(X0, Y0)
threshold 1
10
• Stretched exponential distribution:
0 B
general threshold for all systems 10 0 1 2 3 4 5
10 10 10 10 10 10
User rank
dy
: decrease rate of user contribution along rank 1
y k 1 1 a c
X0= log k, Y0= yk : exp( ), yk
di n a c c
: increase rate of user contribution rank
i
yk k/n cumsum n
dy dy di
Blog article 47 14.8% 73.3% 348 K
y y i Blog photo
dX dY 0 209 7.7% 64.0% 269 K
Bookmark 248 8.3% 67.6% 1.7 M
Let X log i, Y log y(i)
Answer 287 4.7% 63.7% 19
10.3 M
Creation patterns of different types of UGC
Blog article Blog photo Bookmark Answer
c
type c
type c 0.32 type c
all posts 0.25
all posts 0.42 imports 0.33
more effort than best ans 0.19
> 1 KB 0.39 all posts 0.32
> 2 KB 0.31
short blog
taking photo, higher quality, smaller c
with tags 0.30
transferring, no difference in effort (more effort to compose)
more effort, smaller c editing,
longer articles need more uploading,
effort to compose, adding writing desc,
…
tags needs extra effort y c a log i b
small c : y c ~ log y
user participating
effort is even smaller higher quality and more Power law !
effort than best answer
Our conjecture: larger c, flatter would have much smaller c 20
user contribution distribution
Discussion: UGC production vs. UGC
consumption
• Internet media access patterns (PODC’08)
– Number of requests to an media object is stretched exponential for
different kinds of media systems
• Media request is content consumption
– Stretch factor increases with file length (duration a user views)
• UGC creation is content production
– Stretch factor decreases with the effort to create a UGC object
• UGC social networks rely on user contribution to attract traffic
– Relationship between UGC creation and consumption
– More general model for both UGC creation and consumption
• Understand the driving force of a social network
• Design effective participation mechanisms for social applications
• Provide efficient data management for underlying supporting systems
21
Outline
• User generated content
• User posting over time
• Distribution of user contributions
• Implications of UGC generation patterns
• Concluding remarks
22
Conclusion
• User activities and contributions are critical for knowledge-sharing
social networks
• We have analyzed three large OSNs, and found
– User lifetime in OSNs does not follow exponential
distribution
– User contribution distribution is stretched exponential
– Different types of UGC content generation patterns can be
modeled with different parameters in SE
• User contribution model: distribution of individual user behaviors
– Building block to understand more complex social network phenomena
– Foundation to guide design, modeling and simulation of OSNs
23
Thank you!
24
Shared by: Jun Wang
About
Some of Those documents come from internet for research purpose,if you have the copyrights of one of them,tell me by mail vixychina@gmail.com.Thank you!
Related docs
Other docs by hcj