Docstoc

Analyzing Patterns of User Content Generation in Online Social

Document Sample
Analyzing Patterns of User Content Generation in Online Social Powered By Docstoc
					 Analyzing Patterns of User Content
Generation in Online Social Networks

                Lei Guo, Yahoo!
           Enhua Tan, Ohio State University
       Songqing Chen, George Mason University
        Xiaodong Zhang, Ohio State University
             Yihong (Eric) Zhao, Yahoo!


                                                1
      Online social networks: platforms for social
           connections and content sharing
•   Networking oriented OSNs                      User network


                                   social connections



     – Knowledge-sharing mainly among friends

• Knowledge-sharing oriented OSNs                       Content network




                                                          common interest topics
     – Content sharing is among all users                                          2
         UGC content in online social networks

•   User generated content (UGC)
     – Users are basic elements of OSNs
     – OSNs are driven by user contributions

                 Users create new contents

                                                          advertisement


          User

                 Contents attract new users


•   Understanding UGC content generation patterns is important
     –   Business success: attract new users and clients
     –   Identify and distinguish active users from spamming users
     –   Predict hot spots and the trends of topics in user communities
     –   Perform efficient resource management in the underlying supporting system
                                                                                     3
        Existing studies about user contributions in
                   online social networks

•   Wikipedia
     – Power law: core users contribute most articles   log y
                                                          y
       (ISSI’05)
          • Number of articles a user edited
          • Number of co-authors of a Wiki article
                                                                         slope: -a
          • Heavy tailed, scale free: highly skewed
            towards top users
                                                                         heavy tail
     – User contribution shifts from “elite” users to
       common users (CHI’07)                                                           i
                                                                                      log i
          • Log analysis from 2001 to 2006
                                                                         a
                                                                yi  i
        Power law or not: no conclusion

•   Delicious social bookmark (CHI’07)
     – Similar shifts for user contribution as in          i : contribution rank of a user
       Wikipedia                                           yi : contribution of the user
     – Power law or not: no conclusion
                                                                                         4
                      Our study
• UGC content in three large online social networks

   – Blog, social bookmark, question answer

• User posting over time
•User posting over time

• Distribution of user contributions

• Implications of UGC generation patterns

• Concluding remarks

                                                      5
                                 UGC creation traffic overview
                         Blog article, Asia                                                  Bookmark, US                                                   Answer, US
                                 Weekly pattern
                                                                                                 Weekly pattern                                                 Weekly pattern
                                                                                1                                                               1
     Posts (%)




                                                            Posts (%)




                                                                                                                            Posts (%)
                 1
                                                                        0.5                                                             0.5
             0                                                                  0
             Mon Tue Wed Thu Fri Sat Sun                                                                                                        0
                                                                                Mon Tue Wed Thu Fri Sat Sun                                     Mon Tue Wed Thu Fri Sat Sun
                        Daily pattern                                                      Daily pattern                                                   Daily pattern
            10
Posts (%)




                                                                                5


                                                                    Posts (%)
                                                                                                                                                5




                                                                                                                                    Posts (%)
                 5

                 0                                                              0
                     0   3   6     9   12   15    18   21                           0   3    6     9   12   15    18   21                       0
                                                                                                                                                    0   3   6     9   12   15    18   21



                         Blog photo, Asia
                                                                            •           Weekly patterns
                                 Weekly pattern
                                                                                         –       Blog (article/photo): weekday and weekend posts are
     Posts (%)




                 1                                                                               similar—daily web journaling
                                                                                         –       Bookmark/Answer: weekend posts are smaller than
             0
             Mon Tue Wed Thu Fri Sat Sun                                                         weekdays
                        Daily pattern
            10                                                              •           Daily patterns
Posts (%)




                 5                                                                       –       Peak times are all 11:00 PM local time
                 0                                                                       –       Bottom times are different for US and Asia: different
                     0   3   6     9   12   15    18   21
                                                                                                 cultures
                                                                                                                                                                                           6
           Dynamics of user joining and posting in OSNs
                  Blog                               Bookmark                     User join rate (new users per day)
                                                                                      –   increases with time
                                                                                      –   bursty in large time scales
                                                                                  User increase rate
                                                                                      –   decrease with time
                                                                                      –   bursty in large time scales
                                                                                  Post increase rate
                                                                                      –   decrease with time
                                                                                      –   less bursy than user increase rate
                                                                                  Implications
                                                                                      –   total user population and content
                                                                                          do not increase exponentially
                                                                                      –   User join bursts: post inc rate <
                                                                                          user inc rate
                                                                                      –   Bursts and dynamics need to be
                                                                                          considered for data analysis
                       new users per day                          new posts per day
user increase rate                        post increase rate 
                           all users                                  all posts
                                                                                                                        7
                   User activity over time
Author’s OSN age of posts   • User’s posting frequency over time
                                – The age of the user in OSN when an UGC
                                  object is posted
                                – Bookmark: almost uniform distribution
                                – Blog: a little skewed towards small ages
                                – Answer: more skewed towards small ages


                            • User’s lifetime (active duration) in OSNs
  Author’s OSN lifetime
                                – Assumed exponential distribution before
                                – For user posting behavior
                                    • Long lifetime users
                                    • Short lifetime users
                                    • Other users: a wide range of lifetime



                                                                              8
                       Outline
• User generated content

• User posting over time

• Distribution of user contributions

• Implications of UGC generation patterns

• Concluding remarks



                                            9
    Original and non-original UGC content
                                                                                                 Weekly pattern
• Three kinds of UGC objects                                                   1.5




                                                                 Posts (%)
                                                                                1
   – Original UGC objects                                                      0.5
                                                                                0
   – Cut-and-paste objects                                                      Mon Tue Wed Thu Fri Sat Sun
                                                                                          Diurnal pattern
   – Spam and advertisement                                                    10




                                                                   Posts (%)
• Spam: filtered out with ML model                                              5

                                                                                0
• Cut-and-paste objects in Blog                                                 4
                                                                                     0   3   6     9   12   15    18   21
                                                                 x 10
   – Posted by a small number of users                       4




                                           Number of posts
                                                             3
   – No clear posting peak time
                                                             2
   – Focused on recreation and social
     event categories                                        1

• Spam users and cut-and-paste                               0




                                                                     Environment/Health
                                                                             Organization
                                                                             Photography




                                                                     Religion/Philosophy
                                                                    Technology Products
                                                                             Relationship

                                                                           Leisure Habits



                                                                            Social Events
                                                                                    Travel




                                                                      Computer/Internet
                                                                                    Movie
                                                                                    Music

                                                                                   Family
                                                                                    Study
                                                                                 Creation




                                                                               Recreation
                                                                              Pop Culture




                                                                                Business
                                                                                    Other
                                                                                    Game
                                                                                 Working




                                                                               Art Design
                                                                                     Sport
                                                                                     Food
                                                                                      Life
  users are removed in our analysis

                                                                                                                        10
                                         Cut-and-paste posts
         Stretched exponential distribution
• User contribution in a social network follows the stretched
  exponential distribution
                                                 log y
                                                                fat head
  Rank order distribution:
     • fat head and thin tail in log-log scale
      • straight line in logx-yc scale (SE scale)
                                                                            thin tail

                                                                                 log i
 i : rank of users (N users)
 y : number of objects created by the user              yc   c: stretch factor

    yic  a log i  b (1  i  N )
                                                    b               slope: -a

     b  1  a log N (assuming yN  1)
                                                                                         11
                                                                                 log i
                               UGC creation patterns of Blog

                                     article                                    photo

  fat head
    powered scale yc




                                             thin tail




                                                         log scale
                       c = 0.418
                       R2 = 0.997
                        log scale in x axis

x: contribution rank of user                             y: number of original posts by the user
                        Y left: y^c scale        Parameters: maximum likelihood method
                        Y right: log scale       R2: coefficient of determination (1 means a perfect fit)
                                                                                                            12
                            UGC creation patterns of Bookmark
                           Bookmark (imports)                     Bookmark (all posts)
      fat head
        powered scale yc




                                          thin tail




                           log scale in x axis        log scale

x: contribution rank of user y: number of bookmark posts by the user
  •     Bookmark imports: bookmarks imported from user’s Web browser when
        joining the system
  •     Bookmark posts: bookmarks posted to the system by the bookmark plug-in
        of web browser
                                                                                         13
                           UGC creation patterns of Answer
                         Answer (all posts)                    Answer (best)
  fat head
     powered scale yc




                                       thin tail




                                                   log scale
                        log scale in x axis

x: contribution rank of user y: number of answer posts by the user
 •   Best answer: the asker can select a best answer from all received answers.
     Best answers are high quality UGC posts since they are judged by the
     askers themselves.
                                                                               14
                                     Model validation
•   Chi-square test                                        Chi-square test results (a = 0.05)
                                                          Data set          k     2      2(a,k-c)   Result
                   k
                     (O  Ei )   2
           2   i                                      Blog article       11   11.403    14.067     pass
                i 1    Ei                               Blog photo         12   14.072    15.507     pass
    k: number of bins, Oi: total observed posts,     Bookmark (all posts)   10   11.486    12.592     pass
    Ei: expected number of posts
                                                     Bookmark (imports)     11   9.367     14.067     pass

       2  (2a ,kc) rejected by the test           Answer (all posts)    11   13.340    14.067     pass
                                                      Answer (best ans)     10   7.001     12.592     pass

•   Validation on users joined the system
    simultaneously
      –   Users join rate increases with time
      –   Some users may become inactive

•   Validation on different parts of workloads
      –   follow SE distribution with the same c
      –   parameter c is the shape factor, not
          change for different parts of a workload
                                                                                                        15
                       Outline
• User generated content

• User posting over time

• Distribution of user contributions

• Implications of UGC generation patterns

• Conclusion and future work



                                            16
                          The “80-20” rule

•   80-20 rule of power law distributions
     – Pareto principle: 20% people
       own 80% social wealth
     – Internet systems: 20% web pages
       account for 80% requests
     – …
•   In social networks
     – Blog: 20% users for 80% posts
     – Bookmark: 17% users for 83% posts
     – Answer: 13% users for 87% posts            Roughly follows the 80-20 rule

    User contribution is stretched exponential

    What is the difference between user contribution distribution in online
    social networks and user income distribution in a real society?
                                                                                   17
        Asymptotical properties of top users
Highly skewed towards top users                                           Contributions of top users
                                                                   1
                                                                           SE (blog article)
log y      Power law                                                       Power law (a = 0.9)




                                  Cumulative contribution ratio
                                                                  0.8


                                                                  0.6


                                                                  0.4


                                                                  0.2
                    log i
                                                                   0 -5          -4       -3        -2         -1        0
                                                                   10       10         10         10      10        10
Less skewed towards top users                                                         Fraction of users


log y   Stretched exponential     The cumulative contribution ratio of top-k
                                  users among all n users in an OSN

                                                                              k      T
                                                                                 0, se  0
                                                                              n     T pow

                                  A small number of top users cannot
                                  dominate the content in an OSN     18
                       log i
             The “core” users in social networks
                                                                               5
                                                                              10
    •   Looking for a threshold to identify most                                   A
        important users                                                        4
                                                                              10




                                                            Number of posts
    •   Power law distribution: hard threshold                                 3
                                                                              10
         –   By number of or fraction of users
                                                                               2
         –   By a predefined user contribution                                10
                                                                                                            (X0, Y0)
             threshold                                                         1
                                                                              10
    •   Stretched exponential distribution:
                                                                               0                                           B
        general threshold for all systems                                     10 0      1      2        3              4        5
                                                                                10     10    10      10          10            10
                                                                                              User rank
    dy
      : decrease rate of user contribution along rank                                                                                      1

     y                                                                                             k       1 1         a c
                                                         X0= log k, Y0= yk :                          exp(  ), yk   
    di                                                                                             n       a c        c
       : increase rate of user contribution rank
     i
                                                                                        yk    k/n           cumsum                   n
         dy   dy di
                                                      Blog article                   47   14.8%            73.3%                 348 K
          y    y i                                        Blog photo
                                    dX  dY  0                                        209   7.7%             64.0%                 269 K
                                                          Bookmark                     248   8.3%             67.6%                 1.7 M
Let X  log i, Y  log y(i)
                                                           Answer                      287   4.7%             63.7%                19
                                                                                                                                10.3 M
    Creation patterns of different types of UGC
      Blog article              Blog photo             Bookmark                   Answer




                                       c
                                                                                    type       c
       type       c                   0.32                type       c
                                                                                  all posts   0.25
     all posts   0.42                                    imports    0.33
                                 more effort than                                 best ans    0.19
      > 1 KB     0.39                                   all posts   0.32
      > 2 KB     0.31
                                 short blog
                                  taking photo,                                 higher quality, smaller c
     with tags   0.30
                                  transferring,     no difference in effort     (more effort to compose)

more effort, smaller c            editing,
  longer articles need more       uploading,
  effort to compose, adding       writing desc,
                                  …
  tags needs extra effort                                                          y c  a log i  b
                                                                                  small c : y c ~ log y
                        user participating
                        effort is even smaller      higher quality and more          Power law !
                                                    effort than best answer
 Our conjecture: larger c, flatter                  would have much smaller c                        20
 user contribution distribution
     Discussion: UGC production vs. UGC
                 consumption
• Internet media access patterns (PODC’08)
    – Number of requests to an media object is stretched exponential for
      different kinds of media systems
• Media request is content consumption
    – Stretch factor increases with file length (duration a user views)
• UGC creation is content production
    – Stretch factor decreases with the effort to create a UGC object
• UGC social networks rely on user contribution to attract traffic
    – Relationship between UGC creation and consumption
    – More general model for both UGC creation and consumption
        • Understand the driving force of a social network
        • Design effective participation mechanisms for social applications
        • Provide efficient data management for underlying supporting systems
                                                                                21
                       Outline
• User generated content

• User posting over time

• Distribution of user contributions

• Implications of UGC generation patterns

• Concluding remarks



                                            22
                          Conclusion
• User activities and contributions are critical for knowledge-sharing
  social networks
• We have analyzed three large OSNs, and found
    – User lifetime in OSNs does not follow exponential
      distribution
    – User contribution distribution is stretched exponential
    – Different types of UGC content generation patterns can be
      modeled with different parameters in SE
• User contribution model: distribution of individual user behaviors
    – Building block to understand more complex social network phenomena
    – Foundation to guide design, modeling and simulation of OSNs
                                                                         23
Thank you!

             24

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:10/28/2012
language:English
pages:24
Jun Wang Jun Wang Dr
About Some of Those documents come from internet for research purpose,if you have the copyrights of one of them,tell me by mail vixychina@gmail.com.Thank you!