Analyzing Patterns of User Content Generation in Online Social

Document Sample
Analyzing Patterns of User Content Generation in Online Social Powered By Docstoc
					 Analyzing Patterns of User Content
Generation in Online Social Networks

                Lei Guo, Yahoo!
           Enhua Tan, Ohio State University
       Songqing Chen, George Mason University
        Xiaodong Zhang, Ohio State University
             Yihong (Eric) Zhao, Yahoo!

      Online social networks: platforms for social
           connections and content sharing
•   Networking oriented OSNs                      User network

                                   social connections

     – Knowledge-sharing mainly among friends

• Knowledge-sharing oriented OSNs                       Content network

                                                          common interest topics
     – Content sharing is among all users                                          2
         UGC content in online social networks

•   User generated content (UGC)
     – Users are basic elements of OSNs
     – OSNs are driven by user contributions

                 Users create new contents



                 Contents attract new users

•   Understanding UGC content generation patterns is important
     –   Business success: attract new users and clients
     –   Identify and distinguish active users from spamming users
     –   Predict hot spots and the trends of topics in user communities
     –   Perform efficient resource management in the underlying supporting system
        Existing studies about user contributions in
                   online social networks

•   Wikipedia
     – Power law: core users contribute most articles   log y
          • Number of articles a user edited
          • Number of co-authors of a Wiki article
                                                                         slope: -a
          • Heavy tailed, scale free: highly skewed
            towards top users
                                                                         heavy tail
     – User contribution shifts from “elite” users to
       common users (CHI’07)                                                           i
                                                                                      log i
          • Log analysis from 2001 to 2006
                                                                yi  i
        Power law or not: no conclusion

•   Delicious social bookmark (CHI’07)
     – Similar shifts for user contribution as in          i : contribution rank of a user
       Wikipedia                                           yi : contribution of the user
     – Power law or not: no conclusion
                      Our study
• UGC content in three large online social networks

   – Blog, social bookmark, question answer

• User posting over time
•User posting over time

• Distribution of user contributions

• Implications of UGC generation patterns

• Concluding remarks

                                 UGC creation traffic overview
                         Blog article, Asia                                                  Bookmark, US                                                   Answer, US
                                 Weekly pattern
                                                                                                 Weekly pattern                                                 Weekly pattern
                                                                                1                                                               1
     Posts (%)

                                                            Posts (%)

                                                                                                                            Posts (%)
                                                                        0.5                                                             0.5
             0                                                                  0
             Mon Tue Wed Thu Fri Sat Sun                                                                                                        0
                                                                                Mon Tue Wed Thu Fri Sat Sun                                     Mon Tue Wed Thu Fri Sat Sun
                        Daily pattern                                                      Daily pattern                                                   Daily pattern
Posts (%)


                                                                    Posts (%)

                                                                                                                                    Posts (%)

                 0                                                              0
                     0   3   6     9   12   15    18   21                           0   3    6     9   12   15    18   21                       0
                                                                                                                                                    0   3   6     9   12   15    18   21

                         Blog photo, Asia
                                                                            •           Weekly patterns
                                 Weekly pattern
                                                                                         –       Blog (article/photo): weekday and weekend posts are
     Posts (%)

                 1                                                                               similar—daily web journaling
                                                                                         –       Bookmark/Answer: weekend posts are smaller than
             Mon Tue Wed Thu Fri Sat Sun                                                         weekdays
                        Daily pattern
            10                                                              •           Daily patterns
Posts (%)

                 5                                                                       –       Peak times are all 11:00 PM local time
                 0                                                                       –       Bottom times are different for US and Asia: different
                     0   3   6     9   12   15    18   21
           Dynamics of user joining and posting in OSNs
                  Blog                               Bookmark                     User join rate (new users per day)
                                                                                      –   increases with time
                                                                                      –   bursty in large time scales
                                                                                  User increase rate
                                                                                      –   decrease with time
                                                                                      –   bursty in large time scales
                                                                                  Post increase rate
                                                                                      –   decrease with time
                                                                                      –   less bursy than user increase rate
                                                                                      –   total user population and content
                                                                                          do not increase exponentially
                                                                                      –   User join bursts: post inc rate <
                                                                                          user inc rate
                                                                                      –   Bursts and dynamics need to be
                                                                                          considered for data analysis
                       new users per day                          new posts per day
user increase rate                        post increase rate 
                           all users                                  all posts
                   User activity over time
Author’s OSN age of posts   • User’s posting frequency over time
                                – The age of the user in OSN when an UGC
                                  object is posted
                                – Bookmark: almost uniform distribution
                                – Blog: a little skewed towards small ages
                                – Answer: more skewed towards small ages

                            • User’s lifetime (active duration) in OSNs
  Author’s OSN lifetime
                                – Assumed exponential distribution before
                                – For user posting behavior
                                    • Long lifetime users
                                    • Short lifetime users
                                    • Other users: a wide range of lifetime

• User generated content

• User posting over time

• Distribution of user contributions

• Implications of UGC generation patterns

• Concluding remarks

    Original and non-original UGC content
                                                                                                 Weekly pattern
• Three kinds of UGC objects                                                   1.5

                                                                 Posts (%)
   – Original UGC objects                                                      0.5
   – Cut-and-paste objects                                                      Mon Tue Wed Thu Fri Sat Sun
                                                                                          Diurnal pattern
   – Spam and advertisement                                                    10

                                                                   Posts (%)
• Spam: filtered out with ML model                                              5

• Cut-and-paste objects in Blog                                                 4
                                                                                     0   3   6     9   12   15    18   21
                                                                 x 10
   – Posted by a small number of users                       4

                                           Number of posts
   – No clear posting peak time
   – Focused on recreation and social
     event categories                                        1

• Spam users and cut-and-paste                               0


                                                                    Technology Products

                                                                           Leisure Habits

                                                                            Social Events



                                                                              Pop Culture


                                                                               Art Design
  users are removed in our analysis

                                         Cut-and-paste posts
         Stretched exponential distribution
• User contribution in a social network follows the stretched
  exponential distribution
                                                 log y
                                                                fat head
  Rank order distribution:
     • fat head and thin tail in log-log scale
      • straight line in logx-yc scale (SE scale)
                                                                            thin tail

                                                                                 log i
 i : rank of users (N users)
 y : number of objects created by the user              yc   c: stretch factor

    yic  a log i  b (1  i  N )
                                                    b               slope: -a

     b  1  a log N (assuming yN  1)
                                                                                 log i
                               UGC creation patterns of Blog

                                     article                                    photo

  fat head
    powered scale yc

                                             thin tail

                                                         log scale
                       c = 0.418
                       R2 = 0.997
                        log scale in x axis

x: contribution rank of user                             y: number of original posts by the user
                        Y left: y^c scale        Parameters: maximum likelihood method
                        Y right: log scale       R2: coefficient of determination (1 means a perfect fit)
                            UGC creation patterns of Bookmark
                           Bookmark (imports)                     Bookmark (all posts)
      fat head
        powered scale yc

                                          thin tail

                           log scale in x axis        log scale

x: contribution rank of user y: number of bookmark posts by the user
  •     Bookmark imports: bookmarks imported from user’s Web browser when
        joining the system
  •     Bookmark posts: bookmarks posted to the system by the bookmark plug-in
        of web browser
                           UGC creation patterns of Answer
                         Answer (all posts)                    Answer (best)
  fat head
     powered scale yc

                                       thin tail

                                                   log scale
                        log scale in x axis

x: contribution rank of user y: number of answer posts by the user
 •   Best answer: the asker can select a best answer from all received answers.
     Best answers are high quality UGC posts since they are judged by the
     askers themselves.
                                     Model validation
•   Chi-square test                                        Chi-square test results (a = 0.05)
                                                          Data set          k     2      2(a,k-c)   Result
                     (O  Ei )   2
           2   i                                      Blog article       11   11.403    14.067     pass
                i 1    Ei                               Blog photo         12   14.072    15.507     pass
    k: number of bins, Oi: total observed posts,     Bookmark (all posts)   10   11.486    12.592     pass
    Ei: expected number of posts
                                                     Bookmark (imports)     11   9.367     14.067     pass

       2  (2a ,kc) rejected by the test           Answer (all posts)    11   13.340    14.067     pass
                                                      Answer (best ans)     10   7.001     12.592     pass

•   Validation on users joined the system
      –   Users join rate increases with time
      –   Some users may become inactive

•   Validation on different parts of workloads
      –   follow SE distribution with the same c
      –   parameter c is the shape factor, not
          change for different parts of a workload
• User generated content

• User posting over time

• Distribution of user contributions

• Implications of UGC generation patterns

• Conclusion and future work

                          The “80-20” rule

•   80-20 rule of power law distributions
     – Pareto principle: 20% people
       own 80% social wealth
     – Internet systems: 20% web pages
       account for 80% requests
     – …
•   In social networks
     – Blog: 20% users for 80% posts
     – Bookmark: 17% users for 83% posts
     – Answer: 13% users for 87% posts            Roughly follows the 80-20 rule

    User contribution is stretched exponential

    What is the difference between user contribution distribution in online
    social networks and user income distribution in a real society?
        Asymptotical properties of top users
Highly skewed towards top users                                           Contributions of top users
                                                                           SE (blog article)
log y      Power law                                                       Power law (a = 0.9)

                                  Cumulative contribution ratio



                    log i
                                                                   0 -5          -4       -3        -2         -1        0
                                                                   10       10         10         10      10        10
Less skewed towards top users                                                         Fraction of users

log y   Stretched exponential     The cumulative contribution ratio of top-k
                                  users among all n users in an OSN

                                                                              k      T
                                                                                 0, se  0
                                                                              n     T pow

                                  A small number of top users cannot
                                  dominate the content in an OSN     18
                       log i
             The “core” users in social networks
    •   Looking for a threshold to identify most                                   A
        important users                                                        4

                                                            Number of posts
    •   Power law distribution: hard threshold                                 3
         –   By number of or fraction of users
         –   By a predefined user contribution                                10
                                                                                                            (X0, Y0)
             threshold                                                         1
    •   Stretched exponential distribution:
                                                                               0                                           B
        general threshold for all systems                                     10 0      1      2        3              4        5
                                                                                10     10    10      10          10            10
                                                                                              User rank
      : decrease rate of user contribution along rank                                                                                      1

     y                                                                                             k       1 1         a c
                                                         X0= log k, Y0= yk :                          exp(  ), yk   
    di                                                                                             n       a c        c
       : increase rate of user contribution rank
                                                                                        yk    k/n           cumsum                   n
         dy   dy di
                                                      Blog article                   47   14.8%            73.3%                 348 K
          y    y i                                        Blog photo
                                    dX  dY  0                                        209   7.7%             64.0%                 269 K
                                                          Bookmark                     248   8.3%             67.6%                 1.7 M
Let X  log i, Y  log y(i)
                                                           Answer                      287   4.7%             63.7%                19
                                                                                                                                10.3 M
    Creation patterns of different types of UGC
      Blog article              Blog photo             Bookmark                   Answer

                                                                                    type       c
       type       c                   0.32                type       c
                                                                                  all posts   0.25
     all posts   0.42                                    imports    0.33
                                 more effort than                                 best ans    0.19
      > 1 KB     0.39                                   all posts   0.32
      > 2 KB     0.31
                                 short blog
                                  taking photo,                                 higher quality, smaller c
     with tags   0.30
                                  transferring,     no difference in effort     (more effort to compose)

more effort, smaller c            editing,
  longer articles need more       uploading,
  effort to compose, adding       writing desc,
  tags needs extra effort                                                          y c  a log i  b
                                                                                  small c : y c ~ log y
                        user participating
                        effort is even smaller      higher quality and more          Power law !
                                                    effort than best answer
 Our conjecture: larger c, flatter                  would have much smaller c                        20
 user contribution distribution
     Discussion: UGC production vs. UGC
• Internet media access patterns (PODC’08)
    – Number of requests to an media object is stretched exponential for
      different kinds of media systems
• Media request is content consumption
    – Stretch factor increases with file length (duration a user views)
• UGC creation is content production
    – Stretch factor decreases with the effort to create a UGC object
• UGC social networks rely on user contribution to attract traffic
    – Relationship between UGC creation and consumption
    – More general model for both UGC creation and consumption
        • Understand the driving force of a social network
        • Design effective participation mechanisms for social applications
        • Provide efficient data management for underlying supporting systems
• User generated content

• User posting over time

• Distribution of user contributions

• Implications of UGC generation patterns

• Concluding remarks

• User activities and contributions are critical for knowledge-sharing
  social networks
• We have analyzed three large OSNs, and found
    – User lifetime in OSNs does not follow exponential
    – User contribution distribution is stretched exponential
    – Different types of UGC content generation patterns can be
      modeled with different parameters in SE
• User contribution model: distribution of individual user behaviors
    – Building block to understand more complex social network phenomena
    – Foundation to guide design, modeling and simulation of OSNs
Thank you!


Shared By:
Jun Wang Jun Wang Dr
About Some of Those documents come from internet for research purpose,if you have the copyrights of one of them,tell me by mail you!