Docstoc

chuah-facebook-apps

Document Sample
chuah-facebook-apps Powered By Docstoc
					                               ELECTRICAL &
                               COMPUTER ENGINEERING




Peeking into Facebook Applications:
  - Activity Graphs, Network Footprints,
            & Phantom Profiles


                              Chen-Nee Chuah
                    Robust & Ubiquitous Networking Lab
                     http://www.ece.ucdavis.edu/rubinet




                       Outline

 Introduction and motivation
 Case study: Facebook
   – Measurement Methodology
   – Usage Characteristics of Third-Party Facebook
     Applications
   – Network-layer footprints and performance bottlenecks
 Ongoing Work: Phantom User Profiles




                                                            2




                                                                1
                           Motivations

   Online social networks (OSNs) reach more than 0.5 billion
    users collectively [TechCrunch, New York Times]
     – OSNs have reshaped user-user interaction
   Third-party social applications dominate user interactions
    in OSNs
     – Facebook, MySpace, Hi5, Bebo, Friendster, …
   Two major platforms:
     – Facebook Developer Platform (FDP)
     – OpenSocial




                                                                             3




                      OSN Framework




 Viral growth in number of users, number & heterogeneity of applications, and
  traffic volume is changing the Internet landscape
  => Need a careful characterization of this new emerging workload
                                                                             4




                                                                                 2
                      Open Questions

  (ISPs) What are the usage characteristics of OSNs and
   OSN-based applications? Does it call for a redesign of
   networking substrate?
  (Application developers) Do external developers of
   popular and viral applications need exorbitantly high
   resources to serve content to users?
  (Users) How much do Facebook request forwarding and
   response processing delays affect user experience?
  (OSN providers) What are the possible provisioning
   strategies at OSNs like Facebook? What is a suitable
   distributed data storage platform?


                                                                          5




                        Related Work
 Graph theoretic properties of social networks:
   – Real-life Social Networks:
      • [Granovetter ‘73]: The Strength of Weak Ties
      • [Milgram ’67]: The Small World Problem
   – Online Social Networks:
      • [Mislove ‘07]: Measurement and Analysis of Online Social Networks
 Security and privacy issues:
   – [Gross ’05]: Information Revelation and Privacy in Online Social
     Networks
 User behavior:
   – [Gjoka ‘08]: Poking Facebook: Characterization of OSN Applications
   – [Golder ’07]: Rhythms of Social Interaction: Messaging within a
     Massive Online Network

                                                                          6




                                                                              3
Our Focus: Facebook 3rd Party Applications

A comprehensive framework for large- scale measurement study of
  3rd party OSN applications

 [Nazir ‘08]: Unveiling Facebook: A Measurement Study of Social
  Network Based Applications
   – Global application usage characteristics
   – User-level behavior on social applications
   – Distinguishing features of social games
 [Nazir’09]: Network-level Footprints of Third-Party Applications
   – Characterize user experience (delays) interacting with 3rd party
     applications via Facebook
   – Gauging provisioning strategies at Facebook & Application
     Servers
                                                                              7




             Measurement Methodology

   Passive & active measurement approach
   (Passive) Become one of the players: develop and launch our
    own Facebook applications
      – Nine popular within top 1% of ~57K (as of Jan 2009)* Facebook
        applications attracting 8+ million unique users.
      – Traces collected at the application servers (installation requests,
        interaction with other users via the applications)
   (Active)
      – Simulated clients from Planetlab nodes
          Modify request types and parameters

  * Today, there are >95,000 apps.


                                                                              8




                                                                                  4
[Nazir’08] Characterization of Facebook Apps.

  We consider three applications on Facebook:

                                                        Social
    – Fighters’ Club (FC, 3.4M+, Jun 2007)              Gaming


    – Got Love?          (GL, 4M+, Nov 2007)
                                                        Social Utility
    – Hugged             (0.7M+, Feb 2008)




                                                                          9




         GL, Hugged: Social Utility Apps

   GL: friend-friend, one request per target friend

   Hugged: friend-friend, multiple requests per
    target friend

   Similar functionality:
                                                     Inform




       • User A hugs/loves (friend) User B
                                                              View
                                               Hug




       • User B accepts/ignores hug/love


                                                                     10


                                                                          10




                                                                               5
  Fighters’ Club: A Gaming Application
                                                              Winne
                                                                r
                                            Pick Fight
 Friend-friend, non-friend
  to non-friend interaction
                                           Hit/Attack       More
                                                          Damaging


 Number of blows limited     Offender’s
                              Supporters
                                                          Defender’s
                                                          Supporters
  through points system

                                            Pick a Side




                                                                       11




               Data Set Summary




                                                                       12




                                                                            6
    Inference on Global Characteristics
                                                                           Fighters’ Club
                                                            8000




                                      No. of Active Users
 Geographical distribution                                 6000

                                                            4000
  – Determined by underlying
                                                            2000
    social graph
                                                               0




                                                                        la a
                                                                      Au frica
                                                                     M ysia

                                                                   So i La es
                                                                     M AE
                                                                              A


                                                                             da




                                                                             sh
                                                                         A a
                                                                      Ca UK




                                                                     ng ali
                                                                     ut nk
                                                               US




                                                                     Sr ldiv




                                                                          de
                                                                          na
                                                                          U




                                                                   Ba str
                                                                          a
                                                                       al
 Traffic pattern




                                                                       a


                                                                       h
  – Daily pattern is diurnal                                 1.5

  – Weekends are slow, Tuesday




                                    Bandwidth Used
                                                               1




                                       (in GBs)
    sees peak traffic
  – Affected by majority of user                             0.5


   contribution from US, Canada                                0
                                                                   0   5       10     15     20
                                                                           Hour of the Day



 User activity follows a power-law distribution as
  well (shown later)
                                                                                                  13




  User Interaction on Social Applications

 Scalability of social applications is an issue
    – Storage, retrieval and processing of data is problematic
      for applications, especially social gaming
 Data segregation is may achieve scalability
 Consider interaction graphs on social applications:
    – Users associating with one ‘data point’ (user/game)
      have an edge between them




                                                                                                  14




                                                                                                       7
             User Interaction (1)

  Connected components?
   – Natural separation of interacting users




                                               15




Interaction Graphs: Data and Results Summary




                                               16




                                                    8
 User Interaction: Community Structures

  Connected components?
   – Too biased: > 85% of users in largest component
   – Artifact of social networks (social graph)
   – Doesn’t work
  Community structures?
   – Find clusters of ‘more connected’ nodes in interaction
     graphs
   – Does structure exist in our applications’ interaction
     graphs?
      • Structure coefficient or ‘modularity’, a quality function
      • More than 0.3: significant structure



                                                                    17




Interaction Graphs: Data and Results Summary




                                                                    18




                                                                         9
User Interaction: Community Structures (1)

   Community size distribution
     – More varied for GL, Hugged than for FC




   Can we predict community formation?
     – Through geographic locations
     – Through Facebook’s `networks’
                                                19




User Interaction: Community Structures (2)

 Geographical diversity per community




 Facebook-network diversity per community




                                                20




                                                     10
                                                  Interaction Graphs: Data and Results Summary




                                                                                                     Actually Small World
                                                                                                          Networks!
                                                                                                                                                                                                                                                                21




                                                      Social Gaming vs. Social Utility Apps. (1)
                                                   Social Utility: Hugged, Got Love?
                                                   Social Game: Fighters’ Club
                                                                Fighters’ Club                                                                         Got Love?
                                    0                                                                                         0                                                                                                          Hugged
                   10                                                                                                        10                                                                                 0
                                                                                                                                                                                                               10
                                                                                                          log(Node Degree)
log(Node Degree)




                                                                                                                                                                                            log(Node Degree)




                                    −2
                   10

                                                                                                                              −5
                                    −4                                                                                       10
                   10                                                                                                                                                                                           −5
                                                                                                                                                                                                               10
                                                                                                                                       0            1               2            3                                   0            1               2         3
                                              0             1           2           3            4                                10              10           10            10                                     10         10             10           10
                                         10                10         10           10        10                                            log(Probability of Node Degree)
                                                      log(Probability of Node Degree)                                                                                                                                     log(Probability of Node Degree)

                                                  4
                                         8 x 10                                                                               40                                                                               1500
                                                                                        FC                                                                                                                                                                  FC
                                                                                                                                                                                     Average Time Spent
                   No. of Unique Users




                                                                                                      Data Transferred




                                                                                        GL                                                                                                                                                                  GL
                                         6                                              Hugged                                30
                                                                                                                                                                                                               1000
                                                                                                       (in GBs)




                                                                                                                                                                                       (in secs)




                                         4                                                                                    20                                        FC
                                                                                                                                                                        GL
                                                                                                                                                                        Hugged                                  500
                                         2                                                                                    10

                                         0                                                                                        0                                                                                  0
                                          0           10        20    30     40         50   60                                    0       10     20      30       40   50   60                                       0             20                40        60
                                                                Day No. in Trace                                                                  Day No. in Trace                                                                Day No. in Trace


                                                                                                                                                                                                                                                                22




                                                                                                                                                                                                                                                                     11
 Social Gaming vs. Social Utility Apps. (2)

   FC: Relatively small proportion highly engaged
   Utility applications have little or no engagement
   User interaction is still a power-law
     – Some users interact with more friends through
       applications




                                                                 23




 Social Gaming vs. Social Utility Apps. (3)

 Other differences:
   – Average number of activities higher on FC than on GL,
     Hugged
   – Average number of friends on application, total number of
     friends on Facebook, significantly higher for FC than GL,
     Hugged
    1500
                        Average No. of Activities
                        Average No. of Subscribing Friends
                        Average No. of Total Friends
    1000


     500


        0
                   FC       Hugged         GL



                                                                 24




                                                                      12
   [Nazir’08] Summary & Discussions

 Performed analysis that highlights:
   – General properties observed on Facebook applications
   – User-level behavior on top of the social graph
   – Differentiating properties of social games
 Activity levels are determined by number of
  interacting individuals
   – In FC, more playing friends means more activities made
 Users befriend (non-friend) players on FC
   – Form a deeply engaged cluster of players
   – User engagement is thus quite high
   – Distorts the actual social graph on Facebook
 Social games have higher warm-up time than social
  utility applications
                                                              25




  Activity Graph Evolution Over Time

 Track one social utility (Hugged) and one gaming
  (FC) application over 52 weeks
 Track a new social utility application (iHeart)
  since its launch ~ 26 weeks of data.
 Questions
   – How do graph properties evolve over time?
   – What about the heavy sub-graphs?
   – User churns? Heavy/persistent users?




                                                              26




                                                                   13
   Graph Properties Remain Stable for Matured
                  Applications




                                                                    27




             Heavy vs. Persistent Users




 Heavy users on FC: 10% contributes to 80% of the activities. Not
  so much on Hugged
 Persistent users on FC: FC interaction time is order of magnitude
  larger than Hugged (some users return almost daily!)              28




                                                                         14
                          Outline

 Introduction and motivation
 Case study: Facebook
   – Measurement Methodology
   – Usage Characteristics of Third-Party Facebook
     Applications
   – Network-layer footprints and performance bottlenecks
 Ongoing Work: Phantom User Profiles




                                                             29




         [Nazir’09] Applications Deployed




 Total third-party applications: ~57,000 (as of Jan 2009)
 Our applications are within the top 1% DAU/MAU-wise
  – MAU: Monthly Active Users
  – DAU: Daily Active Users
  – DAU/MAU: Engagement metric for comparison with global
    applications


                                                             30




                                                                  15
   Observed Loads at Application Server
 Request arrivals exhibit diurnal patterns
 Growth patterns depend on popularity and seasonality of content
 High correlation between installation requests and # of total
  requests
   – Preferential attachment is responsible for application growth




                                                                                               31




       Methodology: Measurement Set up
       Active                                                        Passive
                                                                     Measurements
       Measurements


                                                                    -User activity traces
                                                                    -Pageview response sizes
                                                                    -API call delays
                                                                    -TCP dump logs
       Page view                                 Request
                   request se                Forwarding Delay
                              nt
                                                                 Request Queuing
                                             Page view                Delay
                                                       reques
                                                forwarded t




                                                         sent
                                                response
                                      Page view
                                                                     Request
                sponse fo   rwarded                              Processing Delay
   Page view re
                                                 Response
                                              Processing Delay

                                                                                               32




                                                                                                    16
          Application Server Delays (1)

 Application server delays:
   – Queuing delays: negligible!
   – Processing delays: dependent on load, reduced w/
     proper resource provisioning



                       Server Upgrade




             Resource provisioning is crucial,
              yet exorbitant resources are
                     unnecessary!

                                                             33




          Application Server Delays (2)

 Response sizes remain stable across time, independent of
  load. Typical response sizes:
       • 1.5-3Kb per request for The Streets
       • 4-5Kb per request for Hugged, Holiday Cheers
 API call response delays vary with API call type
   – Some calls are more expensive during peak hours



              Save resources: Make API
               calls in off-peak hours!



                                                             34




                                                                  17
     Methodology: Measurement Set up
                                                                  Passive
     Active
     Measurements                                                 Measurements


                                                                 -User activity traces
                                                                 -Pageview response sizes
                                                                 -API call delays
                                                                 -TCP dump logs
     Page view                                Request
                 request se               Forwarding Delay
                            nt
                                                              Request Queuing
                                          Page view                Delay
                                                    reques
                                             forwarded t




                                                      sent
                                             response
                                   Page view
                                                                  Request
              sponse   forwarded                              Processing Delay
 Page view re
                                              Response
                                           Processing Delay

                                                                                            35




            OSN Request Forwarding Delays

 Vary linearly with increasing request sizes
   – Typical request sizes (0-1Kb): ~130ms delay
    Negligible fraction of total response times
 Do not vary with traffic load, application popularity
    Applications are treated equally by Facebook



                   OSN Request Forwarding Delays:
                 Negligible Impact on User-Experience!




                                                                                            36




                                                                                                 18
     Methodology: Measurement Set up
                                                                  Passive
     Active
     Measurements                                                 Measurements


                                                                 -User activity traces
                                                                 -Pageview response sizes
                                                                 -API call delays
                                                                 -TCP dump logs
     Page view                                Request
                 request se               Forwarding Delay
                            nt
                                                              Request Queuing
                                          Page view                Delay
                                                    reques
                                             forwarded t




                                                      sent
                                             response
                                   Page view
                                                                  Request
              sponse   forwarded                              Processing Delay
 Page view re
                                              Response
                                           Processing Delay

                                                                                            37




           OSN Response Processing Delays

 Seem to be unaffected by application popularity
 HTML content takes significantly less time to process
  than Javascript
   – HTML: 0.01ms/byte, Javascript: 0.04ms/byte
 FBML tags:
   – Targeting non-user entities: Unaffected by entity’s
     popularity, type (groups and networks)
   – Targeting user content: Unaffected by entity’s
     popularity, network membership, but not content type




                                                                                            38




                                                                                                 19
    OSN Response Delays: Effect of Caching

  User-Related FBML Tags: Delays depend on content
     – Longest: profile pictures; Shortest: users’ status




                     Developers can balance
                    content to improve delays!

                                                                            39




              Overall Delay Breakdowns
 OSN response processing delays form significant fraction!
   – Synthetic workload




   – Actual workload
       • Through simulated average workload per request for our applications:
           – The Streets: dg = 44.4% of 1.30s total time
           – Hugged: dg= 68.8% of 2.21s total time
                   OSN Response of 1.77s total time
           – Pound Puppies: dg= 59.9% Processing
                     Delays are Significant!

                                                                            40




                                                                                 20
                 Discussion/Insights

 Suggestions for OSNs
   – Eliminate connection set-up delays using persistent HTTP
     connections
   – Distribute data center locations!
       • Improve end-to-end delay for global audience
       • Fact: Current Facebook architecture relies on two data centers in
         the U.S.
   – Parallelize per-request OSN-tag (e.g., FBML) processing
 Suggestions for Developers
   – Scale resources up/down according to high/low activity
     periods
       • Daily, weekly, yearly!

                                                                             41




                            Outline

 Introduction and motivation
 Case study: Facebook
   – Measurement Methodology
   – Usage Characteristics of Third-Party Facebook
     Applications
   – Network-layer footprints and performance bottlenecks
 Ongoing Work: Phantom User Profiles




                                                                             42




                                                                                  21
                     Background

 Social gaming:
   – Major contributor to user engagement in OSNs
   – Already atleast a billion dollar industry in 2009
 The Facebook Platform is the primary focus for
  social games
 How do social games impact the underlying social
  graph?
   – Existing research: graph theoretic properties, user
     behavior and traffic patterns




                                                              43




            Phantom User Profiles

 Social games are highly engaging
   – Some users spend more than 10 hours per day playing
   – More social => more engaging => higher tendency to
     cheat
 Gamers create fake (Phantom) profiles to achieve
  higher rewards in social games
   – They are advertised to other gamers
   – Phantom profiles are heavily connected to gamers (real
     profiles)
   => Contaminated social graph, loss of revenue, negative
     impact on user experience


                                                              44




                                                                   22
Significance of Phantom/Fake Profiles

 OSN business models depend on advertising
  revenues => mostly interested in ‘real’ users
 As more social interaction shifts to the Internet,
  new social phenomena and social problems may
  emerge.
 Crime in cyber space relies crucially on fake
  identities




                                                                 45




                 Our Contributions

 Problem definition:
   – How do we detect (and eliminate) the Phantom
     profiles?
   – Current Facebook methods rely primarily on user
     reports, manual inspection and flagging through rate
     limits [anecdotal]
 Our approach:
   – Characterize Phantom profiles on a popular social game
      • Phantom profiles have nearly identical OSN-resident
        properties as Genuine profiles
      • Phantom profile characteristics differ on social games
   – Investigate a supervised learning technique to classify
     Phantom profiles


                                                                 46




                                                                      23
                     Related Work

 Phantom profiles in OSNs have received limited
  attention
 Yu et. al’s [ToN08] work on Sybil attacks is most
  relevant
   – Focuses on users on the OSN, but not in games
   – Focuses on Sybil attacks:
      • “Malicious user obtaining multiple fake identities and
        pretending to be multiple, distinct nodes in the system.”
      • Assumption: phantom profiles are less connected to genuine
        users than to other phantom profiles
 Phantom profiles in social games may be more
  connected to real users
   – Our problem is not a Sybil attack

                                                                     47




     Case Study: Fighters’ Club (FC)

                                         Winner

                           Pick Fight




                          Hit/Attack     More
             No. of Unique Users:     264,606
                                     Damaging
              No. of Active Users:     30,990
                of Known
            No.Offender’s Genuine Profiles: 520
                                    Defender’s
            No. of Known Phantom Profiles: 545
              Supporters            Supporters


           We employ 13 different OSN- and
         game-based attributes for the analysis.
                           Pick a Side




                                                                     48




                                                                          24
       Case Study: Fighters’ Club (FC)

 Phantom users have incentive to “blend in”:
  – OSN-based properties such as # of friends, # of networks
    joined, etc. are similar on average to genuine users




  – Must consider game-specific properties instead
     • Special activities, special interactions?                             49


                                                                                  49




   Characterization of Phantom Profiles

   Genuine users compete with friends, Phantom
    users only serve Genuine users as slaves
   Strategies for improving rank in FC:
     1) Fight weaker players to win fights and gain higher
        status in FC
         •    Phantom users instigate/defend more fights

     2) Support stronger players to accumulate more FC
        rewards
         •    Phantom users have larger opposing teams

     3) Allow ample time for friends to participate in fights
                                                                             50
         •    Phantom users instigate/defend fights with smaller durations
                                                                                  50




                                                                                       25
          Supervised Learning: SVM
 Question: Does combining the attributes detect
  Phantom profiles better?
 Apply Support Vector Machines (SVMs) for
  classification of Phantom profiles
  – General idea: project N features into an N-dimensional
    plane to find a maximal separating hyper-plane
  – Question for Classifier: Is Profile X a Phantom Profile?
  – Considered 13 features, eliminated 3 to reduce noise
  – Performed parameter tuning to train classifier


                                                                              51


                                                                                   51




                          Evaluation


 Tested classifier using training data, apply it on the
  unknowns (~30,000 users)
 Divided our known profiles into 10 subsets with
  varying ratios of Phantom to Genuine profiles
 Employed K-fold (10-fold) Cross Validation to
  evaluate our classifier per input set
     • K-1 sets are used for training, 1 set is used as validation data for
       testing the model
       – Cross-validation process is repeated K times
       – The K results (1 per fold) are then averaged                         52


                                                                                   52




                                                                                        26
              Cross-Validation Result




                     Successful application of
                    classifier depends on true
                   ratio of Phantom to Genuine
                   profiles – currently unknown




                                                                          53


                                                                               53




                           Discussion

 Introduced the problem of Phantom profiles in social
  games
 How do we detect them?
  – Statistical classification just doesn’t work
  – Supervised learning fares better
      • Suffers from unknown ratio of actual Phantom to Genuine profiles in
        entire population
      • May suffice for flagging users, but not for automatic elimination
 Future work:
  –   Identify larger numbers of Phantom and Genuine profiles
  –   Incorporate behavioral and demographic data
  –   Investigate other supervised learning techniques                    54
  –   Ultimate goal: identify Phantom profiles and their creators
                                                                               54




                                                                                    27
       Questions & Comments?




E-mail: chuah@ucdavis.edu
Project url: http://www.ece.ucdavis.edu/rubinet




                                                  55




                                                       28

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:6/2/2011
language:English
pages:28