Social Networks and Collaborative Filtering

Document Sample
Social Networks and Collaborative Filtering Powered By Docstoc
					Social Networks and
Collaborative Filtering

          Qiang Yang
            HKUST

       Thanks: Sonny Chee
                            1
Motivation
   Question:
      A user bought some products already

      what other products to recommend to a user?

   Collaborative Filtering (CF)
      Automates “circle of advisors”.




                       +
                                                     2
Collaborative Filtering

“..people collaborate to help one another
   perform filtering by recording their
   reactions...” (Tapestry)
 Finds users whose taste is similar to you and

   uses them to make recommendations.
 Complimentary to IR/IF.

     IR/IF finds similar documents – CF finds similar
      users.


                                                         3
Example
       Which movie would Sammy watch next?
                  Ratings 1--5
                                   Titles
                   Starship Sleepless
                   Trooper in Seattle MI-2 Matrix Titanic
                     (A)       (R)      (A) (A)     (R)
        Sammy         3         4         3  ?       ?
        Beatrice      3         4         3  1       1
Users




        Dylan         3         4         3  3       4
        Mathew        4         2         3  2       5
        John          4         3         4  4       4
        Basil         5         1         5  ?       ?

              • If we just use the average of other users who voted on these movies, then we get
                    •Matrix= 3; Titanic= 14/4=3.5
                    •Recommend Titanic!
              •But, is this reasonable?                                                            4
    Types of Collaborative Filtering Algorithms

   Collaborative Filters
        Statistical Collaborative Filters
        Probabilistic Collaborative Filters [PHL00]
        Bayesian Filters [BP99][BHK98]
        Association Rules [Agrawal, Han]
   Open Problems
        Sparsity, First Rater, Scalability


                                                       5
Statistical Collaborative Filters
   Users annotate items with numeric
    ratings.
   Users who rate items “similarly” become
    mutual advisors.
                        Items                     Users
                     I1 I2 … Im              U1 U2 ..       Un
                  U1 .. .. ..             U1 ..   ..   ..
          Users




                                  Users
                  U2   ..    ..           U2 ..   ..   ..   ..
                  … ..    ..              .. ..   ..   ..   ..
                  Un   .. .. ..           Un      ..   ..   ..


   Recommendation computed by taking a
    weighted aggregate of advisor ratings.
                                                                 6
Basic Idea
   Nearest Neighbor Algorithm
   Given a user a and item i
       First, find the the most similar users to a,
            Let these be Y
       Second, find how these users (Y) ranked i,
       Then, calculate a predicted rating of a on i
        based on some average of all these users Y
            How to calculate the similarity and average?


                                                            7
Statistical Filters

   GroupLens [Resnick et al 94, MIT]
       Filters UseNet News postings
       Similarity: Pearson correlation
       Prediction: Weighted deviation from mean
                          1
          Pa ,i  r a         (ru ,i  r u )  wa ,u
                          



                                                         8
Pearson Correlation
           7
           6
           5
  Rating


           4
           3
           2
           1
           0
               Item 1         Item 2      Item 3    Item 4   Item 5
                                          Items

                                 User A    User B   User C


               Pearson Correlation
                             User
                           A B C
                         A 1 1 -1
                  User




                         B 1 1 -1
                         C -1 -1 1
                                                                      9
  Pearson Correlation

     Weight between users a and u
              Compute similarity matrix between users
                      Use Pearson Correlation (-1, 0, 1)
                      Let items be all items that users rated
                                                                           Pearson Correlation
                              (ra ,i  r a )(ru ,i  r u )                              User
wa ,u                                                                               A B C

                           (ra ,i  r a ) 2           (ru ,i  r u ) 2            A 1 1 -1




                                                                             User
          item s
                                                                                    B 1 1 -1
                   item s                      item s                               C -1 -1 1




                                                                                                10
Prediction Generation
   Predicts how much a user a likes an
    item i
       Generate predictions using weighted
        deviation from the mean
                         


                     1
     Pa ,i  r a 
                     
                        (r
                          u ,i    r u )  wa ,u       (1)


       : sum of all weights    | wa ,u |
                                                   Ya ,u




                                                             11
Error Estimation

   Mean Absolute Error (MAE) for user               a
                           N

                           | P   a ,i    ra ,i |
                  MAEa    i 1
                                  N
   Standard Deviation of the errors
                   K

                    ( MAEa  MAE ) 2
                 a 1
                            K


                                                         12
   Example
                                                      Correlation
                           wa ,i
                                           Sammy         Dylan           Mathew
                           Sammy             1             1              -0.87
                   Users

                           Dylan             1                 1          0.21
                           Mathew          -0.87          0.21             1


                            (rDylan,Matrix  r Dylan )  wSammy , Dylan  
                                                                                             1
PSammy,Matrix    r Sammy                                                 
                            (rMathew,Matrix  r Mathew )  wSammy ,Mathew  | wSammy, Dylan |  | wSammy,Mathew |
                                                                          
                                    3.3  {(3  3.4) 1  (2  3.2)  (0.87)} /(1  0.87)
                                    3.6

                             Prediction           Actual           MAE
                           Matrix Titanic Matrix Titanic                            MAE =0.83
       Users




               Sammy        3.6      2.8         3         4       0.9
               Basil        4.6      4.1         4         5       0.75

                                                                                                       13
    Statistical Collaborative Filters
   Ringo [Shardanand and Maes 95 (MIT)]
       Recommends music albums
            Each user buys certain music artists’ CDs
                                                             P      a, j
       Base case: weighted average                Pa, i 
                                                             j

                                                                 K
       Predictions
            Mean square difference
                  First compute dissimilarity between pairs of users
                  Then find all users Y with dissimilarity less than L
                  Compute the weighted average of ratings of these users
            Pearson correlation (Equation 1)
            Constrained Pearson correlation (Equation 1 with
             weighted average of similar users (corr > L))

                                                                            14
Open Problems in CF

   “Sparsity Problem”
       CFs have poor accuracy and coverage in
        comparison to population averages at low
        rating density [GSK+99].
   “First Rater Problem” (cold start prob)
       The first person to rate an item receives no
        benefit. CF depends upon altruism. [AZ97]


                                                 15
Open Problems in CF

   “Scalability Problem”
       CF is computationally expensive. Fastest
        published algorithms (nearest-neighbor)
        are n2.
            Any indexing method for speeding up?
       Has received relatively little attention.




                                                    16
References

   P. Domingos and M. Richardson, Mining
    the Network Value of Customers,
    Proceedings of the Seventh
    International Conference on Knowledge
    Discovery and Data Mining (pp. 57-66),
    2001. San Francisco, CA: ACM Press.



                                        17
Motivation
   Network value is ignored (Direct
    marketing).
   Examples:
    Market to          Affected (under the network effect)

Low
expected                                      High
profit                                        expected
                                              profit

           Marketed
                                        High
                                        expected
                                        profit
                                                      18
Some Successful Case

   Hotmail
       Grew from 0 to 12 million users in 18
        months
       Each email include a promotional URL of it.
   ICQ
       Expand quickly
       First appear, user addicted to it
       Depend it to contact with friend
                                                19