# Social Networks and Collaborative Filtering

Document Sample

```					Social Networks and
Collaborative Filtering

Qiang Yang
HKUST

Thanks: Sonny Chee
1
Motivation
   Question:
 A user bought some products already

 what other products to recommend to a user?

   Collaborative Filtering (CF)

+
2
Collaborative Filtering

“..people collaborate to help one another
perform filtering by recording their
reactions...” (Tapestry)
 Finds users whose taste is similar to you and

uses them to make recommendations.
 Complimentary to IR/IF.

   IR/IF finds similar documents – CF finds similar
users.

3
Example
       Which movie would Sammy watch next?
   Ratings 1--5
Titles
Starship Sleepless
Trooper in Seattle MI-2 Matrix Titanic
(A)       (R)      (A) (A)     (R)
Sammy         3         4         3  ?       ?
Beatrice      3         4         3  1       1
Users

Dylan         3         4         3  3       4
Mathew        4         2         3  2       5
John          4         3         4  4       4
Basil         5         1         5  ?       ?

• If we just use the average of other users who voted on these movies, then we get
•Matrix= 3; Titanic= 14/4=3.5
•Recommend Titanic!
•But, is this reasonable?                                                            4
Types of Collaborative Filtering Algorithms

   Collaborative Filters
   Statistical Collaborative Filters
   Probabilistic Collaborative Filters [PHL00]
   Bayesian Filters [BP99][BHK98]
   Association Rules [Agrawal, Han]
   Open Problems
   Sparsity, First Rater, Scalability

5
Statistical Collaborative Filters
   Users annotate items with numeric
ratings.
   Users who rate items “similarly” become
Items                     Users
I1 I2 … Im              U1 U2 ..       Un
U1 .. .. ..             U1 ..   ..   ..
Users

Users
U2   ..    ..           U2 ..   ..   ..   ..
… ..    ..              .. ..   ..   ..   ..
Un   .. .. ..           Un      ..   ..   ..

   Recommendation computed by taking a
6
Basic Idea
   Nearest Neighbor Algorithm
   Given a user a and item i
   First, find the the most similar users to a,
   Let these be Y
   Second, find how these users (Y) ranked i,
   Then, calculate a predicted rating of a on i
based on some average of all these users Y
   How to calculate the similarity and average?

7
Statistical Filters

   GroupLens [Resnick et al 94, MIT]
   Filters UseNet News postings
   Similarity: Pearson correlation
   Prediction: Weighted deviation from mean
1
Pa ,i  r a         (ru ,i  r u )  wa ,u


8
Pearson Correlation
7
6
5
Rating

4
3
2
1
0
Item 1         Item 2      Item 3    Item 4   Item 5
Items

User A    User B   User C

Pearson Correlation
User
A B C
A 1 1 -1
User

B 1 1 -1
C -1 -1 1
9
Pearson Correlation

   Weight between users a and u
    Compute similarity matrix between users
   Use Pearson Correlation (-1, 0, 1)
   Let items be all items that users rated
Pearson Correlation
(ra ,i  r a )(ru ,i  r u )                              User
wa ,u                                                                               A B C

        (ra ,i  r a ) 2           (ru ,i  r u ) 2            A 1 1 -1

User
item s
B 1 1 -1
item s                      item s                               C -1 -1 1

10
Prediction Generation
   Predicts how much a user a likes an
item i
   Generate predictions using weighted
deviation from the mean


1
Pa ,i  r a 

 (r
u ,i    r u )  wa ,u       (1)


   : sum of all weights    | wa ,u |
Ya ,u

11
Error Estimation

   Mean Absolute Error (MAE) for user               a
N

| P   a ,i    ra ,i |
MAEa    i 1
N
   Standard Deviation of the errors
K

 ( MAEa  MAE ) 2
    a 1
K

12
Example
Correlation
wa ,i
Sammy         Dylan           Mathew
Sammy             1             1              -0.87
Users

Dylan             1                 1          0.21
Mathew          -0.87          0.21             1

(rDylan,Matrix  r Dylan )  wSammy , Dylan  
                                                                 1
PSammy,Matrix    r Sammy                                                 
(rMathew,Matrix  r Mathew )  wSammy ,Mathew  | wSammy, Dylan |  | wSammy,Mathew |
                                              
 3.3  {(3  3.4) 1  (2  3.2)  (0.87)} /(1  0.87)
 3.6

Prediction           Actual           MAE
Matrix Titanic Matrix Titanic                            MAE =0.83
Users

Sammy        3.6      2.8         3         4       0.9
Basil        4.6      4.1         4         5       0.75

13
Statistical Collaborative Filters
   Ringo [Shardanand and Maes 95 (MIT)]
   Recommends music albums
   Each user buys certain music artists’ CDs
P      a, j
   Base case: weighted average                Pa, i 
j

K
   Predictions
   Mean square difference
   First compute dissimilarity between pairs of users
   Then find all users Y with dissimilarity less than L
   Compute the weighted average of ratings of these users
   Pearson correlation (Equation 1)
   Constrained Pearson correlation (Equation 1 with
weighted average of similar users (corr > L))

14
Open Problems in CF

   “Sparsity Problem”
   CFs have poor accuracy and coverage in
comparison to population averages at low
rating density [GSK+99].
   “First Rater Problem” (cold start prob)
   The first person to rate an item receives no
benefit. CF depends upon altruism. [AZ97]

15
Open Problems in CF

   “Scalability Problem”
   CF is computationally expensive. Fastest
published algorithms (nearest-neighbor)
are n2.
   Any indexing method for speeding up?
   Has received relatively little attention.

16
References

   P. Domingos and M. Richardson, Mining
the Network Value of Customers,
Proceedings of the Seventh
International Conference on Knowledge
Discovery and Data Mining (pp. 57-66),
2001. San Francisco, CA: ACM Press.

17
Motivation
   Network value is ignored (Direct
marketing).
   Examples:
Market to          Affected (under the network effect)

Low
expected                                      High
profit                                        expected
profit

Marketed
High
expected
profit
18
Some Successful Case

   Hotmail
   Grew from 0 to 12 million users in 18
months
   Each email include a promotional URL of it.
   ICQ
   Expand quickly
   First appear, user addicted to it
   Depend it to contact with friend
19

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 245 posted: 6/22/2010 language: English pages: 19