Docstoc

Page Not Found

Document Sample
Page Not Found Powered By Docstoc
					                                                            ICS-FORTH & CSD University of Crete
                                            3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                     Manos Papagelis – 28/06/’04




“Qualitative Analysis of User-based and
 Item-based Prediction Algorithms for
     Recommendation Systems”


   by Manos Papagelis1,2, Dimitris Plexousakis1,2,
    Ioannis Rousidis2 and Elias Theoharopoulos1

      3rd Hellenic Data Management Symposium
                 Athens 28 June, 2004
                      1ICS-FORTH

  2Computer   Science Department, University of Crete
                                                                                            1
                                                          ICS-FORTH & CSD University of Crete

Outline                                   3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                   Manos Papagelis – 28/06/’04




   Recommendation Systems
   Similarity Measures
   Prediction Algorithms
   Experimental Evaluation and Results
   Extensions
   Conclusions and Discussion




                                                                                          2
                                                                                ICS-FORTH & CSD University of Crete

Recommendation Systems                                          3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                                         Manos Papagelis – 28/06/’04




 Approaches
   • Content-based
        String comparison, categorization, etc.
   • Collaborative Filtering (CF) based
        User similarities based on user models, rating behavior, etc.
   • Hybrid
 Challenges
   • Sparsity
        Even active users result in rating only a fraction of items in db
   • Scalability
        Maintain accuracy while the number of users and items scales up to millions
   • Cold-start
        New and obscure items are barely recommended


                                                                                                                3
                                                                            ICS-FORTH & CSD University of Crete

Collaborative Filtering                                     3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                                     Manos Papagelis – 28/06/’04




      i1   i2   i3   …   …     …                      How similar are they?
 u1   2    -    4    5   -     6   2   5    6         - Cosine Vector Similarity
 u2   …    …    …    …   …     …                      - Spearman Correlation
                                                      - Mean-squared Difference
 u3   5    2    -    2   -     9   5   2    9
                                                      - Entropy-based Uncertainty
 …    …    …    …    …   …     …                      - Pearson Correlation Coefficient



                                                                        z
                                                      u1

                                                     u2                            u6
                                                                                            y
  Motivating illustration of                    u3
   user models based on
       rating activity                                                                               x



                                                           u5
                                           u4

                                                                                                            4
                                                                ICS-FORTH & CSD University of Crete

Similarity Measures (1/3)                       3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                         Manos Papagelis – 28/06/’04




 Distinctions
   • User-based vs. Item-based Similarity
   • Explicit Rating vs. Implicit Rating
 Definition of three matrices
   • User-Item, User-Category, Item-Category Matrices




                                                                                                5
                                                                                         ICS-FORTH & CSD University of Crete

Similarity Measures (2/3)                                                3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                                                  Manos Papagelis – 28/06/’04




 User-based Similarity derived from
   • Explicit Ratings
                                      n
                                      (ru ,i
                                     h=1
                                             x h
                                                    - rux )(ruy ,ih - ruy )
      k x,y = sim(ux ,uy ) =
                                n                           n
                                (rux ,ih - rux )
                               h=1
                                                        2
                                                             (ruy ,ih - ruy )2
                                                            h=1



   • Implicit Ratings
                                     p

                                      (r ux ,ch
                                                                     
                                                    - rux )(ruy ,ch - ruy )
      λ x,y = sim(ux ,uy ) =         h=1
                                p                            p

                                (rux ,ch - rux )
                               h=1
                                                      2
                                                              
                                                             (ruy ,ch - ruy )2
                                                            h=1

                                                                                                                         6
                                                                                                     ICS-FORTH & CSD University of Crete

Similarity Measures (3/3)                                                            3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                                                              Manos Papagelis – 28/06/’04




 Item-based Similarity derived from
    • Explicit Rating
                                     m

                                      (r      uh ,ix   - rix )(ruh ,iy - riy )
      μx,y = sim(ix ,iy ) =          h=1
                               m                                 m

                                (r
                               h=1
                                         uh ,ix   - rix )2         (r
                                                                  h=1
                                                                        uh ,iy   - riy )2



   • Item-category matrix
                                           p

                                           (v          ch ,ix   - vix )(vch ,iy - viy )
      ν x,y = sim(ix ,iy ) =              h=1
                                     p                                    p

                                  (vuh ,ix - vix )
                                    h=1
                                                                    2
                                                                          (vuh ,iy - viy )2
                                                                         h=1
                                                                                                                                     7
                                                                                        ICS-FORTH & CSD University of Crete

Prediction Algorithms (1/2)                                             3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                                                 Manos Papagelis – 28/06/’04




 User-based Prediction algorithms
   • Explicit Ratings (CFUB-ER)
                               m'

                               k     a,h   (ruh ,ia - ruh )
     CFUB-ER = pua ,ia = rua + h=1        m'

                                      k
                                      h=1
                                                     a,h



   • Explicit Ratings, Content Boosted (CFUB-ER-CB)
                                     m

                                     k        a,h   (ruh ,ia - ruh )
     CFUB-ER-CB = pua ,ia = rua + h=1           m

                                               k
                                               h=1
                                                           a,h



   • Implicit Ratings (CFUB-IR)
                               m

                              λ     a,h    (ruh ,ia - ruh )
     CFUB-IR = pua ,ia = rua + h=1    m

                                     λ          a,h
                                      h=1                                                                               8
                                                                               ICS-FORTH & CSD University of Crete

Prediction Algorithms (2/2)                                    3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                                        Manos Papagelis – 28/06/’04




 Item-based Prediction algorithms
    • Explicit Ratings (CFIB-ER)
                                n

                               μ     a,h   (rua ,ih - rua )
     CFIB-ER = pua ,ia = ria + h=1     n

                                      μ
                                      h=1
                                                a,h



   • Implicit Ratings (CFIB-IR)
                                n

                               ν     a,h   (rua ,ih - rua )
      CFIB-IR = pua ,ia = ria + h=1    n

                                      ν
                                      h=1
                                                a,h




                                                                                                               9
                                                                   ICS-FORTH & CSD University of Crete

Experimental Evaluation and Results                3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                            Manos Papagelis – 28/06/’04




 Data Set
   • 2100 ratings (range from 1 to 10), 115 users, 650 items, 20 item
     categories
   • Sparsity 0.97%
   • 300-item sample sets
 Metrics
   • Coverage
       ~99%
   • Accuracy
        Statistical Accuracy
            Mean Absolute Error (MAE)
          Decision-support Accuracy
            Receiver Operating Curve (ROC)
                                                                                                 10
                                                                        ICS-FORTH & CSD University of Crete

Mean Absolute Error (MAE)                           3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                                Manos Papagelis – 28/06/’04




          We plot MAE vs. Sparsity
                       n

                      p    h   - rh
                                                              n

                                                              p
                      h=1
             MAE =                                                  h   - rh
                            n                      MAE =      h=1

                                                                    n
Remarks
   Item-based predictions result in higher
    accuracy than user-based predictions
   Predictions based on explicit ratings
    result in higher accuracy than predictions
    based on implicit ratings
   CFIB-ER is very sensitive to sparsity levels
   CFIB-ER performs as much as 39.5% better
    than classic CF for sparsity 97.2
   Content boosted algorithms slightly
    improve accuracy but require expensive
    computations

                                                                                                     11
                                                                                                    ICS-FORTH & CSD University of Crete

Receiver Operating Curve (ROC) (1/2)                                                3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                                                             Manos Papagelis – 28/06/’04




Notation:
   PR  Predicted Rating, AR  Actual Rating, QT  Quality Threshold

4 cases defined by the prediction process for one item:
   True Positive (TP) when PR≥QT and AR≥QT
   False Positive (FP) when PR≥QT and AR<QT
   True Negative (TN) when PR<QT and AR<QT
   False Negative (FN) when PR<QT and AR≥QT

We plot sensitivity vs. 1-specificity for Quality Threshold values between 1-9 where:
                                       tp                                                                     fp
          sensitivity = TPF =                                          1- specificity = FPF =
                                     tp + fn                                                                fp + tn


  tp, fn, fp, tn is the total number of occurrences for TP, FN, FP, TN over the set of items respectively

                                                                                                                                  12
                                                            ICS-FORTH & CSD University of Crete

Receiver Operating Curve (ROC) (2/2)        3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                     Manos Papagelis – 28/06/’04




We plot sensitivity vs. 1-specificity for
Quality Threshold values between 1-9

Remarks
   Item-based predictions provide better
    accuracy than user-based predictions
   Predictions based on explicit ratings
    perform better than the ones based
    on implicit ratings
   CFIB-ER outperforms by 95%, 89%, 29%
    the classic CF for ROC-9, ROC-8,
    ROC-7.
   CFIB-ER outperforms all other
    algorithms comparing the area under
    the ROC curve



                                                                                          13
                                                                                                          ICS-FORTH & CSD University of Crete

Extensions (1/2)                                                                          3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                                                                   Manos Papagelis – 28/06/’04



 Incremental Calculation of Similarity Measures
                       n'

                        (r
                                                                                         
                             ux ,ih   - rux )(ruy ,ih - ruy )
                                                                                     B                      B+e
                                                                                                 A =
                       h=1
sim(ux ,uy ) =                                                                 A=
                 n'                           n'

                  (r
                 i=1
                        ux ,ih   - rux ) 2
                                              (r
                                              i=1
                                                    uy ,ih   - ruy )   2            C D                    C+f D+g



     • Incremental calculation of e, f, g factors after each single rating
     • Cases to be examined
          Submission of a new rating vs. Update of an old rating?
          Co-rated item or not?
     • Caching scheme to support the Incremental Collaborative Filtering (ICF)
     • ICF deals with Scalability problem (Orders of magnitude better performance)
     • ICF provides highest accuracy as it is NOT based on any Dimensionality
       Reduction technique



                                                                                                                                        14
                                                                             ICS-FORTH & CSD University of Crete

Extensions (2/2)                                             3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                                      Manos Papagelis – 28/06/’04




 Formation of Communities
   • Self-organized, virtual, online communities
   • Introduction of the Community Coherence Grade (CCG)
                                m

                                 sim(u ,u )
                                h=1
                                           a   h
                      CCGua =
                                      m

    • Sub-community detection based on filtering algorithms
    • Identification of “trust chains” to deal with sparsity in recommendation
      systems (trust propagation)




                                                                                                           15
                                                                     ICS-FORTH & CSD University of Crete

Conclusions and Discussion                           3rd   Hellenic Data Management Symposium (HDMS’04)
                                                                              Manos Papagelis – 28/06/’04




 Evaluation of user-based and item-based prediction algorithms
   • Item-based algorithms perform better
   • Category-boosted predictions slightly improve prediction
     accuracy
   • Explicit ratings are more functional than implicit ratings
 Incremental Collaborative Filtering (ICF) to deal with Scalability
 Trust Propagation to deal with Sparsity and Cold-start




                          Questions?

                                                                                                   16
                          ICS-FORTH & CSD University of Crete
          3rd   Hellenic Data Management Symposium (HDMS’04)
                                   Manos Papagelis – 28/06/’04




Thanks!




                                                        17

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:0
posted:3/23/2011
language:English
pages:17