Recommender Systems

Document Sample
Recommender Systems Powered By Docstoc
					Recommender Systems
Session B
Robin Burke
DePaul University
Chicago, IL
Roadmap
   Session A: Basic Techniques I
     –   Introduction
     –   Knowledge Sources
     –   Recommendation Types
     –   Collaborative Recommendation
   Session B: Basic Techniques II
     –   Content-based Recommendation
     –   Knowledge-based Recommendation
   Session C: Domains and Implementation I
     –   Recommendation domains
     –   Example Implementation
     –   Lab I
   Session D: Evaluation I
     –   Evaluation
   Session E: Applications
     –   User Interaction
     –   Web Personalization
   Session F: Implementation II
     –   Lab II
   Session G: Hybrid Recommendation
   Session H: Robustness
   Session I: Advanced Topics
     –   Dynamics
     –   Beyond accuracy
Content-Based
Recommendation
   Collaborative recommendation
    – requires only ratings
   Content-based recommendation
    – all techniques that use properties of the items
      themselves
    – usually refers to techniques that only use item
      features
   Knowledge-based recommendation
    – a sub-type of content-based
    – in which we apply knowledge
          about items and how they satisfy user needs
Content-Based Profiling

   Suppose we have no other users
    – but we know about the features of the items
      rated by the user
   We can imagine building a profile based
    on user preferences
    – here are the kinds of things the user likes
    – here are the ones he doesn't like
   Usually called content-based
    recommendation
Recommendation   Collaborative     Opinion
  Knowledge                        Profiles

                                 Demographic
                                   Profiles


                     User          Opinions        Query

                                 Demographics    Constraints


                                 Requirements    Preferences


                    Content      Item Features    Context

                                   Domain        Means-ends
                                  Knowledge
Recommendation
Knowledge Sources                 Contextual       Feature
                                  Knowledge       Ontology
Taxonomy
                                                  Domain
                                                 Constraints
                Content-based Profiling
                               To find relevant items
                                                                           ?        item   a1     a2      a3 a4   ...         ak
Recommend                                                                      ?      item  a1      a2     a3 a4    ...        ak
                                                                                   ? item     a1     a2     a3 a4     ...        ak
                                                                                    ? item     a1      a2    a3 a4      ...       ak
                                                                                      ? item     a1     a2    a3 a4       ...       ak


                                                                                          Obtain rated items
                                                                                                     item   a    a    a     a     ...   ak
                                                Build classifier                        item        a1    a2 1 a3 2 a4 3 ... 4   ak

                                                                                                item
                                                                                                       a     a     a      a     ...     ak
                                                                                        item  a1    a2 1 a3 2 a4 3 ... 4 ak

                Predict              Classifier                                                item    a1    a2    a3     a4    ...     ak
                                                                                        item  a1    a2    a3    a4    ...    ak
                               Y                        N                                      item    a1    a2    a3 a4      ...       ak
                                                                                        item  a1    a2    a3    a4    ...    ak


   item   a1   a2   a3   a4   ...    ak        item       a1   a2   a3            a4     ...        ak

                                                item       a1   a2   a3            a4     ...        ak


                                                item       a1   a2   a3            a4     ...        ak
Origins

   Began with earliest forms of user
    models
    – Grundy (Rich, 1979)
   Elaborated in information filtering
    – Selecting news articles (Dumais, 1990)
   More recently spam filtering
Basic Idea

   Record user ratings for item
   Generate a model of user preferences
    over features
   Give as recommendations other items
    with similar content
Movie Recommendation
   Predictions for unseen (target) items are
    computed based on their similarity (in terms of
    content) to items in the user profile.
   E.g., user profile Pu contains




    recommend highly:     and recommend “mildly”:
Content-Based
Recommender
Systems
Personalized Search

   How can the search
    engine determine the           ?
    “user’s context”?

Query: “Madonna and Child”
                                   ?


   Need to “learn” the user profile:
     – User is an art historian?
     – User is a pop music fan?
Play List Generation
   Music recommendations
   Configuration problem
    – Must take into account other items already in
      list




               Example: Pandora
Algorithms

   kNN
   Naive Bayes
   Neural networks
   Any classification technique can be
    used
Naive Bayes
   p(A) = probability of event A
   p(A,B) = probability of event A and event B
     –   joint probability
   p(A|B) = probability of event A given event B
     –   we know B happened
     –   conditional probability
   Example
     –   A is a student getting an "A" grade
              p(A) = 20%
     –   B is the event of a student coming to less than 50% of meetings
              p(A|B) is much less than 20%
              p(A,B) would be the probability of both things
                  –   how many students are in this category?
   Recommender system question
     –   Li is the event that the user likes item i
     –   B is the set of features associated with item i
              Estimate p(Li|B)
Bayes Rule
   p(A|B) = p(B|A) p(A) / p(B)
   We can always restate a conditional probability in
    terms of
    – the reverse condition p(B|A)
    – and two prior probabilities
           p(A)
           p(B)
   Often the reverse condition is easier to know
    – we can count how often a feature appears in items the
      user liked
    – frequentist assumption
Naive Bayes

   Probability of liking an item given its
    features
    – p(L |a1, a2, ... , ak)
         i


    – think of Li as the class for item i
   By the theorem
Naive Assumption
   Independence
     – the features a1, a2, ... , ak are independent
     – independent means
             p(A,B) = p(A)p(B)
   Example
     – two coin flips P(heads) = 0.5
     – P(heads,heads) = 0.25
   Anti-example
     – appearance of the word "Recommendation" and "Collaborative" in papers
       by Robin Burke
             P("Recommendation") = 0.6
             P("Collaborative") = 0.3
             P("Recommendation","Collaborative")=0.3 not 0.18
   In general
     – this assumption is false for items and their features
     – but pretending it is true works well
Naive Assumption

   For joint probability

   For conditional probability

   Bayes' Rule
Frequency Table

   Iterate through all
    examples                                       L   ~L
    – if example is "liked"                  a1

          for each feature a
            – add one to the cell for that   a2

              feature under L
                                             ...
    – similar for ~L
                                             ak
Example

   Total # of movies 20
    – 10 liked
    – 10 not liked
Classification MAP
   Maximum a posteriori
    – Calculate the probabilities for each possible classification
    – pick the one with the highest probability
   Examples
    – "12 Monkeys" = Pitt && Willis
           p(L|12 Monkeys)=0.13
           p(~L|12 Monkeys)=1
           not liked
    – "Devil's Own" = Ford && Pitt
           p(L|Devil's Own)=0.67
           p(~L|Devil's Own)=0.53
           liked
Classification LL
   Log likelihood
     –   For two possibilities
     –   Calculate probabilities
     –   Compute ln(p(L |a1, ... , ak)/p(~L |a1, ... , ak)
                              i              i


     –   If > 0, then classify as liked
   Examples
     – "12 Monkeys" = Pitt && Willis
              ratio = 0.13
              ln = -2.1
              not liked
     – "Devil's Own" = Ford && Pitt
              p(L|Devil's Own)=0.67
              p(~L|Devil's Own)=0.53
              ratio = 1.25
              ln = 0.22
              liked
Smoothing
   If a feature never appears in a class
    – p(aj|L)=0
    – that means that it will always veto the classification
   Example
    – new movie director
    – cannot be classified as "liked"
           because there are no liked instances in which he is a feature
   Solution
    – Laplace smoothing
           add a small random value to all attributes before starting
Naive Bayes
   Works surprisingly well
    – used in spam filtering
   Simple implementation
    – just counting and multiplying
    – requires O(F) space
           where F is the feature set used
    – easy to update the profile
    – classification is very fast
   Learned classifier can be hard-coded
    – used in voice recognition and computer games
   Try this first
Neural Networks
Biological inspiration

           axon                  dendrites




        synapses



The information transmission happens at the synapses.
How it works
   Source (pre-synaptic)
    – Tiny voltage spikes travel along the axon
    – At dendrites, neurotransmitter released in the
      synapse
   Destination (post-synaptic)
    – Neurotransmiter absorbed by dendrites
    – Causes excitation or inhibition
    – Signals integrated
          may produce spikes in the next neuron
   Connections
    – Synaptic connections can be strong or weak
         Artificial neurons
         Neurons work by processing information. They receive and
         provide information in form of voltage spikes.
          x1
          x2                   w1
                                           n                        Output
          x3              w2          z   wi xi ; y  H ( z )
Inputs




                                          i 1                          y
          …        w3
                 ..
          xn-1        .   wn-1

          xn                     wn
                                        The McCullogh-Pitts model
Artificial neurons
Nonlinear generalization of the McCullogh-Pitts
neuron:



y is the neuron’s output, x is the vector of inputs, and w
is the vector of synaptic weights.
Examples:
                                     sigmoidal neuron


                                     Gaussian neuron
 Artificial neural networks


                                                     Output
Inputs




   An artificial neural network is composed of many artificial
   neurons that are linked together according to a specific
   network architecture. The objective of the neural network
   is to transform the inputs into meaningful outputs.
Learning with Back-
Propagation
   Biological system
    – seems to modify many synaptic connections
      simultaneously
    – we still don't totally understand this
   A simplification of the learning problem:
    – calculate first the changes for the synaptic weights of the
      output neuron
    – calculate the changes backward starting from layer p-1,
      and propagate backward the local error terms
   Still relatively complicated
    – much simpler than the original optimization problem
Application to
Recommender Systems
   Inputs
    – features of products
    – binary features work best
          otherwise tricky encoding is required
   Output
    – liked / disliked neurons
    NN Recommender
Item Features




                                                              Liked



                                                              Disliked
                                         …
                      …
               Calculate recommendation score as yliked - ydisliked
Issues with ANN
   Often many iterations are needed
     – 1000s or even millions
   Overfitting can be a serious problem
   No way to diagnose or debug the network
     – must relearn
   Designing the network is an art
     – input and output coding
     – layering
     – often learning simply fails
            system never converges
   Stability vs plasticity
     – Learning is usually one-shot
     – Cannot easily restart learning with new data
     – (Actually many learning techniques have this problem)
Overfitting

   The problem of
    training a learner
    too much
    – the learner
      continues to
      improve on the
      training data
    – but gets worse
      on the real task
Other classification
techniques
   Lots of other classification techniques have
    been applied to this problem
    – support vector machines
    – fuzzy sets
    – decision trees
   Essentials are the same
    – learn a decision rule over the item features
    – apply the rule to new items
Content-Based
Recommendation
   Advantages:
    – useful for large information-based sites (e.g.,
      portals) or for domains where items have
      content-rich features
    – can be easily integrated with “content servers”
   Disadvantages
    – may miss important pragmatic relationships
      among items (based on usage)
          avante-garde jazz / classical
    – not effective in small-specific sites or sites which
      are not content-oriented
    – cannot achieve serendipity – novel connections
Break

   10 minutes
Roadmap
   Session A: Basic Techniques I
     –   Introduction
     –   Knowledge Sources
     –   Recommendation Types
     –   Collaborative Recommendation
   Session B: Basic Techniques II
     –   Content-based Recommendation
     –   Knowledge-based Recommendation
   Session C: Domains and Implementation I
     –   Recommendation domains
     –   Example Implementation
     –   Lab I
   Session D: Evaluation I
     –   Evaluation
   Session E: Applications
     –   User Interaction
     –   Web Personalization
   Session F: Implementation II
     –   Lab II
   Session G: Hybrid Recommendation
   Session H: Robustness
   Session I: Advanced Topics
     –   Dynamics
     –   Beyond accuracy
Knowledge-Based
Recommendation
   Sub-type of content-based
    – we use the features of the items
   Covers other kinds of knowledge, too
    – means-ends knowledge
          how products satisfy user needs
    – ontological knowledge
          what counts as similar in the product domain
    – constraints
          what is possible in the domain and why
Recommendation   Collaborative     Opinion
  Knowledge                        Profiles

                                 Demographic
                                   Profiles


                     User          Opinions        Query

                                 Demographics    Constraints


                                 Requirements    Preferences


                    Content      Item Features    Context

                                   Domain        Means-ends
                                  Knowledge
Recommendation
Knowledge Sources                 Contextual       Feature
                                  Knowledge       Ontology
Taxonomy
                                                  Domain
                                                 Constraints
Diverse Possibilities
   Utility
     – some systems concentrate on representing the user's
       constraints in the form utility functions
   Similarity
     – some systems focus on detailed knowledge-based
       similarity calculations
   Interactivity
     – some systems use knowledge to enhance the collection of
       requirement information
   For our purposes
     – concentrate on case-based recommendation and
       constraint-based recommendation
Case-Based
Recommendation
   Based on ideas from case-based
    reasoning (CBR)
    – An alternative to rule-based problem-
      solving
   “A case-based reasoner solves new
    problems by adapting solutions used
    to solve old problems”
                 -- Riesbeck & Schank 1987
CBR Solving Problems
                                              Solution
             Retain              Review



 Database                             Adapt



                      Retrieve
                      Similar
   New
   Problem
CBR System Components
   Case-base
    – database of previous cases (experience)
    – episodic memory
   Retrieval of relevant cases
    – index for cases in library
    – matching most similar case(s)
    – retrieving the solution(s) from these case(s)
   Adaptation of solution
    – alter the retrieved solution(s) to reflect
      differences between new case and retrieved
      case(s)
Retrieval knowledge

   Contents
    – features used to index cases
    – relative importance of features
    – what counts as “similar”
   Issues
    – “surface” vs “deep” similarity
Analogy to the catalog

   Problem
    – user need
   Case
    – product
   Retrieval
    – recommendation
Entree I
Entree II
Entree III
Critiquing Dialog

   Mixed-initiative interaction
    – user offers input
    – system responds with possibilities
    – user critiques or offers additional input
   Makes preference elicitation gradual
    – rather than all-at-once with a query
    – can guide user away from “empty” parts
      of the product space
CBR retrieval

   Knowledge-based nearest-neighbor
    – similarity metric defines distance between cases
    – usually on an attribute-by-attribute basis
   Entree
    –   cuisine
    –   quality
    –   price
    –   atmosphere
How do we measure
similarity?
   complex multi-level comparison
   goal sensitive
    – multiple goals
   retrieval strategies
    – non-similarity relationships
   Can be strictly numeric
    – weighted sum of similarities of features
    – “local similarities”
   May involve inference
    – reasoning about the similarity of items
Price metric

                          Query
                          value
     Similarity




                  Price
Cuisine Metric

                                             European

          Asian
                                    French

Chinese
                  Japanese       Nouvelle
                                 Cuisine

   Vietnamese      Thai
                               Pacific
                             New Wave
Metrics

   Goal-specific comparison
    – How similar is target product to the
      source with respect to this goal?
   Asymmetric
    – directional effects
   A small # of general purpose types
Metrics
   If they generate a true metric space
    – approaches using space-partitioning techniques
          bsp, quad-trees, etc.
   Not always the case
   Hard to optimize
    – storing n2 distances/recalculating
    – FindMe calculates similarity at retrieval time
Combining metrics
   Global metric
    – combination of attribute metrics
   Hierarchical combination
    – lower metrics break ties in upper
   Benefits
    – simple to acquire
    – easy to understand
   Somewhat inflexible
    – More typical would be a weighted sum
Constraint-based
Recommendation
   Represent user’s needs as a set of
    constraints
   Try to satisfy those constraints with
    products
Example

   User needs a car
    – Gas mileage > 25 mpg
    – Capacity >= 5 people
    – Price < 18,000
   A solution would be a list of models
    satisfying these requirements
Configurable Products
   Constraints important where products are
    configurable
    –   computers
    –   travel packages
    –   business services
    –   (cars)
   The relationships between configurable
    components need to be expressed as
    constraints anyway
    – a GT 6800 graphics card needs power supply >=
      300 W
Product Space



                              Weight < x
   Weight




                                           Possible
                                           Recommendations



            Screen > y
                         Screen Size
Utility

   In order to rank products
    – we need a measure of utility
    – can be “slack”
          how much the product exceeds the constraints
    – can be another measure
          price is typical
    – can be a utility calculation that is a function
      of product attributes
          but generally this is user-specific
             – value of weight vs screen size
Product Space



                              Weight < x
   Weight




                                       C
                              A    B




            Screen > y
                         Screen Size
Utility
   SlackA = (X – WeightA) + (SizeA - Y)
    – not really commensurate
   PriceA
    – ignores product differences
   UtilityA =  (X – WeightA) +  (SizeA - Y ) + 
    (X – WeightA) (SizeA - Y )
    – usually we ignore  and treat utilities as
      independent
    – how do we know what  and  are?
          make assumptions
          infer from user behavior
Knowledge-Based
Recommendation
   Hard to generalize
   Advantages
    – no cold start issues
    – great precision possible
          very important in some domains
   Disadvantages
    – knowledge engineering required
          can be substantial
    – expert opinion may not match user preferences
Next

   Session C
   15:00
   Need laptops
   Install workspace

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:2/2/2013
language:Unknown
pages:67