Self-Selection Bias in Reputation Systems

Document Sample
Self-Selection Bias in Reputation Systems Powered By Docstoc
Self-Selection and
Information Role of Online
Product Reviews
Xinxin Li, Lorin Hitt
The Wharton School, University of Pennsylvania
Workshop on Information Systems and Economics (WISE
   Data Collection
   Trend in Consumer reviews
   Impact of Consumer Reviews on Book
   Theory Model and Implicatons
   Word of mouth has long been recognized as
   a major drivers of product sales.

   eBay-like online reputation systems : a large
   body of work

   product review websites : very little
   systematic research
Self-Selection Problem
   The efficacy of consumer-generated product reviews
   may be limited for at least two reasons.

     Firms may manipulate online rating services by paying
     individuals to provide high ratings.

     There are possibilities that the reported ratings are
     inconsistent with the preferences of the general population.

   Ratings of products may reflect both consumer taste
   as well as quality.
Major Research Questions
   Early adopters may have significantly
   different preferences than later adopters
   which will create trends in ratings as products

   We consider whether consumers account for
   these biases in ratings when making product
   purchase decisions.
Data Collection
   A random sample of 2651 hardback books was collected from “Books in Print”
   covering books published from 2000-2004 that also have reviews on Amazon.

   Book characteristic information
       publication date
       publication date for corresponding paperback editions
       consumer reviews

   Sales-related data (every Friday from March to July in 2004)
       sales rank
       the number of consumers reviews
       the average review
       shipping availability
Trend in Consumer Reviews
   The Box-Cox model :
      AvgRatingit : the average review for book i at time t,
      T : the time difference between the date the average review was posted
      and the date the book was released
      ui : the idiosyncratic characteristics of each individual book that keep
      constant over time.
Trend in Consumer Reviews (contd.)
Impact of Consumer Reviews on
Book Sales
   Sales rank is a log-linear function of book
   sales with a negative slope.

    Log[ SalesRanki ]   0  1 AvgRating   1 Log[ Pi ]

     2 Log[ Numof Re viewi ]   3 Log[ Pi c ]   4 Pr omotioni
     5T   6CategoryDummiesi   7 ShippingDummiesi   i
Impact of Consumer Reviews on
Book Sales (contd.)
   All estimates are significant and have the right sign.
                                                           ^    ^
          AvgRatingit  3.90  0.45  exp[0.746  T ]  i   it
                        ^    ^
          Ri  3.90  i   it
          RT  0.45  exp[0.746  T ]

   With other demand-related factors controlled for, the
   time variant component RT has a significant impact
   on book sales when consumers compare different
   books at the same time period
Theory Model and Implicatons
   An individual consumer’s preferences over the
   product can be characterized by two
   components (xi, qi).
     The element xi is known by each consumer before
     purchasing and represents the consumer’s
     preference over product characteristics that can
     be inspected before purchase
     The element qi measures the quality of the
     product for consumer I
     qe: expected quality
   Our findings suggest the significance of
   product design and early period product
Self-Selection Bias in
Reputation Systems

Mark Kramer
MITRE Corporation
   Expectation and Self-Selection
   Avoiding Bias in Reputation
   Management Systems
   Can a reputation system based on user
   ratings accurately signal the quality of a
Ratings Bias
   Reputation systems appear to be
   inherently biased towards better-than-
   average ratings
     Amazon: 3.9 out of 5
     Netflix prize data set: 3.6 out of 5 stars
Ratings Bias (contd.)

                   87% of ratings are 3 or higher
Possible Reasons for Positive Bias
   People don’t like to be critical
   People don’t understand the rating
   system or cannot calibrate themselves
   Lake Wobegon Effect: Most movies are
   better than average
     Number of ratings for quality movies far
     exceeds number of ratings of poor movies
The SpongeBob Effect
   Oscar Winners 2000-2005 : Average
   Rating 3.7 Stars
   SpongeBob DVDs : Average Rating 4.1
   If SpongeBob effect is common, then
   ratings do not accurately signal the
   quality of the resource
What is Happening Here?
   People choose movies they think they
   will like, and often they are right
     Ratings only tell us that “fans of
     SpongeBob like SpongeBob”
   Oscar winners draw a wider audience
     Rating is much more representative of the
     general population
What is Happening Here? (contd.)
   There might be a tendency to downplay
   the problem of biased ratings
     you already "know" whether or not you
     would like the SpongeBob movie
     you could look at written reviews
     one could get personalized guidance from
     a recommendation engine
Importance of Self-Selection Bias
   Bizrate 44% of consumers consult opinion sites
   before making online purchases
   High ratings are the norm, contain little information
   Written reviews also can be biased
   Discarding numerical (star) ratings would eliminate
   an important time-saver
   Consumers have no idea what “discount” to apply to
   ratings to get a true idea of quality
   No recommendation engine will ever totally replace
   browsing as a method of resource selection
Model of Self-Selection Bias
   Two groups:
      Evaluation group E
      Feedback group F where F  E
   Consider binary situation:
      E = Expect to be satisfied (T/F)
      S = Are satisfied
      R = Resource selected (and reviewed)
      P(S) = probability of satisfaction with resource in E
      P(S|R) = probability of satisfaction within F

    If   P(R|E) > P(R|~E)       Self-Selection
    And  P(S|E) > P(S|~E)       Realization of expectations
    Then P(S|R) > P(S)          Biased Rating
Utility and Self-Selection
   Some distribution of expected utility in evaluation group E
   Resource will be selected only if expected utility is positive

                               # people


                                                     Expected Utility
                                                   (Evaluation Group)

   Very high reviews can shift the expected utility curve to the right and
   increase the number of people selecting the resource
       “Swing” group has a greater chance of disappointment
Effect of Biased Rating: Example
   10 people see SpongeBob’s 4-star ratings
      3 are already SpongeBob fans, rent movie, award 5 stars
      6 already know they don’t like SpongeBob, do not see movie
      Last person doesn’t know SpongeBob, impressed by high
      ratings, rents movie, rates it 1-star

   Average rating remains unchanged: (5+5+5+1)/4 =
   4 stars
   9 of 10 consumers did not really need rating system
   Only consumer who actually used the rating system
   was misled
Paradox of Subjective Reputation
 “Accurate ratings render ratings inaccurate”

   The purpose of reputation systems is to increase
   consumer satisfaction
      Do better than random selection
      The mechanism is self-selection
   If self-selection works, ratings will become positively
      In the limit, all ratings will be 5-star ratings
   Self-Selection bias (SpongeBob Effect) distorts the
   information needed for accurate self-selection
   Rating system defeats itself
Dynamics of Ratings Paradox

   Accurate, complete prior   Inaccurate or biased prior
         information                 information
                Good self-                 Poor self-
                selection                  selection

                                Mix of happy and
     Happy consumers
                               unhappy consumers

  Positively biased ratings     Unbiased ratings
Example of Reputation Dynamics
   Resource with uniformly distributed
   satisfaction between 0 – 100
   Successive groups decide whether to
   use the resource, based on rating
   # selecting resource is proportional to
   average rating
Example of Reputation
  Fans first

 Random people first
Ideas for Bias-Resistant Reputation
   Use more demographics
      Kids like SpongeBob, most adults do not
      Self-selection is still at work within demographic subgroup
      Demographics might not create useful groups with different

   Make personalized recommendations
      Yes, but people still like to browse
      Recommendations based on biased ratings might fail
      NetFlix recommendation engine has large error

   Use written reviews
      Self-selection bias is still present
Bias-Resistant Reputation System
   Want P(S) but we collect data on P(S|R)
      S = Are satisfied with resource
      R = Resource selected (and reviewed)

   However, P(S|E,R)  P(S|E)
     Likelihood of satisfaction depends primarily on expectation
     of satisfaction, not on the selection decision
     If we can collect prior expectation, the gap between
     evaluation group and feedback group disappears
      • whether you select the resource or not doesn’t matter
Bias-Resistant Reputation System
     Before viewing:                         After viewing:
        I think I will:                          I liked this movie:
          •    Love this movie                     •   Much more than expected
          •    Like this movie                     •   More than expected
          •    It will be just OK                  •   About the same as I expected
          •    Somewhat dislike this movie         •   Less than I expected
          •    Hate this movie                     •   Much less than I expected


 Everyone else

              Big fans
   Self-selection bias exists in most cases of consumer choice

   Bias means that user ratings do not reflect the distribution of
   satisfaction in the evaluation group
      Consumers have no idea what “discount” to apply to ratings to get
      a true idea of quality

   Many current rating systems may be self-defeating
      Accurate ratings promote self-selection, which leads to inaccurate

   Collecting prior expectations may help address this problem

Shared By: