Slide 1 - Emory University

Shared by: zhouwenjuan
Categories
Tags
-
Stats
views:
1
posted:
7/3/2012
language:
pages:
76
Document Sample
scope of work template
							Modeling User Interactions
    in Social Media
      Eugene Agichtein
        Emory University
                    Outline
•   Online information access landscape
•   User-generated content
•   Modeling, searching and mining social media
•   Open problems
3
                        Social Media Today
Published:
   4Gb/day
Social Media:
   10Gb/Day
Page views:
   180-200Gb/day

Technorati:
   112M blogs
   1.6M posts/day
Blogpulse:
   78M blogs
   750K posts/day
Twitter: since 11/07:
   ~2M users
   ~3M msgs/day

Facebook/Myspace:
   200-300M users
   Average 19 min/day

Yahoo Answers:
   90M active users, ~20M questions, ~200M answers
                        [From Andrew Tomkins/Yahoo!, SSM2008 Keynote]
 Trends in search and social media
• Search in the East:
   – Heavily influenced by social media: Naver, Baidu Knows, TaskCn, ..


• Search in the West:
   – Content licensing industry, but typically around traditional media
   – Social media mostly crawlable, integrated in search repositories


• Two opposite trends in social media search:
   – Moving towards point relevance (answers, knowledge search)
   – Moving towards browsing experience, subscription/push model


• How to integrate “active” engagement and contribution with
  “passive” search/browse?
Where is the nearest car rental
to Carnegie Mellon University?
7
8
       Winning Search Strategy
• Lookup CMU address/zipcode
• Google maps 
• Query: “car rental near:5000 Forbes Avenue
  Pittsburgh, PA 15213”




                                               9
Total time: 7-10 minutes,
active “work”




                    10
Someone must have done this
        before…
+0 minutes : 11pm
             12
13
14
    15
+1 minute
        16
+36 minutes
+7 hours:
perfect answer
      Why would one wait hours?
•   Effective use of time
•   Unique information need
•   Subjective/normative question
•   Complex
•   Human contact/community
•   Multiple viewpoints
http://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdO




                                                                           19
Challenges in ____ing Social Media
•   Estimating contributor expertise
•   Estimating content quality
•   Infering user intent
•   Predicting satisfaction: general, personalized
•   Matching askers with answerers
•   Searching archives
•   Detecting spam

                                                     20
                   Work done in collaboration with:




         Abulimiti Aji   Qi Guo          Yandong Liu        Pawel Jurczyk




                          Jiang Bian   Prof. Hongyuan Zha



     Yahoo! Research: ChaTo Castillo, Gilad Mishne,
                      Aris Gionis, Debora Donato, Ravi Kumar

Thanks to:
                                                                            21
             Estimating Contributor Authority
             P. Jurczyk and E. Agichtein, Discovering Authorities in Question Answer
             Communities Using Link Analysis (poster), CIKM 2007
                      Answer 1        User 3                                  User 3
     Question 1
                                                  User 1
User 1                                                                        User 4
                                      User 4
                      Answer 2
                                                                              User 5
         Question 2   Answer 3        User 5
User 2                                            User 2                      User 6
                        Answer 4      User 6

  Question 3
                        Answer 5

                        Answer 6


  A( j )         H (i)
               i  0.. M

  H (i)          A( j)
                j  0.. K
                                    Hub (asker)       Authority (answerer)       22
Finding Authorities: Results




                               23
        Qualitative Observations
HITS effective 




                       HITS ineffective
                                           24
Trolls




         25
Estimating Content Quality
    E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne,
    Finding High Quality Content in Social Media, WSDM 2008




                                                             26
27
27
28
28
29
29
30
30
31
31
Community




            32
            32
33
33
34
34
35
35
36
36
Editorial Quality != Popularity != Usefulness




                                                37
                                                37
                                 Yahoo! Answers: Time to Fulfillment
                        40
                        35
                        30
                        25
Time to close (hours)




                        20
                        15
                        10
                        5
                        0
                             1         2       3        4        5         6      7        8        9   10
                                      Time to close a question (hours) for sample question categories

                             1.   2006 FIFA World Cup                6. Medicine
                             2.   Optical                            7. Winter Sports
                             3.   Poetry                             8. Special Education
                             4.   Football (American)                9. General Health Care
                             5.   Scottish Football (Soccer)         10. Outdoor Recreation             38
                             Predicting Asker Satisfaction
                             Y. Liu, J. Bian, and E. Agichtein, Predicting Information
                             Seeker Satisfaction in Community Question Answering,
                             in SIGIR 2008
Yandong Liu     Jiang Bian



          Given a question submitted by an asker in CQA,
          predict whether the user will be satisfied with the
          answers contributed by the community.

              – “Satisfied” :
                  • The asker has closed the question AND
                  • Selected the best answer AND
                  • Rated best answer >= 3 “stars”
              – Else, “Unsatisfied
                                                                                    39
                   Motivation
•   Save time: don’t bother to post
•   Suggest a good forum for information need
•   Notify user when satisfactory answer contributed
•   From “relevance” to information need fulfillment
•   Explicit ratings from asker & community




                                                 40
      ASP: Asker Satisfaction Prediction
           Answer             Answerer History               Text
Question
                    Asker History           Category



                                                 Wikipedia

                            Classifier
                                                          News



              asker is                     asker is not
              satisfied                     satisfied




                                                                    41
                         Datasets
Crawled from Yahoo! Answers in early 2008 (Thanks, Yahoo!)



  Questions Answers Askers Categories % Satisfied
   216,170 1,963,615 158,515  100      50.7%


       Available at
       http://ir.mathcs.emory.edu/shared




                                                             42
                Dataset Statistics
Category        #Q     #A      #A per Q    Satisfied   Avg asker rating   Time to close by
                                                                          asker

2006 FIFA       1194   35659      329.86       55.4%         2.63           47 minutes
World Cup(TM)

Mental Health   151    1159        7.68        70.9%         4.30          1 day and 13
                                                                              hours
Mathematics     651    2329        3.58        44.5%         4.48           33 minutes


Diet &          450    2436        5.41        68.4%         4.30             1.5 days
Fitness


 Asker satisfaction varies by category
 #Q, #A, Time to close… -> Asker Satisfaction


                                                                                        43
  Satisfaction Prediction: Human Judges

• Truth: asker’s rating
• A random sample of 130 questions
• Researchers
  – Agreement: 0.82
  – F1: 0.45
• Amazon’s Mechanical Turk
  – Five workers per question.
  – Agreement: 0.9 F1: 0.61.
  – Best when at least 4 out of 5 raters agree

                                                 44
                   ASP vs. Humans (F1)

     Classifier      With Text   Without Text      Selected
                                                   Features
ASP_SVM                0.69         0.72            0.62
ASP_C4.5               0.75         0.76            0.77
ASP_RandomForest       0.70         0.74            0.68
ASP_Boosting           0.67         0.67            0.67
ASP_NB                 0.61         0.65            0.58
Best Human Perf        0.61
Baseline (naïve)       0.66

ASP is significantly more effective than humans
Human F1 is lower than the naïve baseline!
                                                              45
            Features by Information Gain
•   0.14219 Q: Askers’ previous rating
•   0.13965 Q: Average past rating by asker
•   0.10237 UH: Member since (interval)
•   0.04878 UH: Average # answers for by past Q
•   0.04878 UH: Previous Q resolved for the asker
•   0.04381 CA: Average rating for the category
•   0.04306 UH: Total number of answers received
•   0.03274 CA: Average voter rating
•   0.03159 Q: Question posting time
•   0.02840 CA: Average # answers per Q


                                                    46
   “Offline” vs. “Online” Prediction
• Offline prediction:
  – All features( question, answer, asker & category)
  – F1: 0.77
• Online prediction:
  – NO answer features
  – Only asker history and question features (stars,
    #comments, sum of votes…)
  – F1: 0.74


                                                        47
                       Feature Ablation
                              Precision   Recall    F1
Selected features             0.80        0.73      0.77
No question-answer features   0.76        0.74      0.75
No answerer features          0.76        0.75      0.75
No category features          0.75        0.76      0.75

No asker features             0.72        0.69      0.71
No question features          0.68        0.72      0.70

Asker & Question features are most important.
Answer quality/Answerer expertise/Category characteristics:
    may not be important
    caring or supportive answers often preferred
                                                           48
   Satisfaction: varying by asker experience




Group together questions from askers with the same
number of previous questions
Accuracy of prediction increase dramatically
Reaching F1 of 0.9 for askers with >= 5 questions    49
    Personalized Prediction of Asker
         Satisfaction with info
• Same information != same usefulness for different users!

• Personalized classifier achieves surprisingly good
  accuracy (even with just 1 previous question!)

• Simple strategy of grouping users by number of previous
  questions is more effective than other methods for users
  with moderate amount of history

• For users with >= 20 questions, textual features are
  more significant


                                                         50
Grouping Users by “Age”




                          51
Some Personalized Models




                           52
                         Summary
• Asker satisfaction is predictable
   – Can achieve higher than human accuracy by
     exploiting interaction history
• User’s experience is important
• General model: one-size-fits-all
   – 2000 questions for training model are enough
• Personalized satisfaction prediction:
   – Helps with sufficient data (>= 1 prev interactions, can
     observe text patterns with >=20 prev. interactions)


                                                           53
           Subjectivity in CQA
           B. Li, Y. Liu, and E. Agichtein, CoCQA: Co-Training Over Questions
           and Answers with an Application to Predicting Question
           Subjectivity Orientation, in EMNLP 2008

• How can we exploit structure of CQA to
  improve question classification?

• Case Study: Question Subjectivity Prediction
   – Subjective: Has anyone got one of those
     home blood pressure monitors? and if so
     what make is it and do you think they are
     worth getting?
   – Objective: What is the difference between
     chemotherapy and radiation treatments?
                                                                      54
Dataset Statistics (~1000 questions)
               http://ir.mathcs.emory.edu/shared/
                                   Arts
                                                        Science
 Education
                             30%

                                                      48%
36%
                                          70%                        52%

           64%

                                                            Sports
                      34%
      Health         Objective        Subjective
                                                       21%

36%
                                                66%
               64%                                                   79%

                                                                       55
               Key Observations
• Analysis of real questions in CQA is challenging:
   – Typically complex and subjective
   – Can be ill-phrased and vague
   – Not enough annotated data


• Idea:
   – Can we utilize the inherent structure of the CQA
     interactions, and use unlabeled CQA data to
     improve classification performance?

                                                        56
    Natural Approach: Co-Training

• Introduced in:
  – Combining labeled and unlabeled data with co-
    training, Blum and Mitchell, 1998
• Two views of the data
  – E.g.: content and hyperlinks in web pages
• Provide complementary information
• Iteratively construct additional labeled data


                                                    57
Questions and Answers: Two Views
• Example:
  – Q: Has anyone got one of those home blood
    pressure monitors? and if so what make is it and
    do you think they are worth getting?
  – A: My mom has one as she is diabetic so its
    important for her to monitor it she finds it useful.
• Answers usually match/fit question
  – My mom… she finds…
• Askers can usually identify matching answers
  by selecting the “best answer”
                                                       58
CoCQA: A Co-Training Framework over Questions and
                     Answers

               Q          CQ           Q
  Labeled                                    Unlabeled
    Data                                        Data
                                           Unlabeled Data
               A          CA           A    ??????????
                                            ??????????
                                            ??????????
                                            ??????????

                          Classify
                                               +--++--
                                               --++--+
                      Validation
            Stop
                   (Holdout training
                         data)


                                                         59
Semi Supervised Learning: Adding unlabeled data


                                    Question+
     Features        Question
                                   Best Answer
     Method
      Supervised      0.717           0.695

         GE        0.712 (-0.7%)   0.717 (+3.2%)

       CoCQA       0.731 (+1.9%)   0.745 (+7.2%)




                                                   60
          60
CoCQA for varying amount of labeled data
      0.72

       0.7

      0.68

      0.66

      0.64

      0.62
 F1




       0.6

      0.58
                           CoCQA (Question + Best Answer)
      0.56
                           Supervised Q_Best Ans
      0.54

      0.52
             50    100   150   200   250    300    350   400
                         # of labeled data used

                                                               61
              61
                Summary
• User-generated Content
  – Growing
  – Important: impact on main-stream media,
    scholarly publishing, …
  – Can provide insight into information seeking and
    social processes
  – “Training” data for IR, machine learning, NLP, ….
  – Need to re-think quality, impact, usefulness

                                                   62
              Current work
• Intelligently route a question to ``good’’
  answerers
• Improve web search ranking by incorporating
  CQA data
• ``Cost’’ models for CQA-based question
  processing vs. other methods
• Rating dynamics
• Discourse analysis
• Cross-cultural comparisons                    63
                   Takeaways
• People specify their information need fully when
  they know humans are on the other end

• Next generation of search must be able to cope with
  complex, subjective, and personal information needs

• To move beyond relevance, must be able to model
  user satisfaction

• CQA provides insights into community “health”,
  growth, engagement, quality of experience.

                                                     64
                 Thank you!
     http://www.mathcs.emory.edu/~eugene

•   Estimating contributor expertise [CIKM 2007]
•   Estimating content quality [WSDM 2008]
•   Inferring asker intent [EMNLP 2008]
•   Predicting satisfaction [SIGIR 2008, ACL 2008]
•   Matching askers with answerers
•   Searching CQA archives [WWW 2008]
•   Detecting spam [AIRWeb 2008]
Backup Slides
Question-Answer Features
                 Q: length, posting
                       time…




                     Q:Terms




                  QA: length, KL
                   divergence




                      Q:Votes


           67
                                   67
User Features

                U: Member since




                U: Total points




                 U: #Questions




                  U: #Answers
                             68
                       Category Features
                                                    • CA: Average time to close a
                                                      question
                                                    • CA: Average # answers per
                                                      question
                                                    • CA: Average asker rating
                                                    • CA: Average voter rating
                                                    • CA: Average # questions per
                                                      hour
                                                    • CA: Average # answers per
                                                      hour



Category         #Q    #A    #A per Q   Satisfied     Avg asker rating   Time to close by asker

General Health   134   737   5.46       70.4%
                                            69        4.49               1 day and 13 hours
                                                                                         69
Backup slides
Self-Selection: First Experience Crucial




Days as member vs. rating




                                                          71
                            # prev questions vs. rating
      Prediction Methods
• Heuristic: # answers
• Baseline: guess the majority class (satisfied).
• ASP: (our system)
  • ASP_SVM: Our system with the SVM classifier
  • ASP_C4.5: with the C4.5 classifier
  • ASP_RandomForest: with the RandomForest classifier
  • ASP_Boosting: with the AdaBoost algorithm combining
    weak learners
  • ASP_NaiveBayes: with the Naive Bayes classifier
  • …


                            72
                                                      72
Satisfaction Prediction: Human Perf (Cont’d): Amazon
                   Mechanical Turk
  • Methodology
     – Used the same 130 questions
     – For each question, list the best answer, as well as
       other four answers ordered by votes
     – Five independent raters for each question.
     – Agreement: 0.9 F1: 0.61.
     – Best accuracy achieved when at least 4 out of 5
       raters predicted asker to be ‘satisfied’ (otherwise,
       labeled as “unsatisfied”).

                                 73
                                                              73
Some
Results




          74
          74
 Details of CoCQA implementation

• Base classifier
  – LibSVM
• Term Frequency as Term Weight
  – Also tried Binary, TF*IDF
• Select top K examples with highest
  confidence
  – Margin value in SVM


                                       75
      75
                    Feature Set
• Character 3-grams
   – has, any, nyo, yon, one…
• Words
   – Has, anyone, got, mom, she, finds…
• Word with Character 3-grams
• Word n-grams (n<=3, i.e. Wi, WiWi+1,
     WiWi+1Wi+2)
   – Has anyone got, anyone got one, she finds it…
• Word and POS n-gram (n<=3, i.e. Wi, WiWi+1, Wi
  POSi+1, POSiWi+1 , POSiPOSi+1, etc.)
   – NP VBP, She PRP, VBP finds…
                                                     76
       76

						
Related docs
Other docs by zhouwenjuan