Slide 1 - Emory University
Shared by: zhouwenjuan
-
Stats
- views:
- 1
- posted:
- 7/3/2012
- language:
- pages:
- 76
Document Sample


Modeling User Interactions
in Social Media
Eugene Agichtein
Emory University
Outline
• Online information access landscape
• User-generated content
• Modeling, searching and mining social media
• Open problems
3
Social Media Today
Published:
4Gb/day
Social Media:
10Gb/Day
Page views:
180-200Gb/day
Technorati:
112M blogs
1.6M posts/day
Blogpulse:
78M blogs
750K posts/day
Twitter: since 11/07:
~2M users
~3M msgs/day
Facebook/Myspace:
200-300M users
Average 19 min/day
Yahoo Answers:
90M active users, ~20M questions, ~200M answers
[From Andrew Tomkins/Yahoo!, SSM2008 Keynote]
Trends in search and social media
• Search in the East:
– Heavily influenced by social media: Naver, Baidu Knows, TaskCn, ..
• Search in the West:
– Content licensing industry, but typically around traditional media
– Social media mostly crawlable, integrated in search repositories
• Two opposite trends in social media search:
– Moving towards point relevance (answers, knowledge search)
– Moving towards browsing experience, subscription/push model
• How to integrate “active” engagement and contribution with
“passive” search/browse?
Where is the nearest car rental
to Carnegie Mellon University?
7
8
Winning Search Strategy
• Lookup CMU address/zipcode
• Google maps
• Query: “car rental near:5000 Forbes Avenue
Pittsburgh, PA 15213”
9
Total time: 7-10 minutes,
active “work”
10
Someone must have done this
before…
+0 minutes : 11pm
12
13
14
15
+1 minute
16
+36 minutes
+7 hours:
perfect answer
Why would one wait hours?
• Effective use of time
• Unique information need
• Subjective/normative question
• Complex
• Human contact/community
• Multiple viewpoints
http://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdO
19
Challenges in ____ing Social Media
• Estimating contributor expertise
• Estimating content quality
• Infering user intent
• Predicting satisfaction: general, personalized
• Matching askers with answerers
• Searching archives
• Detecting spam
20
Work done in collaboration with:
Abulimiti Aji Qi Guo Yandong Liu Pawel Jurczyk
Jiang Bian Prof. Hongyuan Zha
Yahoo! Research: ChaTo Castillo, Gilad Mishne,
Aris Gionis, Debora Donato, Ravi Kumar
Thanks to:
21
Estimating Contributor Authority
P. Jurczyk and E. Agichtein, Discovering Authorities in Question Answer
Communities Using Link Analysis (poster), CIKM 2007
Answer 1 User 3 User 3
Question 1
User 1
User 1 User 4
User 4
Answer 2
User 5
Question 2 Answer 3 User 5
User 2 User 2 User 6
Answer 4 User 6
Question 3
Answer 5
Answer 6
A( j ) H (i)
i 0.. M
H (i) A( j)
j 0.. K
Hub (asker) Authority (answerer) 22
Finding Authorities: Results
23
Qualitative Observations
HITS effective
HITS ineffective
24
Trolls
25
Estimating Content Quality
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne,
Finding High Quality Content in Social Media, WSDM 2008
26
27
27
28
28
29
29
30
30
31
31
Community
32
32
33
33
34
34
35
35
36
36
Editorial Quality != Popularity != Usefulness
37
37
Yahoo! Answers: Time to Fulfillment
40
35
30
25
Time to close (hours)
20
15
10
5
0
1 2 3 4 5 6 7 8 9 10
Time to close a question (hours) for sample question categories
1. 2006 FIFA World Cup 6. Medicine
2. Optical 7. Winter Sports
3. Poetry 8. Special Education
4. Football (American) 9. General Health Care
5. Scottish Football (Soccer) 10. Outdoor Recreation 38
Predicting Asker Satisfaction
Y. Liu, J. Bian, and E. Agichtein, Predicting Information
Seeker Satisfaction in Community Question Answering,
in SIGIR 2008
Yandong Liu Jiang Bian
Given a question submitted by an asker in CQA,
predict whether the user will be satisfied with the
answers contributed by the community.
– “Satisfied” :
• The asker has closed the question AND
• Selected the best answer AND
• Rated best answer >= 3 “stars”
– Else, “Unsatisfied
39
Motivation
• Save time: don’t bother to post
• Suggest a good forum for information need
• Notify user when satisfactory answer contributed
• From “relevance” to information need fulfillment
• Explicit ratings from asker & community
40
ASP: Asker Satisfaction Prediction
Answer Answerer History Text
Question
Asker History Category
Wikipedia
Classifier
News
asker is asker is not
satisfied satisfied
41
Datasets
Crawled from Yahoo! Answers in early 2008 (Thanks, Yahoo!)
Questions Answers Askers Categories % Satisfied
216,170 1,963,615 158,515 100 50.7%
Available at
http://ir.mathcs.emory.edu/shared
42
Dataset Statistics
Category #Q #A #A per Q Satisfied Avg asker rating Time to close by
asker
2006 FIFA 1194 35659 329.86 55.4% 2.63 47 minutes
World Cup(TM)
Mental Health 151 1159 7.68 70.9% 4.30 1 day and 13
hours
Mathematics 651 2329 3.58 44.5% 4.48 33 minutes
Diet & 450 2436 5.41 68.4% 4.30 1.5 days
Fitness
Asker satisfaction varies by category
#Q, #A, Time to close… -> Asker Satisfaction
43
Satisfaction Prediction: Human Judges
• Truth: asker’s rating
• A random sample of 130 questions
• Researchers
– Agreement: 0.82
– F1: 0.45
• Amazon’s Mechanical Turk
– Five workers per question.
– Agreement: 0.9 F1: 0.61.
– Best when at least 4 out of 5 raters agree
44
ASP vs. Humans (F1)
Classifier With Text Without Text Selected
Features
ASP_SVM 0.69 0.72 0.62
ASP_C4.5 0.75 0.76 0.77
ASP_RandomForest 0.70 0.74 0.68
ASP_Boosting 0.67 0.67 0.67
ASP_NB 0.61 0.65 0.58
Best Human Perf 0.61
Baseline (naïve) 0.66
ASP is significantly more effective than humans
Human F1 is lower than the naïve baseline!
45
Features by Information Gain
• 0.14219 Q: Askers’ previous rating
• 0.13965 Q: Average past rating by asker
• 0.10237 UH: Member since (interval)
• 0.04878 UH: Average # answers for by past Q
• 0.04878 UH: Previous Q resolved for the asker
• 0.04381 CA: Average rating for the category
• 0.04306 UH: Total number of answers received
• 0.03274 CA: Average voter rating
• 0.03159 Q: Question posting time
• 0.02840 CA: Average # answers per Q
46
“Offline” vs. “Online” Prediction
• Offline prediction:
– All features( question, answer, asker & category)
– F1: 0.77
• Online prediction:
– NO answer features
– Only asker history and question features (stars,
#comments, sum of votes…)
– F1: 0.74
47
Feature Ablation
Precision Recall F1
Selected features 0.80 0.73 0.77
No question-answer features 0.76 0.74 0.75
No answerer features 0.76 0.75 0.75
No category features 0.75 0.76 0.75
No asker features 0.72 0.69 0.71
No question features 0.68 0.72 0.70
Asker & Question features are most important.
Answer quality/Answerer expertise/Category characteristics:
may not be important
caring or supportive answers often preferred
48
Satisfaction: varying by asker experience
Group together questions from askers with the same
number of previous questions
Accuracy of prediction increase dramatically
Reaching F1 of 0.9 for askers with >= 5 questions 49
Personalized Prediction of Asker
Satisfaction with info
• Same information != same usefulness for different users!
• Personalized classifier achieves surprisingly good
accuracy (even with just 1 previous question!)
• Simple strategy of grouping users by number of previous
questions is more effective than other methods for users
with moderate amount of history
• For users with >= 20 questions, textual features are
more significant
50
Grouping Users by “Age”
51
Some Personalized Models
52
Summary
• Asker satisfaction is predictable
– Can achieve higher than human accuracy by
exploiting interaction history
• User’s experience is important
• General model: one-size-fits-all
– 2000 questions for training model are enough
• Personalized satisfaction prediction:
– Helps with sufficient data (>= 1 prev interactions, can
observe text patterns with >=20 prev. interactions)
53
Subjectivity in CQA
B. Li, Y. Liu, and E. Agichtein, CoCQA: Co-Training Over Questions
and Answers with an Application to Predicting Question
Subjectivity Orientation, in EMNLP 2008
• How can we exploit structure of CQA to
improve question classification?
• Case Study: Question Subjectivity Prediction
– Subjective: Has anyone got one of those
home blood pressure monitors? and if so
what make is it and do you think they are
worth getting?
– Objective: What is the difference between
chemotherapy and radiation treatments?
54
Dataset Statistics (~1000 questions)
http://ir.mathcs.emory.edu/shared/
Arts
Science
Education
30%
48%
36%
70% 52%
64%
Sports
34%
Health Objective Subjective
21%
36%
66%
64% 79%
55
Key Observations
• Analysis of real questions in CQA is challenging:
– Typically complex and subjective
– Can be ill-phrased and vague
– Not enough annotated data
• Idea:
– Can we utilize the inherent structure of the CQA
interactions, and use unlabeled CQA data to
improve classification performance?
56
Natural Approach: Co-Training
• Introduced in:
– Combining labeled and unlabeled data with co-
training, Blum and Mitchell, 1998
• Two views of the data
– E.g.: content and hyperlinks in web pages
• Provide complementary information
• Iteratively construct additional labeled data
57
Questions and Answers: Two Views
• Example:
– Q: Has anyone got one of those home blood
pressure monitors? and if so what make is it and
do you think they are worth getting?
– A: My mom has one as she is diabetic so its
important for her to monitor it she finds it useful.
• Answers usually match/fit question
– My mom… she finds…
• Askers can usually identify matching answers
by selecting the “best answer”
58
CoCQA: A Co-Training Framework over Questions and
Answers
Q CQ Q
Labeled Unlabeled
Data Data
Unlabeled Data
A CA A ??????????
??????????
??????????
??????????
Classify
+--++--
--++--+
Validation
Stop
(Holdout training
data)
59
Semi Supervised Learning: Adding unlabeled data
Question+
Features Question
Best Answer
Method
Supervised 0.717 0.695
GE 0.712 (-0.7%) 0.717 (+3.2%)
CoCQA 0.731 (+1.9%) 0.745 (+7.2%)
60
60
CoCQA for varying amount of labeled data
0.72
0.7
0.68
0.66
0.64
0.62
F1
0.6
0.58
CoCQA (Question + Best Answer)
0.56
Supervised Q_Best Ans
0.54
0.52
50 100 150 200 250 300 350 400
# of labeled data used
61
61
Summary
• User-generated Content
– Growing
– Important: impact on main-stream media,
scholarly publishing, …
– Can provide insight into information seeking and
social processes
– “Training” data for IR, machine learning, NLP, ….
– Need to re-think quality, impact, usefulness
62
Current work
• Intelligently route a question to ``good’’
answerers
• Improve web search ranking by incorporating
CQA data
• ``Cost’’ models for CQA-based question
processing vs. other methods
• Rating dynamics
• Discourse analysis
• Cross-cultural comparisons 63
Takeaways
• People specify their information need fully when
they know humans are on the other end
• Next generation of search must be able to cope with
complex, subjective, and personal information needs
• To move beyond relevance, must be able to model
user satisfaction
• CQA provides insights into community “health”,
growth, engagement, quality of experience.
64
Thank you!
http://www.mathcs.emory.edu/~eugene
• Estimating contributor expertise [CIKM 2007]
• Estimating content quality [WSDM 2008]
• Inferring asker intent [EMNLP 2008]
• Predicting satisfaction [SIGIR 2008, ACL 2008]
• Matching askers with answerers
• Searching CQA archives [WWW 2008]
• Detecting spam [AIRWeb 2008]
Backup Slides
Question-Answer Features
Q: length, posting
time…
Q:Terms
QA: length, KL
divergence
Q:Votes
67
67
User Features
U: Member since
U: Total points
U: #Questions
U: #Answers
68
Category Features
• CA: Average time to close a
question
• CA: Average # answers per
question
• CA: Average asker rating
• CA: Average voter rating
• CA: Average # questions per
hour
• CA: Average # answers per
hour
Category #Q #A #A per Q Satisfied Avg asker rating Time to close by asker
General Health 134 737 5.46 70.4%
69 4.49 1 day and 13 hours
69
Backup slides
Self-Selection: First Experience Crucial
Days as member vs. rating
71
# prev questions vs. rating
Prediction Methods
• Heuristic: # answers
• Baseline: guess the majority class (satisfied).
• ASP: (our system)
• ASP_SVM: Our system with the SVM classifier
• ASP_C4.5: with the C4.5 classifier
• ASP_RandomForest: with the RandomForest classifier
• ASP_Boosting: with the AdaBoost algorithm combining
weak learners
• ASP_NaiveBayes: with the Naive Bayes classifier
• …
72
72
Satisfaction Prediction: Human Perf (Cont’d): Amazon
Mechanical Turk
• Methodology
– Used the same 130 questions
– For each question, list the best answer, as well as
other four answers ordered by votes
– Five independent raters for each question.
– Agreement: 0.9 F1: 0.61.
– Best accuracy achieved when at least 4 out of 5
raters predicted asker to be ‘satisfied’ (otherwise,
labeled as “unsatisfied”).
73
73
Some
Results
74
74
Details of CoCQA implementation
• Base classifier
– LibSVM
• Term Frequency as Term Weight
– Also tried Binary, TF*IDF
• Select top K examples with highest
confidence
– Margin value in SVM
75
75
Feature Set
• Character 3-grams
– has, any, nyo, yon, one…
• Words
– Has, anyone, got, mom, she, finds…
• Word with Character 3-grams
• Word n-grams (n<=3, i.e. Wi, WiWi+1,
WiWi+1Wi+2)
– Has anyone got, anyone got one, she finds it…
• Word and POS n-gram (n<=3, i.e. Wi, WiWi+1, Wi
POSi+1, POSiWi+1 , POSiPOSi+1, etc.)
– NP VBP, She PRP, VBP finds…
76
76
Get documents about "