Analyzing the Political Blogosphere
19 March 2007
.
Social Media
• “Social media describes the online tools and platforms that people use to share opinions, insights, experiences, and perspectives” - wikipedia • Level of user participation and thought sharing across varied topics
6/1/2008
Page 2
Blogs – Essence of Social Media
Blogs spread new ideas and information rapidly
6/1/2008
Page 3
Knowing & Influencing your Audience
• Your goal is to campaign for a presidential candidate • How can you track the buzz about him/her? • What are the relevant communities and bogs? • Which communities are supporters, which are skeptical, which are put off by the hype? • Is your campaign having an effect? The desired effect? • Which bloggers are influential with political audience? Of these, which are already onboard and which are lost causes? • To whom should you send details or talk to?
6/1/2008 Page 4
Influence Detection
• Often voters are influenced by opinions and reviews on blogs • Detecting influential nodes and their role in how people perceive a political party could be an important tool during campaigning • Using topic, social structure, opinions, biases and temporal information we can develop an accurate model for influence
6/1/2008
Page 5
Influence in Communities
http://michellemalkin.com/
http://instapundit.com
http://dailykos.com
http://volokh.com http://crooksandliars.com
http://rightwingnews.com
Communities detected using “Fast algorithm for detecting 6 6/1/2008 Page community structure in networks”, M.E. J. Newman
Influence of MSM
Citation count alone is not an indicator of influence; who cites is a factor.
Using a list of 130 dem and 140 rep blogs
6/1/2008
Page 7
Computing Influence of MSM
For Democratic Citations
Score(i) = Pd(i)•log(Pd(i)/Pr(i))•Nd(i) where
• i is the MSM source • Pd(i): probability that a democratic blog links to MSM i • Pr(i): probability that a republican blog links to MSM i • Nd(i): number of distinct democratic blogs linking to i
Similar ranking for republican blogs
6/1/2008
Page 8
Opinions in Social Media
“I went to school early so I would have time to grab some lunch. Which ended up consisting of a crappy sandwich from starbucks and a chai latte. Lacey came into Starbucks while I was there so we chatted for a little bit and she thought that I might be in her class. After I finished eating I headed to school and checked the board……..”1
[1] http://annamay13x.livejournal.com/7061.html
6/1/2008 Page 9
TREC 06: Finding opinionated posts, Reader’s Perspective either positive or Narrative “Starbucks negative, about a query Sandwiches are bad!” 2006 TREC Blog corpus: • 80K blogs Expressed • 300K posts Opinions • 50 test queries
Challenges: open domain sentiment words, slangs, subject
Opinions can effect buying decisions of customers
Finding Feeds That Matter
Analysis of Bloglines Feeds
83K publicly listed subscribers 2.8M feeds, 500K are unique 26K users (35%) use folders to organize subscriptions Data collected in May 2006
Before Merge
After Merge
6/1/2008 http://ftm.umbc.edu
Page 10
Finding Feeds That Matter
Top Feeds for “Politics” (Merging: “political”, “political blogs”) • Talking Points Memo: by Joshua Micah Marshall • Daily Kos: State of the Nation • Eschaton • The Washington Monthly • Wonkette, Politics for People with Dirty Minds • http://instapundit.com/ • Informed Comment • Power Line • AMERICAblog: Because a great nation deserves the truth • Crooks and Liars
6/1/2008 Page 11
Finding Feeds That Matter
Tag Based Feed Recommender: Feeds under similar folder names http://www.dailykos.com • Recommended Feeds http://www.andrewsullivan.com/index.php http://www.talkingpointsmemo.com/ http://atrios.blogspot.com http://jameswolcott.com/ http://mediamatters.org/ http://yglesias.typepad.com/matthew/ http://billmon.org/ http://digbysblog.blogspot.com http://instapundit.com/ http://www.washingtonmonthly.com/
6/1/2008 Page 12
Finding Influential Feeds using “Co-Citations”
Feed recommendations
www.dailykos.com
Blogs influenced by seed set
Leading blogs about “Politics”. Seed set is top blogs in “politics” from bloglines and blog graph used is from Blogpulse dataset..
6/1/2008 Page 13
Link Polarity / Bias
• Linking alone is not indicator of influence • Polarity can indicate the type of influence • Consistent negative / positive opinion over a period of time can indicate bias • Link polarity/citation signal can also be helpful in determining trust
Democrat Blog
Republican Blog
6/1/2008
Page 14
Modeling Influence Using Link Polarity
Motivation
• Growing interest in exploring role of communities in social media • Better community detection algorithms using sentiment associated with links • Convert sparsely connected blog graph into a densely connected one with sentiment weight attached to every link
Approach
• Link Polarity: Analyze post text surrounding links to determine bias of bloggers about each other • Trust Propagation: Use trust propagation models to spread the polarity from a small subset of “connected” bloggers to all bloggers.
Experiments
• Study political blogosphere with goal to classify blogs as left/right leaning • Bias detection using positive/neural/negative score from influential bloggers (high in-link blogs) in both communities • Validation with a hand-labeled dataset indicates ~60% correct classification
6/1/2008 Page 15
Bird’s Eye View – Step 1
C E
B
D
foo
A
F
6/1/2008
Page 16
Bird’s Eye View – Step 2
C “He is great” D “I like him”
foo
E
B
“What crap!”
A
-ve bias
“ridiculous”
F
+ve bias
6/1/2008 Page 17
Bird’s Eye View – Step 3
C E
B
D
foo
A
-ve bias +ve bias
6/1/2008
F
Page 18
Bird’s Eye View – Step 4
C E
B
D
foo
A
-ve bias +ve bias
6/1/2008
F
Page 19
Bird’s Eye View – Step 4
C E
B
D
foo
A
-ve bias +ve bias
6/1/2008
F
Page 20
Link Polarity Example
• “Stephen Colbert's performance at the White House Correspondents' Association dinner has garnered him huge applause in the blogosphere and also on C-Span where it was shown more than once. Those of us who have been angry with Bush for quite some time because of his arrogant and feckless corruption of our country were even more thrilled to see and know that he had no recourse but to sit there and watch his aspirations for greatness be destroyed by a master of irony. This will be his legacy: I stand by this man. I stand by this man because he stands for things. Not only for things, he stands on things. Things like aircraft carriers and rubble and recently flooded city squares. And that sends a strong message, that no matter what happens to America, she will always rebound -- with the most powerfully staged photo ops in the world. We who have been watching Stephen Colbert eviscerate politicians that have come on his show knew he was a gifted comedian. But it took Saturday's dinner to demonstrate how incredibly effective the art form Colbert has chosen is for exposing the Potemkin Regime Bush and his henchmen have created. Rove and the right wing machine have no answer to the performance but to say "it bombed", "it wasn't funny", and to hope that by ignoring it, the caustic cleansing agent it has lobbed into their camp can be contained. Yet, the Republican spinmeisters are the masters of spin.”[2]
This - http://dailykos.com/storyonly/2006/4/30/1441/59811
Np = 8, Nn = 4 ; Polarity = 0.33
[2]http://www.pacificviews.org/weblog/archives/001989.html
6/1/2008 Page 21
Trust Propagation
• Based on Guha’s work on propagating trust and distrust
• Mij represents bias from user i to j (0 <= Mij <= 1) • Belief matrix M represents the initial set of known beliefs • Mij can be based on trust matrix (T), distrust matrix (D) or a combination of trust and distrust (T-D) from i to j. • T = Positive Polarities and D = Negative Polarities • Goal is to compute all unknown values in M
• Results from validations on dataset from “epinions” are impressive
[1] Guha R, Kumar R, Raghavan P, Tomkins A. Propagation of trust and distrust. In: Proc. 13th Int. World Wide Web Conf., New York, NY, USA, May 2004. ACM Press, 2004.
6/1/2008 Page 22
Experiments
Domain
• • • • Political Blogosphere Dataset from Buzzmetrics[2] provides post-post links over 1.5M posts Few off-the-topic posts help aggregation Potential business value
Reference Dataset
• Adamic’s [3] Hand-labeled dataset classifies blogs as right or left leaning • Timeframe: 2004 presidential elections, over 1500 blogs analyzed • Overlap of 300 blogs between Buzzmetrics and reference dataset
Goal
• Classify the blogs in Buzzmetrics dataset as democrat and republic and compare with reference dataset
[2] Lada A. Adamic and Natalie Glance, "The political blogosphere and the 2004 US Election", in Proceedings of the WWW-2005 Workshop, MAY 2005. Buzzmetrics – www.buzzmetrics.com
6/1/2008 Page 23
Republican blogs classified more correctly
Effect of Link Polarity
Trust propagation on polar links more effective than on non-polar.
Link Polarity yields ~30% classification improvement
6/1/2008
Convergence after 20 Page 24 propagations
Effect of text window size
• • • •
Optimal window size is 750 characters for our experiments Small window size – Non-opinionated phrases Large Window size – Analysis of non-related text Specific to our experiments, numbers may not be generalized
6/1/2008 Page 25
Sample Data
• Trust propagation compensates for initial incorrect polarity (DK–AT) • Trust propagation doesn’t change correct polarity (ATDK) • Trust propagation assigns correct polarity for nonexistent links (AT-IP) • Numbers in italics problematic (AT-MM)
• Make polarities below threshold zero? • Improve sentiment detection?
6/1/2008 Page 26
Future Work
• Bias and trustworthiness of MSM sources • Trend extraction and meme tracking for political blogs • Real time classification positive and negative opinions for presidential candidates • Determining genres in political opinions using content analysis.
6/1/2008
Page 27
sammyc2007 6/1/2008 |
67 |
4 |
0 |
educational
sammyc2007 6/1/2008 |
30 |
4 |
0 |
educational
sammyc2007 6/1/2008 |
42 |
2 |
0 |
educational
sammyc2007 6/1/2008 |
30 |
0 |
0 |
educational
sammyc2007 6/1/2008 |
46 |
4 |
0 |
educational
sammyc2007 6/1/2008 |
43 |
2 |
0 |
educational
sammyc2007 6/1/2008 |
32 |
0 |
0 |
educational
sammyc2007 6/1/2008 |
36 |
0 |
0 |
educational
sammyc2007 6/1/2008 |
34 |
1 |
0 |
educational
sammyc2007 6/1/2008 |
36 |
1 |
0 |
educational
sammyc2007 6/1/2008 |
34 |
1 |
0 |
educational
sammyc2007 6/1/2008 |
35 |
4 |
0 |
educational
sammyc2007 6/1/2008 |
46 |
2 |
0 |
educational
sammyc2007 6/1/2008 |
43 |
0 |
0 |
educational
sammyc2007 6/1/2008 |
38 |
0 |
0 |
educational
sammyc2007 6/13/2008 |
207 |
6 |
0 |
legal
sammyc2007 6/13/2008 |
190 |
0 |
0 |
legal
sammyc2007 6/13/2008 |
249 |
4 |
0 |
legal
sammyc2007 6/13/2008 |
222 |
2 |
0 |
legal
sammyc2007 6/13/2008 |
403 |
2 |
0 |
legal
sammyc2007 6/13/2008 |
319 |
1 |
0 |
legal
sammyc2007 6/13/2008 |
207 |
0 |
0 |
legal
sammyc2007 6/13/2008 |
174 |
0 |
0 |
legal
sammyc2007 6/13/2008 |
297 |
0 |
0 |
legal
sammyc2007 6/13/2008 |
245 |
0 |
0 |
legal