Document Sample
Analyzing+the+Political+Blogosphere Powered By Docstoc
					Analyzing the Political Blogosphere

19 March 2007

Social Media
• “Social media describes the online tools and platforms that people use to share opinions, insights, experiences, and perspectives” - wikipedia • Level of user participation and thought sharing across varied topics


Page 2

Blogs – Essence of Social Media
Blogs spread new ideas and information rapidly


Page 3

Knowing & Influencing your Audience
• Your goal is to campaign for a presidential candidate • How can you track the buzz about him/her? • What are the relevant communities and bogs? • Which communities are supporters, which are skeptical, which are put off by the hype? • Is your campaign having an effect? The desired effect? • Which bloggers are influential with political audience? Of these, which are already onboard and which are lost causes? • To whom should you send details or talk to?
8/22/2008 Page 4

Influence Detection
• Often voters are influenced by opinions and reviews on blogs • Detecting influential nodes and their role in how people perceive a political party could be an important tool during campaigning • Using topic, social structure, opinions, biases and temporal information we can develop an accurate model for influence


Page 5

Influence in Communities

Communities detected using “Fast algorithm for detecting 6 8/22/2008 Page community structure in networks”, M.E. J. Newman

Influence of MSM
Citation count alone is not an indicator of influence; who cites is a factor. Using a list of 130 dem and 140 rep blogs


Page 7

Computing Influence of MSM
For Democratic Citations
Score(i) = Pd(i)•log(Pd(i)/Pr(i))•Nd(i) where
• i is the MSM source • Pd(i): probability that a democratic blog links to MSM i • Pr(i): probability that a republican blog links to MSM i • Nd(i): number of distinct democratic blogs linking to i

Similar ranking for republican blogs


Page 8

Opinions in Social Media
“I went to school early so I would have time to grab some lunch. Which ended up consisting of a crappy sandwich from starbucks and a chai latte. Lacey came into Starbucks while I was there so we chatted for a little bit and she thought that I might be in her class. After I finished eating I headed to school and checked the board……..”1
8/22/2008 Page 9

TREC 06: Finding opinionated posts, Reader’s Perspective either positive or Narrative “Starbucks negative, about a query Sandwiches are bad!” 2006 TREC Blog corpus: • 80K blogs Expressed • 300K posts Opinions • 50 test queries Challenges: open domain sentiment words, slangs, subject

Opinions can effect buying decisions of customers

Finding Feeds That Matter
Analysis of Bloglines Feeds
83K publicly listed subscribers 2.8M feeds, 500K are unique 26K users (35%) use folders to organize subscriptions Data collected in May 2006

Before Merge

After Merge


Page 10

Finding Feeds That Matter
Top Feeds for “Politics” (Merging: “political”, “political blogs”) • Talking Points Memo: by Joshua Micah Marshall • Daily Kos: State of the Nation • Eschaton • The Washington Monthly • Wonkette, Politics for People with Dirty Minds • • Informed Comment • Power Line • AMERICAblog: Because a great nation deserves the truth • Crooks and Liars
8/22/2008 Page 11

Finding Feeds That Matter
Tag Based Feed Recommender: Feeds under similar folder names • Recommended Feeds
8/22/2008 Page 12

Finding Influential Feeds using “Co-Citations”
Feed recommendations

Blogs influenced by seed set

Leading blogs about “Politics”. Seed set is top blogs in “politics” from bloglines and blog graph used is from Blogpulse dataset..
8/22/2008 Page 13

Link Polarity / Bias
• Linking alone is not indicator of influence • Polarity can indicate the type of influence • Consistent negative / positive opinion over a period of time can indicate bias • Link polarity/citation signal can also be helpful in determining trust

Democrat Blog

Republican Blog


Page 14

Modeling Influence Using Link Polarity
• Growing interest in exploring role of communities in social media • Better community detection algorithms using sentiment associated with links • Convert sparsely connected blog graph into a densely connected one with sentiment weight attached to every link

• Link Polarity: Analyze post text surrounding links to determine bias of bloggers about each other • Trust Propagation: Use trust propagation models to spread the polarity from a small subset of “connected” bloggers to all bloggers.

• Study political blogosphere with goal to classify blogs as left/right leaning • Bias detection using positive/neural/negative score from influential bloggers (high in-link blogs) in both communities • Validation with a hand-labeled dataset indicates ~60% correct classification
8/22/2008 Page 15

Bird’s Eye View – Step 1





Page 16

Bird’s Eye View – Step 2
C “He is great” D “I like him” E B


“What crap!”

A -ve bias



+ve bias
8/22/2008 Page 17

Bird’s Eye View – Step 3


A -ve bias +ve bias


Page 18

Bird’s Eye View – Step 4


A -ve bias +ve bias


Page 19

Bird’s Eye View – Step 4


A -ve bias +ve bias


Page 20

Link Polarity Example
• “Stephen Colbert's performance at the White House Correspondents' Association dinner has garnered him huge applause in the blogosphere and also on C-Span where it was shown more than once. Those of us who have been angry with Bush for quite some time because of his arrogant and feckless corruption of our country were even more thrilled to see and know that he had no recourse but to sit there and watch his aspirations for greatness be destroyed by a master of irony. This will be his legacy: I stand by this man. I stand by this man because he stands for things. Not only for things, he stands on things. Things like aircraft carriers and rubble and recently flooded city squares. And that sends a strong message, that no matter what happens to America, she will always rebound -- with the most powerfully staged photo ops in the world. We who have been watching Stephen Colbert eviscerate politicians that have come on his show knew he was a gifted comedian. But it took Saturday's dinner to demonstrate how incredibly effective the art form Colbert has chosen is for exposing the Potemkin Regime Bush and his henchmen have created. Rove and the right wing machine have no answer to the performance but to say "it bombed", "it wasn't funny", and to hope that by ignoring it, the caustic cleansing agent it has lobbed into their camp can be contained. Yet, the Republican spinmeisters are the masters of spin.”[2]

This -

Np = 8, Nn = 4 ; Polarity = 0.33
8/22/2008 Page 21

Trust Propagation
• Based on Guha’s work on propagating trust and distrust
• Mij represents bias from user i to j (0 <= Mij <= 1) • Belief matrix M represents the initial set of known beliefs • Mij can be based on trust matrix (T), distrust matrix (D) or a combination of trust and distrust (T-D) from i to j. • T = Positive Polarities and D = Negative Polarities • Goal is to compute all unknown values in M

• Results from validations on dataset from “epinions” are impressive
[1] Guha R, Kumar R, Raghavan P, Tomkins A. Propagation of trust and distrust. In: Proc. 13th Int. World Wide Web Conf., New York, NY, USA, May 2004. ACM Press, 2004.
8/22/2008 Page 22

• • • • Political Blogosphere Dataset from Buzzmetrics[2] provides post-post links over 1.5M posts Few off-the-topic posts help aggregation Potential business value

Reference Dataset
• Adamic’s [3] Hand-labeled dataset classifies blogs as right or left leaning • Timeframe: 2004 presidential elections, over 1500 blogs analyzed • Overlap of 300 blogs between Buzzmetrics and reference dataset

• Classify the blogs in Buzzmetrics dataset as democrat and republic and compare with reference dataset
[2] Lada A. Adamic and Natalie Glance, "The political blogosphere and the 2004 US Election", in Proceedings of the WWW-2005 Workshop, MAY 2005. Buzzmetrics –
8/22/2008 Page 23

Republican blogs classified more correctly

Effect of Link Polarity

Trust propagation on polar links more effective than on non-polar.

Link Polarity yields ~30% classification improvement


Convergence after 20 Page 24 propagations

Effect of text window size

• • • •

Optimal window size is 750 characters for our experiments Small window size – Non-opinionated phrases Large Window size – Analysis of non-related text Specific to our experiments, numbers may not be generalized
8/22/2008 Page 25

Sample Data
• Trust propagation compensates for initial incorrect polarity (DK–AT) • Trust propagation doesn’t change correct polarity (ATDK) • Trust propagation assigns correct polarity for nonexistent links (AT-IP) • Numbers in italics problematic (AT-MM)
• Make polarities below threshold zero? • Improve sentiment detection?
8/22/2008 Page 26

Future Work
• Bias and trustworthiness of MSM sources • Trend extraction and meme tracking for political blogs • Real time classification positive and negative opinions for presidential candidates • Determining genres in political opinions using content analysis.


Page 27

Shared By: