MittalNikkila_ProjectReport
Document Sample


Analyzing the Effect of Real World Events on Feelings
Using We Feel Fine
Rajat Mittal Shawn Nikkila
Arizona State University, School of Computing Arizona State University, School of Computing
and Informatics and Informatics
699 S. Mill Ave, Tempe, AZ 85282 699 S. Mill Ave, Tempe, AZ 85282
mittal.rajat@asu.edu Shawn.Nikkila@asu.edu
ABSTRACT Is it possible to see that more bloggers express sadness in
This project aims to show the dynamics of how certain their writings when the stock market falls? This is the type
human emotions have changed over time from January 2005 of question that we hope to analyze and answer.
through November/December 2008 and tries to analyze
whether certain external factors are closely correlated to This leads to a more intriguing question about blogs which
these observed changes in human sentiment. The human is whether blogs are more like a personal diary of an
feelings are gathered using WeFeelFine and visualized using individual, or are they more of an opinion piece on issues
the Processing API. Four feelings (happy, sad, angry, and and major events that affect the global population (i.e. the
alone) and several external factors (national and regional stock market or unemployment rates)?
unemployment rates, presidential approval rates, and the We start with the assumption that blogs are actually a more
NASDAQ index) were used to show these dynamics. personal affair, and there should be little correlation
Results have shown that there is little to no correlation between the feelings that individuals express on their blogs
between these external factors and the feelings that have and these external factors. For the purpose of this
been chosen for this experiment. experiment, the data collected is restricted to the United
States only.
Author Keywords
Visualizations, Feelings IMPLEMENTATION TOOLS AND APPROACH
In this implementation, two visualizations were created: a
INTRODUCTION line graph which would help in quantitatively measuring the
Blogs have emerged as a personal account of people’s correlation between the feelings and the external events and
feelings allowing users to write and share online what they a map showing the geographic distribution of the feelings to
used to write in their personal diary or elsewhere. As a help see the results.
result, blogs today have become a relevant source of
unstructured information on the thoughts of the human In order to accomplish this goal, the WeFeelFine 1 API was
population as a whole. Therefore, it is possible that used in conjunction with the Processing2 API to do the
potential information can be mined from this data to indicate visualizations.
the feelings of the world in general.
We Feel Fine API
In 2006, Jonathan Harris, a computer science graduate from The WeFeelFine API uses a REST based framework to
Princeton University, and Sep Kamvar, a consulting query a particular feeling by time, location, type, etc. that
professor of Computational Mathematics at Stanford the WeFeelFine system has gathered by crawling numerous
University, created WeFeelFine1. This project explored blogs. Since the project’s API was officially launched in
human emotions on a massive scale. They extracted 2006, queries against the API are limited from 2005 through
feelings from numerous blogs and presented six unique and 2008. The API allows queries for nearly 100 different
fascinating representations from these feelings with each human emotions, but for the purpose of this project, the
highlighting different aspects of human emotions across the selection of feelings was narrowed down to four different
world. human emotions which are sad, happy, alone, and angry.
This project attempts to extend their effort of analyzing
human emotions by trying to correlate these emotions with
external factors such as the NASDAQ stock market index.
1
http://www.wefeelfine.org
1
Processing
In order to display these various statistics, it was decided
that Processing2 should be used. Processing is an open
source, robust Java API that is used for visualizations.
EXTERNAL VARIABLES
The following sections will go over the external variables
that were used and the reasons behind using them.
NASDAQ Stock Index
Considering the current economic downturn, the stock index
was chosen because it would have been interesting to see
how bloggers react when the stock market is on the decline
and conversely, when it is doing well. One would expect
that negative feelings would be on the rise when the stock Figure 1 - The complete user interface
market is doing unwell continuously and vice versa.
The controls for the map and grid are found on the lower
Presidential Approval Ratings sides of the interface, and the playback controls that let a
Another interesting external variable that was used was the user play/pause the animation and skip to the time desired
approval ratings for President Bush from January 2005 to are located at the bottom of the interface.
November 2008. This was used because it has been on a
steady decline for a long period of time and therefore, would United States Map
be a good variable to use when determining if there was any Although WeFeelFine collects data from all over the world,
correlation between it and any of the feelings. It is expected it was decided to only use data from the United States for
that this would be negatively correlated with negative two reasons.
feelings and positvely correlated with more upbeat feelings. 1. Most of the data collected from WeFeelFine is
from the United States.
Regional and National Unemployment Rates
2. The external variables used are exclusive to the
High unemployment rates are usually a precursor for overly
United States.
pessimistic outlooks from many people, especially when
layoffs are just around the corner. Therefore, given the WeFeelFine also collects the latitude and longitude of the
current economic climate in mind, regional and national feeling that was found. Therefore, for each data point
unemployment rates were used to map against feelings. returned for a particular feeling, the latitude and longitude is
correlated to the [x, y] pixel location on the map, and then
Apple iPhone Quarterly Sales plotted accordingly.
The Apple iPhone sales from April 2007 through July 2008
were also used. Considering the mass media attention that Line Graph
the device has received since its reveal, it was thought that it The external variables discussed earlier were plotted on an
could be a fun variable to use. animated line graph synced with the feelings being plotted
on the map. This allows a user to easily see the dynamics
INTERFACE between these variables and the feelings plotted over time.
The interface was constructed with usability and visual A line graph was used because most people are familiar
appeal in mind. The following figure is a screenshot of the with it, and it is extremely easy to read and see sharp
interface that was constructed. A retro futuristic look for the changes in the data. The graph that was implemented is
application was desired as is found in some science fiction able to show the data from four external variables at one
movies such as Tron, Star Trek, or Wargames. Therefore, a time.
vector drawing of the map was used with traditional green
colors making up the bulk of the interface. EXPERIMENTAL RESULTS
The following sections will present the experimental results
that were obtained from the visualization and data.
Peak Values
The following table lists the month and year that each
feeling peaked. In addition, the values of the different
external variables at that time instance are shown as well.
2
http://www.processing.org
Feeling Peak Date Peak Associated Var. Values Angry -0.032 -0.150 -0.418
Value {Nat. Unemp, AZ
Unemp., CA Unemp., Table 2 – Correlation coefficients of feelings and external
NY Unemp., variables
NASDAQ, Presidential It should be noted that the coefficients computed above used
Ratings} the data from August 2005 (first occurrence of data points
Sad Nov. 2006 1343 {4.5, 3.9, 4.8, 4.4, from WeFeelFine) through June 2007, which encompasses
2431.77, 33.0} 23 data points. The reason that data beyond that end date is
Happy Nov. 2006 1500 {4.5, 3.9, 4.8, 4.4, not included is because the number of feelings drops
2431.77, 33.0} dramatically, which might skew the computations. This
phenomenon will be discussed in more detail in later
Alone Jan. 2006 1500 {4.7, 4.4, 5.1, 4.8, sections.
2305.82, 43.0}
The regional data and iPhone data sets were not included in
Angry Nov. 2006 292 {4.5, 3.9, 4.8, 4.4, the correlation analysis. Because of time constraints, the
2431.77, 33.0} specific number of feelings in a given region was not
calculated; therefore, there was no feelings variable to
Table 1 - Peak value of feelings correlate with the regional unemployment data sets. The
Apple iPhone sales are not included because it was not iPhone data was not included because it only contained
realeased until eary 2007. Also, the WeFeelFine API only quarterly data instead of monthly data.
returns up to 1500 results explaining the peak values for the In addition to correlating feelings with these external
happy and alone feelings. variables, the correlation between the external variables
themselves was examined. These coefficients are computed
Correlation using the datasets from January 2005 through November
The correlation coefficients between the different feelings 2008 as shown below.
and external variables were calculated in order to
understand the relations between them. These computations NASDAQ <> NASDAQ <> Presidential
were done in Minitab which uses the Pearson product Presidential National Approval <>
moment correlation coefficient technique [2] defined as: Approval Unemployment National
Unemployment
-0.291 -0.612 -0.258
Table 3 – Correlation between different external variables.
Where x and y are the two data sets, s represents the sample
standard deviation, and n is the number of elements in each ANALYSIS AND DISCUSSION
data set. The following is an analysis and discussion of the various
results that were obtained.
This technique computes a correlation coefficient from [-1,
1]. A negative value indicates that the two data sets are Peaks
negatively correlated with one increasing while the other The results show peaks for each of the individual feelings
decreasing. A positive value indicates that the two data sets around November 2006. It is peculiar to see that sad,
are increasing together. A value close to -1 or 1 indicates happy, and angry each hit their individual maximum over
strong correlation whereas a value close to 0 indicates that the last three years at around this same period of time.
the variables are independent. The correlation coefficients
computed for the feelings against the external variables are It seems some major event happened in November 2006
shown in the table below. which caused people to blog heavily about it and express
mixed emotions. After looking into the major events of
NASDAQ Presidential National 2006, it was revealed that on November 5 2006, Saddam
Index (NI) Approval Unemployment Hussein, the deposed president of Iraq, was sentenced to
Ratings (PA) Rates (NU) death by hanging. It seems that this might be the
Sad -0.107 -0.072 -0.324 controversial event that people blogged about heavily in the
United States. Another event that might have sparked a
Happy -0.069 -0.090 -0.314 surge of feelings could be the release of Windows Vista.
Alone 0.032 -0.116 -0.449
3
For the feeling, alone, it is observed from the data that easily as other external events other than the ones used for
people were extremely alone in 2006, and the loneliness this project are affecting these variables.
count dropped only after April 2007.
Drop Off of Blog Posts
From the map, one could also visually observe that there An interesting observation that was made with this
were more blog posts from the East coast than from the visualization was that the number of feelings extracted from
West coast of United States over the past three years for all the blog posts dropped dramatically starting in the year
feelings. This implies that people from the East coast seem 2007. This is not isolated to one or two feelings; it seems to
to be more expressive about how they feel. affect all the feelings that were used for this experiment.
This could be indicative of a few factors. This sudden
Correlation
downturn could be because the blog authors are not being
The correlation coefficients for the feelings and external
outward about their feelings as much. However, this seems
variables as shown in Table 2 indicate that there is little or
to be too convenient of an assumption, and there could be
no correlation between them. What is strange is that all four
other factors to explain this phenomenon.
feelings paired with the national unemployment rate showed
a moderately negative correlation. While this makes sense Another reason for this sudden downturn could be because
for the happy feeling as one would speculate that people of the simple fact that the blogs that WeFeelFine checks are
would be more content as the unemployment rate lowers, it no longer updated. This line of reasoning could also imply
does not make sense for the more negative feelings. The that blog writers are now moving onto other forms of
correlation coefficient for the sad, alone, and angry feelings communication such as micro blogging which has become
indicate that they would tend to increase if the national popular. However, it should be noted that all of this is
unemployment rate decreases. This seems very simply speculation, and future work should analyze these
counterintuitive and implies that there is no real correlation aspects in more detail.
happening here. The alone and angry feelings show a small,
negative correlation with the presidential approval ratings, FUTURE WORK
which intuitively makes sense. If the president is doing a Potential future work that could be done is as follows:
bad job, then it is reasonable to assume that more people
Provide additional types of graphs to view the data.
would be angry or have negative feelings. The NASDAQ
index and the sad feeling also share a very small negative Visualize and analyze the data from individual
correlation, which is also reasonable. The rest of the days as months may be too coarse of a
pairings show that there is no correlation and that these measurement.
variables are independent from each other. The user should be able to view graphs of different
time scales simultaneously. This would allow the
This lack of correlation is not very surprising considering user to quickly see what is happening at
that one does not know the context of how these feelings increasingly finer levels of granularity at the same
were being used in the blog posts. This is because time.
WeFeelFine uses a simple keyword/phrase search to gather Similar feelings from WeFeelFine should be
feelings from the blogosphere. It is very likely that bloggers clustered together and aggregated to a single value.
are writing about more personal experiences that are not This would make more sense than using just one
related to any of the external variables used. The primary feeling at a time.
problem of this experiment as was mentioned earlier is that Or create a custom application that mines feelings
there are so many external events that could be affecting from domains specific blogs. For example, mine
these feelings that using only three variables and four feelings from a financial blog to see if there is any
feelings is not very effective in hindsight. correlation between feelings found there and the
Some of the more interesting results came from calculating NASDAQ index.
the correlation coefficients between the external variables
themselves. The results are common sense for the most CONCLUSION
part, but the most interesting aspect was seeing the In conclusion, it has been shown that there is little to no
correlation coefficients. From the data used, the NASDAQ correlation between the external factors and feelings that
index and the national unemployment rates show a strong were chosen. This could be because there are many more
negative correlation with each other at -0.612. This makes external variables that could be affecting the feelings or that
absolutely perfect sense as a falling stock market usually blogging is indeed a more personal experience that is less
results in companies scaling back their budgets thus concerned with external factors. However, more analysis
resulting in job cuts. The most puzzling result was that the and data is needed for any definitive study.
NASDAQ index and presidential approval ratings are
negatively correlated. However, this could be explained
REFERENCES
1. Apple iPhone Sales from April 2007 to July 2008.
http://iphonefan.co.za/2008/10/15/apple-bites-into-the-
cellphone-market/
2. Pearson Product-Moment Correlation Coefficient.
http://en.wikipedia.org/wiki/Pearson_product-
moment_correlation_coefficient
3. Presidential Job Approval in Depth.
http://www.gallup.com/poll/1723/Presidential-Job-
Approval-Depth.aspx
4. United States Bureau of Labor Statistics,
Unemployment Data. http://www.bls.gov
5. WeFeelFine Methodology.
http://www.wefeelfine.org/methodology.html
5
APPENDIX A – SCREENSHOTS
Get documents about "