Document Sample
MittalNikkila_ProjectReport Powered By Docstoc
					       Analyzing the Effect of Real World Events on Feelings
                        Using We Feel Fine
                     Rajat Mittal                                                   Shawn Nikkila
    Arizona State University, School of Computing                    Arizona State University, School of Computing
                   and Informatics                                                  and Informatics
         699 S. Mill Ave, Tempe, AZ 85282                                 699 S. Mill Ave, Tempe, AZ 85282

ABSTRACT                                                            Is it possible to see that more bloggers express sadness in
This project aims to show the dynamics of how certain               their writings when the stock market falls? This is the type
human emotions have changed over time from January 2005             of question that we hope to analyze and answer.
through November/December 2008 and tries to analyze
whether certain external factors are closely correlated to          This leads to a more intriguing question about blogs which
these observed changes in human sentiment. The human                is whether blogs are more like a personal diary of an
feelings are gathered using WeFeelFine and visualized using         individual, or are they more of an opinion piece on issues
the Processing API. Four feelings (happy, sad, angry, and           and major events that affect the global population (i.e. the
alone) and several external factors (national and regional          stock market or unemployment rates)?
unemployment rates, presidential approval rates, and the            We start with the assumption that blogs are actually a more
NASDAQ index) were used to show these dynamics.                     personal affair, and there should be little correlation
Results have shown that there is little to no correlation           between the feelings that individuals express on their blogs
between these external factors and the feelings that have           and these external factors. For the purpose of this
been chosen for this experiment.                                    experiment, the data collected is restricted to the United
                                                                    States only.
Author Keywords
Visualizations, Feelings                                            IMPLEMENTATION TOOLS AND APPROACH
                                                                    In this implementation, two visualizations were created: a
INTRODUCTION                                                        line graph which would help in quantitatively measuring the
Blogs have emerged as a personal account of people’s                correlation between the feelings and the external events and
feelings allowing users to write and share online what they         a map showing the geographic distribution of the feelings to
used to write in their personal diary or elsewhere. As a            help see the results.
result, blogs today have become a relevant source of
unstructured information on the thoughts of the human               In order to accomplish this goal, the WeFeelFine 1 API was
population as a whole. Therefore, it is possible that               used in conjunction with the Processing2 API to do the
potential information can be mined from this data to indicate       visualizations.
the feelings of the world in general.
                                                                    We Feel Fine API
In 2006, Jonathan Harris, a computer science graduate from          The WeFeelFine API uses a REST based framework to
Princeton University, and Sep Kamvar, a consulting                  query a particular feeling by time, location, type, etc. that
professor of Computational Mathematics at Stanford                  the WeFeelFine system has gathered by crawling numerous
University, created WeFeelFine1. This project explored              blogs. Since the project’s API was officially launched in
human emotions on a massive scale. They extracted                   2006, queries against the API are limited from 2005 through
feelings from numerous blogs and presented six unique and           2008. The API allows queries for nearly 100 different
fascinating representations from these feelings with each           human emotions, but for the purpose of this project, the
highlighting different aspects of human emotions across the         selection of feelings was narrowed down to four different
world.                                                              human emotions which are sad, happy, alone, and angry.
This project attempts to extend their effort of analyzing
human emotions by trying to correlate these emotions with
external factors such as the NASDAQ stock market index.


In order to display these various statistics, it was decided
that Processing2 should be used. Processing is an open
source, robust Java API that is used for visualizations.

The following sections will go over the external variables
that were used and the reasons behind using them.

NASDAQ Stock Index
Considering the current economic downturn, the stock index
was chosen because it would have been interesting to see
how bloggers react when the stock market is on the decline
and conversely, when it is doing well. One would expect
that negative feelings would be on the rise when the stock                      Figure 1 - The complete user interface
market is doing unwell continuously and vice versa.
                                                                   The controls for the map and grid are found on the lower
Presidential Approval Ratings                                      sides of the interface, and the playback controls that let a
Another interesting external variable that was used was the        user play/pause the animation and skip to the time desired
approval ratings for President Bush from January 2005 to           are located at the bottom of the interface.
November 2008. This was used because it has been on a
steady decline for a long period of time and therefore, would      United States Map
be a good variable to use when determining if there was any        Although WeFeelFine collects data from all over the world,
correlation between it and any of the feelings. It is expected     it was decided to only use data from the United States for
that this would be negatively correlated with negative             two reasons.
feelings and positvely correlated with more upbeat feelings.           1.   Most of the data collected from WeFeelFine is
                                                                            from the United States.
Regional and National Unemployment Rates
                                                                       2.   The external variables used are exclusive to the
High unemployment rates are usually a precursor for overly
                                                                            United States.
pessimistic outlooks from many people, especially when
layoffs are just around the corner. Therefore, given the           WeFeelFine also collects the latitude and longitude of the
current economic climate in mind, regional and national            feeling that was found. Therefore, for each data point
unemployment rates were used to map against feelings.              returned for a particular feeling, the latitude and longitude is
                                                                   correlated to the [x, y] pixel location on the map, and then
Apple iPhone Quarterly Sales                                       plotted accordingly.
The Apple iPhone sales from April 2007 through July 2008
were also used. Considering the mass media attention that          Line Graph
the device has received since its reveal, it was thought that it   The external variables discussed earlier were plotted on an
could be a fun variable to use.                                    animated line graph synced with the feelings being plotted
                                                                   on the map. This allows a user to easily see the dynamics
INTERFACE                                                          between these variables and the feelings plotted over time.
The interface was constructed with usability and visual            A line graph was used because most people are familiar
appeal in mind. The following figure is a screenshot of the        with it, and it is extremely easy to read and see sharp
interface that was constructed. A retro futuristic look for the    changes in the data. The graph that was implemented is
application was desired as is found in some science fiction        able to show the data from four external variables at one
movies such as Tron, Star Trek, or Wargames. Therefore, a          time.
vector drawing of the map was used with traditional green
colors making up the bulk of the interface.                        EXPERIMENTAL RESULTS
                                                                   The following sections will present the experimental results
                                                                   that were obtained from the visualization and data.

                                                                   Peak Values
                                                                   The following table lists the month and year that each
                                                                   feeling peaked. In addition, the values of the different
                                                                   external variables at that time instance are shown as well.
 Feeling   Peak Date      Peak      Associated Var. Values           Angry     -0.032          -0.150            -0.418
                          Value     {Nat.    Unemp,     AZ
                                    Unemp., CA Unemp.,                Table 2 – Correlation coefficients of feelings and external
                                    NY             Unemp.,                                    variables
                                    NASDAQ, Presidential            It should be noted that the coefficients computed above used
                                    Ratings}                        the data from August 2005 (first occurrence of data points
 Sad       Nov. 2006      1343      {4.5, 3.9, 4.8, 4.4,            from WeFeelFine) through June 2007, which encompasses
                                    2431.77, 33.0}                  23 data points. The reason that data beyond that end date is
 Happy     Nov. 2006      1500      {4.5, 3.9, 4.8,    4.4,         not included is because the number of feelings drops
                                    2431.77, 33.0}                  dramatically, which might skew the computations. This
                                                                    phenomenon will be discussed in more detail in later
 Alone     Jan. 2006      1500      {4.7, 4.4, 5.1,    4.8,         sections.
                                    2305.82, 43.0}
                                                                    The regional data and iPhone data sets were not included in
 Angry     Nov. 2006      292       {4.5, 3.9, 4.8,    4.4,         the correlation analysis. Because of time constraints, the
                                    2431.77, 33.0}                  specific number of feelings in a given region was not
                                                                    calculated; therefore, there was no feelings variable to
               Table 1 - Peak value of feelings                     correlate with the regional unemployment data sets. The
Apple iPhone sales are not included because it was not              iPhone data was not included because it only contained
realeased until eary 2007. Also, the WeFeelFine API only            quarterly data instead of monthly data.
returns up to 1500 results explaining the peak values for the       In addition to correlating feelings with these external
happy and alone feelings.                                           variables, the correlation between the external variables
                                                                    themselves was examined. These coefficients are computed
Correlation                                                         using the datasets from January 2005 through November
The correlation coefficients between the different feelings         2008 as shown below.
and external variables were calculated in order to
understand the relations between them. These computations            NASDAQ         <>    NASDAQ <>           Presidential
were done in Minitab which uses the Pearson product                  Presidential         National            Approval     <>
moment correlation coefficient technique [2] defined as:             Approval             Unemployment        National
                                                                     -0.291               -0.612              -0.258

                                                                      Table 3 – Correlation between different external variables.
Where x and y are the two data sets, s represents the sample
standard deviation, and n is the number of elements in each         ANALYSIS AND DISCUSSION
data set.                                                           The following is an analysis and discussion of the various
                                                                    results that were obtained.
This technique computes a correlation coefficient from [-1,
1]. A negative value indicates that the two data sets are           Peaks
negatively correlated with one increasing while the other           The results show peaks for each of the individual feelings
decreasing. A positive value indicates that the two data sets       around November 2006. It is peculiar to see that sad,
are increasing together. A value close to -1 or 1 indicates         happy, and angry each hit their individual maximum over
strong correlation whereas a value close to 0 indicates that        the last three years at around this same period of time.
the variables are independent. The correlation coefficients
computed for the feelings against the external variables are        It seems some major event happened in November 2006
shown in the table below.                                           which caused people to blog heavily about it and express
                                                                    mixed emotions. After looking into the major events of
           NASDAQ         Presidential      National                2006, it was revealed that on November 5 2006, Saddam
           Index (NI)     Approval          Unemployment            Hussein, the deposed president of Iraq, was sentenced to
                          Ratings (PA)      Rates (NU)              death by hanging. It seems that this might be the
 Sad       -0.107         -0.072            -0.324                  controversial event that people blogged about heavily in the
                                                                    United States. Another event that might have sparked a
 Happy     -0.069         -0.090            -0.314                  surge of feelings could be the release of Windows Vista.
 Alone     0.032          -0.116            -0.449

For the feeling, alone, it is observed from the data that        easily as other external events other than the ones used for
people were extremely alone in 2006, and the loneliness          this project are affecting these variables.
count dropped only after April 2007.
                                                                 Drop Off of Blog Posts
From the map, one could also visually observe that there         An interesting observation that was made with this
were more blog posts from the East coast than from the           visualization was that the number of feelings extracted from
West coast of United States over the past three years for all    the blog posts dropped dramatically starting in the year
feelings. This implies that people from the East coast seem      2007. This is not isolated to one or two feelings; it seems to
to be more expressive about how they feel.                       affect all the feelings that were used for this experiment.
                                                                 This could be indicative of a few factors. This sudden
                                                                 downturn could be because the blog authors are not being
The correlation coefficients for the feelings and external
                                                                 outward about their feelings as much. However, this seems
variables as shown in Table 2 indicate that there is little or
                                                                 to be too convenient of an assumption, and there could be
no correlation between them. What is strange is that all four
                                                                 other factors to explain this phenomenon.
feelings paired with the national unemployment rate showed
a moderately negative correlation. While this makes sense        Another reason for this sudden downturn could be because
for the happy feeling as one would speculate that people         of the simple fact that the blogs that WeFeelFine checks are
would be more content as the unemployment rate lowers, it        no longer updated. This line of reasoning could also imply
does not make sense for the more negative feelings. The          that blog writers are now moving onto other forms of
correlation coefficient for the sad, alone, and angry feelings   communication such as micro blogging which has become
indicate that they would tend to increase if the national        popular. However, it should be noted that all of this is
unemployment rate decreases.              This seems very        simply speculation, and future work should analyze these
counterintuitive and implies that there is no real correlation   aspects in more detail.
happening here. The alone and angry feelings show a small,
negative correlation with the presidential approval ratings,     FUTURE WORK
which intuitively makes sense. If the president is doing a       Potential future work that could be done is as follows:
bad job, then it is reasonable to assume that more people
                                                                         Provide additional types of graphs to view the data.
would be angry or have negative feelings. The NASDAQ
index and the sad feeling also share a very small negative               Visualize and analyze the data from individual
correlation, which is also reasonable. The rest of the                    days as months may be too coarse of a
pairings show that there is no correlation and that these                 measurement.
variables are independent from each other.                               The user should be able to view graphs of different
                                                                          time scales simultaneously. This would allow the
This lack of correlation is not very surprising considering               user to quickly see what is happening at
that one does not know the context of how these feelings                  increasingly finer levels of granularity at the same
were being used in the blog posts. This is because                        time.
WeFeelFine uses a simple keyword/phrase search to gather                 Similar feelings from WeFeelFine should be
feelings from the blogosphere. It is very likely that bloggers            clustered together and aggregated to a single value.
are writing about more personal experiences that are not                  This would make more sense than using just one
related to any of the external variables used. The primary                feeling at a time.
problem of this experiment as was mentioned earlier is that              Or create a custom application that mines feelings
there are so many external events that could be affecting                 from domains specific blogs. For example, mine
these feelings that using only three variables and four                   feelings from a financial blog to see if there is any
feelings is not very effective in hindsight.                              correlation between feelings found there and the
Some of the more interesting results came from calculating                NASDAQ index.
the correlation coefficients between the external variables
themselves. The results are common sense for the most            CONCLUSION
part, but the most interesting aspect was seeing the             In conclusion, it has been shown that there is little to no
correlation coefficients. From the data used, the NASDAQ         correlation between the external factors and feelings that
index and the national unemployment rates show a strong          were chosen. This could be because there are many more
negative correlation with each other at -0.612. This makes       external variables that could be affecting the feelings or that
absolutely perfect sense as a falling stock market usually       blogging is indeed a more personal experience that is less
results in companies scaling back their budgets thus             concerned with external factors. However, more analysis
resulting in job cuts. The most puzzling result was that the     and data is needed for any definitive study.
NASDAQ index and presidential approval ratings are
negatively correlated. However, this could be explained
1.   Apple iPhone Sales from April 2007 to July 2008.
2.   Pearson Product-Moment Correlation Coefficient.
3.   Presidential Job Approval in Depth.
4.   United States Bureau of Labor Statistics,
     Unemployment Data.
5.   WeFeelFine Methodology.


Shared By: