final models FINAL version
Document Sample


2012
Presidential
Election
Poll
Philip
Garland,
Ph.D.,
VP
Methodology
Dave
Goldberg,
CEO
Liana
Epstein,
Ph.D.,
Senior
Methodologist
Annabell
Suh
SURVEYMONKEY
2012
PRESIDENTIAL
ELECTION
POLL
SurveyMonkey has surveyed roughly 1.2 million people from August 17th to November 2nd. Still,
skeptics will ask, “Can an internet poll really be successful at approximating voter turnout?”
Here
is
the
map
of
actual
voter
turnout
in
2008
by
county…
This
is
the
map
of
respondents
to
SurveyMonkey’s
2012
presidential
election
poll…
.
This report contains our newest wave of data from the 600,000 people who responded to our
presidential election poll from October 3rd through November 2nd. Results will be displayed in
two different ways: first, as popular vote percentages and second as Electoral College
distributions. With this data, we seek to show that internet data is as good as phone data (if not
better) at assessing public opinion.
1
a
few
notes
about
our
data
Understand
the
data
that
we’re
reporting.
Why
does
all
the
data
begin
on
10/10
rather
than
10/3?
The data reported below begins at 10/10 due to the fact that we chose to use a seven-day
trailing sum. This was done for three main reasons. First, all publicly available polls
report data using trailing sums as well. Matching their methodology in this way will
facilitate comparisons between SurveyMonkey and other polling firms. This provides a
reality check for how well SurveyMonkey is doing measuring public opinion. Second,
using a trailing sum, rather than a daily measure, provides a statistic that is less swayed
by any single day’s events. Essentially, averaging over a week’s worth of data smoothes
out and otherwise jagged curve. Lastly, for analyses at the state level, using more than
one day of data gives us a larger sample that increases the power and accuracy of our
analyses.
Why
is
only
weekday
data
reported?
It is also important to note that all results that will be reported below exclude weekend
data. This was done for two reasons. First, we observed that the graphs of our raw, daily
data showed spikes every weekend that were aberrant from the trend line, and from
publicly available polling data. We speculate that this is due to two main problems. First,
our traffic volume is much lower on weekends, with traffic sinking as low as 15% of
typical weekday traffic. This lower volume makes our results more susceptible to
outliers. Second, we have found in prior studies of our SurveyMonkey traffic that the
people who take surveys on weekends are often not representative of the general U.S.
population and, consequently, qualitatively different from those who take surveys on
weekdays.
2
model
1:
a
RAW
look
This
model
provides
a
transparent
view
of
the
data.
RATIONALE: We have included this model not because we think it will accurately predict what
happens on Election Day, but because we want to be as transparent as possible
about our methodology.
WEIGHTS: None. Other than excluding weekends and using a 7-day trailing sum, this is
purely raw data. No corrections. No weighting.
RESULTS
As can be seen in the graph below the raw results from our survey suggest that the two
candidates standing in the Electoral College has flipped back and forth almost daily. This is
strikingly different from all other polls, which have had Obama consistently ahead in the
Electoral College for October. This inconsistency of electoral college projections was the main
reason that we pursued weighted models rather than merely reporting our raw data. As of Friday
(11/2), Model #1 predicts: Obama, 266; Romney, 272.
The above graph was created through a forced choice for each state between the candidates.
Separating toss up states provides a glimpse into why SurveyMonkey’s numbers show a tighter
race. RCP uses a 5% margin of error to determine if a state is a clear win for either candidate.
SurveyMonkey, on the other hand, uses a slimmer 3% margin of error. Overall the graphs below
show that SurveyMonkey has roughly half the number of toss up states that RCP does, with more
of these going to Romney than Obama. This accounts for why this model estimates a much
higher number of electoral votes for Romney than other polls.
3
Although the Electoral College decides the election, the popular vote is also of interest. Because
we oversampled swing states to be able to conduct analyses at the state-level, the proportions of
states in our sample relative to their representation in the population of American voters varied
wildly. Additionally, due to low traffic, some states were under-represented in our sample. For
example, the percentage of voters from Ohio was inflated, because we directed more respondents
to our survey there—and percentage of voters from North Dakota was lower, as we directed less
traffic there. Thus, publicly available statistics were used to adjust the weights of the state
popular vote totals so that they accurately reflected the proportions of U.S. voter turnout by state
in 2008.
Unsurprisingly, given that SurveyMonkey’s electoral college shows an inconsistent margin of
victory for Obama than other polls do, the SurveyMonkey popular vote total shows a lower
margin of Obama supporters than other polls have.
4
model
2:
the
“HOW”
correction
This
model
corrects
for
sampling
method.
RATIONALE: The anonymity of internet polling is a blessing and a curse. Because the person
being polled has anonymity, he or she is free to respond without feeling self-
conscious. This minimizes the demand characteristics of phone polls to change
their answer in response to what they think the phone pollster wants to hear.
When people are answering surveys online, as opposed to on the phone—they are
“talking” to a computer instead of a real, live person. This matters because
research has shown that when speaking with a real, live person, respondents are
more concerned about what that person thinks of them. This makes respondents
less willing to say “I don’t know,” when asked who they would vote for, because
it would suggest that they haven’t thought about the election much.
Unfortunately, this anonymity can also artificially inflate “don’t know” responses
making accurate predictions tougher to make. Moreover, anonymity can also lead
to people not taking the survey seriously enough, randomly clicking responses or
not thinking through the questions sufficiently.
WEIGHTS:
• Leaning
voters:
The “don’t know” response percentage in the SurveyMonkey
dataset was much higher than that of the average phone poll (9% versus 5%).
Consequently, we used a question that asked what candidate voters were “leaning
towards” to add a small subset of otherwise undecided voters to the results.
• Volatility:
Each day was compared to the previous day to compute a “volatility”
index. This weight was applied to the day’s average so that more consistent days
were weighted more heavily. This makes our averages less susceptible to random
error and “satisficers” (people who don’t take online surveys seriously).
RESULTS
Although RCP and Nate Silver’s “fivethirtyeight” blog have consistently predicted an Obama
victory in the Electoral College by a fairly wide margin, Model # 2 shows a much tighter race.
As can be seen in the graph below SurveyMonkey results suggest that if the election had been
held anytime between 10/10 to 10/18, Mitt Romney would have won. Beginning on 10/18,
however, all the way through Friday, Barack Obama has regained the edge in the Electoral
College. As of Friday (11/2), Model #2 predicts: Obama, 272; Romney, 266.
5
Again, the above graph was created through a forced choice for each state between the
candidates. Separating toss up states provides a glimpse into why SurveyMonkey’s numbers
show a tighter race. Overall the graphs below show that Model #2 has roughly half the number of
toss up states that RCP does, with 50% of these going to Obama and 50% to Romney.
Despite the fact that SurveyMonkey’s electoral college shows a thinner margin of victory for
Obama than other polls do, the SurveyMonkey popular vote total shows a greater margin of
Obama supporters than other polls have. Thus, while other polls indicate that Romney is ahead in
the popular vote, SurveyMonkey data indicates that Obama is actually in the lead. Model #2’s
estimation of the popular vote mirrors Nate Silver’s popular vote estimation more closely than
RCP’s estimation.
6
model
3:
the
“WHO”
correction
This
model
corrects
for
sampling
frame.
RATIONALE: Whether you’re reaching people through their computer or their phone, having
them answer your survey does not guarantee that they are going to show up at the
polls on Election Day. The people who respond to surveys (whether on the
internet or on the phone) and the people who show up to vote are not exactly the
same set of people.
WEIGHTS:
• Party
ID:
Using voter turnout statistics from 2008, we adjusted the proportions of
Democrats, Republicans, and Independents in our sample. A state was coded as too
“blue” or too “red” and the vote of Republicans or Democrats respectively was weighted
heavier to even out the percentage. This correction was applied within a 5% margin of
error, as this is the typical polling error.
• Education:
Having adjusted on party ideology, we then performed a mathematical
correction for the representation of educational level (see Appendix for the question
options) in the population of U.S. voters.
• Undecideds:
Finally, we eliminated any voters who responded “don’t know” twice
when asked who to vote for. If a voter is not leaning towards any political candidate only
a few days before the election, chances are low that they will vote at all, and if they do
they should be equally split between the two candidates. Eliminating these truly
undecided voters from our sample allowed for a more realistic estimate of the popular
vote.
RESULTS
Model #3 predicts a consistent victory for Obama over the past month—even when he was
trailing in the popular vote. Unlike Model #2, which is more conservative in its Electoral College
estimations than both RCP and Nate Silver, Model #3 predicts a wider margin of victory than
either. The electoral vote estimations of Model 3 more closely mirror Nate Silver’s estimations
(more so than RCP). Nevertheless, there is a striking difference in our graph for 10/22-10/25,
which shows Romney briefly ahead in the electoral college. As of Friday (11/2), Model #3
predicts: Obama, 305; Romney, 233.
Again, the above graph was created through a forced choice for each state between the
candidates. Separating toss up states provides a glimpse into why SurveyMonkey’s numbers
7
show a bigger lead for Obama. Overall the graphs below show that SurveyMonkey has roughly
half the number of toss up states that RCP does, but the majority of these tossup states tend to be
attributed to Obama in a forced-choice scenario, creating a wide lead for Obama.
Despite the fact that SurveyMonkey’s electoral college shows a thinner margin of victory for
Obama than RCP polls do, the SurveyMonkey popular vote total shows a greater margin of
Obama supporters than RCP polls have. Thus, while RCP polls indicate that Romney is ahead in
the popular vote, SurveyMonkey data indicates that Obama is actually in the lead.
8
calling
the
race
Ultimately, each model is only as good as the calls it makes on the Electoral College and the
overall popular vote percentages. Below are the electoral map predictions for each model and the
estimations of the popular vote for each. Key differences in swing states are highlighted.
MODEL
#1:
RAW
TALLY:
OBAMA
266
ROMNEY
272
KEY
TOSSUPS:
CO
IA
NH
FL
NC
NV
OH
VA
POPULAR
VOTE:
OBAMA
47.38
%
ROMNEY
46.24%
MODEL
#2:
HOW
TALLY:
OBAMA
272
ROMNEY
266
KEY
TOSSUPS:
CO
IA
NH
NV
FL
NC
OH
VA
POPULAR
VOTE:
OBAMA
48.33%
ROMNEY
47.11%
MODEL
#3:
WHO
TALLY:
OBAMA
305
ROMNEY
233
KEY
TOSSUPS:
CO
IA
NC
NH
NV
OH
FL
VA
POPULAR
VOTE:
OBAMA
49.46%
ROMNEY
47.51%
9
MODEL
SUMMARY:
TOSSUPS
To provide the best possible prediction, we looked at our three models to determine which states
should be labeled definitively as “tossups”. If a state was predicted differently in different
models, or if the difference in Obama and Romney votes was less than 2% in any given state, we
determined that it was too close to call. This led to the following overall prediction…
ELECTORATE:
OBAMA
250
ROMNEY
220
TOSSUPS
68
KEY
TOSSUPS:
IA
NC
NV
OH
VA
WI
POPULAR
VOTE:
OBAMA
48.90%
ROMNEY
47.31%
(average
of
Model
#2
&
Model
#3)
A
final
note
on
swing
states:
It is important to note that we do not consider Colorado, Florida,
New Hampshire, and Pennsylvania swing states. Our data has shown consistent advantages for
Obama in Colorado, New Hampshire, and Pennsylvania and a consistent advantage for Romney
in Florida. We have only six toss up states, nearly half the number of RCP. Among our three
previous models, there are only three states that vary among them, accounting for the electoral
differentials. Thus, regardless of which model is used, 48 out of 51 electorates stay consistent.
OUR
PICK?
MODEL
#3
RATIONALE:
Model #3 accounts for the differential of polled and actual voters without getting
caught up in the pros and cons of an internet sample in particular. It is similar, but not identical
to what other pollsters are saying and has shown itself to be consistently ahead of the curve of
other polls for the past month.
10
Appendix
–
Questionnaire
Voting Registration.
• Are
you
currently
a
registered
and
eligible
voter,
8.How
important
is
the
presidential
election
to
or
not?
you?
Yes
Extremely
important
No
Very
important
Somewhat
important
Zip Code. Slightly
important
• What
is
the
five-‐digit
zip
code
for
the
address
Not
at
all
important
you
registered
to
vote
from,
or
if
you’re
not
9. If
the
election
were
held
tomorrow,
would
you
registered
to
vote,
what
is
the
zip
code
you
know
where
to
go
vote?
would
use?
Yes
[open-‐ended]
No
10. How
often
would
you
say
you
vote
–
always,
Voting Likelihood. nearly
always,
part
of
the
time,
or
seldom?
1. How
much
thought
have
you
given
to
the
Always
upcoming
election
for
president?
Nearly
always
Quite
a
lot
Part
of
the
time
Some
Seldom
Only
a
little
Never
None
Don’t
know
Don’t
know
11. Thinking
back
to
the
elections
held
for
Congress
2. Do
you
happen
to
know
where
people
who
live
in
November
2010,
did
you
vote?
in
your
neighborhood
go
to
vote?
Yes,
voted
Yes
No,
did
not
vote
No
Don’t
know
Voting Preference.
3. Have
you
ever
voted
in
your
precinct
or
election
• Suppose
the
presidential
election
were
held
district?
today.
Who
would
you
be
likely
to
vote
for?
Yes
Barack
Obama
No
Mitt
Romney
Don’t
know
Don’t
know
/
Other
4. Do
you,
yourself,
plan
to
vote
in
the
election
this
• Which
candidate
are
you
leaning
towards?
November,
or
not?
Barack
Obama
Yes
Mitt
Romney
No
Other
Don’t
know
Don’t
know
5. How
certain
are
you
that
you
will
vote?
Absolutely
certain
Demographics.
Fairly
certain
• Generally
speaking
do
you
usually
think
of
Not
certain
yourself
as
a
Republican,
a
Democrat,
an
Don’t
know
Independent
or
something
else?
6. How
likely
are
you
to
vote
in
November’s
Democrat
presidential
election?
Republican
Extremely
likely
Independent
Very
likely
Something
else
Somewhat
likely
• What
is
the
highest
level
of
school
you
have
Slightly
likely
completed
or
the
highest
degree
you
have
Not
at
all
likely
received?
7. Thinking
back
to
the
elections
held
for
Congress
Less
than
high
school
degree
in
November
2010,
did
things
come
up
that
kept
High
school
degree
or
equivalent
you
from
voting,
or
did
you
happen
to
vote?
Some
college
but
no
degree
Yes,
voted
Associate
degree
No,
did
not
vote
Bachelor
degree
Don’t
know
Graduate
degree
Get documents about "