An Introduction to Statistics and SPSS

Document Sample

```					          PSYM021
Introduction to Methods & Statistics

Week Four: Statistical techniques II

Cris Burgess
Last week’s assignment
   All designs were between-subjects (non-repeated measures)
   Contrast coefficients?
   Questions ?
This week

   Polynomial contrasts – trends in the data
   Two-way ANOVA – interpreting contrasts
   Repeated-measures ANOVA – “Sphericity”
   Correlation
   Simple Regression
Polynomial contrasts

18
• “Does recall fall                            16
with delay, but then                         14
recover ?”                                   12

Recall performance
8
6
4
2
0
0       1     2        3   4   5

Increasing Delay
Polynomial contrasts

16
• “Does recall fall                           14
with delay, but
12
only once a critical
10
point is reached ?”
Recall performance
8
• Cubic trend                                 6
4
2
0
0         2        4   6   8

Increasing Delay
Polynomial trends
16
14
14
12
10
Linear      12                                        Cubic
10
8
8
6                                                    6
4                                                    4
2                                                    2
0                                                    0
0       2       4       6       8                    0       2       4       6        8

16
18
14
16
14
Quartic
12                                                   10
10                                                   8
8                                                   6
6
4
4
2                                                   2
0                                                   0
0   1       2       3       4   5                    0   2       4       6       8   10
Two-way ANOVA: Driving study
30      28                                                         30
26
25                 24
25   22

mean accuracy
20                                                                 20
Accuracy

16
15                                                                 15          14
10
10                                                                 10

5                                         4
5

0                                                                  0
Day-clear Night-clear Day-foggy Night-foggy                        Day   Night   Clear   Foggy

• Average performance in Day driving (mean = 22) is better than that for
Night driving (mean = 14)
• Average performance in Clear driving (mean = 26) is better than that
for Foggy driving (mean = 10)
• The drop in performance produced by fogginess is greater at Night
(20-point drop) than it is for Day driving (12-point drop).

How can we provide statistical support for these observations?
Results of Two-way ANOVA: Driving study

-

II I
qduifg
0
3
3
70 a

I
0 n
1
0
00
1
3
7
0
3
1
3
7
0
3
1
3
7
1
3
8
0
0
2
0
1
0
a

The contrast    Significance
F-ratios         levels
Implications

   This means that you can use ANOVA to examine the
independent effects on your data of 2, 3, 4 or more different
independent variables
   In the SPSS output from such analyses, the so-called main
effects tell you whether each variable has an independent effect
(by itself) on the dependent variable
   The interactions tell you how the effect of each independent
variable is itself altered by the influence of the others
Interpreting main effects and interactions
   What does it actually mean to say that
   “the TIME main effect was significant”,
   “the WEATHER main effect was significant”
   And “that the TIME by WEATHER interaction was
significant”?
   To answer this, it helps to plot the results...
Distance estimate

10
9
8
7
6
Day
5
Night
4
3
2
1
0
Clear     Foggy
10
9
8
TIME main effect                          7
6
Day
Means average Day performance             5
Night
4
is reliably different from average        3
Night performance                         2
1
0
Clear                  Foggy
10
9
8
WEATHER main effect                       7
6
Means average performance in Clear        5
Day
Night
is reliably different from average        4
3
performance in Foggy conditions           2
1
0
Clear                  Foggy
10
9
TIME by WEATHER interaction               8
7
Means the Day/Night difference in         6
Day
Clear conditions is reliably different    5
4
Night

from that in Foggy conditions             3
2
1       These differ
0
Clear                  Foggy
Graphs of possible non-interactions
(parallel graphs)

10                         7.2
9                           7
8                         6.8
7                         6.6
6
6.4Day                      Day
5
6.2Night                    Night
4
3                           6
2                         5.8
1                         5.6
0                         5.4
Clear             Foggy     Clear             Foggy

10                         10
9                          9
8                          8
7                          7
6                          6
Day                      Day
5                          5
Night                    Night
4                          4
3                          3
2                          2
1                          1
0                          0
Clear             Foggy    Clear              Foggy
Graphs of possible interactions
(non-parallel graphs)

10                       10
9                        9
8                        8
7                        7
6                        6
Day                    Day
5                        5
Night                  Night
4                        4
3                        3
2                        2
1                        1
0                        0
Clear           Foggy    Clear            Foggy

10                       6
9
8                       5
7                       4
6
Day                   Day
5                       3
Night                 Night
4
3                       2
2                       1
1
0                       0
Clear           Foggy   Clear             Foggy
Plots in ANOVA

Estimated Marginal Means of psychotic

environment
8.00
friendly
neutral
hostile
7.50
Estimated Marginal Means

7.00

6.50

6.00

5.50

5.00

normal sleep       30 hours     70 hours   100 hours
deprivation
Describing main effects and interactions

(1) Distance estimates are affected by whether people are driving
during day or at night (the TIME main effect)
(2) Distance estimates are affected by whether people are driving
in foggy or clear conditions (the WEATHER main effect)
(3) The average difference in Day compared with Night estimates
is itself affected by whether people are driving in foggy or
clear conditions (the TIME*WEATHER interaction)
Be clear!

   For example, it is wrong to describe the interaction as
showing that:
“distance estimates are affected by both WEATHER and
TIME of day at which the test is done”
   This could easily be a description of a completely different
experimental outcome:
(1) “estimates are affected by WEATHER”;
(2) “estimates are affected by TIME of day” but
(3) No interaction.
Summary

   Main effects tell you something about the differences in
performance that occur when an independent variable is
manipulated (e.g. effect of day vs. night, or of foggy vs.
clear)
   The interaction tells you about differences between
differences (or more generally between the profiles of the
effects of an independent variable).
Repeated-measures ANOVA

• Data for each new experimental condition is provided by
testing a completely new and independent set of subjects
• “Non-repeated measures” or “Between subjects” design
• Independent or Non-repeated Measures ANOVA

• When subjects are tested on two or more occasions
• “Repeated measures” or “Within subjects” design
• Repeated Measures ANOVA
• But, …Houston, we have a problem….
• “Sphericity”
Problems with Sphericity

   Only relevant to repeated measures
   Not necessary for contrasts (not even contrasts using repeated
measures variables)
Rough Definition...
Delay                                Differences
D1    D2      D3                  D1-D2      D1-D3      D2-D3

S1   10     9      8            S1        1         2         1
S2   10     8     10            S2        2         0        -2
S3    9     9      8            S3        0         1         1
S4   10     5      1            S4        5         9         4
S5    9     6      0            S5        3         9         6
S6    8     5      0            S6        3         8         5
S7    9     1      0            S7        8         9         1
S8   10     0      1            S8        10        9        -1
S9   10     2      0            S9        8        10         2

In other words, we assume that the effect     For Sphericity, standard
of the manipulation (in this case ‘delay’)    deviations of these three
is the approximately same for all              columns must be equal
participants
Assessing Sphericity

   One approach is to use “Mauchly’s test of Sphericity”
   A significant Mauchly W indicates that the sphericity
assumption has been violated
   “Significant W = trouble”
   But… Mauchly’s test is innaccurate
   Routinely given as the first step of SPSS output
   Ignore Mauchly’s test table because it is not accurate
   What do we do instead?
   Greenhouse-Geisser or “lower bound” test
Dealing with departures from Sphericity
assumptions

   Worst Case Scenario:
   This assumes that the violation of sphericity is as bad as it
could possibly be
   In other words, each participant is affected entirely differently by
the manipulation
   This is known as the “Lower Bound” test or Greenhouse-
Geisser Conservative test
   For ANOVA procedures with Repeated-measures IVs, four
different F-ratios and p-values are reported.
Dealing with departures from Sphericity assumptions
SPSS printout:

II
if

Result is highly significant if it is
safe to assume there is no Sphericity violation
But only just significant if we assume
the worst possible violation of the assumption
G-G and H-F significance levels are intermediate
Dealing with departures from Sphericity assumptions

II
if

Notice that the only difference
between the four tests lies in the   ..and there is no difference in
degrees of freedom                   the F-ratios themselves

   Sphericity will be covered in more detail in PSYM022
   But …
   Undergraduate lectures on ‘Sphericity’ take place on:
   Tuesday Nov 14th and 21st
   Location: Newman E, 2:00-3:00pm
   You are advised to attend these !
Summary
   Non-parametric tests are limited in their ability to provide the
experimenter with grounds for drawing conclusions - parametric
tests provide more detailed information
   ‘Tests of difference’ use a statistic that reflects a ‘signal to noise’
ratio, or how much variance in the DV is accounted for by the IV,
compared with the what is left
   The only fundamental difference between a t-test and ANOVA is
the number of levels in the Independent Variable (IV)
   T-tests: IV has two levels; ANOVA: IV has three or more levels (or
two or more IVs with 2+ levels)
   We can combine a number of IVs together in the same ANOVA
procedure (two-way, three-way etc.), identifying their individual
and combined (interaction) effects on the DV
Break
   If I needed a drink last week, today I need a swim…
   Five minutes – please be prompt
Test of association - Correlation

   A correlation measures the “degree of association” between
two variables (interval or ordinal)
   Associations can be positive (an increase in one variable is
associated with an increase in the other) or negative (an
increase in one variable is associated with a decrease in the
other)
   Correlation is measured in “r” (parametric, Pearson’s) or “ρ”
(non-parametric, Spearman’s)
Test of association - Correlation

    Compare two continuous variables in terms of degree of
association
 e.g. attitude scale vs behavioural frequency

300
250

250
200
200

150
150

100
100

50                                          50

0                                            0
0   50   100   150   200   250   300
0   50   100   150   200   250

Positive                                    Negative
Test of association - Correlation
    Test statistic is “r” (parametric) or “” (non-parametric)
0 (random distribution, zero correlation)
1 (perfect correlation)
180                                       160

160                                       140
140
120
120
100
100
80
80
60
60

40                                    40

20                                    20

0                                     0
0   50   100   150   200   250         0   50   100     150   200   250

High                                        Low
Test of association - Correlation
    Test statistic is “r” (parametric) or “” (non-parametric)
0 (random distribution, zero correlation)
1 (perfect correlation)

180                                        200

160                                        180

160
140
140
120
120
100
100
80
80
60
60
40                                     40

20                                     20

0                                      0
0   50   100    150   200   250         0   50   100   150   200   250

High                                     Zero
Correlation: Height vs Weight

Graph One: Relationship between Height
and Weight
   Strong positive correlation
between height and weight
180
160
   Can see how the relationship
140
120                                                      works, but cannot predict one
Weight (kgs)

100
80
from the other
60
40                                                      If 120cm tall, then how
20                                                       heavy?
0
0           50       100        150      200
Height (cms)
Example: Symptom Index vs Drug A

Graph Two: Relationship between Symptom
Index and Drug A
   Strong negative correlation
160                                                    Can see how relationship
140                                                     works, but cannot make
120
predictions
Symptom Index

100
80                                                    What Symptom Index might
60
40
we predict for a standard dose
20                                                     of 150mg?
0
0      50      100     150        200   250
Drug A (dose in mg)
Example: Symptom Index vs Drug A

Graph Three: Relationship between
Symptom Index and Drug A
   “Best fit line”
(with best-fit line)
   Allows us to describe
180                                                      relationship between variables
160
140                                                      more accurately.
Symptom Index

120
100                                                     We can now predict specific
80
60                                                      values of one variable from
40
20
knowledge of the other
0
0       50      100     150        200   250      All points are close to the line
Drug A (dose in mg)
Example: Symptom Index vs Drug B

Graph Four: Relationship between Symptom
Index and Drug B                     We can still predict specific
(with best-fit line)
values of one variable from
160
140                                                     knowledge of the other
120
Symptom Index

100                                                    Will predictions be as accurate?
80
60                                                     Why not?
40
20                                                     “Residuals”
0
0      50      100     150        200   250
Drug B (dose in mg)
Simple Regression
How best to summarise the data?

160                                                              180

140                                                              160

140
120
120

Symptom Index
Symptom Index

100
100
80
80
60
60
40                                                               40
20                                                               20

0                                                                0
0   50      100     150        200   250                         0   50      100     150        200   250
Drug A (dose in mg)                                              Drug A (dose in mg)

Adding a best-fit line allows us to describe data simply
General Linear Model (GLM)
How best to summarise the data?

   Establish equation for the best-fit line:
y = bx + a          200

180

160

140

Where: a = y intercept (constant)          120

100

b = slope of best-fit line           80

60
y = dependent variable               40

20
x = independent variable              0
0   50   100   150   200   250
Simple Regression
Terminology

   Establish equation for the best-fit line:
y = bx + a

   “Best-fit” line same as “Regression” line
   b is the “regression coefficient” for x
   x is the “predictor” or “regressor” variable for y
Simple Regression
R2 - “Goodness of fit”

   For simple regression, R2 is the square of the correlation
coefficient
   Reflects variance accounted for in data by the best-fit line
   Takes values between 0 (0%) and 1 (100%)
   Frequently expressed as percentage, rather than decimal
   High values show good fit, low values show poor fit
Simple Regression
Low values of R2

300
DV                                              R2 = 0
250
(0% - randomly scattered
200                                         points, no apparent
150                                         relationship between X
and Y)
100

50
   Implies that a best-fit line
will be a very poor
0                                          description of data
0         100      200      300
IV (regressor, predictor)
Simple Regression
High values of R2
300

250

200
   R2 = 1
DV

150

100                                                   (100% - points lie directly
50
on the line - perfect
0
0        100           200         300       relationship between X
IV
and Y)
250

200                                                   Implies that a best-fit line
150                                                    will be a very good
DV

100                                                    description of data
50

0
0       50    100        150     200   250

IV
Simple Regression
R2 - “Goodness of fit”
180                                                                 160
160                                                                 140
140
120
120

S ymptom Index
S ymptom Index

100
100
80
80
60
60

40                                                                 40

20                                                                 20

0                                                                  0
0     50      100     150        200   250                          0   50      100     150        200   250
Drug A (dose in mg)                                               Drug B (dose in mg)

Good fit  R2 high                                          Moderate fit  R2 lower
High variance explained                                              Less variance explained
Simple Regression
Significance test

   Simple regression uses a t-test to establish whether or not
the model describes a significant proportion of the variance
in the data
   This tests is reported in the SPSS output
Simple Regression
R2 - “Goodness of fit”

Model Summary

Model      R       R Square   R Square    the Estimate
1           .721 a     .520       .399       17.70134
a. Predictors: (Constant), AGE, GENDER, INCOME

   R2 is reported in the first table in the SPSS output
   Expressed as a decimal, but can be reported as a percentage
   0.520 = 52%
How to establish the equation for the best-fit line?
Coefficientsa

Unstandardized         Standardized
Coefficients          Coefficients
Model                    B         Std. Error       Beta          t        Sig.
1       (Constant)      68.285       15.444                      4.421       .001
INCOME       -9.34E-02          .029            -.682   -3.178       .008
GENDER           3.306         8.942             .075       .370     .718
AGE               -.162         .344            -.101     -.470      .646
a. Dependent Variable: DEPRESS

   SPSS output table entitled Coefficients
   Column headed Unstandardised coefficients - B
   Gives regression coefficient for each regressor variable (IV)
   Coefficient for AGE = -0.162
   Constant = 68.285
   DEPRESS = -0.162 AGE + 68.285

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 18 posted: 2/8/2012 language: English pages: 45