Docstoc

An Introduction to Statistics and SPSS

Document Sample
An Introduction to Statistics and SPSS Powered By Docstoc
					          PSYM021
Introduction to Methods & Statistics


Week Four: Statistical techniques II

            Cris Burgess
              Last week’s assignment
   All designs were between-subjects (non-repeated measures)
   Contrast coefficients?
   Questions ?
                         This week

   Polynomial contrasts – trends in the data
   Two-way ANOVA – interpreting contrasts
   Repeated-measures ANOVA – “Sphericity”
   Correlation
   Simple Regression
                    Polynomial contrasts

                                               18
• “Does recall fall                            16
  with delay, but then                         14
  recover ?”                                   12




                          Recall performance
• Quadratic trend                              10
                                               8
                                               6
                                               4
                                               2
                                               0
                                                    0       1     2        3   4   5


                                                        Increasing Delay
                    Polynomial contrasts

                                              16
• “Does recall fall                           14
  with delay, but
                                              12
  only once a critical
                                              10
  point is reached ?”
                         Recall performance
                                              8
• Cubic trend                                 6
                                              4
                                              2
                                              0
                                                   0         2        4   6   8


                                                       Increasing Delay
                                         Polynomial trends
                                                     16
14
                                                     14
12
10
                                         Linear      12                                        Cubic
                                                     10
8
                                                     8
6                                                    6
4                                                    4
2                                                    2
0                                                    0
     0       2       4       6       8                    0       2       4       6        8



                                                     16
18
                                                     14
16
14
                                         Quadratic   12
                                                                                               Quartic
12                                                   10
10                                                   8
 8                                                   6
 6
                                                     4
 4
 2                                                   2
 0                                                   0
     0   1       2       3       4   5                    0   2       4       6       8   10
                           Two-way ANOVA: Driving study
           30      28                                                         30
                                                                                                  26
           25                 24
                                                                              25   22




                                                              mean accuracy
           20                                                                 20
Accuracy




                                         16
           15                                                                 15          14
                                                                                                          10
           10                                                                 10

           5                                         4
                                                                              5

           0                                                                  0
                Day-clear Night-clear Day-foggy Night-foggy                        Day   Night   Clear   Foggy

• Average performance in Day driving (mean = 22) is better than that for
  Night driving (mean = 14)
• Average performance in Clear driving (mean = 26) is better than that
  for Foggy driving (mean = 10)
• The drop in performance produced by fogginess is greater at Night
  (20-point drop) than it is for Day driving (12-point drop).

How can we provide statistical support for these observations?
  Results of Two-way ANOVA: Driving study

                                  -


 II I
qduifg
0
3
3
70 a

  I
0 n
1
0
00
1
3
7
0
3
1
3
7
0
3
1
3
7
1
3
8
0
0
2
0
1
0
  a



                   The contrast    Significance
                     F-ratios         levels
                         Implications

   This means that you can use ANOVA to examine the
    independent effects on your data of 2, 3, 4 or more different
    independent variables
   In the SPSS output from such analyses, the so-called main
    effects tell you whether each variable has an independent effect
    (by itself) on the dependent variable
   The interactions tell you how the effect of each independent
    variable is itself altered by the influence of the others
         Interpreting main effects and interactions
   What does it actually mean to say that
        “the TIME main effect was significant”,
        “the WEATHER main effect was significant”
        And “that the TIME by WEATHER interaction was
         significant”?
   To answer this, it helps to plot the results...
              Distance estimate




                                  10
                                   9
                                   8
                                   7
                                   6
                                                      Day
                                   5
                                                      Night
                                   4
                                   3
                                   2
                                   1
                                   0
                                   Clear     Foggy
                                         10
                                          9
                                          8
TIME main effect                          7
                                          6
                                                                         Day
Means average Day performance             5
                                                                         Night
                                          4
is reliably different from average        3
Night performance                         2
                                          1
                                          0
                                          Clear                  Foggy
                                         10
                                          9
                                          8
WEATHER main effect                       7
                                          6
Means average performance in Clear        5
                                                                         Day
                                                                         Night
is reliably different from average        4
                                          3
performance in Foggy conditions           2
                                          1
                                          0
                                          Clear                  Foggy
                                         10
                                          9
TIME by WEATHER interaction               8
                                          7
Means the Day/Night difference in         6
                                                                         Day
Clear conditions is reliably different    5
                                          4
                                                                         Night

from that in Foggy conditions             3
                                          2
                                          1       These differ
                                          0
                                          Clear                  Foggy
         Graphs of possible non-interactions
                 (parallel graphs)

10                         7.2
 9                           7
 8                         6.8
 7                         6.6
 6
                           6.4Day                      Day
 5
                           6.2Night                    Night
 4
 3                           6
 2                         5.8
 1                         5.6
 0                         5.4
 Clear             Foggy     Clear             Foggy

10                         10
 9                          9
 8                          8
 7                          7
 6                          6
                              Day                      Day
 5                          5
                              Night                    Night
 4                          4
 3                          3
 2                          2
 1                          1
 0                          0
 Clear             Foggy    Clear              Foggy
         Graphs of possible interactions
             (non-parallel graphs)

10                       10
 9                        9
 8                        8
 7                        7
 6                        6
                            Day                    Day
 5                        5
                            Night                  Night
 4                        4
 3                        3
 2                        2
 1                        1
 0                        0
 Clear           Foggy    Clear            Foggy

10                       6
 9
 8                       5
 7                       4
 6
                             Day                   Day
 5                       3
                             Night                 Night
 4
 3                       2
 2                       1
 1
 0                       0
 Clear           Foggy   Clear             Foggy
                                                     Plots in ANOVA

                                                 Estimated Marginal Means of psychotic


                                                                                         environment
                           8.00
                                                                                              friendly
                                                                                              neutral
                                                                                              hostile
                           7.50
Estimated Marginal Means




                           7.00



                           6.50



                           6.00



                           5.50



                           5.00


                                  normal sleep       30 hours     70 hours   100 hours
                                                          deprivation
       Describing main effects and interactions

(1) Distance estimates are affected by whether people are driving
    during day or at night (the TIME main effect)
(2) Distance estimates are affected by whether people are driving
    in foggy or clear conditions (the WEATHER main effect)
(3) The average difference in Day compared with Night estimates
    is itself affected by whether people are driving in foggy or
    clear conditions (the TIME*WEATHER interaction)
                             Be clear!

   For example, it is wrong to describe the interaction as
    showing that:
      “distance estimates are affected by both WEATHER and
       TIME of day at which the test is done”
   This could easily be a description of a completely different
    experimental outcome:
       (1) “estimates are affected by WEATHER”;
       (2) “estimates are affected by TIME of day” but
       (3) No interaction.
                          Summary


   Main effects tell you something about the differences in
    performance that occur when an independent variable is
    manipulated (e.g. effect of day vs. night, or of foggy vs.
    clear)
   The interaction tells you about differences between
    differences (or more generally between the profiles of the
    effects of an independent variable).
            Repeated-measures ANOVA

• Data for each new experimental condition is provided by
  testing a completely new and independent set of subjects
• “Non-repeated measures” or “Between subjects” design
• Independent or Non-repeated Measures ANOVA

• When subjects are tested on two or more occasions
• “Repeated measures” or “Within subjects” design
• Repeated Measures ANOVA
• But, …Houston, we have a problem….
• “Sphericity”
                   Problems with Sphericity


   Only relevant to repeated measures
   Not necessary for contrasts (not even contrasts using repeated
    measures variables)
                        Rough Definition...
                Delay                                Differences
           D1    D2      D3                  D1-D2      D1-D3      D2-D3

      S1   10     9      8            S1        1         2         1
      S2   10     8     10            S2        2         0        -2
      S3    9     9      8            S3        0         1         1
      S4   10     5      1            S4        5         9         4
      S5    9     6      0            S5        3         9         6
      S6    8     5      0            S6        3         8         5
      S7    9     1      0            S7        8         9         1
      S8   10     0      1            S8        10        9        -1
      S9   10     2      0            S9        8        10         2


In other words, we assume that the effect     For Sphericity, standard
of the manipulation (in this case ‘delay’)    deviations of these three
is the approximately same for all              columns must be equal
participants
                      Assessing Sphericity

   One approach is to use “Mauchly’s test of Sphericity”
   A significant Mauchly W indicates that the sphericity
    assumption has been violated
   “Significant W = trouble”
   But… Mauchly’s test is innaccurate
   Routinely given as the first step of SPSS output
   Ignore Mauchly’s test table because it is not accurate
   What do we do instead?
   Greenhouse-Geisser or “lower bound” test
           Dealing with departures from Sphericity
                         assumptions

   Worst Case Scenario:
       This assumes that the violation of sphericity is as bad as it
        could possibly be
   In other words, each participant is affected entirely differently by
    the manipulation
   This is known as the “Lower Bound” test or Greenhouse-
    Geisser Conservative test
   For ANOVA procedures with Repeated-measures IVs, four
    different F-ratios and p-values are reported.
   Dealing with departures from Sphericity assumptions
SPSS printout:

II
 if




         Result is highly significant if it is
         safe to assume there is no Sphericity violation
          But only just significant if we assume
          the worst possible violation of the assumption
          G-G and H-F significance levels are intermediate
 Dealing with departures from Sphericity assumptions


II
 if




Notice that the only difference
between the four tests lies in the   ..and there is no difference in
degrees of freedom                   the F-ratios themselves
                  Undergraduate lectures

   Sphericity will be covered in more detail in PSYM022
   But …
   Undergraduate lectures on ‘Sphericity’ take place on:
   Tuesday Nov 14th and 21st
         Location: Newman E, 2:00-3:00pm
   You are advised to attend these !
                              Summary
   Non-parametric tests are limited in their ability to provide the
    experimenter with grounds for drawing conclusions - parametric
    tests provide more detailed information
   ‘Tests of difference’ use a statistic that reflects a ‘signal to noise’
    ratio, or how much variance in the DV is accounted for by the IV,
    compared with the what is left
   The only fundamental difference between a t-test and ANOVA is
    the number of levels in the Independent Variable (IV)
   T-tests: IV has two levels; ANOVA: IV has three or more levels (or
    two or more IVs with 2+ levels)
   We can combine a number of IVs together in the same ANOVA
    procedure (two-way, three-way etc.), identifying their individual
    and combined (interaction) effects on the DV
                           Break
   If I needed a drink last week, today I need a swim…
   Five minutes – please be prompt
             Test of association - Correlation

   A correlation measures the “degree of association” between
    two variables (interval or ordinal)
   Associations can be positive (an increase in one variable is
    associated with an increase in the other) or negative (an
    increase in one variable is associated with a decrease in the
    other)
   Correlation is measured in “r” (parametric, Pearson’s) or “ρ”
    (non-parametric, Spearman’s)
                         Test of association - Correlation

    Compare two continuous variables in terms of degree of
     association
       e.g. attitude scale vs behavioural frequency




    300
                                                 250

    250
                                                 200
    200

                                                 150
    150

                                                 100
    100


     50                                          50


     0                                            0
          0   50   100   150   200   250   300
                                                       0   50   100   150   200   250

                    Positive                                    Negative
                    Test of association - Correlation
    Test statistic is “r” (parametric) or “” (non-parametric)
      0 (random distribution, zero correlation)
      1 (perfect correlation)
180                                       160

160                                       140
140
                                          120
120
                                          100
100
                                          80
    80
                                          60
    60

    40                                    40

    20                                    20

     0                                     0
         0   50   100   150   200   250         0   50   100     150   200   250


                   High                                        Low
                    Test of association - Correlation
    Test statistic is “r” (parametric) or “” (non-parametric)
      0 (random distribution, zero correlation)
      1 (perfect correlation)

180                                        200

160                                        180

                                           160
140
                                           140
120
                                           120
100
                                           100
    80
                                           80
    60
                                           60
    40                                     40

    20                                     20

     0                                      0
         0   50   100    150   200   250         0   50   100   150   200   250

                  High                                     Zero
                                       Correlation: Height vs Weight


                         Graph One: Relationship between Height
                                      and Weight
                                                                       Strong positive correlation
                                                                        between height and weight
               180
               160
                                                                       Can see how the relationship
               140
               120                                                      works, but cannot predict one
Weight (kgs)




               100
               80
                                                                        from the other
               60
               40                                                      If 120cm tall, then how
               20                                                       heavy?
                0
                     0           50       100        150      200
                                      Height (cms)
                             Example: Symptom Index vs Drug A


                      Graph Two: Relationship between Symptom
                                 Index and Drug A
                                                                       Strong negative correlation
                160                                                    Can see how relationship
                140                                                     works, but cannot make
                120
                                                                        predictions
Symptom Index




                100
                 80                                                    What Symptom Index might
                 60
                 40
                                                                        we predict for a standard dose
                 20                                                     of 150mg?
                  0
                      0      50      100     150        200   250
                                  Drug A (dose in mg)
                              Example: Symptom Index vs Drug A


                          Graph Three: Relationship between
                             Symptom Index and Drug A
                                                                        “Best fit line”
                                  (with best-fit line)
                                                                        Allows us to describe
                180                                                      relationship between variables
                160
                140                                                      more accurately.
Symptom Index




                120
                100                                                     We can now predict specific
                 80
                 60                                                      values of one variable from
                 40
                 20
                                                                         knowledge of the other
                  0
                      0       50      100     150        200   250      All points are close to the line
                                   Drug A (dose in mg)
                              Example: Symptom Index vs Drug B


                      Graph Four: Relationship between Symptom
                                  Index and Drug B                     We can still predict specific
                                  (with best-fit line)
                                                                        values of one variable from
                160
                140                                                     knowledge of the other
                120
Symptom Index




                100                                                    Will predictions be as accurate?
                80
                60                                                     Why not?
                40
                20                                                     “Residuals”
                 0
                      0      50      100     150        200   250
                                  Drug B (dose in mg)
                                    Simple Regression
                               How best to summarise the data?

                160                                                              180

                140                                                              160

                                                                                 140
                120
                                                                                 120




                                                                 Symptom Index
Symptom Index




                100
                                                                                 100
                 80
                                                                                  80
                 60
                                                                                  60
                 40                                                               40
                 20                                                               20

                  0                                                                0
                      0   50      100     150        200   250                         0   50      100     150        200   250
                               Drug A (dose in mg)                                              Drug A (dose in mg)




                 Adding a best-fit line allows us to describe data simply
             General Linear Model (GLM)
             How best to summarise the data?

   Establish equation for the best-fit line:
                       y = bx + a          200

                                           180

                                           160

                                           140

Where: a = y intercept (constant)          120

                                           100

       b = slope of best-fit line           80

                                            60
       y = dependent variable               40

                                            20
       x = independent variable              0
                                                 0   50   100   150   200   250
                   Simple Regression
                      Terminology


   Establish equation for the best-fit line:
                           y = bx + a



   “Best-fit” line same as “Regression” line
   b is the “regression coefficient” for x
   x is the “predictor” or “regressor” variable for y
                      Simple Regression
                     R2 - “Goodness of fit”

   For simple regression, R2 is the square of the correlation
    coefficient
   Reflects variance accounted for in data by the best-fit line
   Takes values between 0 (0%) and 1 (100%)
   Frequently expressed as percentage, rather than decimal
   High values show good fit, low values show poor fit
                              Simple Regression
                               Low values of R2

     300
DV                                              R2 = 0
     250
                                                 (0% - randomly scattered
     200                                         points, no apparent
     150                                         relationship between X
                                                 and Y)
     100

      50
                                                Implies that a best-fit line
                                                 will be a very poor
      0                                          description of data
           0         100      200      300
               IV (regressor, predictor)
                                            Simple Regression
                                             High values of R2
      300


      250


      200
                                                           R2 = 1
DV




      150


      100                                                   (100% - points lie directly
       50
                                                            on the line - perfect
           0
               0        100           200         300       relationship between X
                               IV
                                                            and Y)
     250

     200                                                   Implies that a best-fit line
     150                                                    will be a very good
DV




     100                                                    description of data
     50

      0
           0       50    100        150     200   250

                               IV
                                                    Simple Regression
                                                   R2 - “Goodness of fit”
                 180                                                                 160
                 160                                                                 140
                 140
                                                                                     120
                 120




                                                                    S ymptom Index
S ymptom Index




                                                                                     100
                 100
                                                                                     80
                  80
                                                                                     60
                  60

                  40                                                                 40

                  20                                                                 20

                   0                                                                  0
                       0     50      100     150        200   250                          0   50      100     150        200   250
                                  Drug A (dose in mg)                                               Drug B (dose in mg)


                           Good fit  R2 high                                          Moderate fit  R2 lower
                  High variance explained                                              Less variance explained
                     Simple Regression
                       Significance test


   Simple regression uses a t-test to establish whether or not
    the model describes a significant proportion of the variance
    in the data
   This tests is reported in the SPSS output
                          Simple Regression
                         R2 - “Goodness of fit”

                                    Model Summary

                                               Adjusted     Std. Error of
                 Model      R       R Square   R Square    the Estimate
                 1           .721 a     .520       .399       17.70134
                   a. Predictors: (Constant), AGE, GENDER, INCOME




   R2 is reported in the first table in the SPSS output
   Expressed as a decimal, but can be reported as a percentage
   0.520 = 52%
    How to establish the equation for the best-fit line?
                                                 Coefficientsa

                                     Unstandardized         Standardized
                                       Coefficients          Coefficients
            Model                    B         Std. Error       Beta          t        Sig.
            1       (Constant)      68.285       15.444                      4.421       .001
                    INCOME       -9.34E-02          .029            -.682   -3.178       .008
                    GENDER           3.306         8.942             .075       .370     .718
                    AGE               -.162         .344            -.101     -.470      .646
              a. Dependent Variable: DEPRESS



   SPSS output table entitled Coefficients
   Column headed Unstandardised coefficients - B
   Gives regression coefficient for each regressor variable (IV)
       Coefficient for AGE = -0.162
       Constant = 68.285
   DEPRESS = -0.162 AGE + 68.285

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:18
posted:2/8/2012
language:English
pages:45