What if your predictors are categorical? by tQ7Cr31i

VIEWS: 0 PAGES: 32

									What if your predictors are
categorical?
   Or
   ANOVA as Regression

   Created by Amy Schweinle, PhD
   November 28, 2007
   For EDER 860
A note
My lecture on analyzing models with categorical predictors and
continuous dependent variables has been converted to PowerPoint.

I have added some audio comments on many (most, all?) slides.
Wherever you see an icon like this one, then click on it to hear what I
have to say. You get to turn on and off my voice as you wish. No, this
feature does not extend to class! Try it with the icon in the bottom
corner. Take a moment to adjust your volume.

Rather than setting this up to automatically play and progress through
slides, you can move through at your own pace and move back and
forth as you need.

Go make some popcorn, grab a pencil and some paper, sit back and
enjoy the show!
Getting started
   First, let’s review the format of the linear models
    we’ve used so far:
                   ˆ
                   Y  b0  b1 X
       Our goal was to find optimal weights for the intercept and
        slope to best predict Y.
       We used ordinary least squares methods to generate b0 and
        b1 to ensure that the squared residuals were minimized
        (least squared residuals – aha).
       This is the heart of the General Linear Model (GLM) – set
        up a linear model and find the best weights to predict a DV
        or set of DVs from an IV or set of IVs.
Explore the GLM
   Notice I didn’t make restrictions on the measurement
    scales for the IVs and DVs.
   You can use a GLM for continuous or categorical
    variables.
   We have only addressed continuous IVs and DVs.
   In logistic regression, we’ll address dichotomous DVs.
   In EDER 862, multivariate, we address categorical DVs
    specifically with the GLM as well as categorical IVs and
    combinations of categorical/continuous IVs and DVs.
    Exciting stuff! I know that your hands are tensing and feet
    are shaking with the excitement of such a class – but be
    patient and finish this class first.
    Play with categorical IVs
             Male                     69”               automatically starts



             Female                   64”
             All adults               66.5”



                       ˆ
                      Y  b0  b1 X sex          ˆ
                                                Y  b0  b1 X sex
                       ˆ
                      Y  69  (5) X            ˆ
                                                Y  64  5 X
                                          sex                        sex
Click this when initial audio starts
Recall if you will
   Think back to EDER 762 when you addressed
    ANOVA: analysis of variance
   We calculated the variance accounted for by group
    (e.g., sex) as a function of variance not accounted for
    (i.e., error or residual)
   Sound familiar? MSregression and MSerror!
   We just used different terms: MSbetween and MSwithin.
   So, ANOVA = Regression, the IVs are just
    categorical.
ANOVA = Regression
             ANOVA                           Regression
   Categorical IVs                    Continuous IVs
   SS between groups                  SS regression
   SS within groups or error          SS residual or error
   Interactions                       Interactions
   Non-linear relationships           Non-linear relationships
    h2           betweengroups
                   2                   R2
                                                       due to IV
                                                         2

             F 2                                F 2
                   within groups                     not due to IV


                 due to IV
                   2
             F 2
                not due to IV
How do you do it?
   Dummy Coding!


   Or effect coding
   Or contrast coding
   Or (I could go on)
   We’ll just address dummy coding, but know
    that others exist and serve different purposes.
Dummy Coding
   Use a system of 0s and 1s to identify group
    membership
   0 = male and 1 = female
   1 = experimental and 0 = control
   The numbers are symbols or codes for the
    group. These are nominal scale and do not
    reflect quantities or rank orders.
  Dummy Coding Example
   Experimental         Control
        20                10
        18                12
        17                11
        17                15                  X1  X 2            d
        13                17      t                               tn1  n2 2
    Sum = 85              65           x x
                                          2
                                          1         1 1
                                                    2
                                                      
                                                    2
    Mean = 17             13            n1  n2  2  n1 n2 
                                                           
   Sum sq = 26            34

                                          17  13
                                  t                      2.31
                                       26  34  1 
                                                 5
A hint, write down the means           5525 
of each group – they’ll show
up later.
Dummy Coding
                             X2           X3
     Subject       Y     Experimental   Control
       1           20         1           0
       2           18         1           0
       3           17         1           0
       4           17         1           0
       5           13         1           0
       6           10         0           1
       7           12         0           1
       8           11         0           1
       9           15         0           1
       10          17         0           1

  Mean =            15       0.5          0.5
  SS deviation =   100       2.5          2.5
Dummy Coding
                             X2           X3
     Subject       Y     Experimental   Control
       1           20         1           0
       2           18         1           0
       3           17         1           0
       4           17         1           0
       5           13         1           0
       6           10         0           1
       7           12         0           1
       8           11         0           1
       9           15         0           1
       10          17         0           1

  Mean =            15       0.5          0.5
  SS deviation =   100       2.5          2.5
   Use the regression formulas with either the
    X1 or X2 column as the IV and the Y column
    as the DV.
   See the next pages for the formulas.
                         Click here to see
 Click here to see
                        the results of these
the results of these
                          formulas if you
  formulas if you
                            used the X3
    used the X2
                              column.
      column.




                       Listen to audio after revealing
                       the above 2 columns.
Regression Equation
 ˆ
Y  a  bX 2           ˆ
                      Y  a  bX 3

 ˆ
Y  13  4 X 2         ˆ
                      Y  17  4 X 3


Y  13  41  17
 ˆ                    Y  17  41  13
                       ˆ

Y  13  40   13
 ˆ                    Y  17  40  17
                       ˆ
             If Experimental is coded as 1 and control as 0, this is the regression equation – notice
             the positive slope and that the line goes through the mean of each group (predicted
             values at those levels of X)


        21       Experimental
        19       Control
        17


        15
Score




        13


        11


        9


        7


        5
         -0.5                            0                             0.5                              1
                                                     Condition
Significance Tests
                            Sums of Squares
 Regression Coefficient

    b d                   SSreg  b xy
 t   tN  p
    sb
                          SSres   y 2  SS reg
R-squared
            SS reg           SS reg
     R 
      2
                     
           y    2
                             SStotal

                         2
                     R
                             p         d
     F                                 F p 1, N  p 1
          1  R 
                 2

                     N  p  1
What if you have >2 groups?
   You are restricted to using only 0s and 1s, so
    one column isn’t enough.
   You need more columns.
Multiple Categories
           A1     A2 A3
            4      7  1
            5      8  2
            6      9  3
            7     10 4
            8     11 5
         Sum = 30 45 15
         Mean = 6 9   3
Dummy Codes   Subject Y D1 D2 D3
                 1     4 1 0   0
                 2     5 1 0   0
                 3     6 1 0   0
                 4     7 1 0   0
                 5     8 1 0   0
                 6     7 0 1   0
                 7     8 0 1   0
                 8     9 0 1   0
                 9    10 0 1   0
                10    11 0 1   0
                11     1 0 0   1
                12     2 0 0   1
                13     3 0 0   1
                14     4 0 0   1
                15     5 0 0   1
Dummy Codes
   It’s your turn – try it.
   Run a regression with these data:
       Regress Y on D1 and D2
       Regress Y on D1 and D3
       Regress Y on D2 and D3
   Look for:
       Regression equation, slopes and intercept
       Calculate predicted values for people in each group
       (See the slides 26, 28 and 31)
   SPSS information follows.
SPSS
Analyze → Regression → Linear
 ˆ
Y  3.0  3.0 D1  6.0 D2


If in group1, not 2 or 3, then :
 ˆ
Y  3.0  3.0(1)  6.0(0)  6.0

If in group 2, not 1 or 3, then :
 ˆ
Y  3.0  3.0(0)  6.0(1)  9.0

If in group 3, not 1 or 2, then :
 ˆ
Y  3.0  3.0(0)  6.0(0)  3.0
Notice, these are group means!
What if we used D1 and
D3 instead?
 ˆ
Y  9.0  3.0 D1  6.0 D3


If in group1, not 2 or 3, then :
 ˆ
Y  9.0  3.0(1)  6.0(0)  6.0

If in group 2, not 1 or 3, then :
 ˆ
Y  9.0  3.0(0)  6.0(0)  9.0

If in group 3, not 1 or 2, then :
 ˆ
Y  9.0  3.0(1)  6.0(1)  3.0
Notice, these are group means!

								
To top