# What if your predictors are categorical? by tQ7Cr31i

VIEWS: 0 PAGES: 32

• pg 1
```									What if your predictors are
categorical?
Or
ANOVA as Regression

Created by Amy Schweinle, PhD
November 28, 2007
For EDER 860
A note
My lecture on analyzing models with categorical predictors and
continuous dependent variables has been converted to PowerPoint.

Wherever you see an icon like this one, then click on it to hear what I
have to say. You get to turn on and off my voice as you wish. No, this
feature does not extend to class! Try it with the icon in the bottom

Rather than setting this up to automatically play and progress through
slides, you can move through at your own pace and move back and
forth as you need.

Go make some popcorn, grab a pencil and some paper, sit back and
enjoy the show!
Getting started
   First, let’s review the format of the linear models
we’ve used so far:
ˆ
Y  b0  b1 X
   Our goal was to find optimal weights for the intercept and
slope to best predict Y.
   We used ordinary least squares methods to generate b0 and
b1 to ensure that the squared residuals were minimized
(least squared residuals – aha).
   This is the heart of the General Linear Model (GLM) – set
up a linear model and find the best weights to predict a DV
or set of DVs from an IV or set of IVs.
Explore the GLM
   Notice I didn’t make restrictions on the measurement
scales for the IVs and DVs.
   You can use a GLM for continuous or categorical
variables.
   We have only addressed continuous IVs and DVs.
   In logistic regression, we’ll address dichotomous DVs.
   In EDER 862, multivariate, we address categorical DVs
specifically with the GLM as well as categorical IVs and
combinations of categorical/continuous IVs and DVs.
Exciting stuff! I know that your hands are tensing and feet
are shaking with the excitement of such a class – but be
patient and finish this class first.
Play with categorical IVs
         Male                     69”               automatically starts

         Female                   64”

ˆ
Y  b0  b1 X sex          ˆ
Y  b0  b1 X sex
ˆ
Y  69  (5) X            ˆ
Y  64  5 X
sex                        sex
Click this when initial audio starts
Recall if you will
   Think back to EDER 762 when you addressed
ANOVA: analysis of variance
   We calculated the variance accounted for by group
(e.g., sex) as a function of variance not accounted for
(i.e., error or residual)
   Sound familiar? MSregression and MSerror!
   We just used different terms: MSbetween and MSwithin.
   So, ANOVA = Regression, the IVs are just
categorical.
ANOVA = Regression
ANOVA                           Regression
   Categorical IVs                    Continuous IVs
   SS between groups                  SS regression
   SS within groups or error          SS residual or error
   Interactions                       Interactions
   Non-linear relationships           Non-linear relationships
    h2           betweengroups
2                   R2
 due to IV
2

F 2                                F 2
 within groups                     not due to IV

 due to IV
2
F 2
 not due to IV
How do you do it?
   Dummy Coding!

   Or effect coding
   Or contrast coding
   Or (I could go on)
   We’ll just address dummy coding, but know
that others exist and serve different purposes.
Dummy Coding
   Use a system of 0s and 1s to identify group
membership
   0 = male and 1 = female
   1 = experimental and 0 = control
   The numbers are symbols or codes for the
group. These are nominal scale and do not
reflect quantities or rank orders.
Dummy Coding Example
Experimental         Control
20                10
18                12
17                11
17                15                  X1  X 2            d
13                17      t                               tn1  n2 2
Sum = 85              65           x x
2
1         1 1
2
  
2
Mean = 17             13            n1  n2  2  n1 n2 
       
Sum sq = 26            34

17  13
t                      2.31
26  34  1 
  5
A hint, write down the means           5525 
of each group – they’ll show
up later.
Dummy Coding
X2           X3
Subject       Y     Experimental   Control
1           20         1           0
2           18         1           0
3           17         1           0
4           17         1           0
5           13         1           0
6           10         0           1
7           12         0           1
8           11         0           1
9           15         0           1
10          17         0           1

Mean =            15       0.5          0.5
SS deviation =   100       2.5          2.5
Dummy Coding
X2           X3
Subject       Y     Experimental   Control
1           20         1           0
2           18         1           0
3           17         1           0
4           17         1           0
5           13         1           0
6           10         0           1
7           12         0           1
8           11         0           1
9           15         0           1
10          17         0           1

Mean =            15       0.5          0.5
SS deviation =   100       2.5          2.5
   Use the regression formulas with either the
X1 or X2 column as the IV and the Y column
as the DV.
   See the next pages for the formulas.
the results of these
the results of these
formulas if you
formulas if you
used the X3
used the X2
column.
column.

Listen to audio after revealing
the above 2 columns.
Regression Equation
ˆ
Y  a  bX 2           ˆ
Y  a  bX 3

ˆ
Y  13  4 X 2         ˆ
Y  17  4 X 3

Y  13  41  17
ˆ                    Y  17  41  13
ˆ

Y  13  40   13
ˆ                    Y  17  40  17
ˆ
If Experimental is coded as 1 and control as 0, this is the regression equation – notice
the positive slope and that the line goes through the mean of each group (predicted
values at those levels of X)

21       Experimental
19       Control
17

15
Score

13

11

9

7

5
-0.5                            0                             0.5                              1
Condition
Significance Tests
Sums of Squares
Regression Coefficient

b d                   SSreg  b xy
t   tN  p
sb
SSres   y 2  SS reg
R-squared
SS reg           SS reg
R 
2

y    2
SStotal

2
R
p         d
F                                 F p 1, N  p 1
1  R 
2

N  p  1
What if you have >2 groups?
   You are restricted to using only 0s and 1s, so
one column isn’t enough.
   You need more columns.
Multiple Categories
A1     A2 A3
4      7  1
5      8  2
6      9  3
7     10 4
8     11 5
Sum = 30 45 15
Mean = 6 9   3
Dummy Codes   Subject Y D1 D2 D3
1     4 1 0   0
2     5 1 0   0
3     6 1 0   0
4     7 1 0   0
5     8 1 0   0
6     7 0 1   0
7     8 0 1   0
8     9 0 1   0
9    10 0 1   0
10    11 0 1   0
11     1 0 0   1
12     2 0 0   1
13     3 0 0   1
14     4 0 0   1
15     5 0 0   1
Dummy Codes
   It’s your turn – try it.
   Run a regression with these data:
   Regress Y on D1 and D2
   Regress Y on D1 and D3
   Regress Y on D2 and D3
   Look for:
   Regression equation, slopes and intercept
   Calculate predicted values for people in each group
   (See the slides 26, 28 and 31)
   SPSS information follows.
SPSS
Analyze → Regression → Linear
ˆ
Y  3.0  3.0 D1  6.0 D2

If in group1, not 2 or 3, then :
ˆ
Y  3.0  3.0(1)  6.0(0)  6.0

If in group 2, not 1 or 3, then :
ˆ
Y  3.0  3.0(0)  6.0(1)  9.0

If in group 3, not 1 or 2, then :
ˆ
Y  3.0  3.0(0)  6.0(0)  3.0
Notice, these are group means!
What if we used D1 and
ˆ
Y  9.0  3.0 D1  6.0 D3

If in group1, not 2 or 3, then :
ˆ
Y  9.0  3.0(1)  6.0(0)  6.0

If in group 2, not 1 or 3, then :
ˆ
Y  9.0  3.0(0)  6.0(0)  9.0

If in group 3, not 1 or 2, then :
ˆ
Y  9.0  3.0(1)  6.0(1)  3.0
Notice, these are group means!

```
To top