# Multiple Regression Model Building

Document Sample

```					Lec9 1. May, 2004, ref. §11.3

Multiple Regression Model
Building

• Transformation of Variables
• Collinearity
• Dropping Regressors

1
Example

The following (hypothetical) data give
stopping times Y (ft) for n = 62 trials of
various automobiles traveling at speed X
(mph). (Ezekiel and Fox, 1959)
X            Y          X            Y
4    4                  20   48
5    2,4,8,8            21   39,42,55
7    7,7                24   56
8    8,9,11,13          25   33,48,56,59
9    5,5,13             26   39,41
10   8,14,17            27   57,78
12   11,19,21           28   64,84
13   15,18,27           29   54,68
14   14,16              30   60,67,101
15   16                 31   77
16   14,19,34           35   85,107
17   22,29              36   79
18   29,34,47           37   138
19   30                 40   110,134

2
.regress dist speed
.predict yhat
.graph twoway scatter dist yhat speed,
connect(i L) clpattern(solid dash) msymbol(O i)
150   100
dist/Fitted values
50       0

0   10            20               30        40
speed
dist           Fitted values

.regress dist speed

Source |       SS      df       MS                       Number of obs   =        62
---------+----------------------------                     F( 1,     60)   =    403.49
Model | 59153.0082     1 59153.0082                     Prob > F        =    0.0000
Residual | 8796.16921    60   146.60282                    R-squared       =    0.8705
Total | 67949.1774    61 1113.92094                     Root MSE        =    12.108

-------------------------------------------------------------------
dist |     Coef.   Std. Err.     t   P>|t|   [95% Conf. Interval]
-------+-----------------------------------------------------------
speed | 3.148757    .1567552   20.09 0.000      2.8352    3.462314
_cons | -20.16443   3.336167   -6.04 0.000 -26.83776      -13.4911
-------------------------------------------------------------------

3
.regress dist speed
.predict residuals, resid
.gen zero=0*yhat
.graph twoway scatter residuals zero yhat,
connect(i L) clpattern(solid dash) msymbol(O i)
40   20
Residuals/zero
0      −20

0   20    40            60          80   100
Fitted values
Residuals           zero

The curvature and increasing variance of
the residuals indicate a departure from the
model assumptions. How might we
remedy this?

4
Maybe the true relationship is
exponential. That is,

log(Y ) = β0 + β1 X +
.gen logdist=log(dist)
.regress logdist speed
.predict logyhat
.graph twoway scatter logdist logyhat speed,
connect(i L) clpattern(solid dash) msymbol(O i)
5         4
logdist/Fitted values
2          3
1

0       10               20             30     40
speed
logdist           Fitted values

5
.regress logdist speed

Source |       SS      df       MS                     Number of obs   =       62
---------+-----------------------------                  F( 1,     60)   =   396.07
Model | 52.8906939     1 52.8906939                   Prob > F        =   0.0000
Residual | 8.01225299    60   .13353755                  R-squared       =   0.8684
Total | 60.9029469    61 .998408965                   Root MSE        =   .36543

-------------------------------------------------------------------
logdist |     Coef. Std. Err.      t   P>|t| [95% Conf. Interval]
---------+---------------------------------------------------------
speed | .0941543    .004731   19.90 0.000 .0846909      .1036177
_cons | 1.477977 .1006881     14.68 0.000    1.27657    1.679383
-------------------------------------------------------------------

.predict logresiduals, resid
.graph twoway scatter logresiduals zero logyhat,
connect(i L) clpattern(solid dash) msymbol(O i)
.5     0
Residuals/zero
−.5  −1
−1.5

2   3                     4              5
Fitted values
Residuals              zero

6
It looks like we overcompensated for the
curvature by taking log(Y ). Perhaps we
should take log(X) as well.

log(Y ) = β0 + β1 log(X) +
.   gen logspeed=log(speed)
.   regress logdist logspeed
.   predict newyhat
.   graph twoway scatter logdist newyhat logspeed,
connect(i L) clpattern(solid dash) msymbol(O i)
5         4
logdist/Fitted values
2          13

1.5    2               2.5            3         3.5
logspeed
logdist              Fitted values

7
. regress logdist logspeed
Source |       SS      df       MS              Number of obs   =       62
----------+-----------------------------           F( 1,     60)   =   545.17
Model | 54.8646965     1 54.8646965            Prob > F        =   0.0000
Residual | 6.03825036    60 .100637506            R-squared       =   0.9009
Total | 60.9029469    61 .998408965            Root MSE        =   .31723

--------------------------------------------------------------------
logdist |     Coef. Std. Err.     t   P>|t|   [95% Conf. Interval]
----------+---------------------------------------------------------
logspeed | 1.570475 .0672612 23.35 0.000       1.435933    1.705018
_cons | -1.107584 .1911911 -5.79 0.000 -1.490023       -.7251444
--------------------------------------------------------------------
1
.5
Residuals/zero
0−.5
−1

1   2           3            4            5
Fitted values
Residuals         zero

8
That model looks pretty good. As an
alternative, we might have tried ﬁtting a
variables. That is,

Y = β0 + β1 X + β2 X 2 +
. gen speedsq=speed^2

. regress dist speed speedsq

Source |      SS       df       MS       Number of obs   =       62
---------+-----------------------------    F( 2,     59)   =   285.34
Model | 61582.362      2   30791.181    Prob > F        =   0.0000
Residual | 6366.8154     59 107.912125     R-squared       =   0.9063
Total | 67949.1774    61 1113.92094     Root MSE        =   10.388

-------------------------------------------------------------------
dist |     Coef.   Std. Err.   t   P>|t|   [95% Conf. Interval]
---------+---------------------------------------------------------
speed | .4125886    .5921516 0.70 0.489 -.7723039       1.597481
speedsq | .0662677    .0139666 4.74 0.000     .0383205    .0942148
_cons | 1.497801    5.388586 0.28 0.782 -9.284735       12.28034
-------------------------------------------------------------------

9
30
20
Residuals/zero
0   −10
−20 10

0       50                     100      150
Fitted values
Residuals           zero

This isn’t too bad, there is deﬁnitely some
heteroskedasticity. How does this model
compare to the linear model

Y = β0 + β1 X + ?

10
.regress dist speed

Source |       SS       df       MS      Number of obs   =       62
----------+------------------------------   F( 1,     60)   =   403.49
Model | 59153.0082      1 59153.0082    Prob > F        =   0.0000
Residual | 8796.16921     60   146.60282   R-squared       =   0.8705
Total | 67949.1774     61 1113.92094    Root MSE        =   12.108

. regress dist speed speedsq

Source |       SS       df       MS      Number of obs   =       62
----------+------------------------------   F( 2,     59)   =   285.34
Model |   61582.362     2   30791.181   Prob > F        =   0.0000
Residual |   6366.8154    59 107.912125    R-squared       =   0.9063
Total | 67949.1774     61 1113.92094    Root MSE        =   10.388

(8796.16921 − 6366.8154)/1
F =                            = 22.512
6366.8154/59
P (F1,59 > 22.512) = .00001368
The improvement is signiﬁcant.
Nonetheless, it looks like the model

log(Y ) = β0 + β1 log(X) +

ﬁts best.

11
Collinearity
Two predictors, X1 and X2 , are exactly
collinear if there is a linear equation such
as,
c1 X1 + c2 X2 = c0

For example
• height in feet and meters
• temperature in Celsius and Fahrenheit
It’s rare that a regression would include
two exactly collinear predictors, but there
may be approximately collinear
predictors, that is, predictors whose
correlation is close to one in absolute
value.
For more than two predictors, the set
X1 , X2 , · · · , Xp is collinear if, for some

12
constants c0 , c1 , · · · , cp (not all 0)

c1 X1 + c2 X2 + · · · + cp Xp = c0

In the case of collinearity, the estimating
procedure is very unstable; it becomes
very sensitive to random errors. Collinear
predictors will typically lead to large
variances for estimated coeﬃcients.

13
Example
Suppose in the stopping distance example, we
create a second predictor

W = −X + δ,        δ ∼ N (0, 1).

The exact correlation between X and W
depends on the realization of δ, but for one
simulation it came out to r = −0.9957.
What happens if we ﬁt the model

Y = β0 + β1 X + β2 W +

14
.   gen delta=invnorm(uniform())
.   gen W=delta-speed
.   cor W speed
.   regress dist speed W

Source |      SS       df       MS        Number of obs   =       62
----------+-----------------------------     F( 2,     59)   =   199.10
Model | 59180.6006     2 29590.3003      Prob > F        =   0.0000
Residual | 8768.57683    59 148.619946      R-squared       =   0.8710
Total | 67949.1774    61 1113.92094      Root MSE        =   12.191

---------------------------------------------------------------------
dist |     Coef. Std. Err.     t   P>|t|    [95% Conf. Interval]
----------+----------------------------------------------------------
speed | 2.418932 1.701139     1.42 0.160    -.9850383    5.822903
W | -.7273081 1.687961 -0.43 0.668       -4.10491    2.650294
_cons | -20.18897 3.359523 -6.01 0.000      -26.91137   -13.46658
---------------------------------------------------------------------

15
Dropping a Regressor

Suppose we regress y on x1 , x2 , . . . , xp . If
the t value for x2 , say, is small (or
equivalently if the CI for β2 contains 0),
do we leave x2 out of the regression?
If our prior belief is that the regressor has
an eﬀect on the response ...

If our prior belief is that the regressor has
little/no eﬀect on the response ...

16
Dummy variables
A certain drug is suspected of having the
unfortunate side eﬀect of raising blood
pressure. To test this suspicion, 10 women
were randomly sampled, 6 of whom took
the drug once a day, and 4 of whom took
no drug and hence served as the control
group. Deﬁne
D = 1 if she took the drug
= 0 if she did not

We want to investigate how this drug
aﬀects blood pressure Y , keeping constant
other confounding factors such as age X.
ˆ
y = 70 + 5D + 0.44x
Graph the relation when D = 0. When
D = 1.

17
Suppose D is a 0 − 1 variable in the
regression model

ˆ
y = b0 + b1 D + b2 x

Relative to the reference line where D = 0,
the line where D = 1 is parallel and b1
units higher. The standard conﬁdence
intervals and tests can be computed using
b1 and b2 along with their standard errors.

18
Several Categories
Suppose we have two drugs A and B.
Deﬁne dummy variables
DA = 1 if drug A given; 0 otherwise
DB = 1 if drug B given; 0 otherwise

Say the regression computed from the
data yields the equation:
ˆ
y = 70 + 5DA + 8DB + 0.44x
This assumes there is additive eﬀect of
drug A and drug B.
For control group, set DA = 0, DB = 0 :
ˆ
y = 70 + 0.44x

For group on drug A, set DA = 1, DB = 0 :
ˆ
y = 75 + 0.44x.

19
For group on drug B, set DA = 0, DB = 1:

ˆ
y = 78 + 0.44x.

What if there is interaction between Drug
A and Drug B when they are given to the
patients together?
Say the regression computed from the
data yields the equation:

ˆ
y = 70 + 5DA + 8DB − 16DA × DB + 0.44x

DB = 0       DB = 1
DA = 0    70 + 0.44x   78 + 0.44x
DA = 1    75 + 0.44x   67 + 0.44x

20
There might be interaction between drug
and the patients’ age. So we include an
interaction eﬀect of age (x) and drug
eﬀect.
Say we get the regression line

ˆ
y = 70 + 5D + 0.44x + 0.21Dx

For the control group, set D = 0:

ˆ
y = 70 + 0.44x

For the treatment group, set D = 1:

ˆ
y = 75 + 0.65x

21

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 35 posted: 1/4/2010 language: English pages: 21
How are you planning on using Docstoc?