# Regression by ert554898

VIEWS: 30 PAGES: 38

• pg 1
```									Multiple Regression -II

KNN Ch7

1
Extra Sum of Squares (ESS)
Marginal reduction in SSE when one or several predictor
variables are added to the regression model given that the
other variables are already in the model.
In what other, equivalent manner, can you state the above?
 The word “Extra” is used since we would like to know
what the marginal contribution (or extra contribution) is of a
variable or a set of variables when added as explanatory
variables to the regression model

2
Decomposition of SSR into ESS
A pictorial representation is also possible. See page 261,
Fig. 7.1 of KNN

SSR(X2)
SSR(X1, X2)

SSR(X1|X2)
SSE(X2)
SSE(X1, X2)

3
Decomposition of SSR into ESS
For two or three explanatory variables the formulae are
quite easy.
With two variables we have,
SSR( X 2 | X 1 )  SSE ( X 1 )  SSE ( X 1 , X 2 )  SSR( X 1 , X 2 )  SSR( X 1 )

And with three variables,
SSR( X 3 | X 1 , X 2 )  SSE ( X 1 , X 2 )  SSE ( X 1 , X 2 , X 3 )  SSR( X 1 , X 2 , X 3 )  SSR( X 1 , X 2 )

ConsideringX3
Considering Y               Considering Y adjusted for X1 and
adjusted for X1 and         X2 as the response bvariable, and
X2 as the predictor,
X2 as the response          X3 adjusted for X1 and X2 as the
this would be SSR
bvariable, this             predictor, this would be the SSE
would be the SSTO                                                    4
Decomposition of SSR into ESS
Note that with three variables, we may also have,
SSR( X 2 , X 3 | X 1 )  SSE( X 1 )  SSE( X 1 , X 2 , X 3 )

 To test the hypothesis, H 0 :  k  0 v/s H a :  k  0 , the
test statistic is given as,
SSR( X 3 | X 1 , X 2 ) / 1
F* 
SSE ( X 1 , X 2 , X 3 ) /(n  4)

To test (say), H 0 :  2   3  0v/s , H 0 :  2 ,  3 not both 0 ,the
test statistic is given as,
SSR( X 2 , X 3 | X 1 ) / 2
F* 
SSE ( X 1 , X 2 , X 3 ) /(n  4)
5
Decomposition of SSR into ESS
In general however we can write,

*

R
2
 RR / df R  df F 
2

1  RF2 / dfF
F
F

 This form is very convenient to use since we do not have
1
to keep track of the individual sums of squares
Also, this form will minimize any errors due to subtraction
when calculating the SSRs
On the next page we see the ANOVA table with
decomposition of SSR and three variables

6
The ANOVA Table
Source of variation      Sum of squares        df          Mean Squares

Regression         SSR( X 1 , X 2 , X 3 )   3     MSR ( X 1 , X 2 , X 3 )

X1                SSR ( X 1 )          1         MSR( X 1 )

X 2 | X1          SSR( X 2 | X 1 )       1       MSR( X 2 | X 1 )

X 3 | X1, X 2    SSR( X 3 | X 1 , X 2 )   1      MSR( X 3 | X 1 , X 2 )

Error                  SSE               n-4    MSE ( X 1 , X 2 , X 3 )

Total                  SSTO              n-1

7
Another ANOVA Table
(what’s the difference?)
Source of variation       Sum of squares       df          Mean Squares

Regression         SSR( X 1 , X 2 , X 3 )   3     MSR ( X 1 , X 2 , X 3 )
1
X3                 SSR( X 3 )                    MSR( X 1 )

SSR( X 2 | X 3 )         1      MSR ( X 2 | X 3 )
X2 | X3
X 1 | X 2 , X 3 SSR( X 1 | X 2 , X 3 )   1
MSR( X 1 | X 2 , X 3 )
Error                                    n-4     MSE ( X 1 , X 2 , X 3 )
SSE
Total                                    n-1
SSTO

8
An Example
The regression equation is
Y = 236 - 0.203 X1 + 9.09 X2 - 0.330 X3

Predictor       Coeff.          StDev.       T          P
Constant        236.1           254.5       0.93      0.355
X1           -0.20286          0.05894     -3.44       0.001
X2             9.090           1.718         5.29      0.000
X3            -0.3303          0.2229      -1.48      0.141

S = 1802          R-Sq = 95.7%      R-Sq(adj) = 95.6%

Analysis of Variance
Source            DF          SS             MS           F        P
Regression         3 9833046236      3277682079     1009.04    0.000
Error            137   445017478        3248303
Total            140 10278063714

Source       DF     Seq SS
X1            1   80601012
X2            1 9745311037
X3            1    7134188

Source       DF       Seq SS
X3            1   9733071257
X2            1     61498868
X1            1     38476111                                           9
Test for a βk=0, in a general model
 Full model with all variables,
Yi   0   1 X i1  ...   k 1 X i ,k 1   k X ik   k 1 X i ,k 1  ...   p 1 X i , p 1   i

Compute,
SSR( X 1 ,..., X k 1 , X k , X k 1 ,..., X p 1 )
 Reduced model without Xk
Yi   0   1 X i1  ...   k 1 X i ,k 1   k 1 X i ,k 1  ...   p 1 X i , p 1   i
Compute,
SSR( X k | X 1 ,..., X k 1 , X k 1 ,..., X p 1 )  SSR( X 1 ,..., X k 1 , X k , X k 1 ,..., X p 1 ) 

SSR( X 1 ,..., X k 1 , X k 1 ,..., X p 1 )
The test statistic is,
SSR( X k | X 1 ,...,X k 1 , X k 1 ,...,X p 1 ) / 1
F 
*

SSE( X 1 ,...,X k 1 , X k , X k 1 ,...,X p 1 ) /(n  p)                                                 10
The regression equation is
An Example
Y = 236 - 0.203 X1 + 9.09 X2 - 0.330 X3

Predictor        Coef       StDev             T         P
Constant        236.1       254.5          0.93     0.355
X1           -0.20286     0.05894         -3.44     0.001
X2              9.090       1.718          5.29     0.000
X3            -0.3303      0.2229         -1.48     0.141

S = 1802        R-Sq = 95.7%        R-Sq(adj) = 95.6%

Analysis of Variance

Source            DF          SS            MS            F          P
Regression         3     9833046236    3277682079      1009.04      0.000
Error             137     445017478       3248303
Total            140    10278063714

The regression equation is
Y = 881 - 0.0918 X1 + 0.846 X3

Predictor        Coef       StDev             T          P
Constant        881.4       244.2          3.61      0.000
X1           -0.09185     0.06023         -1.52      0.130
X3            0.84614     0.01696         49.88      0.000

S = 1971        R-Sq = 94.8%        R-Sq(adj) = 94.7%

Analysis of Variance

Source            DF          SS            MS            F          P
Regression         2 9742103306     4871051653      1254.21      0.000
Error            138   535960409       3883771                              11
Total            140 10278063714
Test for some βk=0, in a general model
See (7.26) pg. 267 of KNN
 Full model with all variables,
Yi   0   1 X i1  ...   q 1 X i ,q 1   q X iq   q 1 X i ,q 1  ...   p 1 X i , p 1   i

Compute, SSR( X 1 ,..., X q 1 , X q ,..., X p 1 )
 Reduced model without the “vector”Xk
Yi   0   1 X i1  ...   q 1 X i ,q 1   i
Compute, SSR( X q ,..., X p 1 | X 1 ,..., X q 1 )  SSR( X 1 ,..., X q 1 , X q ,..., X p 1 ) 
OR,                                                                                              SSR( X 1 ,..., X q 1 )

 SSR( X q | X 1 ,..., X q 1 )  SSR( X q 1 | X 1 ,..., X q )  ... SSR( X p 1 | X 1 ,..., X p  2 )

The test statistic is,

F 
*
SSR( X q ,...X p 1 | X 1 ,...,X q 1 ) /  p  q                *

R 2
Y .1                          
...p 1  RY .1 ...q 1 /  p  q 
2

SSE( X 1 ,...,X p 1 ) /(n  p)
, or, F
1  R     2
Y .1        
...p 1 /(n  p)
12
The regression equation is
An Example
Y = 236 - 0.203 X1 + 9.09 X2- 0.330 X3

Predictor        Coef        StDev            T           P
Constant        236.1        254.5         0.93       0.355
X1           -0.20286      0.05894        -3.44       0.001
X2              9.090        1.718         5.29       0.000
X3            -0.3303       0.2229        -1.48       0.141

S = 1802         R-Sq = 95.7%        R-Sq(adj) = 95.6%

Analysis of Variance

Source             DF           SS            MS            F          P
Regression          3      9833046236    3277682079      1009.04      0.000
Error              137      445017478       3248303
Total             140     10278063714

The regression equation is
Y = 14 + 6.50 X2

Predictor          Coef       StDev            T           P
Constant           14.4       194.9         0.07       0.941
X2               6.4957      0.1225        53.05       0.000

S = 1866         R-Sq = 95.3%        R-Sq(adj) = 95.3%

Analysis of Variance

Source             DF          SS             MS            F          P
Regression          1 9794265737      9794265737      2813.99      0.000
13
Residual Error    139   483797978        3480561
Total             140 10278063714
Test for βk= βq, in a general model
 Full model with all variables,
Yi   0   1 X i1  ...   k X ik  ...   q X i ,q  ...   p 1 X i , p 1   i
Compute, SSR( X 1 ,..., X k ,..., X q ,..., X p 1 )

 Reduced model with Xk+Xq
Yi   0   1 X i1  ...   k ( X ik  X iq )  ...   p 1 X i , p 1   i
Compute, SSR( X 1 ,..., X k  X q  ,..., X p 1 )

SSR( X 1 ,...,X k ,...,X q ,...,X p 1 ) / 1  SSR( X 1 ,...,X k  X q ,...,X p 1 ) / 1
F 
*

SSE( X 1 ,...,X k ,...,X q ,...,X p 1 ) /(n  p)
Also, when test say  k   ,  q   , or even the above hypothesis,
ing
i.e.  k   q , one can use the General Linear Test approach outlined in
KNN.                                                                                        14
An Example
The regression equation is
Y = 236 - 0.203 X1 + 9.09 X2- 0.330 X3

Predictor        Coef       StDev             T           P
Constant        236.1       254.5          0.93       0.355
X1           -0.20286     0.05894         -3.44       0.001
X2              9.090       1.718          5.29       0.000
X3            -0.3303      0.2229         -1.48       0.141

S = 1802         R-Sq = 95.7%       R-Sq(adj) = 95.6%

Analysis of Variance

Source            DF          SS              MS            F          P
Regression         3     9833046236      3277682079      1009.04      0.000
Error             137     445017478         3248303
Total            140    10278063714

The regression equation is
Y = 324 - 0.200 (X1+X3) + 8.09 X2

Predictor        Coef       StDev             T           P
Constant        324.2       208.7          1.55       0.123
(X1+X3)      -0.19971     0.05858         -3.41       0.001
X2           8.0891      0.4820         16.78       0.000

S = 1798         R-Sq = 95.7%       R-Sq(adj) = 95.6%

Analysis of Variance

Source            DF          SS            MS              F          P
Regression         2 9831847860     4915923930        1520.33      0.000      15
Residual Error   138   446215854       3233448
Total            140 10278063714
Coefficients of Partial Determination
 Recall the definition of the coefficient of (multiple)
determination):
R-sq is the proportionate reduction in Y variation
when the set of X variables is considered in the model.
 Now consider a coefficient of partial determination:
R-sq for a predictor, given the presence of a set of
predictors in the model, measures the marginal contribution
of each variable given that others are already in the model.
A graphical representation of the strength of the
relationship between Y and X1, adjusted for X2, is
provided by partial regression plots (see HW6)
16
Coefficients of Partial Determination

 For a model with two independent variables:
 Interpret this: r 2  SSR( X 1 | X 2 ) , r 2  SSR( X 2 | X 1 )
Y 1 .2                   Y 2 .1
SSE ( X 2 )                    SSE ( X 1 )

Generalization is easy, for e.g.,                  r2

SSR( X 1 | X 2 , X 3 )
Y 1.23
SSE( X 2 , X 3 )
SSR( X 3 | X 1 , X 2 , X 4 )
rY23.124 
SSE ( X 1 , X 2 , X 4 )
etc.
Is there an alternate interpretation of the above partial
2
coefficients? What, is say r12.Y 3 ??
17
An Example
The regression equation is
Y = - 4.9 + 1.12 X1

Predictor          Coef       StDev            T       P
Constant          -4.92       51.52        -0.10   0.924
X1               1.1209      0.9349         1.20   0.233

S = 87.46         R-Sq = 1.0%        R-Sq(adj) = 0.3%

Analysis of Variance

Source              DF         SS            MS            F       P
Regression           1      10995         10995         1.44   0.233
Residual Error     139    1063300          7650
Total              140    1074295

The regression equation is
Y = - 6.17 + 0.144 X2

Predictor       Coef         StDev            T        P
Constant      -6.167         2.075        -2.97    0.003
X2          0.144481      0.002842        50.83    0.000

S = 19.86         R-Sq = 94.9%       R-Sq(adj) = 94.9%

Analysis of Variance

Source              DF         SS            MS          F         P
Regression           1    1019453       1019453    2583.84     0.000
Residual Error     139      54842           395                        18
Total              140    1074295
Another Example
The regression equation is: Y = 236 - 0.203 X1 + 9.09 X2 - 0.330 X3

Predictor        Coef        StDev           T        P
Constant        236.1        254.5        0.93    0.355
X1           -0.20286      0.05894        -3.44    0.001
X2               9.090        1.718        5.29    0.000
X3            -0.3303       0.2229       -1.48    0.141
S = 1802        R-Sq = 95.7%       R-Sq(adj) = 95.6%

Analysis of Variance
Source            DF          SS           MS         F        P
Regression         3 9833046236    3277682079   1009.04    0.000
Residual Error   137   445017478      3248303
Total            140 10278063714

Source      DF       Seq SS
X1          1       80601012
X2          1     9745311037
X3          1        7134188

The regression equation is: Y = 408 - 0.173 X1 + 6.55 X2
Predictor        Coef        StDev           T        P
Constant        407.8        227.6        1.79    0.075
X1           -0.17253      0.05551       -3.11    0.002
X2             6.5506       0.1201       54.54    0.000
S = 1810        R-Sq = 95.6%       R-Sq(adj) = 95.5%

Analysis of Variance
Source            DF          SS           MS         F        P
Regression         2 9825912049    4912956024   1499.47    0.000
Residual Error   138   452151666      3276461                         19
Total            140 10278063714
The Standardized Multiple
Regression Model

20
The Standardized Multi. Regression Model
Why necessary?
- Round-off errors in normal equations calculations (especially
when inverting a large, X ́X matrix. What is the size of this inverse
for say Y=b0+b1X1….+b7X7)
- Lack of comparability of coefficients in regression models
(differences in units involved)
- Especially important in presence of multicollinearity. The X ́X
matrix is almost close to zero in this case.
 OK. So we have a problem. How do we take care of it?
- The Correlation Transformation:
- Centering: Take the difference between each
observation and the average… AND…
- Scaling: Dividing the centered observation by the standard
deviation of the variable.
You must have noticed that this is nothing but regular                  21
“standardization”? What’s the twist? See next slide
The Standardized Multi. Regression Model
Standardization
Yi  Y
sY
X ik  X
, (k  1,, p  1)
sk

Correlation Transformation

1 Yi  Y
Yi ' 
n  1 sY

1 X ik  X
Xi'                  , (k  1,, p  1)
n  1 sk

22
The Standardized Multi. Regression Model
 Once we have performed the Correlation Transformation, then all
that remains is to obtain the new regression parameters. The
standardized regression model is:
Yi  1X i1     p 1 X i, p 1   i

where, the original parameters can be had from the transformation,

sY
k        k , k  1,, p  1 and  0  Y  1 X1     p 1 X p 1
sk

In Matrix Notation we have some interesting relationships:

XX  rXX  correlation matrix of the (untransformed)X variables
( p 1)( p 1)

XY  rYX  correlation matrix of (untransformed)Y and X
( p 1)1                                                                         23
WHY?                                    Is this surprising?
An Example
Part of the original (unstandardized) data set
X1       X2            X3           Y
1384      9387        72100      69678
4069      7031        52737      39699
3719      7017        54542      43292
3553      4794        33216      33731
3916      4370        32906      24167
2480      3182        26573      16751
2815      3033        25663      16941
The regression equation is
Y = 236 - 0.203 X1 + 9.09 X2 - 0.330 X3

Predictor         Coef         StDev               T        P
Constant         236.1         254.5            0.93    0.355
X1            -0.20286       0.05894           -3.44    0.001
X2               9.090         1.718            5.29    0.000
X3             -0.3303        0.2229           -1.48    0.141

S = 1802           R-Sq = 95.7%           R-Sq(adj) = 95.6%

Analysis of Variance

Source              DF          SS                MS         F       P
Regression           3 9833046236         3277682079   1009.04   0.000
Residual Error     137   445017478           3248303                     24
Total              140 10278063714
An Example (continued)
Standardized … and then Correlation Transformed
X1          X2         X3       Y        X1        X2           X3         Y

-0.42915   6.55841   7.41621   6.63363   -0.036269   0.554287   0.626784   0.560644

0.53462    4.72871   3.91736   4.67596   0.045183    0.399649   0.331077   0.395190

0.40899    4.71784   4.33670   4.85845   0.034566    0.398730   0.366518   0.410614

0.34940    2.99143   3.22083   2.70231   0.029530    0.252822   0.272210   0.228387

0.47970    2.66214   2.10462   2.67097   0.040542    0.224992   0.177873   0.225738

-0.03574   1.73953   1.23910   2.03068   -0.003021   0.147017   0.104723   0.171624

0.08450    1.62381   1.26127   1.93867   0.007142    0.137237   0.106597   0.163848

25
-0.48873   1.35588   1.86770   1.52021   -0.041305   0.114593   0.157849   0.128481
An Example (continued)
The regression equation is
Y’ = - 0.00000 - 0.0660 X1’ + 1.37 X2’ - 0.381 X3’

Predictor          Coef       StDev            T        P
Constant      -0.000000    0.001497        -0.00    1.000
X1’            -0.06596     0.01916        -3.44    0.001
X2’              1.3661      0.2582         5.29    0.000
X3’             -0.3813      0.2573        -1.48    0.141

S = 0.01778       R-Sq = 95.7%        R-Sq(adj) = 95.6%

Analysis of Variance

Source              DF         SS             MS         F       P
Regression           3    0.95670        0.31890   1009.04   0.000
Residual Error     137    0.04330        0.00032
Total              140    1.00000

Compare to the regression model obtained from the
untransformed variables, what can we say about the two
models? Is there a difference in predictive power, or is
there a difference in “ease of interpretation”?
26
Why is b0=0 ? Just by chance?
Multicollinearity
One of the assumptions of the OLS model is that the predictor
variables are uncorrelated.
When this assumption is not satisfied, then multicollinearity is
said to exist.(Think about Venn Diagrams for this)
 Note that multicollinearity is strictly a sample phenomenon.
 We may try to avoid it by doing controlled experiments, but
in most social sciences research, this is very difficult to do.
Let us first, consider the case of uncorrelated predictor
variables, i.e., no multicollinearity.
-Usually occurs in controlled experiments
-In this case the R2 between each pair of variables is zero
-The ESS for each variable is the same as when the variable
27
is regressed alone on the response variable.
An Example
X1   X2   Y   The regression equation is
Y = - 4.73 + 0.107 X1 + 3.75 X2
1    2    1
Predictor          Coef            StDev             T       P
2    2    5   Constant         -4.732            4.428         -1.07   0.334
X1               0.1071           0.3537          0.30   0.774
3    3    7   X2                3.750            1.621          2.31   0.069

4    3    8   S = 2.292         R-Sq = 52.1%             R-Sq(adj) = 33.0%

5    3    4   Analysis of Variance

6    3    9   Source              DF            SS               MS         F        P
Regression           2        28.607           14.304      2.72    0.159
7    2    5   Residual Error       5        26.268            5.254
Total                7        54.875
8    2    2
Source       DF          Seq SS
X1            1           0.482
X2            1          28.125

X2           1           28.125
X1           1            0.482

28
An Example (continued)
The regression equation is
Y = 4.64 + 0.107 X1

Predictor          Coef       StDev              T        P
Constant          4.643       2.346           1.98    0.095
X1               0.1071      0.4646           0.23    0.825

S = 3.011         R-Sq = 0.9%           R-Sq(adj) = 0.0%

Analysis of Variance

Source              DF           SS             MS            F       P
Regression           1        0.482          0.482         0.05   0.825
Residual Error       6       54.393          9.065
Total                7       54.875                                       Source   DF    Seq SS
X1        1     0.482
The regression equation is                                                X2        1    28.125
Y = - 4.25 + 3.75 X2
X2       1     28.125
Predictor          Coef         StDev            T        P               X1       1      0.482
Constant         -4.250         3.807        -1.12    0.307
X2                3.750         1.493         2.51    0.046               (From previous slide)
S = 2.111         R-Sq = 51.3%        R-Sq(adj) = 43.1%

Analysis of Variance

Source              DF         SS                MS           F       P
Regression           1     28.125            28.125        6.31   0.046
Residual Error       6     26.750             4.458                                       29
Total                7     54.875
Multicollinearity (Effects of)
 The regression coefficient or any independent variable cannot be
interpreted as usual. One has to take into account which other correlated
variables are included in the model.
 The predictive ability of the overall model is usually unaffected.
 The ESS are usually reduced to a great extent.
 The variability of OLS regression parameter estimates is inflated.
(Let us see an intuitive reason for this based on a model with p-1=2)

 b     ( XX )
2          2           1

  
2
1     r12 
1  r12 r12
2
       1 

Note that the standardized regression coefficients have equal standard
deviations. Will this be the case even when p-1=3? Or is this just a special
case scenario.
30
Multicollinearity (Effects of)
 High R2 , but few significant t-ratios
(By now, you should be able to guess the reason for this)
 Wider individual confidence intervals for regression
parameters (This is obvious based on what we discussed on the
2
earlier slide)

0                      1

e.g. What would you conclude based on the above picture?        31
Multicollinearity (How to detect it?)
 High R2 (>0.8) , but few significant t-ratios
Caveat: There is a particular situation when the above is caused w/out any multicollinearity.
Thankfully this situation never arises in practice

 High pair-wise correlation (>0.8) between independent variables
Caveat: This is a sufficient, but not necessary condition. For example consider the case where,
rX1X2=0.5, rX1X3=0.5 and rX2X3=-0.5. We may conclude, no multicollinearity. However, we
find that R2=1 when we regress X1 on X2 and X3 together. This means that X1 is a perfect linear
combination of the two other independent variables. In fact the formula for the R2 is given as,
and one can readily verify that the numbers satisfy this equation.
rX1 X 2  rX1 X 3  2rX1 X 2 rX1 X 3 rX 2 X 3
2         2

RX 1 . X 2 X 3 
2

1  rX 2 X 3
2

 Due to the above caveat, always examine the partial correlation
coefficients.
32
Multicollinearity (How to detect it?)
Run auxiliary regressions, i.e . Regress each of the independent
variables on the other independent variables taken together and
conclude if it is correlated to the other or not based on the R2.
The test statistic is,
RX i . X 1 ,.... X i 1 , X i 1 ,...... X p 1 /  p  2 
2

FX i 
(1  RX i . X 1 ,.... X i 1 , X i 1 ,...... X p 1 ) /(n  p  1)
2

The Condition Index (CI):

Maximum Eigen Value
If,   10  CI                            30                                      Moderate to Strong
Minimum Eigen Value
multicollinearity.
CI > 30 means severe multicollinearity.

33
Multicollinearity (What is the remedy?)
Rely on joint confidence intervals rather than individual ones
2

0                            1

A priori information of relationship between some independent
variables? Then include it!
For example: b1=b2 is known. Then use this in the regression model which then becomes,
Y=b0 + b2X, (where, X=X2+ X1)
Data Pooling (Usually done by combining cross-sectional and time
series data. Time series data is notorious for multicollinearity)                  34
Multicollinearity (What is the remedy?)
Delete a variable which is causing problems
Caveat: Beware of specification bias. This arises when a model is incorrectly specified.
For example, in order to explain consumption expenditure, we may only include income and
drop wealth since it highly correlated to income. However economic theory may postulate that
you use both variables.
 First difference transformation of variables from time series data
The regression in run on differences between successive values of variables rather than the
original variables. (Xi,1-Xi+1,1) and (Xi,2-Xi+1,2) etc. The logic is that even if X1 and X2 are
correlated, there is no reason for their first differences to be correlated too.
Caveat: Beware of autocorrelation which usually arises due to this procedure. Also, we lose
one degree of freedom due to the difference procedure.
Correlation transformation
Getting a new sample (Why?) and/or increasing sample size (Why?)
Factor Analysis, Principal Components Analysis, Ridge Regression
35
An Example

10000
Pop.

5000
X1

0
0   10000 20000 30000 40000 50000 60000 70000

X2
Income

rX1 X 2  .997
36
An Example (continued)
The regression equation is
Y = - 0.032 + 6.99 X1 - 0.064 X2

Predictor        Coef          StDev            T        P
Constant      -0.0322         0.2516        -0.13    0.898
X1              6.986          1.667         4.19    0.000
X2            -0.0640         0.2171        -0.29    0.769

S = 1.872          R-Sq = 95.3%        R-Sq(adj) = 95.2%

Analysis of Variance                                                  High R2
 Low t-value for b2
Source              DF            SS           MS         F       P
Regression           2        9794.6       4897.3   1397.80   0.000    Low ESS for X2
Residual Error     138         483.5          3.5                     (i.e.SSR(X2|X1))
Total              140       10278.1
Clearly, X2 contributes
Source        DF         Seq SS                                       little to the model.
X1             1         9794.3                                       Really? Look at
X2             1           0.3                                        SSR(X2) …..its
humungous !!
Source        DF         Seq SS
X2             1         9733.1                                       Clear case of Multi.coll.
X1             1           61.5
Of course we knew that
rX1X2=0.997. This should
Predicted Values                                                      have made us suspect
that something was
Fit     StDev Fit            95.0% CI              95.0% PI       amiss.
12.020       3.351         (5.394, 18.646)        (4.431, 19.609)                37
Multicollinearity (Specification Bias)
 Types of Specification Errors
    Omitting a relevant variable
    Including an unnecessary or irrelevant variable
    Incorrect functional form
    Errors of measurement bias
    Incorrect specification of stochastic error term (This is a model mis-specification
error)
 More on omitting a relevant variable (under-fitting)
True Model:   Yi = 0+1Xi1+2 Xi2+ ui
Fitted Model : Yi = 0+1 Xi1+ ni
Consequences of omission:
1.   If r12 is non-zero then the estimators of 0 and 1 are biased and inconsistent
2.   Variance of estimator of 1 is biased estimate of variance of estimator of 1
3.   2 is incorrectly estimated and CIs, hypothesis tests are misleading
4.   E(Estimator of 1)= 12 b21                                                   38

```
To top