# How to remedy Heteroskedasticity by XFpRNq

VIEWS: 72 PAGES: 44

• pg 1
```									Heteroske...what?
O.L.S. is B.L.U.E.
• BLUE means “Best Linear Unbiased
Estimator”.
• What does that mean?
• We need to define…
– Unbiased: The mean of the sampling
distribution is the true population parameter.
– What is a sampling distribution? …
• Imagine taking a sample, finding b, take another
sample, find b again, and repeat over and over.
Describes the possible values b can take on in
repeated sampling
We hope that…

β
If the sampling distribution centers on the true
population mean, our estimates will, “on average” be
right. We get this with the 10 assumptions
If some assumptions don’t hold…

Average ˆ
β

• We can get a “biased” estimate. That is,
 
ˆ
E  
• If your parameter estimates are unbiased,
are wrong. They do not describe the true
relationship.
Efficiency / Inefficiency
• What makes one “unbiased” estimator
better than another?
Efficiency
• Sampling Distributions with less variance
(smaller standard errors) are more efficient
• OLS is the “Best” linear unbiased
estimator because its sampling distribution
has less variance than other estimators.

OLS Regression

LAV Regression
Under the 10 regression
assumptions and assuming
normally distributed errors…
• We will get estimates using OLS
• Those estimates will be unbiased
• Those estimates will be efficient (the
“best”)
• They will be the “Best Unbiased Estimator”
out of all possible estimators
If we violate…
• Perfect Collinearity or n > k
– We cannot get any estimates—nothing we
can do to fix it
• Normal Error Term assumption
– OLS is BLUE, but not BUE.
• Heteroskedasticity or Serial Correlation
– OLS is still unbiased, but not efficient
• Everything else (omitted variables,
endogeneity, linearity)
– OLS is biased
What do Bias and Efficiency Mean?

β                       β

 
ˆ
 E  Biased, but very efficient   Unbiased, but inefficient
ˆ
 =E    

β                       β

 
ˆ
 E 
 
ˆ
 =E 
Biased and inefficient    Unbiased and efficient
Today: Heteroskedasticity
• Consequence: OLS is still Unbiased, but it
is not efficient (and std. errors are wrong)
• Today we will learn:
– How to diagnose Heteroskedasticity
– How to remedy Heteroskedasticity
• New Estimator for coefficients and std. errs.
• Keep OLS estimator but fix std. errs.
What is heteroskedasticity?
• Heteroskedasticity occurs when the size of
the errors varies across observations.
This arises generally in two ways.
– When increases in an independent variable
are associated with changes in error in
20000

prediction.
15000 10000
sales
5000

0
-5000

0   20   40           60   80   100
Salespersons
What is Heteroskedasticity?
• Heteroskedasticity occurs when the size of
the errors varies across observations.
This arises generally in two ways.
– When you have “subgroups” or clusters in
• We might try to predict presidential popularity. We
measure average popularity in each year. Of
course, there are “clusters” of years where the
same president is in office. Because each
president is unique, the errors in predicting Bush’s
popularity are likely to be a bit different from the
errors predicting Clinton’s.
How do we recognize this beast?
• Three Methods
the two ways heteroskedasticity can strike.
– Graphical Analysis
– Formal statistical test
Graphical Analysis
ˆ
• Plot residuals against y and independent
variables.
• Expect to see residuals randomly
clustered around zero
• However, you might see a pattern. This is
• Examples…
2

2
1

1
Residuals
0

0
-1

-1
-2

-2
0     .2       .4              .6   .8       1                     0     .2     .4       .6    .8      1
x
x
As x increases, so does the                                               As x increases, the
error variance                                                   error variance decreases
2

scatter resid x
scatter resid x
1
0

rvfplot (or scatter resid yhat)
-1
-2

2              3                    4        5
Fitted values
As the predicted value of y increases,
So does the error variance
25
Good Examples…

5
20
15

Residuals

0
10 y
5
0

-5
0       2        4             6     8    10
x                                             0   2       4       6     8   10
x

scatter y x                                                      scatter resid x
5
0
-5

0        5             10        15        20
Fitted values

rvfplot (scatter resid yhat)
Formal Statistical Tests
• White’s Test
– Heteroskedasticity occurs when the size of
the errors is correlated with one or more
independent variables.
– We can run OLS, get the residuals, and then
see if they are correlated with the
independent variables
More Formally,
turnouti  a  1101.4  diplomaui   1.1 mdnincmi   ei
turnouti  a  1101.4  diplomaui   1.1 mdnincmi 
ˆ
turnout  turnout  ei
ˆ

state     district   turnout   diplomau mdnincm pred_turnout residual

AL           1      151,188     14.7     \$27,360   200,757.4     -49,569.4

AL           2      216,788     16.7     \$29,492       205,330   11,457.96

AL           3      147,317     12.3     \$26,800   197,491.7     -50,174.7

AL           4      226,409     8.1      \$25,401   191,310.8     35,098.16

AL           5      186,059     20.4     \$33,189   213,514.6     -27,455.6

e   0  1 x1   2 x2   x  1 x2  1 x1 x2  error
2
i                                      1 1
2        2
So, if error increases with x, we
violate heteroskedasticity
ei2   0  1 x1   2 x2  1 x12  1 x2 2  1 x1 x2  error
•If we can predict error

4
with a regression line,
we have

2
heteroskasticity.
Residuals

•To make this
prediction, we need to
0

make everything
positive (square it)
-2

0   .2   .4       .6   .8   1
x
So, if error increases with x, we
violate homoskedasticity
ei2   0  1 x1   2 x2  1 x12  1 x2 2  1 x1 x2  error
•Finally, we use these

15
squared residuals as
the dependent variable

10
squared_residual

in a new regression.
•If we can predict
increases/decreases in
5

the size of the residual,
we have found
0

evidence of                   0     .2     .4    .6     .8     1
x
heteroskedasticity
•For Ind. Vars., we use the same ones as in the original
regression plus their squares and their cross-products.
The Result…
• Take the r2 from this regression and multiply it
by n.
• This test statistic is distributed χ2 with degrees of
freedom equal to the number of independent
variables in the 2nd regression
• In other words, r2*n is the χ2 you calculate from
your data, compare it to a critical χ2* from a χ2
table. If your χ2 is greater than χ2* then you
reject the null hypothesis (of homoskedasticity)
A Sigh of Relief…
• Stata will calculate this for you
• After running the regression, type
imtest, white
. imtest, white

White's test for Ho: homoskedasticity
against Ha: unrestricted heteroskedasticity

chi2(5)       =     9.97
Prob > chi2   =   0.0762

Cameron & Trivedi's decomposition of IM-test

---------------------------------------------------
Source |       chi2     df      p
---------------------+-----------------------------
Heteroskedasticity |       9.97      5    0.0762
Skewness |       3.96      2    0.1378
Kurtosis | -28247.96       1    1.0000
---------------------+-----------------------------
Total | -28234.03       8    1.0000
---------------------------------------------------
An Alternative Test: Breusch/Pagan

•     Based on similar logic
•     Three changes:
1. Instead of using e2 as the D.V. in the 2nd
ei2
regression, use 2 where  e 
e
ˆ
ˆ2        2
ei / n
2. Instead of using every variable (plus
squares and cross-products), you specify
the variables you think are causing the
heteroskedasticity
– Alternatively, use only   ˆ
y   as a “catch all”
An Alternative Test: Breusch/Pagan
3. Test Statistic is RegSS from 2nd regression
divided by 2. It is distributed χ2 with degrees
of freedom equal to the number of
independent variables in the 2nd regression.
Stata Command: hettest
. hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of turnout

chi2(1)       =     8.76
Prob > chi2   =   0.0031

. hettest senate

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: senate

chi2(1)       =     4.59
Prob > chi2   =   0.0321

. hettest , rhs

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: diplomau mdnincm senate guber

chi2(4)       =    11.33
Prob > chi2   =   0.0231
What are you gonna do about it?
• Two Remedies
– We might need to try a different estimator.
This will be the “Generalized Least Squares”
estimator. This “GLS” Estimator can be
applied to data with heteroskedasticity and
serial correlation.
– OLS is still consistent (just inefficient) and
Standard Errors are wrong. We could fix the
standard errors and stick with OLS.
Generalized Least Squares
• When used to correct heteroskedasticity, we refer
to GLS as “Weighted Least Squares” or WLS.
• Intuition:
– Some data points

7
have better quality

6
5
the regression line
4

than others because
3

they have less error.
- We should give those
2

0   .2   .4              .6       .8   1
x
observations more
Fitted values   y
weight.
Non-Constant Variance
• We want constant error variance for all
observations,
E(ei2) = σ 2 , estimated by RMSE
• However, with Heteroskedasticity, error
variance (σi2) is not constant
E(ei2) = σi2, not constant (indexed by i)
• If we know what σi2 is, we can re-weight
the equation to make the error variance
constant
Re-weighting the regression
yi  a  bxi  ei                Begin with the formula

yi  ax0i  bxi  ei             Add x0i, a variable that is always 1

yi     x0i   xi      ei
 a   b                      Divide through by σi to weight it
i      i   i       i
We can simplify notation and
yi  ax0i  bxi  ei
*      *     *    *
show it’s really just a regression
with transformed variables.
2
 ei          Last, we just need to show that
var(e )  E (e )  E  
*      * 2

 i 
i      i
the transformation makes the
new error term, ei*, constant
 ei 2  1          1 2
 E  2   2 E (ei )  2  i  1
2

 i  i          i
GLS vs. OLS
• In OLS, we minimize the sum of the squared
errors:
e                 y  y     y  a  bx 
2                   2
i
2
ˆ
• In GLS, y we minimize a weighted sum of the
squared errors.
2              1
 y     x0i   x        let w  2

  a           b                i
 i    i      i 
  w  y  ax0i  bx 
2
2
 1                 
    y  ax0i  bx  
   wy  awx0i  bwx 
2
 i                
2       Set partial derivatives to 0,
1
             y  ax0i  bx            solve for a and b to get
   i
2
eqs.
GLS vs. OLS
    y  a  bx 
2
• Minimize Errors:
• Minimize Weighted Errors:   wy  awx0i  bwx 
2

• GLS (WLS) is just doing OLS with
transformed variables.
• In the same way that we “transformed” a
non-linear data to fit the assumptions of
OLS, we can “transform” the data with
weights to help heteroskedastic data meet
the assumptions of OLS
GLS vs. OLS
• In Matrix form,
– OLS: b = (x’x)-1x’y

– GLS: b*= (x’Ω-1x)-1x’ Ω-1y

• Weights are included in a matrix, Ω-1
 1                      
 2   0        0    0 
 i2   0        0    0         i                      
                                    1                 
0          2
0    0  1     0             0    0 
             i
,             2

0      0             0
i

                      2
 0    0             0 
0
       0        0   i 
                               
 0                  1 
0        0

                    i2 

Problem:
• We rarely know exactly how to weight our
data
• Solutions:
– Plan A: If heteroskedasticity comes from one
specific variable, we can use that variable as
the “weight”
– Alternatively, we could run OLS and use the
residuals to estimate the weights
(observations with large OLS residuals get
little weight in the WLS estimates)
Plan A: A Single, Known, Villain
• Example: Household income
• Households that earn little must spend it all on
necessities. When income is low, there is little
variance in spending.
• Households that earn a great deal can either
spend it all or buy just essentials and save the
rest. More error variance as income increases
yi  a  b1 x1  b2 x2  ei
1    x1   x2 ei
yi  a  b1  b2 
x1   x1   x1 x1
• Note the changes in interpretation
Plan B: Estimate the weights
• Running OLS, get an estimate of the
residuals
• Regress those residuals (squared) on the
set of independent variables and get
predicted values
• Use those predicted values as the weights
• Because this is GLS that is “doable”, it is
called “Feasible GLS” or FGLS
• FGLS is asymptotically equal to GLS as
sample size goes to infinity
I don’t want to do GLS
• I don’t blame you
• Usually best if we know something about
the nature of the heteroskedasticity
• OLS was unbiased, why can’t we just use
that?
– Inefficient (but only problematic with very
severe heteroskedasticity)
– Incorrect Standard Errors (formula changes)
• What if we could just fix standard errors?
White Standard Errors
• We can use OLS and just fix the Standard
Errors. There are a number of ways to do
this, but the classic is “White Standard
Errors”
• Number of names for this
– White Std. Errs.
– Huber-White Std. Errs.
– Robust Std. Errs.
– Heteroskedastic Consistent Std. Errs.
The big idea…
• In OLS, Standard Errors come from the
Variance-Covariance Matrix.
– Std. Err. is the Std. Dev. Of a Sampling Distribution
– Variance is the square of the Standard Deviation
(Std. Dev. is the square root of variance)
• Variance Covariance matrix for OLS is given
by: σe2(X’X)-1 . vce         Variances
| diplomau mdnincm     _cons
-------------+---------------------------
diplomau |   254467
mdnincm | -178.899 .187128
_cons | 1.4e+06 -3172.43 9.3e+07
With Heteroskedasticity
• Variance Covariance matrix for OLS is
given by: σe2(X’X)-1
• Variance Covariance matrix under
heteroskedasticity is given by:
(X’X)-1 (X’Ω-1X) (X’X)-1
• Problem: We still don’t know Sigma
• Solution: We can estimate (X’Ω-1X) quite
n
1
e x x '
well using OLS residuals by           2
where xi’ is the row of X for obs. i           i   i i
ni 1
In Stata…
• Specify the “robust” option after regression
. regress turnout diplomau mdnincm, robust

Regression with robust standard errors    Number of obs    =      426
F( 2,     423)   =    33.93
Prob > F         =   0.0000
R-squared        =   0.1291
Root MSE         =    47766

-----------------------------------------------------------------
|            Robust
turnout |    Coef. Std. Err.     t   P>|t| [95% Conf. Interval]
---------+-------------------------------------------------------
diplomau | 1101.359 548.7361    2.01 0.045    22.77008   2179.948
mdnincm | 1.111589 .4638605    2.40 0.017      .19983   2.023347
_cons | 154154.4 9903.283 15.57 0.000      134688.6   173620.1
-----------------------------------------------------------------
Drawbacks
• OLS is still inefficient (though this is not
much of a problem unless
• Requires larger sample sizes to give good
estimates of Std. Errs. (which means t
tests are only OK asymptotically)
• If there is no heteroskedasticity and you
use robust SE’s, you do slightly worse
than regular Std. Errs.
Moral of the Story
• If you know something about the nature of
the heteroskedasticity, WLS is good—
BLUE
• If you don’t, use OLS with robust Std. Errs.
• Now, Group heteroskedasticity…
Group Heteroskedasticity
• No GLS/WLS option
• There is a Robust Std. Err. Option
– Essentially Stacks “clusters” into their own
kind of mini-White correction

```
To top