How to remedy Heteroskedasticity by XFpRNq

VIEWS: 72 PAGES: 44

									Heteroske...what?
           O.L.S. is B.L.U.E.
• BLUE means “Best Linear Unbiased
  Estimator”.
• What does that mean?
• We need to define…
  – Unbiased: The mean of the sampling
    distribution is the true population parameter.
  – What is a sampling distribution? …
     • Imagine taking a sample, finding b, take another
       sample, find b again, and repeat over and over.
       Describes the possible values b can take on in
       repeated sampling
              We hope that…




                        β
If the sampling distribution centers on the true
population mean, our estimates will, “on average” be
right. We get this with the 10 assumptions
 If some assumptions don’t hold…




                        Average ˆ
                    β

• We can get a “biased” estimate. That is,
                  
                 ˆ
               E  
               Bias is Bad
• If your parameter estimates are unbiased,
  your answers (coefficients) relating x and y
  are wrong. They do not describe the true
  relationship.
      Efficiency / Inefficiency
• What makes one “unbiased” estimator
  better than another?
                Efficiency
• Sampling Distributions with less variance
  (smaller standard errors) are more efficient
• OLS is the “Best” linear unbiased
  estimator because its sampling distribution
  has less variance than other estimators.

                         OLS Regression




                             LAV Regression
      Under the 10 regression
    assumptions and assuming
    normally distributed errors…
• We will get estimates using OLS
• Those estimates will be unbiased
• Those estimates will be efficient (the
  “best”)
• They will be the “Best Unbiased Estimator”
  out of all possible estimators
              If we violate…
• Perfect Collinearity or n > k
  – We cannot get any estimates—nothing we
    can do to fix it
• Normal Error Term assumption
  – OLS is BLUE, but not BUE.
• Heteroskedasticity or Serial Correlation
  – OLS is still unbiased, but not efficient
• Everything else (omitted variables,
  endogeneity, linearity)
  – OLS is biased
  What do Bias and Efficiency Mean?


                    β                       β



     
    ˆ
 E  Biased, but very efficient   Unbiased, but inefficient
                                                                     ˆ
                                                                 =E    



                   β                       β


     
    ˆ
 E 
                                                                    
                                                                    ˆ
                                                                =E 
          Biased and inefficient    Unbiased and efficient
    Today: Heteroskedasticity
• Consequence: OLS is still Unbiased, but it
  is not efficient (and std. errors are wrong)
• Today we will learn:
  – How to diagnose Heteroskedasticity
  – How to remedy Heteroskedasticity
     • New Estimator for coefficients and std. errs.
     • Keep OLS estimator but fix std. errs.
   What is heteroskedasticity?
• Heteroskedasticity occurs when the size of
  the errors varies across observations.
  This arises generally in two ways.
  – When increases in an independent variable
    are associated with changes in error in
                20000




    prediction.
                15000 10000
                  sales
              5000

                              0
                -5000




                                  0   20   40           60   80   100
                                             Salespersons
   What is Heteroskedasticity?
• Heteroskedasticity occurs when the size of
  the errors varies across observations.
  This arises generally in two ways.
  – When you have “subgroups” or clusters in
    your data.
     • We might try to predict presidential popularity. We
       measure average popularity in each year. Of
       course, there are “clusters” of years where the
       same president is in office. Because each
       president is unique, the errors in predicting Bush’s
       popularity are likely to be a bit different from the
       errors predicting Clinton’s.
 How do we recognize this beast?
• Three Methods
  – Think about your data—look for analogs of
    the two ways heteroskedasticity can strike.
  – Graphical Analysis
  – Formal statistical test
         Graphical Analysis
                         ˆ
• Plot residuals against y and independent
  variables.
• Expect to see residuals randomly
  clustered around zero
• However, you might see a pattern. This is
  bad.
• Examples…
      2




                                                                             2
      1




                                                                             1
                                                               Residuals
      0




                                                                             0
     -1




                                                                            -1
     -2




                                                                            -2
              0     .2       .4              .6   .8       1                     0     .2     .4       .6    .8      1
                                    x
                                                                                                   x
              As x increases, so does the                                               As x increases, the
                    error variance                                                   error variance decreases
 2




                                                                           scatter resid x
                                                                                                       scatter resid x
 1
 0




                                                                           rvfplot (or scatter resid yhat)
-1
-2




          2              3                    4        5
                             Fitted values
     As the predicted value of y increases,
         So does the error variance
25
                                        Good Examples…




                                                                            5
20
15




                                                               Residuals




                                                                            0
10 y
   5
   0




                                                                           -5
           0       2        4             6     8    10
                                  x                                             0   2       4       6     8   10
                                                                                                x



                       scatter y x                                                      scatter resid x
       5
       0
   -5




               0        5             10        15        20
                                Fitted values


                   rvfplot (scatter resid yhat)
      Formal Statistical Tests
• White’s Test
  – Heteroskedasticity occurs when the size of
    the errors is correlated with one or more
    independent variables.
  – We can run OLS, get the residuals, and then
    see if they are correlated with the
    independent variables
                        More Formally,
        turnouti  a  1101.4  diplomaui   1.1 mdnincmi   ei
        turnouti  a  1101.4  diplomaui   1.1 mdnincmi 
           ˆ
        turnout  turnout  ei
                     ˆ

state     district   turnout   diplomau mdnincm pred_turnout residual

 AL           1      151,188     14.7     $27,360   200,757.4     -49,569.4

 AL           2      216,788     16.7     $29,492       205,330   11,457.96

 AL           3      147,317     12.3     $26,800   197,491.7     -50,174.7

 AL           4      226,409     8.1      $25,401   191,310.8     35,098.16

 AL           5      186,059     20.4     $33,189   213,514.6     -27,455.6

e   0  1 x1   2 x2   x  1 x2  1 x1 x2  error
 2
 i                                      1 1
                                           2        2
     So, if error increases with x, we
        violate heteroskedasticity
 ei2   0  1 x1   2 x2  1 x12  1 x2 2  1 x1 x2  error
•If we can predict error



                                   4
with a regression line,
we have

                                   2
heteroskasticity.
                      Residuals




•To make this
prediction, we need to
                                   0




make everything
positive (square it)
                                  -2




                                       0   .2   .4       .6   .8   1
                                                     x
     So, if error increases with x, we
        violate homoskedasticity
 ei2   0  1 x1   2 x2  1 x12  1 x2 2  1 x1 x2  error
•Finally, we use these



                                    15
squared residuals as
the dependent variable

                                    10
                 squared_residual




in a new regression.
•If we can predict
increases/decreases in
                                     5




the size of the residual,
we have found
                                     0




evidence of                   0     .2     .4    .6     .8     1
                                              x
heteroskedasticity
•For Ind. Vars., we use the same ones as in the original
regression plus their squares and their cross-products.
                The Result…
• Take the r2 from this regression and multiply it
  by n.
• This test statistic is distributed χ2 with degrees of
  freedom equal to the number of independent
  variables in the 2nd regression
• In other words, r2*n is the χ2 you calculate from
  your data, compare it to a critical χ2* from a χ2
  table. If your χ2 is greater than χ2* then you
  reject the null hypothesis (of homoskedasticity)
                  A Sigh of Relief…
• Stata will calculate this for you
• After running the regression, type
  imtest, white
. imtest, white

White's test for Ho: homoskedasticity
         against Ha: unrestricted heteroskedasticity

         chi2(5)       =     9.97
         Prob > chi2   =   0.0762

Cameron & Trivedi's decomposition of IM-test

---------------------------------------------------
              Source |       chi2     df      p
---------------------+-----------------------------
  Heteroskedasticity |       9.97      5    0.0762
            Skewness |       3.96      2    0.1378
            Kurtosis | -28247.96       1    1.0000
---------------------+-----------------------------
               Total | -28234.03       8    1.0000
---------------------------------------------------
    An Alternative Test: Breusch/Pagan

•     Based on similar logic
•     Three changes:
     1. Instead of using e2 as the D.V. in the 2nd
                         ei2
        regression, use 2 where  e 
                         e
                          ˆ
                                      ˆ2        2
                                                ei / n
     2. Instead of using every variable (plus
        squares and cross-products), you specify
        the variables you think are causing the
        heteroskedasticity
     – Alternatively, use only   ˆ
                                 y   as a “catch all”
An Alternative Test: Breusch/Pagan
 3. Test Statistic is RegSS from 2nd regression
   divided by 2. It is distributed χ2 with degrees
   of freedom equal to the number of
   independent variables in the 2nd regression.
             Stata Command: hettest
. hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
         Ho: Constant variance
         Variables: fitted values of turnout

            chi2(1)       =     8.76
            Prob > chi2   =   0.0031

. hettest senate

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
         Ho: Constant variance
         Variables: senate

            chi2(1)       =     4.59
            Prob > chi2   =   0.0321

. hettest , rhs

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
         Ho: Constant variance
         Variables: diplomau mdnincm senate guber

            chi2(4)       =    11.33
            Prob > chi2   =   0.0231
 What are you gonna do about it?
• Two Remedies
  – We might need to try a different estimator.
    This will be the “Generalized Least Squares”
    estimator. This “GLS” Estimator can be
    applied to data with heteroskedasticity and
    serial correlation.
  – OLS is still consistent (just inefficient) and
    Standard Errors are wrong. We could fix the
    standard errors and stick with OLS.
     Generalized Least Squares
• When used to correct heteroskedasticity, we refer
  to GLS as “Weighted Least Squares” or WLS.
• Intuition:
  – Some data points


                            7
    have better quality

                            6
    information about
                            5
    the regression line
                            4



    than others because
                            3




    they have less error.
  - We should give those
                            2




                                0   .2   .4              .6       .8   1
                                                  x
    observations more
                                              Fitted values   y
    weight.
      Non-Constant Variance
• We want constant error variance for all
  observations,
      E(ei2) = σ 2 , estimated by RMSE
• However, with Heteroskedasticity, error
  variance (σi2) is not constant
      E(ei2) = σi2, not constant (indexed by i)
• If we know what σi2 is, we can re-weight
  the equation to make the error variance
  constant
      Re-weighting the regression
 yi  a  bxi  ei                Begin with the formula

 yi  ax0i  bxi  ei             Add x0i, a variable that is always 1

 yi     x0i   xi      ei
    a   b                      Divide through by σi to weight it
i      i   i       i
                                      We can simplify notation and
yi  ax0i  bxi  ei
 *      *     *    *
                                      show it’s really just a regression
                                      with transformed variables.
                              2
                      ei          Last, we just need to show that
var(e )  E (e )  E  
      *      * 2

                      i 
      i      i
                                    the transformation makes the
                                    new error term, ei*, constant
     ei 2  1          1 2
 E  2   2 E (ei )  2  i  1
                   2

     i  i          i
                       GLS vs. OLS
• In OLS, we minimize the sum of the squared
  errors:
  e                 y  y     y  a  bx 
                                      2                   2
          i
              2
                            ˆ
• In GLS, y we minimize a weighted sum of the
  squared errors.
                      2              1
      y     x0i   x        let w  2
  
  a           b                i
     i    i      i 
                                            w  y  ax0i  bx 
                                                                    2
                          2
     1                 
    y  ax0i  bx  
                                             wy  awx0i  bwx 
                                                                          2
     i                
                                  2       Set partial derivatives to 0,
      1
             y  ax0i  bx            solve for a and b to get
        i
          2
                                          eqs.
              GLS vs. OLS
                        y  a  bx 
                                    2
• Minimize Errors:
• Minimize Weighted Errors:   wy  awx0i  bwx 
                                                   2


• GLS (WLS) is just doing OLS with
  transformed variables.
• In the same way that we “transformed” a
  non-linear data to fit the assumptions of
  OLS, we can “transform” the data with
  weights to help heteroskedastic data meet
  the assumptions of OLS
                      GLS vs. OLS
• In Matrix form,
  – OLS: b = (x’x)-1x’y

  – GLS: b*= (x’Ω-1x)-1x’ Ω-1y


• Weights are included in a matrix, Ω-1
                                      1                      
                                      2   0        0    0 
      i2   0        0    0         i                      
                                         1                 
     0          2
                      0    0  1     0             0    0 
               i
                               ,             2
                                                              
     0      0             0
                                                i

                           2
                                      0    0             0 
     0
            0        0   i 
                                                            
                                      0                  1 
                                            0        0
                                     
                                                         i2 
                                                              
                  Problem:
• We rarely know exactly how to weight our
  data
• Solutions:
  – Plan A: If heteroskedasticity comes from one
    specific variable, we can use that variable as
    the “weight”
  – Alternatively, we could run OLS and use the
    residuals to estimate the weights
    (observations with large OLS residuals get
    little weight in the WLS estimates)
  Plan A: A Single, Known, Villain
• Example: Household income
• Households that earn little must spend it all on
  necessities. When income is low, there is little
  variance in spending.
• Households that earn a great deal can either
  spend it all or buy just essentials and save the
  rest. More error variance as income increases
     yi  a  b1 x1  b2 x2  ei
           1    x1   x2 ei
     yi  a  b1  b2 
           x1   x1   x1 x1
• Note the changes in interpretation
  Plan B: Estimate the weights
• Running OLS, get an estimate of the
  residuals
• Regress those residuals (squared) on the
  set of independent variables and get
  predicted values
• Use those predicted values as the weights
• Because this is GLS that is “doable”, it is
  called “Feasible GLS” or FGLS
• FGLS is asymptotically equal to GLS as
  sample size goes to infinity
       I don’t want to do GLS
• I don’t blame you
• Usually best if we know something about
  the nature of the heteroskedasticity
• OLS was unbiased, why can’t we just use
  that?
  – Inefficient (but only problematic with very
    severe heteroskedasticity)
  – Incorrect Standard Errors (formula changes)
• What if we could just fix standard errors?
       White Standard Errors
• We can use OLS and just fix the Standard
  Errors. There are a number of ways to do
  this, but the classic is “White Standard
  Errors”
• Number of names for this
  – White Std. Errs.
  – Huber-White Std. Errs.
  – Robust Std. Errs.
  – Heteroskedastic Consistent Std. Errs.
                The big idea…
• In OLS, Standard Errors come from the
  Variance-Covariance Matrix.
  – Std. Err. is the Std. Dev. Of a Sampling Distribution
  – Variance is the square of the Standard Deviation
    (Std. Dev. is the square root of variance)
• Variance Covariance matrix for OLS is given
  by: σe2(X’X)-1 . vce         Variances
                                 | diplomau mdnincm     _cons
                    -------------+---------------------------
                        diplomau |   254467
                         mdnincm | -178.899 .187128
                           _cons | 1.4e+06 -3172.43 9.3e+07
        With Heteroskedasticity
• Variance Covariance matrix for OLS is
  given by: σe2(X’X)-1
• Variance Covariance matrix under
  heteroskedasticity is given by:
           (X’X)-1 (X’Ω-1X) (X’X)-1
• Problem: We still don’t know Sigma
• Solution: We can estimate (X’Ω-1X) quite
                                      n
                                    1
                                           e x x '
  well using OLS residuals by           2
  where xi’ is the row of X for obs. i           i   i i
                                         ni 1
                       In Stata…
   • Specify the “robust” option after regression
. regress turnout diplomau mdnincm, robust

Regression with robust standard errors    Number of obs    =      426
                                          F( 2,     423)   =    33.93
                                          Prob > F         =   0.0000
                                          R-squared        =   0.1291
                                          Root MSE         =    47766

-----------------------------------------------------------------
         |            Robust
 turnout |    Coef. Std. Err.     t   P>|t| [95% Conf. Interval]
---------+-------------------------------------------------------
diplomau | 1101.359 548.7361    2.01 0.045    22.77008   2179.948
 mdnincm | 1.111589 .4638605    2.40 0.017      .19983   2.023347
   _cons | 154154.4 9903.283 15.57 0.000      134688.6   173620.1
-----------------------------------------------------------------
               Drawbacks
• OLS is still inefficient (though this is not
  much of a problem unless
  heteroskedasticity is really bad)
• Requires larger sample sizes to give good
  estimates of Std. Errs. (which means t
  tests are only OK asymptotically)
• If there is no heteroskedasticity and you
  use robust SE’s, you do slightly worse
  than regular Std. Errs.
          Moral of the Story
• If you know something about the nature of
  the heteroskedasticity, WLS is good—
  BLUE
• If you don’t, use OLS with robust Std. Errs.
• Now, Group heteroskedasticity…
    Group Heteroskedasticity
• No GLS/WLS option
• There is a Robust Std. Err. Option
  – Essentially Stacks “clusters” into their own
    kind of mini-White correction

								
To top