VIEWS: 72 PAGES: 44 POSTED ON: 2/7/2012 Public Domain
Heteroske...what? O.L.S. is B.L.U.E. • BLUE means “Best Linear Unbiased Estimator”. • What does that mean? • We need to define… – Unbiased: The mean of the sampling distribution is the true population parameter. – What is a sampling distribution? … • Imagine taking a sample, finding b, take another sample, find b again, and repeat over and over. Describes the possible values b can take on in repeated sampling We hope that… β If the sampling distribution centers on the true population mean, our estimates will, “on average” be right. We get this with the 10 assumptions If some assumptions don’t hold… Average ˆ β • We can get a “biased” estimate. That is, ˆ E Bias is Bad • If your parameter estimates are unbiased, your answers (coefficients) relating x and y are wrong. They do not describe the true relationship. Efficiency / Inefficiency • What makes one “unbiased” estimator better than another? Efficiency • Sampling Distributions with less variance (smaller standard errors) are more efficient • OLS is the “Best” linear unbiased estimator because its sampling distribution has less variance than other estimators. OLS Regression LAV Regression Under the 10 regression assumptions and assuming normally distributed errors… • We will get estimates using OLS • Those estimates will be unbiased • Those estimates will be efficient (the “best”) • They will be the “Best Unbiased Estimator” out of all possible estimators If we violate… • Perfect Collinearity or n > k – We cannot get any estimates—nothing we can do to fix it • Normal Error Term assumption – OLS is BLUE, but not BUE. • Heteroskedasticity or Serial Correlation – OLS is still unbiased, but not efficient • Everything else (omitted variables, endogeneity, linearity) – OLS is biased What do Bias and Efficiency Mean? β β ˆ E Biased, but very efficient Unbiased, but inefficient ˆ =E β β ˆ E ˆ =E Biased and inefficient Unbiased and efficient Today: Heteroskedasticity • Consequence: OLS is still Unbiased, but it is not efficient (and std. errors are wrong) • Today we will learn: – How to diagnose Heteroskedasticity – How to remedy Heteroskedasticity • New Estimator for coefficients and std. errs. • Keep OLS estimator but fix std. errs. What is heteroskedasticity? • Heteroskedasticity occurs when the size of the errors varies across observations. This arises generally in two ways. – When increases in an independent variable are associated with changes in error in 20000 prediction. 15000 10000 sales 5000 0 -5000 0 20 40 60 80 100 Salespersons What is Heteroskedasticity? • Heteroskedasticity occurs when the size of the errors varies across observations. This arises generally in two ways. – When you have “subgroups” or clusters in your data. • We might try to predict presidential popularity. We measure average popularity in each year. Of course, there are “clusters” of years where the same president is in office. Because each president is unique, the errors in predicting Bush’s popularity are likely to be a bit different from the errors predicting Clinton’s. How do we recognize this beast? • Three Methods – Think about your data—look for analogs of the two ways heteroskedasticity can strike. – Graphical Analysis – Formal statistical test Graphical Analysis ˆ • Plot residuals against y and independent variables. • Expect to see residuals randomly clustered around zero • However, you might see a pattern. This is bad. • Examples… 2 2 1 1 Residuals 0 0 -1 -1 -2 -2 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 x x As x increases, so does the As x increases, the error variance error variance decreases 2 scatter resid x scatter resid x 1 0 rvfplot (or scatter resid yhat) -1 -2 2 3 4 5 Fitted values As the predicted value of y increases, So does the error variance 25 Good Examples… 5 20 15 Residuals 0 10 y 5 0 -5 0 2 4 6 8 10 x 0 2 4 6 8 10 x scatter y x scatter resid x 5 0 -5 0 5 10 15 20 Fitted values rvfplot (scatter resid yhat) Formal Statistical Tests • White’s Test – Heteroskedasticity occurs when the size of the errors is correlated with one or more independent variables. – We can run OLS, get the residuals, and then see if they are correlated with the independent variables More Formally, turnouti a 1101.4 diplomaui 1.1 mdnincmi ei turnouti a 1101.4 diplomaui 1.1 mdnincmi ˆ turnout turnout ei ˆ state district turnout diplomau mdnincm pred_turnout residual AL 1 151,188 14.7 $27,360 200,757.4 -49,569.4 AL 2 216,788 16.7 $29,492 205,330 11,457.96 AL 3 147,317 12.3 $26,800 197,491.7 -50,174.7 AL 4 226,409 8.1 $25,401 191,310.8 35,098.16 AL 5 186,059 20.4 $33,189 213,514.6 -27,455.6 e 0 1 x1 2 x2 x 1 x2 1 x1 x2 error 2 i 1 1 2 2 So, if error increases with x, we violate heteroskedasticity ei2 0 1 x1 2 x2 1 x12 1 x2 2 1 x1 x2 error •If we can predict error 4 with a regression line, we have 2 heteroskasticity. Residuals •To make this prediction, we need to 0 make everything positive (square it) -2 0 .2 .4 .6 .8 1 x So, if error increases with x, we violate homoskedasticity ei2 0 1 x1 2 x2 1 x12 1 x2 2 1 x1 x2 error •Finally, we use these 15 squared residuals as the dependent variable 10 squared_residual in a new regression. •If we can predict increases/decreases in 5 the size of the residual, we have found 0 evidence of 0 .2 .4 .6 .8 1 x heteroskedasticity •For Ind. Vars., we use the same ones as in the original regression plus their squares and their cross-products. The Result… • Take the r2 from this regression and multiply it by n. • This test statistic is distributed χ2 with degrees of freedom equal to the number of independent variables in the 2nd regression • In other words, r2*n is the χ2 you calculate from your data, compare it to a critical χ2* from a χ2 table. If your χ2 is greater than χ2* then you reject the null hypothesis (of homoskedasticity) A Sigh of Relief… • Stata will calculate this for you • After running the regression, type imtest, white . imtest, white White's test for Ho: homoskedasticity against Ha: unrestricted heteroskedasticity chi2(5) = 9.97 Prob > chi2 = 0.0762 Cameron & Trivedi's decomposition of IM-test --------------------------------------------------- Source | chi2 df p ---------------------+----------------------------- Heteroskedasticity | 9.97 5 0.0762 Skewness | 3.96 2 0.1378 Kurtosis | -28247.96 1 1.0000 ---------------------+----------------------------- Total | -28234.03 8 1.0000 --------------------------------------------------- An Alternative Test: Breusch/Pagan • Based on similar logic • Three changes: 1. Instead of using e2 as the D.V. in the 2nd ei2 regression, use 2 where e e ˆ ˆ2 2 ei / n 2. Instead of using every variable (plus squares and cross-products), you specify the variables you think are causing the heteroskedasticity – Alternatively, use only ˆ y as a “catch all” An Alternative Test: Breusch/Pagan 3. Test Statistic is RegSS from 2nd regression divided by 2. It is distributed χ2 with degrees of freedom equal to the number of independent variables in the 2nd regression. Stata Command: hettest . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of turnout chi2(1) = 8.76 Prob > chi2 = 0.0031 . hettest senate Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: senate chi2(1) = 4.59 Prob > chi2 = 0.0321 . hettest , rhs Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: diplomau mdnincm senate guber chi2(4) = 11.33 Prob > chi2 = 0.0231 What are you gonna do about it? • Two Remedies – We might need to try a different estimator. This will be the “Generalized Least Squares” estimator. This “GLS” Estimator can be applied to data with heteroskedasticity and serial correlation. – OLS is still consistent (just inefficient) and Standard Errors are wrong. We could fix the standard errors and stick with OLS. Generalized Least Squares • When used to correct heteroskedasticity, we refer to GLS as “Weighted Least Squares” or WLS. • Intuition: – Some data points 7 have better quality 6 information about 5 the regression line 4 than others because 3 they have less error. - We should give those 2 0 .2 .4 .6 .8 1 x observations more Fitted values y weight. Non-Constant Variance • We want constant error variance for all observations, E(ei2) = σ 2 , estimated by RMSE • However, with Heteroskedasticity, error variance (σi2) is not constant E(ei2) = σi2, not constant (indexed by i) • If we know what σi2 is, we can re-weight the equation to make the error variance constant Re-weighting the regression yi a bxi ei Begin with the formula yi ax0i bxi ei Add x0i, a variable that is always 1 yi x0i xi ei a b Divide through by σi to weight it i i i i We can simplify notation and yi ax0i bxi ei * * * * show it’s really just a regression with transformed variables. 2 ei Last, we just need to show that var(e ) E (e ) E * * 2 i i i the transformation makes the new error term, ei*, constant ei 2 1 1 2 E 2 2 E (ei ) 2 i 1 2 i i i GLS vs. OLS • In OLS, we minimize the sum of the squared errors: e y y y a bx 2 2 i 2 ˆ • In GLS, y we minimize a weighted sum of the squared errors. 2 1 y x0i x let w 2 a b i i i i w y ax0i bx 2 2 1 y ax0i bx wy awx0i bwx 2 i 2 Set partial derivatives to 0, 1 y ax0i bx solve for a and b to get i 2 eqs. GLS vs. OLS y a bx 2 • Minimize Errors: • Minimize Weighted Errors: wy awx0i bwx 2 • GLS (WLS) is just doing OLS with transformed variables. • In the same way that we “transformed” a non-linear data to fit the assumptions of OLS, we can “transform” the data with weights to help heteroskedastic data meet the assumptions of OLS GLS vs. OLS • In Matrix form, – OLS: b = (x’x)-1x’y – GLS: b*= (x’Ω-1x)-1x’ Ω-1y • Weights are included in a matrix, Ω-1 1 2 0 0 0 i2 0 0 0 i 1 0 2 0 0 1 0 0 0 i , 2 0 0 0 i 2 0 0 0 0 0 0 i 0 1 0 0 i2 Problem: • We rarely know exactly how to weight our data • Solutions: – Plan A: If heteroskedasticity comes from one specific variable, we can use that variable as the “weight” – Alternatively, we could run OLS and use the residuals to estimate the weights (observations with large OLS residuals get little weight in the WLS estimates) Plan A: A Single, Known, Villain • Example: Household income • Households that earn little must spend it all on necessities. When income is low, there is little variance in spending. • Households that earn a great deal can either spend it all or buy just essentials and save the rest. More error variance as income increases yi a b1 x1 b2 x2 ei 1 x1 x2 ei yi a b1 b2 x1 x1 x1 x1 • Note the changes in interpretation Plan B: Estimate the weights • Running OLS, get an estimate of the residuals • Regress those residuals (squared) on the set of independent variables and get predicted values • Use those predicted values as the weights • Because this is GLS that is “doable”, it is called “Feasible GLS” or FGLS • FGLS is asymptotically equal to GLS as sample size goes to infinity I don’t want to do GLS • I don’t blame you • Usually best if we know something about the nature of the heteroskedasticity • OLS was unbiased, why can’t we just use that? – Inefficient (but only problematic with very severe heteroskedasticity) – Incorrect Standard Errors (formula changes) • What if we could just fix standard errors? White Standard Errors • We can use OLS and just fix the Standard Errors. There are a number of ways to do this, but the classic is “White Standard Errors” • Number of names for this – White Std. Errs. – Huber-White Std. Errs. – Robust Std. Errs. – Heteroskedastic Consistent Std. Errs. The big idea… • In OLS, Standard Errors come from the Variance-Covariance Matrix. – Std. Err. is the Std. Dev. Of a Sampling Distribution – Variance is the square of the Standard Deviation (Std. Dev. is the square root of variance) • Variance Covariance matrix for OLS is given by: σe2(X’X)-1 . vce Variances | diplomau mdnincm _cons -------------+--------------------------- diplomau | 254467 mdnincm | -178.899 .187128 _cons | 1.4e+06 -3172.43 9.3e+07 With Heteroskedasticity • Variance Covariance matrix for OLS is given by: σe2(X’X)-1 • Variance Covariance matrix under heteroskedasticity is given by: (X’X)-1 (X’Ω-1X) (X’X)-1 • Problem: We still don’t know Sigma • Solution: We can estimate (X’Ω-1X) quite n 1 e x x ' well using OLS residuals by 2 where xi’ is the row of X for obs. i i i i ni 1 In Stata… • Specify the “robust” option after regression . regress turnout diplomau mdnincm, robust Regression with robust standard errors Number of obs = 426 F( 2, 423) = 33.93 Prob > F = 0.0000 R-squared = 0.1291 Root MSE = 47766 ----------------------------------------------------------------- | Robust turnout | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+------------------------------------------------------- diplomau | 1101.359 548.7361 2.01 0.045 22.77008 2179.948 mdnincm | 1.111589 .4638605 2.40 0.017 .19983 2.023347 _cons | 154154.4 9903.283 15.57 0.000 134688.6 173620.1 ----------------------------------------------------------------- Drawbacks • OLS is still inefficient (though this is not much of a problem unless heteroskedasticity is really bad) • Requires larger sample sizes to give good estimates of Std. Errs. (which means t tests are only OK asymptotically) • If there is no heteroskedasticity and you use robust SE’s, you do slightly worse than regular Std. Errs. Moral of the Story • If you know something about the nature of the heteroskedasticity, WLS is good— BLUE • If you don’t, use OLS with robust Std. Errs. • Now, Group heteroskedasticity… Group Heteroskedasticity • No GLS/WLS option • There is a Robust Std. Err. Option – Essentially Stacks “clusters” into their own kind of mini-White correction