VIEWS: 30 PAGES: 38 POSTED ON: 4/29/2012 Public Domain
Multiple Regression -II KNN Ch7 1 Extra Sum of Squares (ESS) Marginal reduction in SSE when one or several predictor variables are added to the regression model given that the other variables are already in the model. In what other, equivalent manner, can you state the above? The word “Extra” is used since we would like to know what the marginal contribution (or extra contribution) is of a variable or a set of variables when added as explanatory variables to the regression model 2 Decomposition of SSR into ESS A pictorial representation is also possible. See page 261, Fig. 7.1 of KNN SSR(X2) SSR(X1, X2) SSR(X1|X2) SSE(X2) SSE(X1, X2) 3 Decomposition of SSR into ESS For two or three explanatory variables the formulae are quite easy. With two variables we have, SSR( X 2 | X 1 ) SSE ( X 1 ) SSE ( X 1 , X 2 ) SSR( X 1 , X 2 ) SSR( X 1 ) And with three variables, SSR( X 3 | X 1 , X 2 ) SSE ( X 1 , X 2 ) SSE ( X 1 , X 2 , X 3 ) SSR( X 1 , X 2 , X 3 ) SSR( X 1 , X 2 ) ConsideringX3 Considering Y Considering Y adjusted for X1 and adjusted for X1 and adjusted for X1 and X2 as the response bvariable, and X2 as the predictor, X2 as the response X3 adjusted for X1 and X2 as the this would be SSR bvariable, this predictor, this would be the SSE would be the SSTO 4 Decomposition of SSR into ESS Note that with three variables, we may also have, SSR( X 2 , X 3 | X 1 ) SSE( X 1 ) SSE( X 1 , X 2 , X 3 ) To test the hypothesis, H 0 : k 0 v/s H a : k 0 , the test statistic is given as, SSR( X 3 | X 1 , X 2 ) / 1 F* SSE ( X 1 , X 2 , X 3 ) /(n 4) To test (say), H 0 : 2 3 0v/s , H 0 : 2 , 3 not both 0 ,the test statistic is given as, SSR( X 2 , X 3 | X 1 ) / 2 F* SSE ( X 1 , X 2 , X 3 ) /(n 4) 5 Decomposition of SSR into ESS In general however we can write, * R 2 RR / df R df F 2 1 RF2 / dfF F F This form is very convenient to use since we do not have 1 to keep track of the individual sums of squares Also, this form will minimize any errors due to subtraction when calculating the SSRs On the next page we see the ANOVA table with decomposition of SSR and three variables 6 The ANOVA Table Source of variation Sum of squares df Mean Squares Regression SSR( X 1 , X 2 , X 3 ) 3 MSR ( X 1 , X 2 , X 3 ) X1 SSR ( X 1 ) 1 MSR( X 1 ) X 2 | X1 SSR( X 2 | X 1 ) 1 MSR( X 2 | X 1 ) X 3 | X1, X 2 SSR( X 3 | X 1 , X 2 ) 1 MSR( X 3 | X 1 , X 2 ) Error SSE n-4 MSE ( X 1 , X 2 , X 3 ) Total SSTO n-1 7 Another ANOVA Table (what’s the difference?) Source of variation Sum of squares df Mean Squares Regression SSR( X 1 , X 2 , X 3 ) 3 MSR ( X 1 , X 2 , X 3 ) 1 X3 SSR( X 3 ) MSR( X 1 ) SSR( X 2 | X 3 ) 1 MSR ( X 2 | X 3 ) X2 | X3 X 1 | X 2 , X 3 SSR( X 1 | X 2 , X 3 ) 1 MSR( X 1 | X 2 , X 3 ) Error n-4 MSE ( X 1 , X 2 , X 3 ) SSE Total n-1 SSTO 8 An Example The regression equation is Y = 236 - 0.203 X1 + 9.09 X2 - 0.330 X3 Predictor Coeff. StDev. T P Constant 236.1 254.5 0.93 0.355 X1 -0.20286 0.05894 -3.44 0.001 X2 9.090 1.718 5.29 0.000 X3 -0.3303 0.2229 -1.48 0.141 S = 1802 R-Sq = 95.7% R-Sq(adj) = 95.6% Analysis of Variance Source DF SS MS F P Regression 3 9833046236 3277682079 1009.04 0.000 Error 137 445017478 3248303 Total 140 10278063714 Source DF Seq SS X1 1 80601012 X2 1 9745311037 X3 1 7134188 Source DF Seq SS X3 1 9733071257 X2 1 61498868 X1 1 38476111 9 Test for a βk=0, in a general model Full model with all variables, Yi 0 1 X i1 ... k 1 X i ,k 1 k X ik k 1 X i ,k 1 ... p 1 X i , p 1 i Compute, SSR( X 1 ,..., X k 1 , X k , X k 1 ,..., X p 1 ) Reduced model without Xk Yi 0 1 X i1 ... k 1 X i ,k 1 k 1 X i ,k 1 ... p 1 X i , p 1 i Compute, SSR( X k | X 1 ,..., X k 1 , X k 1 ,..., X p 1 ) SSR( X 1 ,..., X k 1 , X k , X k 1 ,..., X p 1 ) SSR( X 1 ,..., X k 1 , X k 1 ,..., X p 1 ) The test statistic is, SSR( X k | X 1 ,...,X k 1 , X k 1 ,...,X p 1 ) / 1 F * SSE( X 1 ,...,X k 1 , X k , X k 1 ,...,X p 1 ) /(n p) 10 The regression equation is An Example Y = 236 - 0.203 X1 + 9.09 X2 - 0.330 X3 Predictor Coef StDev T P Constant 236.1 254.5 0.93 0.355 X1 -0.20286 0.05894 -3.44 0.001 X2 9.090 1.718 5.29 0.000 X3 -0.3303 0.2229 -1.48 0.141 S = 1802 R-Sq = 95.7% R-Sq(adj) = 95.6% Analysis of Variance Source DF SS MS F P Regression 3 9833046236 3277682079 1009.04 0.000 Error 137 445017478 3248303 Total 140 10278063714 The regression equation is Y = 881 - 0.0918 X1 + 0.846 X3 Predictor Coef StDev T P Constant 881.4 244.2 3.61 0.000 X1 -0.09185 0.06023 -1.52 0.130 X3 0.84614 0.01696 49.88 0.000 S = 1971 R-Sq = 94.8% R-Sq(adj) = 94.7% Analysis of Variance Source DF SS MS F P Regression 2 9742103306 4871051653 1254.21 0.000 Error 138 535960409 3883771 11 Total 140 10278063714 Test for some βk=0, in a general model See (7.26) pg. 267 of KNN Full model with all variables, Yi 0 1 X i1 ... q 1 X i ,q 1 q X iq q 1 X i ,q 1 ... p 1 X i , p 1 i Compute, SSR( X 1 ,..., X q 1 , X q ,..., X p 1 ) Reduced model without the “vector”Xk Yi 0 1 X i1 ... q 1 X i ,q 1 i Compute, SSR( X q ,..., X p 1 | X 1 ,..., X q 1 ) SSR( X 1 ,..., X q 1 , X q ,..., X p 1 ) OR, SSR( X 1 ,..., X q 1 ) SSR( X q | X 1 ,..., X q 1 ) SSR( X q 1 | X 1 ,..., X q ) ... SSR( X p 1 | X 1 ,..., X p 2 ) The test statistic is, F * SSR( X q ,...X p 1 | X 1 ,...,X q 1 ) / p q * R 2 Y .1 ...p 1 RY .1 ...q 1 / p q 2 SSE( X 1 ,...,X p 1 ) /(n p) , or, F 1 R 2 Y .1 ...p 1 /(n p) 12 The regression equation is An Example Y = 236 - 0.203 X1 + 9.09 X2- 0.330 X3 Predictor Coef StDev T P Constant 236.1 254.5 0.93 0.355 X1 -0.20286 0.05894 -3.44 0.001 X2 9.090 1.718 5.29 0.000 X3 -0.3303 0.2229 -1.48 0.141 S = 1802 R-Sq = 95.7% R-Sq(adj) = 95.6% Analysis of Variance Source DF SS MS F P Regression 3 9833046236 3277682079 1009.04 0.000 Error 137 445017478 3248303 Total 140 10278063714 The regression equation is Y = 14 + 6.50 X2 Predictor Coef StDev T P Constant 14.4 194.9 0.07 0.941 X2 6.4957 0.1225 53.05 0.000 S = 1866 R-Sq = 95.3% R-Sq(adj) = 95.3% Analysis of Variance Source DF SS MS F P Regression 1 9794265737 9794265737 2813.99 0.000 13 Residual Error 139 483797978 3480561 Total 140 10278063714 Test for βk= βq, in a general model Full model with all variables, Yi 0 1 X i1 ... k X ik ... q X i ,q ... p 1 X i , p 1 i Compute, SSR( X 1 ,..., X k ,..., X q ,..., X p 1 ) Reduced model with Xk+Xq Yi 0 1 X i1 ... k ( X ik X iq ) ... p 1 X i , p 1 i Compute, SSR( X 1 ,..., X k X q ,..., X p 1 ) SSR( X 1 ,...,X k ,...,X q ,...,X p 1 ) / 1 SSR( X 1 ,...,X k X q ,...,X p 1 ) / 1 F * SSE( X 1 ,...,X k ,...,X q ,...,X p 1 ) /(n p) Also, when test say k , q , or even the above hypothesis, ing i.e. k q , one can use the General Linear Test approach outlined in KNN. 14 An Example The regression equation is Y = 236 - 0.203 X1 + 9.09 X2- 0.330 X3 Predictor Coef StDev T P Constant 236.1 254.5 0.93 0.355 X1 -0.20286 0.05894 -3.44 0.001 X2 9.090 1.718 5.29 0.000 X3 -0.3303 0.2229 -1.48 0.141 S = 1802 R-Sq = 95.7% R-Sq(adj) = 95.6% Analysis of Variance Source DF SS MS F P Regression 3 9833046236 3277682079 1009.04 0.000 Error 137 445017478 3248303 Total 140 10278063714 The regression equation is Y = 324 - 0.200 (X1+X3) + 8.09 X2 Predictor Coef StDev T P Constant 324.2 208.7 1.55 0.123 (X1+X3) -0.19971 0.05858 -3.41 0.001 X2 8.0891 0.4820 16.78 0.000 S = 1798 R-Sq = 95.7% R-Sq(adj) = 95.6% Analysis of Variance Source DF SS MS F P Regression 2 9831847860 4915923930 1520.33 0.000 15 Residual Error 138 446215854 3233448 Total 140 10278063714 Coefficients of Partial Determination Recall the definition of the coefficient of (multiple) determination): R-sq is the proportionate reduction in Y variation when the set of X variables is considered in the model. Now consider a coefficient of partial determination: R-sq for a predictor, given the presence of a set of predictors in the model, measures the marginal contribution of each variable given that others are already in the model. A graphical representation of the strength of the relationship between Y and X1, adjusted for X2, is provided by partial regression plots (see HW6) 16 Coefficients of Partial Determination For a model with two independent variables: Interpret this: r 2 SSR( X 1 | X 2 ) , r 2 SSR( X 2 | X 1 ) Y 1 .2 Y 2 .1 SSE ( X 2 ) SSE ( X 1 ) Generalization is easy, for e.g., r2 SSR( X 1 | X 2 , X 3 ) Y 1.23 SSE( X 2 , X 3 ) SSR( X 3 | X 1 , X 2 , X 4 ) rY23.124 SSE ( X 1 , X 2 , X 4 ) etc. Is there an alternate interpretation of the above partial 2 coefficients? What, is say r12.Y 3 ?? 17 An Example The regression equation is Y = - 4.9 + 1.12 X1 Predictor Coef StDev T P Constant -4.92 51.52 -0.10 0.924 X1 1.1209 0.9349 1.20 0.233 S = 87.46 R-Sq = 1.0% R-Sq(adj) = 0.3% Analysis of Variance Source DF SS MS F P Regression 1 10995 10995 1.44 0.233 Residual Error 139 1063300 7650 Total 140 1074295 The regression equation is Y = - 6.17 + 0.144 X2 Predictor Coef StDev T P Constant -6.167 2.075 -2.97 0.003 X2 0.144481 0.002842 50.83 0.000 S = 19.86 R-Sq = 94.9% R-Sq(adj) = 94.9% Analysis of Variance Source DF SS MS F P Regression 1 1019453 1019453 2583.84 0.000 Residual Error 139 54842 395 18 Total 140 1074295 Another Example The regression equation is: Y = 236 - 0.203 X1 + 9.09 X2 - 0.330 X3 Predictor Coef StDev T P Constant 236.1 254.5 0.93 0.355 X1 -0.20286 0.05894 -3.44 0.001 X2 9.090 1.718 5.29 0.000 X3 -0.3303 0.2229 -1.48 0.141 S = 1802 R-Sq = 95.7% R-Sq(adj) = 95.6% Analysis of Variance Source DF SS MS F P Regression 3 9833046236 3277682079 1009.04 0.000 Residual Error 137 445017478 3248303 Total 140 10278063714 Source DF Seq SS X1 1 80601012 X2 1 9745311037 X3 1 7134188 The regression equation is: Y = 408 - 0.173 X1 + 6.55 X2 Predictor Coef StDev T P Constant 407.8 227.6 1.79 0.075 X1 -0.17253 0.05551 -3.11 0.002 X2 6.5506 0.1201 54.54 0.000 S = 1810 R-Sq = 95.6% R-Sq(adj) = 95.5% Analysis of Variance Source DF SS MS F P Regression 2 9825912049 4912956024 1499.47 0.000 Residual Error 138 452151666 3276461 19 Total 140 10278063714 The Standardized Multiple Regression Model 20 The Standardized Multi. Regression Model Why necessary? - Round-off errors in normal equations calculations (especially when inverting a large, X ́X matrix. What is the size of this inverse for say Y=b0+b1X1….+b7X7) - Lack of comparability of coefficients in regression models (differences in units involved) - Especially important in presence of multicollinearity. The X ́X matrix is almost close to zero in this case. OK. So we have a problem. How do we take care of it? - The Correlation Transformation: - Centering: Take the difference between each observation and the average… AND… - Scaling: Dividing the centered observation by the standard deviation of the variable. You must have noticed that this is nothing but regular 21 “standardization”? What’s the twist? See next slide The Standardized Multi. Regression Model Standardization Yi Y sY X ik X , (k 1,, p 1) sk Correlation Transformation 1 Yi Y Yi ' n 1 sY 1 X ik X Xi' , (k 1,, p 1) n 1 sk 22 The Standardized Multi. Regression Model Once we have performed the Correlation Transformation, then all that remains is to obtain the new regression parameters. The standardized regression model is: Yi 1X i1 p 1 X i, p 1 i where, the original parameters can be had from the transformation, sY k k , k 1,, p 1 and 0 Y 1 X1 p 1 X p 1 sk In Matrix Notation we have some interesting relationships: XX rXX correlation matrix of the (untransformed)X variables ( p 1)( p 1) XY rYX correlation matrix of (untransformed)Y and X ( p 1)1 23 WHY? Is this surprising? An Example Part of the original (unstandardized) data set X1 X2 X3 Y 1384 9387 72100 69678 4069 7031 52737 39699 3719 7017 54542 43292 3553 4794 33216 33731 3916 4370 32906 24167 2480 3182 26573 16751 2815 3033 25663 16941 The regression equation is Y = 236 - 0.203 X1 + 9.09 X2 - 0.330 X3 Predictor Coef StDev T P Constant 236.1 254.5 0.93 0.355 X1 -0.20286 0.05894 -3.44 0.001 X2 9.090 1.718 5.29 0.000 X3 -0.3303 0.2229 -1.48 0.141 S = 1802 R-Sq = 95.7% R-Sq(adj) = 95.6% Analysis of Variance Source DF SS MS F P Regression 3 9833046236 3277682079 1009.04 0.000 Residual Error 137 445017478 3248303 24 Total 140 10278063714 An Example (continued) Standardized … and then Correlation Transformed X1 X2 X3 Y X1 X2 X3 Y -0.42915 6.55841 7.41621 6.63363 -0.036269 0.554287 0.626784 0.560644 0.53462 4.72871 3.91736 4.67596 0.045183 0.399649 0.331077 0.395190 0.40899 4.71784 4.33670 4.85845 0.034566 0.398730 0.366518 0.410614 0.34940 2.99143 3.22083 2.70231 0.029530 0.252822 0.272210 0.228387 0.47970 2.66214 2.10462 2.67097 0.040542 0.224992 0.177873 0.225738 -0.03574 1.73953 1.23910 2.03068 -0.003021 0.147017 0.104723 0.171624 0.08450 1.62381 1.26127 1.93867 0.007142 0.137237 0.106597 0.163848 25 -0.48873 1.35588 1.86770 1.52021 -0.041305 0.114593 0.157849 0.128481 An Example (continued) The regression equation is Y’ = - 0.00000 - 0.0660 X1’ + 1.37 X2’ - 0.381 X3’ Predictor Coef StDev T P Constant -0.000000 0.001497 -0.00 1.000 X1’ -0.06596 0.01916 -3.44 0.001 X2’ 1.3661 0.2582 5.29 0.000 X3’ -0.3813 0.2573 -1.48 0.141 S = 0.01778 R-Sq = 95.7% R-Sq(adj) = 95.6% Analysis of Variance Source DF SS MS F P Regression 3 0.95670 0.31890 1009.04 0.000 Residual Error 137 0.04330 0.00032 Total 140 1.00000 Compare to the regression model obtained from the untransformed variables, what can we say about the two models? Is there a difference in predictive power, or is there a difference in “ease of interpretation”? 26 Why is b0=0 ? Just by chance? Multicollinearity One of the assumptions of the OLS model is that the predictor variables are uncorrelated. When this assumption is not satisfied, then multicollinearity is said to exist.(Think about Venn Diagrams for this) Note that multicollinearity is strictly a sample phenomenon. We may try to avoid it by doing controlled experiments, but in most social sciences research, this is very difficult to do. Let us first, consider the case of uncorrelated predictor variables, i.e., no multicollinearity. -Usually occurs in controlled experiments -In this case the R2 between each pair of variables is zero -The ESS for each variable is the same as when the variable 27 is regressed alone on the response variable. An Example X1 X2 Y The regression equation is Y = - 4.73 + 0.107 X1 + 3.75 X2 1 2 1 Predictor Coef StDev T P 2 2 5 Constant -4.732 4.428 -1.07 0.334 X1 0.1071 0.3537 0.30 0.774 3 3 7 X2 3.750 1.621 2.31 0.069 4 3 8 S = 2.292 R-Sq = 52.1% R-Sq(adj) = 33.0% 5 3 4 Analysis of Variance 6 3 9 Source DF SS MS F P Regression 2 28.607 14.304 2.72 0.159 7 2 5 Residual Error 5 26.268 5.254 Total 7 54.875 8 2 2 Source DF Seq SS X1 1 0.482 X2 1 28.125 X2 1 28.125 X1 1 0.482 28 An Example (continued) The regression equation is Y = 4.64 + 0.107 X1 Predictor Coef StDev T P Constant 4.643 2.346 1.98 0.095 X1 0.1071 0.4646 0.23 0.825 S = 3.011 R-Sq = 0.9% R-Sq(adj) = 0.0% Analysis of Variance Source DF SS MS F P Regression 1 0.482 0.482 0.05 0.825 Residual Error 6 54.393 9.065 Total 7 54.875 Source DF Seq SS X1 1 0.482 The regression equation is X2 1 28.125 Y = - 4.25 + 3.75 X2 X2 1 28.125 Predictor Coef StDev T P X1 1 0.482 Constant -4.250 3.807 -1.12 0.307 X2 3.750 1.493 2.51 0.046 (From previous slide) S = 2.111 R-Sq = 51.3% R-Sq(adj) = 43.1% Analysis of Variance Source DF SS MS F P Regression 1 28.125 28.125 6.31 0.046 Residual Error 6 26.750 4.458 29 Total 7 54.875 Multicollinearity (Effects of) The regression coefficient or any independent variable cannot be interpreted as usual. One has to take into account which other correlated variables are included in the model. The predictive ability of the overall model is usually unaffected. The ESS are usually reduced to a great extent. The variability of OLS regression parameter estimates is inflated. (Let us see an intuitive reason for this based on a model with p-1=2) b ( XX ) 2 2 1 2 1 r12 1 r12 r12 2 1 Note that the standardized regression coefficients have equal standard deviations. Will this be the case even when p-1=3? Or is this just a special case scenario. 30 Multicollinearity (Effects of) High R2 , but few significant t-ratios (By now, you should be able to guess the reason for this) Wider individual confidence intervals for regression parameters (This is obvious based on what we discussed on the 2 earlier slide) 0 1 e.g. What would you conclude based on the above picture? 31 Multicollinearity (How to detect it?) High R2 (>0.8) , but few significant t-ratios Caveat: There is a particular situation when the above is caused w/out any multicollinearity. Thankfully this situation never arises in practice High pair-wise correlation (>0.8) between independent variables Caveat: This is a sufficient, but not necessary condition. For example consider the case where, rX1X2=0.5, rX1X3=0.5 and rX2X3=-0.5. We may conclude, no multicollinearity. However, we find that R2=1 when we regress X1 on X2 and X3 together. This means that X1 is a perfect linear combination of the two other independent variables. In fact the formula for the R2 is given as, and one can readily verify that the numbers satisfy this equation. rX1 X 2 rX1 X 3 2rX1 X 2 rX1 X 3 rX 2 X 3 2 2 RX 1 . X 2 X 3 2 1 rX 2 X 3 2 Due to the above caveat, always examine the partial correlation coefficients. 32 Multicollinearity (How to detect it?) Run auxiliary regressions, i.e . Regress each of the independent variables on the other independent variables taken together and conclude if it is correlated to the other or not based on the R2. The test statistic is, RX i . X 1 ,.... X i 1 , X i 1 ,...... X p 1 / p 2 2 FX i (1 RX i . X 1 ,.... X i 1 , X i 1 ,...... X p 1 ) /(n p 1) 2 The Condition Index (CI): Maximum Eigen Value If, 10 CI 30 Moderate to Strong Minimum Eigen Value multicollinearity. CI > 30 means severe multicollinearity. 33 Multicollinearity (What is the remedy?) Rely on joint confidence intervals rather than individual ones 2 0 1 A priori information of relationship between some independent variables? Then include it! For example: b1=b2 is known. Then use this in the regression model which then becomes, Y=b0 + b2X, (where, X=X2+ X1) Data Pooling (Usually done by combining cross-sectional and time series data. Time series data is notorious for multicollinearity) 34 Multicollinearity (What is the remedy?) Delete a variable which is causing problems Caveat: Beware of specification bias. This arises when a model is incorrectly specified. For example, in order to explain consumption expenditure, we may only include income and drop wealth since it highly correlated to income. However economic theory may postulate that you use both variables. First difference transformation of variables from time series data The regression in run on differences between successive values of variables rather than the original variables. (Xi,1-Xi+1,1) and (Xi,2-Xi+1,2) etc. The logic is that even if X1 and X2 are correlated, there is no reason for their first differences to be correlated too. Caveat: Beware of autocorrelation which usually arises due to this procedure. Also, we lose one degree of freedom due to the difference procedure. Correlation transformation Getting a new sample (Why?) and/or increasing sample size (Why?) Factor Analysis, Principal Components Analysis, Ridge Regression 35 An Example 10000 Pop. 5000 X1 0 0 10000 20000 30000 40000 50000 60000 70000 X2 Income rX1 X 2 .997 36 An Example (continued) The regression equation is Y = - 0.032 + 6.99 X1 - 0.064 X2 Predictor Coef StDev T P Constant -0.0322 0.2516 -0.13 0.898 X1 6.986 1.667 4.19 0.000 X2 -0.0640 0.2171 -0.29 0.769 S = 1.872 R-Sq = 95.3% R-Sq(adj) = 95.2% Analysis of Variance High R2 Low t-value for b2 Source DF SS MS F P Regression 2 9794.6 4897.3 1397.80 0.000 Low ESS for X2 Residual Error 138 483.5 3.5 (i.e.SSR(X2|X1)) Total 140 10278.1 Clearly, X2 contributes Source DF Seq SS little to the model. X1 1 9794.3 Really? Look at X2 1 0.3 SSR(X2) …..its humungous !! Source DF Seq SS X2 1 9733.1 Clear case of Multi.coll. X1 1 61.5 Of course we knew that rX1X2=0.997. This should Predicted Values have made us suspect that something was Fit StDev Fit 95.0% CI 95.0% PI amiss. 12.020 3.351 (5.394, 18.646) (4.431, 19.609) 37 Multicollinearity (Specification Bias) Types of Specification Errors Omitting a relevant variable Including an unnecessary or irrelevant variable Incorrect functional form Errors of measurement bias Incorrect specification of stochastic error term (This is a model mis-specification error) More on omitting a relevant variable (under-fitting) True Model: Yi = 0+1Xi1+2 Xi2+ ui Fitted Model : Yi = 0+1 Xi1+ ni Consequences of omission: 1. If r12 is non-zero then the estimators of 0 and 1 are biased and inconsistent 2. Variance of estimator of 1 is biased estimate of variance of estimator of 1 3. 2 is incorrectly estimated and CIs, hypothesis tests are misleading 4. E(Estimator of 1)= 12 b21 38