VIEWS: 19 PAGES: 15 POSTED ON: 6/25/2012 Public Domain
Multiple Regression 7.1 • More than one explanatory/independent variable yt 1 2 x2t 3 x3t ... k xkt et yt E ( yt ) et • This makes a slight change to the interpretation of the coefficients • This changes the measure of degrees of freedom • We need to modify one of the assumptions EXAMPLE: trt = 1 + 2 pt + 3 at + e EXAMPLE: qdt = 1 + 2 pt + 3 inct + et EXAMPLE: gpat = 1 + 2 SATt + 3 STUDYt + et Interpretation of Coefficient 7.2 yt 1 2 x2t 3 x3t et dy 2 dx2 x3 dy 3 dx3 x2 • 2 measures the change in Y from a change in X2, holding X3 constant. • 23 measures the change in Y from a change in X3, holding X2 constant. Assumptions of the Multiple Regression Model 7.3 1. The Regression Model is linear in the parameters and error term yt = 1 + 2 x2t + 3 x3t + … k xkt +et 2. Error Term has a mean of zero: E(e) = 0 E(y) = 1 + 2 x2t + 3 x3t + … k xkt 3. Error term has constant variance: Var(e) = E(e2) = 2 4. Error term is not correlated with itself (no serial correlation): Cov(ei,ej) = E(eiej) = 0 ij 5. Data on x’s are not random (and thus are uncorrelated with the error term: Cov(X,e) = E(Xe) = 0) and they are NOT exact linear functions of other explanatory variables. 6. (Optional) Error term has a normal distribution. E~N(0, 2) Estimation of the Multiple Regression Model 7.4 • Let’s use a model with 2 independent variables: yt 1 2 x2t 3 x3t et • A scatterplot of points is now a scatter “cloud”. We want to fit the best “line” through these points. In 3 dimensions, the line becomes a plane. • The estimated “line” and a residual are defined as before: yt b1 b2 x2t b3 x3t ˆ et yt yt ˆ ˆ • The idea is to choose values for b1, b2, and b3 such that the sum of squared residuals is minimized. et2 ( yt yt ) 2 ˆ ˆ 7.5 ( yt b1 b2 x2t b3 x3t ) 2 From here, we minimize this expression with respect to b1, b2, and b3. We set these three derivatives equal to zero and Solve for b1, b2, b3. We get the following formulas: b2 yt* x2t x3t2 yt* x3t x2t x3t * * * * * * x22 t x3t2 * * * 2 x2t x3t b3 yt* x3t x22 yt* x2t x2t x3t * * t * * * Where: x2t x3t x2t x3t ( yt y ) * * 2 * *2 *2 yt x2t ( x2t x2 ) * b1 y b2 x2 b3 x3 x3t ( x3t x3 ) * [What is going on here? In the formula for b2, notice that if 7.6 x3 where omitted from the model, the formula reduces to the familiar formula from Chapter 3.] You may wonder why the multiple regression formulas on slide 7.5 aren’t equal to: b2 ( yt y )(x 2t x2 ) ( x 2t x2 ) 2 b3 ( yt y )(x3t x3 ) ( x 3t x3 ) 2 We can use a Venn diagram to illustrate the idea of Regression as 7.7 Analysis of Variance For Bivariate (Simple) Regression y x For Multiple Regression y x2 x3 Example of Multiple Regression 7.8 Suppose we want to estimate a model of home prices using data on the size of the house (sqft), the number of bedrooms (bed) and the number of bathrooms (bath). We get the following results: ˆ pri cet 129.062 0.1548sqftt 21.588bedt 12.193batht How does a negative coefficient estimate on bed and bath make sense? 7.9 Expected Value E (b1 ) 1 We will omit the proofs. The Least E (b2 ) 2 Squares estimator for multiple regression is unbiased, regardless of the number of E (b3 ) 3 independent variables Variance Formulas With 2 Independent Variables 2 Where r23 is the correlation Var(b2 ) (1 r23 ) ( x2t x2 ) 2 2 between x2 and x3 and the parameter 2 is the variance 2 of the error term. Var(b3 ) (1 r23 ) ( x3t x3 ) 2 2 2 We need to estimate using the formula 2 ˆ et2 This estimate has T-k degrees of freedom. T k Gauss Markov Theorem 7.10 Under the assumptions 1-5 (the 6th assumption isn’t needed for the theorem to be true) of the linear regression model, the least squares estimators b1, b2, …bk have the smallest variance of all linear and unbiased estimators of 1 , 2,… k. They are the BLUE (Best, linear, unbiased, estimator) Confidence Intervals and Hypothesis Testing 7.11 • The methods for constructing confidence intervals and conducting hypothesis tests are the same as they were for simple regression. • The format for a confidence interval is: bi tc se(bi ) Where tc depends on the level of confidence and has T-k degrees of freedom. T is the number of observations and k is the number of independent variables plus one for the intercept. • Hypothesis Tests: t bi i Ho : i = c H1 : i c se(bi ) Use the value of c for i when calculating t. If t > tc or t < - tc reject Ho If c is 0, then we call it a test of significance. Goodness of Fit 7.12 • R2 measures the proportion of the variance in the dependent variable that is explained by the independent variable. Recall that R 2 SSR 1 SSE 1 et2ˆ SST SST ( yt y ) 2 • Least Squares chooses the line that produces the smallest sum of squared residuals, it also produces the line with the largest R2. It also has the property that the inclusion of additional independent variables will never increase and will often lower the sum of squared residuals, meaning that R2 will never fall and will often increase when new independent variables are added, even if the variables have no economic justification. • Adjusted R2: adjust R2 for degrees of freedom R 1 2 et2 /(T k ) ˆ ( yt y ) 2 /(T 1) Example: Grades at JMU 7.13 A sample of 55 JMU students was taken Fall 2002. Data on • GPA • SAT scores • Credit Hours Completed • Hours of Study per Week • Hours at a Job per week • Hours at Extracurricular Activites Three models were estimated: gpat = 1 + 2 SATt + et gpat = 1 + 2 SATt + 3 CREDITSt + 4 STUDYt + 5JOBt + 6 ECt + et gpat = 1 + 2 SATt + 3 CREDITSt + 4 STUDYt + 5JOBt + et 7.14 Regression Statistics Here is our simple Multiple R 0.270081672 Regression model. R Square 0.072944109 Adjusted R Square 0.055452489 Standard Error 0.455494651 Observations 55 Coefficients Standard Error t Stat P-value Intercept 1.799734629 0.631013633 2.852132721 0.006178434 SAT Total 0.001057599 0.000517894 2.042114498 0.046129915 Regression Statistics Here is our multiple Multiple R 0.465045986 regression model. R Square 0.216267769 Adjusted R Square 0.136295092 Both R2 and Standard Error 0.4355661 Adjusted R2 have Observations 55 increased with the inclusion of Coefficients Standard Error t Stat P-value 4 additional indep. Intercept 1.538048132 0.686920438 2.239048435 0.029726041 SAT Total 0.000943615 0.000503684 1.873427614 0.066980023 variables. 0.003382201 Credit Hours (completed) 0.002664489 1.269361823 0.210308107 Study (Hrs/wk) 0.010762866 0.006782587 1.586837918 0.118982276 Job (Hrs/wk) -0.00795843 0.006452798 -1.23333021 0.22333641 EC (Hrs/wk) 0.002606617 0.009216993 0.282805582 0.778517157 7.15 Regression Statistics Multiple R 0.463668569 Notice that the R Square 0.214988542 Adjusted R Square 0.152187625 Exclusion of EC Standard Error 0.431540195 increases adjusted Observations 55 R2 but reduces R2 Intercept Coefficients Standard Error 1.53616637 0.680539354 t Stat 2.257277792 P-value 0.028390614 SAT Total 0.000951057 0.000498346 1.908425924 0.062085531 0.003441283 Credit Hours (completed) 0.002631734 1.307610208 0.196986336 Study (Hrs/wk) 0.011267319 0.006483348 1.737885925 0.088386942 Job (Hrs/wk) -0.0080306 0.006388155 -1.25710723 0.214555464