VIEWS: 16 PAGES: 3 POSTED ON: 11/7/2011 Public Domain
Economics 3640-001 Instructor: Sanghoon Lee Lecture 31 * Reading Assignment: Ch.9 Simple Linear Regression (9.4 ~ 9.5) 9.4 An Estimation of σ2 The greater the variability of the random error ε (which is measured by its variance σ2), the greater will be the errors in the estimation of the model parameters β0 and β1 and in the error of prediction when ^ is used to predict y for y some value of x. In most practical situations, σ2 is unknown and we must use our data to estimate its value. * (n-2)df for β0 and β1. Ex. 9.32 Visually compare the scattergrams. If a least squares line were determined for each data set, which do you think would have the smallest variance, s2? The graph in b would have the smallest s2 because the width of the data points is the smallest. Ex. 9.34 Suppose you fit a least squares line to 12 data points and the calculated value of SSE is .429. a. Find s2, the estimator of σ2 (the variance of the random error term ε). s2 = SSE / (n-2) = .0429. b. Find s, the estimator of σ. s = (s2)1/2 = .2071. c. What is the largest deviation that you might expect between any one of the 12 points and the least squares line? We would expect most (95%) of the observations to be within 2s of the least squares line. 2s = .414 Ex. 9.42 To improve the quality of the output of any production process, it is necessary first to understand the capabilities of the process. In a particular manufacturing process, the useful life of a cutting tool is linearly related to the speed at which the tool is operated. The data in the table were derived from life tests for the two different brands of cutting tools currently used in the production process. For which brand would you feel more confident in using the least squares line to predict useful life for a given cutting speed? Scatterplots of the two sets of data are: - Since the data points for B are not spread apart as much as those for A, it appears that B would be a better predictor for useful life than A. - For A, ^ = 6.62 – .0727x, s = 1.211 y - For B, ^ = 9.31 – .1077x, s = .610 y - Since the standard deviation (s = .610) for B is smaller than the standard deviation for A (s = 1.211), B would be a better predicted for the useful life for a given cutting speed. 1 Eg. Consumption and Income: Econ.3640.Spring.2005.Simple.Linear.Regression.01.xls, Lecture31(1) 9.5 Assessing the Utility of the Model: Making Inferences about the Slope β1 Simple linear regression model: ^=^ +^ x y β β 0 1 H0: β1 = 0 : the model contributes no information for the prediction of y Ha: β1 ≠ 0 : the linear model is useful for predicting y We usually don't know σ, and thus the appropriate test statistic is a t-statistic (n-2)df : 2 Ex. 9.48 Consider the following pairs of observations: x 1 5 3 2 6 6 0 y 1 3 3 1 4 5 1 a. Construct a scattergram for the data. b. use the method of least squares to fit a straight line to the seven data points in the table. c. Plot the least squares line on your scattergram of part a. ∑x = 23, ∑x2 = 111, ∑xy = 81, ∑y = 18, ∑y2 = 62 SSxy= ∑xy – (∑x∑y)/n = 21.85714286 SSxx = ∑x2 – (∑x)2/n = 35.42857143 SSyy = ∑y2 – (∑y)2/n = 15.71428571 ^ = SS / SS = .616935483 ≈ .617 β1 xy xx β0 y β1─ ^ = ─ – ^ x = .544354838 ≈ .544 The least squares line is ^ = .544 + .617x. y d. Specify the null and alternative hypotheses you would use to test whether the data provide sufficient evidence to indicate that x contributes information for the (linear) prediction of y. H0: β1 = 0, Ha: β1 ≠ 0. e. What is the test statistics that should be used in conducting the hypothesis test of part d? Specify the degrees of freedom associated with the test statistic. t = ^ / {s / (SS )1/2} = 5.50 β 1 xx where SSE = SSyy - ^1SSxy = 2.22983872, s2 = SSE/(n-2) = .44596774, s = .6678. β f. Conduct the hypothesis test of part d using α = .05. The rejection region requires α/2 = .025 in each tail of the t distribution with df = 5: t.025 = 2.571 (cf. Table IV, Appendix A). The rejection region: t < -2.571, t > 2.571. Since the observed value of the test statistic falls in the rejection region (t = 5.50 > 2.571), H 0 is rejected. There is sufficient evidence to indicate x contributes information for the linear prediction of y at α = .05. Eg. Consumption and Income: Econ.3640.Spring.2005.Simple.Linear.Regression.01.xls, Lecture31(2) 3