Economics 3640 001 (DOC download)

Document Sample
Economics 3640 001 (DOC download) Powered By Docstoc
					Economics 3640-001                                                                              Instructor: Sanghoon Lee

                                                               Lecture 31

* Reading Assignment: Ch.9 Simple Linear Regression (9.4 ~ 9.5)

9.4 An Estimation of σ2

The greater the variability of the random error ε
(which is measured by its variance σ2), the
greater will be the errors in the estimation of
the model parameters β0 and β1 and in the error
of prediction when ^ is used to predict y for
some value of x.

In most practical situations, σ2 is unknown and we must
use our data to estimate its value. * (n-2)df for β0 and β1.

                                                 Ex. 9.32 Visually compare the scattergrams. If a least squares line
                                                 were determined for each data set, which do you think would have
                                                 the smallest variance, s2?

                                                 The graph in b would have the smallest s2 because the width of the
                                                 data points is the smallest.

Ex. 9.34 Suppose you fit a least squares line to 12 data points and the calculated value of SSE is .429.
a. Find s2, the estimator of σ2 (the variance of the random error term ε). s2 = SSE / (n-2) = .0429.
b. Find s, the estimator of σ.         s = (s2)1/2 = .2071.
c. What is the largest deviation that you might expect between any one of the 12 points and the least squares
line? We would expect most (95%) of the observations to be within 2s of the least squares line. 2s = .414

Ex. 9.42 To improve the quality of the output of any production process,
it is necessary first to understand the capabilities of the process. In a
particular manufacturing process, the useful life of a cutting tool is
linearly related to the speed at which the tool is operated. The data in the
table were derived from life tests for the two different brands of cutting
tools currently used in the production process. For which brand would
you feel more confident in using the least squares line to predict useful
life for a given cutting speed?
Scatterplots of the two sets of data are:
                                                     - Since the data points for B are not spread apart as much
                                                     as those for A, it appears that B would be a better predictor
                                                     for useful life than A.
                                                     - For A, ^ = 6.62 – .0727x, s = 1.211
                                                               - For B, ^ = 9.31 – .1077x, s = .610
                                                               - Since the standard deviation (s = .610) for B is smaller
                                                               than the standard deviation for A (s = 1.211), B would be a
                                                               better predicted for the useful life for a given cutting speed.

Eg. Consumption and Income: Econ.3640.Spring.2005.Simple.Linear.Regression.01.xls, Lecture31(1)

9.5 Assessing the Utility of the Model: Making Inferences about the Slope β1
Simple linear regression model:    ^=^ +^ x
                                   y β β    0    1

H0: β1 = 0         : the model contributes no information for the prediction of y
Ha: β1 ≠ 0         : the linear model is useful for predicting y

We usually don't know σ, and thus the appropriate test statistic is a t-statistic (n-2)df :

Ex. 9.48 Consider the following pairs of observations:
  x         1        5         3         2         6          6       0
  y         1        3         3         1         4          5       1

a. Construct a scattergram for the data. b. use the
method of least squares to fit a straight line to the
seven data points in the table. c. Plot the least
squares line on your scattergram of part a.
∑x = 23, ∑x2 = 111, ∑xy = 81, ∑y = 18, ∑y2 = 62
SSxy= ∑xy – (∑x∑y)/n = 21.85714286
SSxx = ∑x2 – (∑x)2/n = 35.42857143
SSyy = ∑y2 – (∑y)2/n = 15.71428571
^ = SS / SS = .616935483 ≈ .617
β1     xy       xx

β0 y β1─
^ = ─ – ^ x = .544354838 ≈ .544

The least squares line is ^ = .544 + .617x.
d. Specify the null and alternative hypotheses you would use to test whether the data provide sufficient
evidence to indicate that x contributes information for the (linear) prediction of y. H0: β1 = 0, Ha: β1 ≠ 0.
e. What is the test statistics that should be used in conducting the hypothesis test of part d? Specify the
degrees of freedom associated with the test statistic. t = ^ / {s / (SS )1/2} = 5.50
                                                            β     1        xx

where SSE = SSyy - ^1SSxy = 2.22983872, s2 = SSE/(n-2) = .44596774, s = .6678.
f. Conduct the hypothesis test of part d using α = .05. The rejection region requires α/2 = .025 in each tail of
the t distribution with df = 5: t.025 = 2.571 (cf. Table IV, Appendix A). The rejection region: t < -2.571, t > 2.571.
Since the observed value of the test statistic falls in the rejection region (t = 5.50 > 2.571), H 0 is rejected.
There is sufficient evidence to indicate x contributes information for the linear prediction of y at α = .05.

Eg. Consumption and Income: Econ.3640.Spring.2005.Simple.Linear.Regression.01.xls, Lecture31(2)


Shared By: