Embed
Email

Prediction_Error

Document Sample

Shared by: linzhengnd
Categories
Tags
Stats
views:
0
posted:
11/27/2011
language:
English
pages:
4
Demonstration





Prediction Error



How much can we trust a prediction made for an individual using regression

analysis? Compare the regression model with the prediction equation, and

you'll see that there are two potential sources of error in a prediction:

(1) The model coefficients might have been misestimated (due to sampling

error), or (2) the true residual of the individual for whom the prediction is

being made might be different from zero.



Y = a +b X +K+b X

1 1 k k + e

Ypred = a + b1X1 + K + b k X k + 0

The standard error of the regression measures your exposure to errors

of the second type. But the impact of errors of the first type depends on the

values of the independent variables for which the prediction is being made.

To see this, we'll run an experiment.



We'll take a population for which we know the true model coefficients.

(Actually, we'll create such a population via simulation.) Then we'll draw a

sample from that population, and compare the estimated regression equation

to the true one.



The relationship we'll study will involve only one independent variable. That

way, we'll be able to "see" what's happening on a chart. The true regression

equation is: Y = 100 + 4X . The true mean value of X is 20, and the

standard deviation of X, as well as the standard deviation of the residual

term, are given below:



2 standard deviation of X

6 standard deviation of residual



Now, imagine 10 investigators, each collecting a sample of size 25 from this

population and estimating the true relationship. The results of their ten

studies are listed in the table below, and the corresponding estimated

regression lines are plotted in the chart, overlayed by the true (red) regression

line.



alpha beta The True Regression Line,

100.00 4.00 and Ten Estimates

210

a b 200

82.71 4.93

105.51 3.68 190

97.75 4.10 180

99.95 4.04

Y









113.14 3.35 170

119.39 2.98 160

86.26 4.66

95.88 4.20 150

125.61 2.69

140







Page 1

Demonstration



140

117.73 3.08 15 17 19 21 23 25



X









Some of the estimated lines are too steep. Some are too level. But the

important thing to note is that the estimated lines are usually closest to the

true line near X = 20, and move further away as the values of X move further

from 20. That is, a + bX is typically a less-reliable estimate of the true

height of the regression line for larger values of |X - 20|. Press the

"Resample" button a few times to redraw the ten separate samples and see

how the results of additional studies compare to the true situation.



The prediction equation can be viewed in two somewhat-different ways. For

any specific values of the independent variables, it provides an estimate of

the mean value of the dependent variable across the subpopulation of

individuals for whom the independent variables take those values. And, of

course, it also provides a prediction for any one such individual.



The standard error of the estimated mean measures uncertainty due to

sampling error in the "mean value" estimate for the subpopulation. It is this

uncertainty we see in the chart - uncertainty about the true value of

a +b X +K+b X

1 1 k k

In a simple linear regression, the standard error of the estimated mean takes

the value

se ×

1

+

(X - X ) 2





n ( n - 1) × s 2

X



(the first factor is the standard error of the regression). The formula itself

isn't all that important, since it only applies to a simple linear regression (when

there are two or more independent variables, the corresponding formula is

quite ugly), and since any decent regression-analysis software will compute

it for us.



But the formula does serve to illustrate two important points, both of which

remain true even when there is more than one independent variable.:



1. For any given sample size, the standard error of the estimated mean

grows as the independent variables take values further from the most-

typically-observed (combination of) values.



When there is more than one independent variable, iIt's not just a matter of

the distance from the mean values of the independent variables. Hidden

extrapolation - when each independent variable takes a not-atypical value,

but the combination of values is atypical - can also increase the standard

error of the estimated mean.



2. For any given values of the independent variables, the standard

error of the estimated mean decreases as we increase our sample

size. This is as it should be, since it measures our exposure to

sampling error in making our estimate.





Page 2

Demonstration







In order to construct confidence intervals for the mean value of the

dependent variable, given values for the independent variables, we simply

use the prediction equation to make the estimate, and the standard error of

the estimated mean to compute the margin of error in the estimate.



When we use the prediction equation to make a prediction for an individual,

we must combine the standard error of the estimated mean with the standard

error of the regression. (The method of combination, since they are

independent sources of potential error, is to convert each to a variance by

squaring, add the variances, and then take a square root to get back to a

standard deviation again.) The result is the standard error of the prediction.



The standard error of the prediction therefore consists of two components.

One (the standard error of the estimated mean) can be reduced by

increasing the sample size. The other (the standard error of the regression)

can be reduced only by including new (and relevant) independent variables

in our model.



We're done! But if you'd like to test your intuition, go back up to the chart,

and ask yourself whether an increase in the standard deviation of X, or in

the standard error of the regression, wolud increase or decrease the spread

of the estimated regression lines around the true line. Then change either of

the two numbers in the yellow cells, and hit the resample button a few times

to see the effect. An explanation of what you see is given below (to keep

from giving away the answer, I've placed it down about 40 lines).









Page 3

Demonstration









Increasing the standard deviation of X will tighten up the estimated lines

around the true line. By spreading out the possible values of X, we make

values near the left and right sides of the chart less atypical. (This can also

be seen directly from the formula for the standard error of the estimated

mean: We're increasing the denominator of the second term inside the

square root.)



Increasing the standard error of the regression will widen the spread of the

estimated lines. With more "noise" in the model (and the same sample

size), our estimates of the true coefficients will become less reliable. (This

can also be seen from the formula, which has the standard error of the

regression as its leading factor outside the square root.)









Page 4



Other docs by linzhengnd
F_Rehab
Views: 0  |  Downloads: 0
affirmative asylum
Views: 1  |  Downloads: 0
er-oz_spor_malzemeleri__fiyatlar_a_dan_z_ye
Views: 19  |  Downloads: 0
Questions to homeworks 1 and 2
Views: 0  |  Downloads: 0
_FP7_partnerkeres__int_zm_nyek_honlapra
Views: 0  |  Downloads: 0
200811251358390.November 24_ 2008
Views: 0  |  Downloads: 0
2nd Grade Summaries Theme 3
Views: 1  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!