Chapter 15 - Inference for Regression

Document Sample

```					                          Chapter 15
Inference for Regression
AP Statistics
Hamilton

HW: 15.1, 15.2, 15.5, 15.8, 15.10, 15.11,
15.14, 15.26
The Regression Model
• We have learned that when a scatterplot shows a
linear relationship between a quantitative
explanatory variable x and a quantitative response
variable y, we can use the least-squares regression
line fitted to the data to predict y for a given value
of x.
• Now, we want to do tests and confidence intervals
in this setting.
Crying and IQ
• Infants who cry easily may be more easily
stimulated than others. This may be a sign of
higher IQ. Child development researchers explored
the relationship between the crying of infants four
to ten days old and their later IQ test scores. A snap
of a rubber band on the sole of the foot caused the
infants to cry. The researchers recorded the crying
and measured its intensity by the number of peaks
in the most active 20 seconds. They later measured
the children’s IQ at age three years using the
Stanford-Binet IQ test. The data for 38 infants are
given on the next slide.
Crying and IQ
Crying   IQ    Crying      IQ    Crying   IQ    Crying   IQ
10      87     20         90     17      94     12      94
12      97     16         100    19      103    12      103
9      103    23         103    13      104    14      106
16      106    27         108    18      109    10      109
18      109    15         112    18      112    23      113
15      114    21         114    16      118     9      119
12      119    12         120    19      120    16      124
20      132    15         133    22      135    31      135
16      136    17         141    30      155    22      157
33      159    13         162
Crying and IQ
• Who? – All we know is that the individuals are 38
infants who were studied when they were 4 to 10
days old and then again when they were 3 years
old.
• What? – The explanatory variable is crying intensity
and the response variable is children’s IQ.
• Why? – Researchers wanted to see if there is an
association between crying activity in early infancy
and IQ at age 3 years.
• When, where, how, and by whom? – The data come
from an experiment described in 1964 in the journal
Child Development.
Crying and IQ
data, in this case a scatterplot.
• What are the form, direction and strength of the
relationship as well as any deviations from the
pattern (outliers, influential observations) in the
scatterplot?
• From the scatterplot, there appears to be a
moderate positive linear relationship with no
extreme values or influential observations.
• To get a better idea of the strength, we find that r is
0.455 for our data. This confirms a moderate
positive association. What is r2 and what does it tell
us?
Crying and IQ
• The least-squares regression line of IQ scores (y) on
crying intensity (x) is given by the equation

• Since r2 is 0.207, only about 21% of the variation of
IQ scores is explained by crying intensity.
• Therefore, prediction of IQ scores from crying
intensity will not be very accurate.
Conditions for the Regression Model
• The slope b and y-intercept a of the least-squares
line are statistics. That is, we calculate them from
sample data.
• To do formal inference, we think of a and b as
estimates of unknown parameters.
• The conditions for performing inference are on the
next slide.
• The idea is shown in the graph on the next page.
The basic idea is that we expect y values to vary
according to a normal distribution.
• As you can see, the y values are centered on the
true regression line for the population and vary
according to a Normal distribution.
Checking Conditions
• The observations must be independent. In particular,
we cannot use repeated observations on the same
individual.
• The true relationship is linear. We can’t observe the
true regression line, so we will almost never see a
perfectly straight line. So we look at the scatterplot
and a residual plot to make sure that a line appears to
be a good fit.
• The standard deviation of the response variable about
the true line is the same everywhere. Looking at the
scatterplot again, the scatter of the data points about
the line should be roughly the same over the range of
all the data. Looking at a residual plot is another way
to do this.
Checking Conditions
• The response varies Normally about the true regression
line. We can’t observe the true regression line. What
we can observe is the least-squares regression line and
the residuals. The residuals estimate the deviations of
the response from the true regression line, so they
should follow a Normal distribution. Make a histogram
or stemplot of the residuals and check for clear
skewness or other major departures from Normality. It
turns out that inference for regression is not very
sensitive to a minor lack of Normality, especially when
we have many observations. Do beware of influential
points which move the regression line and can greatly
affect the results of inference.
Checking Conditions
• Fortunately, it is not hard to check for gross
violations of these conditions for regression
inference.
• Since checking conditions uses residuals, most
regression software will calculate and save the
residuals for you.
• We will talk about how to have the calculator find
the residuals for you on the next slide.
Calculating Residuals
2. Go up to where the listname is (L1, L2, etc.).
3. Now, scroll to your right until you get a list with no
name, just dashes.
4. Now, hit 2nd and then Stat to go to all lists.
5. Scroll down and highlight beside RESID and hit
enter.
6. Hit enter one more time and it is now there.
7. Now, every time you do a regression, the residuals
will automatically be listed here for you.
Crying and IQ
• Let’s find the residuals for our data.
• To do this, we just need to run a linear regression on
our data and the residuals will automatically be stored
in RESID for us.
• The 38 residuals are listed below.
-19.20   -31.13   -22.65    -15.18   -12.18   -15.15   -16.63   -6.18
-1.70   -22.60   -6.68      -6.17   -9.15    -23.58   -9.14    2.80
-9.14   -1.66    -6.14     -12.60   0.34     -8.62    2.85     14.30
9.82    10.82    0.37       8.85    10.87    19.34    10.89    -2.55
20.85   24.35    18.94      32.89   18.47    51.32

• Now, let’s create a residual plot. So plot L1 against
RESID.
• Now, let’s see if the residuals appear Normally
distributed. To do this, look at a boxplot and a
histogram. The book used a stemplot.
Crying and IQ
• Based on what we looked at, the pattern appears to
fairly linear since there is no pattern to the residual
plot.
• The residuals also appear to be approximately
Normally distributed. There is some slight right-
skewness, but we see no serious violations of our
conditions.
Estimating the Parameters
• The first step in inference is to estimate the
unknown parameters α, β, and σ.
• When the regression model describes our data and
we calculate the least-squares regression line, the
slope of the regression line, b, is an unbiased
estimator of the true slope β and the intercept a is
an unbiased estimator of the true intercept α.
Estimating the Parameters
• The remaining parameter of the model is the
standard deviation σ, which describes the variability
of the response y about the true regression line.
• The least-squares regression line estimates the true
regression line. So the residuals estimate how
much y varies about the true line.
• There are n residuals, one for each data point.
Because σ is the standard deviation of responses
about the true regression line, we estimate it by a
sample standard deviation of the residuals.
Estimating the Parameters
• We call this sample standard deviation a standard
error to emphasize that it is estimated from data.

• Notice that we divide by n – 2 rather than n – 1.
This is because we have n – 2 degrees of freedom.
Crying and IQ
• For our data, we get an LSRL of
• The true slope would tell us how much higher IQ
would get when the number of peaks in their crying
measurements increased by 1.
• For our example, we are estimating the slope β to
be 1.493. In other words, IQ is about 1.5 points
higher for each additional crying peak.
• We also estimate the y-intercept α to be 91.27. This
has no statistical meaning though because the value
is outside of our x values in the problem. Our
smallest x value is 9. Also, it is reasonable to
believe that all babies would cry if hit with a rubber
band.
Crying and IQ
• Now we want to find s.
• To do this, we need to know the residuals. They
should be in the list RESID.
• Since              we need to find the sum of the
squares of the residuals
• We can do this by typing the following in our
calculator              We get 17.499 for s.
• We can also find s by going to TESTS and selecting
F: LinRegTTest. Scroll down and find s. Notice we
also get 17.499 for s.
Confidence Intervals for the Regression Slope
• The slope β of the true regression line is usually
the most important parameter in a regression
problem.
• The slope is the rate of change of the mean
response as the explanatory variable increases.
• We often want to estimate β. The slope b of the
LSRL is an unbiased estimator of β.
• A confidence interval for β is useful because it
shows how accurate the estimate b is likely to be.
• This confidence interval will have the familiar
form:
Confidence Intervals for the Regression Slope
• Because our estimate is b, the confidence interval
becomes

• Here are the details.
Crying and IQ

• We can create a 95% confidence interval using the
printout above.

• We can find t* from the table or on the calculator. I
Go back!
used the calculator.
Crying and IQ
• We are going to learn in a little while that
• We need to know this to calculate the interval by
hand.
• Going to LinRegTTest, we can find t and b. This
allows us to find SEb.
• From the calculator, t = 3.0655 and b = 1.4929.
• So

• Therefore, the 95% C.I. is
Let’s Try 15.9
Testing the Hypothesis of No Linear Relationship
• The most common hypothesis about the slope is

• A regression line with slope 0 is horizontal. That is,
the mean of y does not change at all when x
changes. So this H0 says that there is no true linear
relationship between x and y.
• Put another way, H0 says that there is no correlation
between x and y in the population from which we
drew our data.
• You can use the test for zero slope to test the
hypothesis of zero correlation between any two
quantitative variables.
Testing the Hypothesis of No Linear Relationship
• Notice that testing correlation makes sense only if
the observations are a random sample. This is
often not the case in regression settings, where
researchers may fix in advance the values of x they
want to study.
• The statistic again takes the form

• The test statistic is just the standardized version of
the least-squares regression slope b.
• The details are on the next slide.
• Notice that the numerator is just b because we
usually test that the parameter is equal to 0.
Crying and IQ
•   What are our t value and p-value?
•   How could we find these on the calculator?
•   What would we have to show on the AP Exam?

• Where did these numbers come from?
• The calculator gives us b and t, so we use that to
find SEb.
Beer and Blood Alcohol Content
content example from Chapter 3.
• The number of beers a volunteer drank and their
recorded BAC are given in the table below.
Student:    1      2      3      4      5       6       7      8
Beers:      5      2      9      8      3       7       3      5
BAC:       0.10   0.03   0.19   0.12   0.04    0.095   0.07   0.06
Student:    9     10     11     12      13      14     15     16
Beers:      3      5      4      6      5       7       1      4
BAC:       0.02   0.05   0.07   0.10   0.085   0.09    0.01   0.05

• We want to conduct a significance test. We believe
that drinking more beer will increase the BAC.
Beer and Blood Alcohol Content
• Step 1: Hypotheses

• Step 2: Conditions for a Linear Regression t Test
– Each observation is independent of the others.
– The scatterplot is reasonably linear and the residual plot
does not indicate that the data is not linear. This indicates
that the true relationship is linear.
– The residual plot does not provide any reason to believe that
the standard deviation of the responses about the true line
are not the same everywhere.
– Looking at a histogram or boxplot of the residuals, we can
see that the residuals are skewed right, but there are no
major departures from Normality for a sample this small.
Beer and Blood Alcohol Content
• Step 3: Calculations

• Step 4: Interpretation
– Since our p-value of 0.000001 is smaller than any
standard significance level, we reject H0. We therefore
conclude that there is very strong evidence that
increasing the number of beers does increase BAC.
Beer and Blood Alcohol Content
• Let’s create a 99% C.I. just to review. We have
already done Steps 1 and 2.
• t* with df = 14 would be 2.977.
• Since the calculator gives us t and b, we can find
SEb.
• So

• Hence, the 99% C.I. would be
Let’s Try 15.13 and 15.15
Testing other than Slope of 0
• Suppose we want to test the following for the data
below.

28    46     75   90   24   50   72    73   54    86
24    48     79   86   25   48   70    73   56    81

37    42     39   81   65   62   71    61   58    54
32    45     38   84   70   61   73    66   54    56

• The calculations are the only part that is different.
They appear on the next slide.
Testing other than Slope of 0
• Calculations
– Now

– Where did SEb come from?
– Since the calculator gives us t and b when we do a
LinRegTTest, we still use this to find SEb. Then we just
plug into the formula and find our own t value for the
slope other than 0.
– P-value = 0.8577. How did we find it?
– 2 tcdf (0.1819, 10000000000, 18)
– Why did we multiply by 2?
– So we fail to reject. There is not enough evidence to
believe that the slope is not equal to 1.
Crying and IQ

Go back!
Crying and IQ

Go back!
Crying and IQ

Go back!
Minitab Output of Beers vs. BAC

• A word of warning. Computer output always gives a
two-sided p-value. So if you are finding the p-value
for a one-sided test, you need to divide the p-value
by 2.

Go back!
Scatterplot of Beers vs. BAC

Go back!
Residual Plot of Beers vs. BAC

Go back!
Histogram of the Beers vs. BAC Residuals

Go back!
Boxplot of the Beers vs. BAC Residuals

Go back!

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 20 posted: 6/26/2011 language: English pages: 45