# STA 3024Introduction to Statistics 2 Chapter5 Simple Linear by cty88181

VIEWS: 0 PAGES: 25

• pg 1
STA 3024 Introduction to Statistics 2
Chapter 5: Simple Linear Regression Analysis

As stated in chapter 3 and chapter 4, the table below summarizes the major materials
that we need to cover

Table 1: Methods to Investigate the Association between Variables
Explanatory Variable(s) Response Variable Method
Chapter 3         Categorical               Categorical         Contingency Tables
Chapter 4         Categorical               Quantitative        Analysis of Variance (ANOVA)
Chapter 5 and 6 Quantitative                Quantitative        Regression Analysis
Quantitative              Categorical         (not discussed)

This chapter deals with cases where both explanatory and response variables are quanti-
tative where we’ll use regression analysis to study the association between the two variables.
The regression methods that we’re studying in this chapter restricted to the linear regression
family (as opposed to nonlinear regression analysis).
If there’s only one quantitative explanatory variable, then we’ll study simple linear re-
gression. If there are more than one explanatory variables, then we’ll introduce multiple
linear regression.

This chapter corresponds to chapter 12 in our textbook.

1
PART I - BACKGROUND

1.1 Background and Remarks

Given two quantitative variables, we’d like to ﬁnd the association between them. For
instance, we might wonder if height and weight have any kind of association that knowing
one might predict the other. (Certainly, both height and weight are quantitative variables)
Now, we go out and collect data. Each person/subject in our experiment should be
identiﬁed by the information that we are interested in; that is, each subject is described by
a pair of data (weight, height).

In fact, all data in simple linear regression are decribed by a pair of number (X, Y )
which we can think of them as points on a 2D plane. A usual data set for two quantitative
variables looks like

Table 2: Data for Weights and Heights
X (= Weight) Y (= Height)
x1 = 130        y1 = 63
x2 = 133        y2 = 65
x3 = 157        y3 = 70
x4 = 180        y4 = 72
x5 = 160        y5 = 74
x6 = 177        y6 = 73
x7 = 193        y7 = 73
x8 = 126        y8 = 61
x9 = 179        y9 = 63
x10 = 143       y10 = 77
x11 = 157       y11 = 68
x12 = 122       y12 = 65
x13 = 145       y13 = 68
x14 = 135       y14 = 67
x15 = 117       y15 = 61
x16 = 186       y16 = 73
x17 = 170       y17 = 71
x18 = 173       y18 = 71
x19 = 192       y19 = 74
x20 = 155       y20 = 64
x21 = 177       y21 = 77
x22 = 107       y22 = 59
x23 = 138       y23 = 69
x24 = 155       y24 = 70
x25 = 201       y25 = 73

2
Figure 1: A Scatterplot for a Sample of People where Each Individual is Identiﬁed by a
Pair of Number (X= Weight, Y = Height).

Recall: a 2D line can be mathematically described by the equation y = m + n ∗ x

Figure 2: A Straight Line y = m + n ∗ x where m Indicates the y-Interception and n
Indicates the Slope

Let X be the quantitative explanatory variable and let Y be the quantitative response
variable. If the relationship between X and Y can be described by a straight line

µY (X) = α + βX      (where µY (X) is the mean for Y at X)

then simple linear regression is the statistical inference method that analyses the associa-
tion between the two variables. “Simple” because there’s only one explanatory variable.

Important Remark 1:

The formula µY (X) = α + βX describes the relationship between X and Y through the
mean/average of Y , and that’s the best we can do.

We can only say, “The average height for person who’s weighted 165 lbs is 5ft10.” How-
ever, we can NOT predict the EXACT value for Y based on explanatory X. The following

3
statement is incorrect, “(blank) weighted 140 lbs so he/she must be 5ft7.”

We have to take into account that even under the same X value (weight), the observed
Y ’s (heights) are not the same most of the time. There is always some amount of variability
in the values of Y with the same value of X. Thus, it is NOT correct when write Y = α+βX.

Important Remark 2:

In the case where X and Y are linearly related by µY (X) = α + βX, the popuplation
parameters α and β are both unknown (why?) We have to use data from samples to cook
up the statistics a and b as the estimates for parameters α and β respectively.

There are a number of method to ﬁnd a and b. Each method depends on the criteria
given before conducting the experiment. We are only interested in the least squares method
which estimates α and β by (the criteria) minimizing the sum of squares of errors. (We will

Form the sample data, we obtain the estimates a and b which describes the regression
equation between Y and X as
ˆ
Y (X) = a + bX
ˆ
where Y (X) indicates the predicted value for Y at the value X using the estimates a and b
obtained from the least squares method from the sample data.

1.2 Making Predictions

The most basic use of the regression equation is to make predictions. We can predict
the value of X by simply plugging in that value of X into the regression equation and seeing
ˆ
what we get for Y . Of course, we would not think that the actual Y value would be exactly
equal to our prediction (as discussed above). The prediction is our single-number best guess
for a Y value at that particular X value.

Existing observations
If we plug in the X value corresponding to an observation that we actually have in the
ˆ
sample, we can write the predicted value as Yi as an abbreviation, where i is a number from
that represents which observation we’re talking about. So if we plug in each observation’s
X value into the regression equation, we get a list of predicted values.

Example Weights and Heights (cont.) Say, out of thin air, I declare the regression equa-
ˆ
tion for Weight and Height is Yi = 123 − 0.4 ∗ Xi or Predicted Height = 123 − 0.4 ∗ Weight.
Now what is the predicted height when the weight is 155? (Note that X20 = 155.)

Solution:

4
New observations
We could instead plug in a value of X that doesn’t correspond to any observation that
we actually have. Then we would be predicting the Y value for a new observation. However,
we also need to be careful to avoid making predictions for X values that are outside the
range of X values for which we actually have data in the sample. Making a prediction for
an X value outside the range of the X values in the data is called extrapolation, and it
leads to predictions that are useless.

Example Weights and Heights (cont.) Using the same regression equation, let’s do some
extrapolation. What’s the predicted height when the weight is 300lbs? (Note that X = 300
is out of our range for weight) Does the answer make sense?

Solution:

Note that the further out of the range for X, the worse the extrapolation becomes.
However, sometimes we still use extrapolation when the value X is reasonably close to the
range. For example, people often extrapolate by using historical data to predict the future.
If we have data from 1984 to 2008, making a prediction for 2009 would technically be ex-
trapolation, but it might be okay in some circumstances. However, we probably wouldn’t
want to use that data to make a prediction for, say, the year 3000.

1.3 Regression toward the mean

“Regression” may seem like a strange name for the process used to analyze the associa-
tion between quantitative variables. (After all, what is it thats regressing?) The procedure
itself is actually named after a speciﬁc phenomenon that the procedure often uncovered in
a variety of situations in the early days of statistics. That phenomenon is called regression
toward the mean, and its best explained with an example.

A class of students takes two editions of the same test on two successive days. It has
frequently been observed that the worst performers on the ﬁrst day will tend to improve
their scores on the second day, and the best performers on the ﬁrst day will tend to do
worse on the second day. The phenomenon occurs because student scores are determined
in part by underlying ability and in part by chance. For the ﬁrst test, some will be lucky,
and score more than their ability, and some will be unlucky and score less than their ability.
Some of the lucky students on the ﬁrst test will be lucky again on the second test, but more
of them will have (for them) average or below average scores. Therefore a student who was
lucky on the ﬁrst test is more likely to have a worse score on the second test than a better
score. Similarly, students who score less than the mean on the ﬁrst test will tend to see
their scores increase on the second test.

Regression toward the mean is a very common phenomenon. We see it almost any time

5
that subjects are measured once and then measured again. Its also extremely important to
remember, because many things that might otherwise be considered real eﬀects can actually
be explained simply as regression toward the mean.

Suppose the 15 students with the lowest scores on the ﬁrst exam all hire tutors before
the second exam. Of those 15 students, 11 improve their score on the second exam. This
might be because the tutoring helped, but it might also be explained simply as regression
toward the mean. Further investigation would be needed to determine whether the tutoring

The term “regression” was coined by Francis Galton, a cousin of Charles Darwin, in
the nineteenth century to describe a biological phenomenon. The phenomenon was that
the heights of descendants of tall ancestors tend to regress down towards a normal average.
For Galton, regression had only this biological meaning, but his work was later extended
by Udny Yule and Karl Pearson to a more general statistical context.

6
PART II - LEAST SQUARES METHOD

Let X and Y have a linear relation µY (Xi ) = α + βXi .
We’ve established that α and β are the population parameters and that they are unknown
constants. Based on the sample data, we’ll then try to come up with the estimates for α
and β, called a and b respectively.
ˆ
The predicted value for Yi is denoted by Yi and it has the following formula
ˆ
Yi = a + bXi
ˆ
Since Yi is the predicted value for the true Yi , the two values are almost always diﬀerent
which results in some error terms. The errors/residuals are deﬁned to be
ˆ
ei = Yi − Yi

The least squares method is a method of ﬁnding the estimates a and b so that the
sum of squares of the residuals is minimized. That is, solving the criteria of minimizing
n             n                       n
2
e2
i   =              ˆ
Yi − Yi       =         [Yi − (a + bXi )]2   (∗)
i=1            i=1                     i=1

we’ll be able to ﬁnd a and b from the sample data (using the diﬀerentiate method).

In fact, by solving the criteria of minimizing (*), we obtain the following formula for the
two coeﬃcients
sY                  ¯     ¯
b=r              and a = Y − bX
sX
where:

• sY is the std for the Y data (either given or can be calculated by your calculator).

• sX is the std for the X data (either given or can be calculated by your calculator).

¯
• Y is the sample average of the Y data.
¯
• X is the sample average of the X data.

Interpret the Slope b

ˆ
The slope b tells us how much the predicted value Yi changes when the corresponding
Xi increases by one unit.

Moreover, the sign of the slope also indicates the direction of the association between
the two variables X and Y

• If b > 0 then X and Y have a positive relationship; that is, if X increases then Y
increases.

7
• If b < 0 then X and Y have a negative relationship; that is, it X increases then Y
decreases.

Note: we didn’t mention the possibility where b = 0 (why?)

Interpret the Intercept a

ˆ
The intercept a is simply the predicted value Y where X = 0.
Be careful though, sometimes the interpretation for the intercept a does not make logical
sense. Still, we accept it as the reasonable estimate in the “least squares” context (that is,
if it meets the least squares criteria, it’s good enough).

Example Weight vs. Height (cont.) Given the the data in Table 2, MINITAB gives the
following descriptive output

Variable         N        Mean        Median        TrMean         StDev
weight          25      156.32        157.00        156.52         26.07
height          25       68.84         70.00         68.91          5.10

• Given that r = 0.713, ﬁnd the least squares regression equation.

• Interpret the slope and the intercept.

Solution:

8
PART III - THE CORRELATION

3.1 The Correlation r

When quantitative variables X and Y have an approximately linear relationship, we
can measure the strength of that relationship with a quantity called the correlation r. The
correlation summaries the direction of the association between two quantitatives variables
and the strength of its straigh-line trend. If we have a sample drawn from a population,
then there are two diﬀerent quantities we can talk about:

• The population correlation, ρ, measures the strength of the association in the
population.

• The sample correlation, r, measures the strength of the association in the sample.

We seldom know the value of ρ, so we typically estimate it with the value of r.

Note: The formula for the sample correlation r is
n         ¯          ¯
1          Xi − X     Yi − Y
r=
n−1           sX         sY
i=1

but we will never actually calculate r using this formula.

Properties of the correlation r

• The correlation r has the same sign as the slope b. Therefore, just like b, r indicates
the direction that X and Y are related. If r > 0 then there’s a positive relationship.
If r < 0 then there’s a negative relationship.

• The correlation r can only take values between -1 and +1; that is −1 ≤ r ≤ +1. The
closer r is to ±1, the closer the data points fall to a straight line, and the stronger
the association. The closer r is to zero, the weaker the association.

Figure 3. Some Scatterplots and Their Correlation

9
Figure 4. An Illustration of a Data Set that has Correlation r = 0

Note: The correlation is only useful for measuring relationships that are linear.

Figure 5. The Data Set above Clearly Shows There Is a Relationship between X an Y ,
but the Correlation r is still equal 0 because the Relationship is Not Linear. Thus, It Is
Always Wise to Look at a Scatterplot of the Data First, to See if the Correlation Will

3.2 The Roles and Limitations of Correlation r

Why do we need the correlation? Why can’t we use the slope b to describe the strength
of the association?
The reason is that the slope’s numerical value depends on the units of measurement.
Recall that the slope b tells us how much the Y value changes when the corresponding X
value increases by 1 unit. The slope b always has the same units of measurement as the Y
value.
The correlation r, on the other hand, can be thought of as the standardized version
of the slope which does not depend on any units of measurement. The standardization

10
adjusts the slope b for the way it depends on the standard deviations of X and Y . Since
sY
the correlation r and the slope b are related by b = r sX , equivalently,
sX
r=b      .
sY

What Factors Aﬀect the Correlation?

1. Just as outliers can greatly inﬂuence the regression equation, they can also greatly
inﬂuence the value of r. Accordingly, the correlation may not be particularly useful
when outliers are present.

2. If the subjects are grouped for the observations, the correlation r tends to increase in
magnitude. This can be deceptive, because it can make X and Y appear to be more
linear strongly related than they actually are.
For example, suppose we have a sample with 50 observations. Instead of treating
our data in the usual way, we could instead make 10 groups of ﬁve, average the data
within each group, and then treat our group averages like a new data set with just
10 observations. If we did this, we would probably increase the magnitude of the
correlation.

Figure 6. The Scatterplot Displays the 50 Individual Data Points and Their
Correlation r = 0.627

Figure 7. The Scatterplot Displays the Group Means for 10 Groups of 5 from the
Data Used in Figure 6. The Correlation Now Becomes r = 0.901

11
3. The size of the correlation r also depends on the range of X values sampled: the
correlation tends to be smaller when we sample only a restricted range of X values
than when we use the entire range.

Figure 8. If We Look at the Complete Data Set Presented in Figure 6, then the
Correlation r = 0.627. However, If We Restrict the Data and Only Look at the Data
where X Runs from 148 to 168, then the Correlation for the Data in that Range Is
r = 0.596.

12
PART IV - SUM OF SQUARES, ANOVA TABLE, AND MORE ON
CORRELATION

4.1 Sum of Squares and ANOVA Table for Regression

We already saw that in one-way and two-way ANOVA, we can construct something called
the ANOVA table, which breaks down the variability in the data into diﬀerent sources. It
turns out that we can also construct an ANOVA table for simple linear regression to analyze
the variability in the response variable (Y ) values. The regression ANOVA table, like the
ANOVA tables weve seen before, includes sums of squares, degrees of freedom, and mean
squares.

The prediction error is the diﬀerence between the observed value Yi and its predicted
values. There are two types of prediction errors:
ˆ
• The error using the regression line to make a prediction is Yi − Yi . This type of error
results in formulating the variability in residual sum of squares.
¯                            ¯
• The error using the sample mean Y to make a prediction is Yi − Y . This type of error
results in formulating the variability in total sum of squares.
There is also one more type of “error”, that is, the diﬀerence between the predicted values
ˆ                        ¯
Yi and the sample mean Y which results in formulating the variability in regression sum of
squares.

From those three types of error mentioned above, it follows that the ANOVA table for
regression involves three diﬀerent sums of squares:

• The regression sum of squares, or SSRegr , measures the variability due to the
ˆ
regression equation. The regression equation gives us a predicted value, Yi , for each
¯
observation, so we measure the variability of those around the sample mean Y using
the formula
n
2
SSRegr =          ˆ    ¯
Yi − Y
i=1

• The residual sum of squares, or SSRes , measures the varaibility of the actual
ˆ
observed Yi values around its predicted value Yi . Its formula is
n
2
SSRes =               ˆ
Yi − Yi
i=1

• The total sum of squares, or SST otal , measures the total variability of the Yi values
¯
around the sample mean Y . Since it explains the overall/total variability in the data,
the total sum of squares is the sum of all of the other SS’s. Its formula is
n
¯   2
SSRegr + SSRes = SST otal =               Yi − Y
i=1

13
Like weve seen before, there are quantities called degrees of freedom that are associated
with each sum of squares. Their formulas are

dfRegr = 1
dfRes = n − 2
dfT otal = n − 1 = dfRegr + dfRes

where n is the usual notation for the sample size.

The mean squares are just the sums of squares divided by their degrees of freedom:
SSRegr                      SSRes
M SRegr =              and   M SRes =
dfRegr                      dfRes

It turns out that M SRes makes a good estimate of σ 2 , the variance of the population of
Y values at each X value.

We typically summarize all this information in the regression ANOVA table, which is
laid out as shown

Source         df               SS         MS
SSRegr
Regression     dfRegr = 1       SSRegr     M SRegr =   dfRegr
SSRes
Error          dfRes = n − 2    SSRes      M SE =   dfRes
Total          df = n − 1       SST otal

4.2 Coeﬃcient of Determination

The better our regression equation is at making predictions (hence, better predictive
power), the closer the observed values will be to their predicted values, and hence the
smaller the residuals will be.

We can quantify this predictive power using the coeﬃcient of determination, R2 ,
which we deﬁne as
SSRegr
R2 =
SST otal
Recall that SST otal = SSRegr + SSRes , so R2 is a number that tells us what proportion
of SST otal comes from SSRegr .

• If our regression equation has high predictive power, SSRes is small. Then most of
SST otal comes from SSRegr , so R2 is close to 1.

• If our regression equation has poor predictive power, SSRes is large. Then most of
SST otal comes from SSRes instead of SSRegr , so R2 is close to 0.

14
It turns out that R2 is equal to the square of the correlation r.
Both the correlation r and the coeﬃcient of determination R2 describe the strength of
the association. However, their interpretations are a bit diﬀerent. The correlation r falls
between -1 and +1, and it governs the extent of “regression toward the mean.” The R2
measure falls between 0 and +1 (or equivalently, 0% to 100%), and it summarizes the re-
duction in sum of squared errors in predicting Y using the regression line instead of using
the mean of Y .

Least Squares Regression Equation As “Best Fit”

Notice from the formulas that SSRes depends on what the predicted values actually are.
Heres what we mean by the least squares regression equation being the “best ﬁt” for the
data:
The least squares regression equation picks the values of a and b that make SSRes min-
imal.

SSRes is the residual sum of squares. The “best ﬁt” is the one where the predicted
values are as close as possible to their observed values, in the overall sense of SSRes . In
other words, changing the values of any of the regression coeﬃcients would yield a larger
SSRes .

Note: We mean the regression equation is the “best ﬁt” when using this particular X
variable. We might be able to get a smaller SSRes by using a completely diﬀerent X variable
or by using multiple X variables, but thats not what were talking about here.

Example Smoking and Nicotine The following table gives the level of continine for a
person who smoke a certain number of cigarette per day

Table 3: Data for Smoking and Nicotine
X (= Cigarettes per day) Y (= Level of continine)
60                       179
10                       283
4                        76
15                       174
10                       209
1                        10
20                       350
MINITAB give out the following descriptive output for the data

Variable         N         Mean         StDev
cigarett         7        17.14         19.94
continin         7        183.0         115.5

Given that r = 0.263, ﬁnd the least squares regressin equation for the data. Interprete
the slope and the intercept.

Solution:

15
Use the obtained regression equation found above, ﬁnd the best predicted level of con-
tinine for a person who smoke 40 cigarettes per day. Interpret your prediction.

Solution:

MINITAB gives us the following ANOVA Table

Analysis of Variance

Source                DF            SS              MS
Regression                        5507
Residual Error
Total                            80040

Fill in all of the missing information to complete the ANOVA Table.

The scatterplot for the data is presented below

Figure 9. The Scatterplot for the Data from Table 3.

It’s suspected that the ﬁrst observation is an outlier. In practice, it’s forbidden to
delibrately censore data in order to make the data “prettier”. However, just for this problem,
let’s delete the ﬁrst observation and see what happens. The new data and new scatterplot
is prsented below

16
Table 4: New Data for Smoking and Nicotine After We Delete the First Observation
X (= Cigarettes per day) Y (= Level of continine)
10                         283
4                          76
15                         174
10                         209
1                          10
20                         350

Figure 10. The New Scatterplot for the Data from Table 4 After We Delete the First
Observation.

The new regression equation and ANOVA Table are found to be

The regression equation is
new continine = 25.7 + 15.8 new cigarettes

Predictor          Coef        SE Coef             T         P
Constant          25.65          53.30          0.48     0.655
new ciga         15.802          4.499          3.51     0.025

Analysis of Variance

Source               DF            SS             MS
Regression
Residual Error                  19596
Total                           80021

Fill in the missing information to complete the ANOVA Table.

Interpret the new slope and the new intercept. Based on the given information, calcu-
late and interpret the coeﬃcient of determination R2 , and in turn, ﬁnd the correlation r.
Compare the new r value to the r value given from Table 3.

17
Solution:

18
PART V - INFERENCE WITH REGRESSION FOR THE SLOPE β

We have been spending a good deal of time on studying the strength of the relationship
between two quantitative variables X and Y . However, we must also concern whether or
not the response variable Y depends on the explanatory variable X to begin with.
Recall that this is the same as asking whether or not the slope β = 0 (or, sometimes we
can also use the criteria ρ = 0). In this section, well see how to answer that question.

So far, we’ve familiarized ourselves with various hypotheses tests and CI methods, and
those are the inferential tools that we’ll be using in this section. First, we’ll go through two
types of hypotheses test to test for β = 0 (or ρ = 0): the t-test and the F-test. Then, we’ll
look at the CI’s for the slope β.

5.1 The t-tests for β

1. Assumptions
• Quantitative variables where the population means of Y at diﬀerent values of X
have a straight line relationship with X; that is, µY (X) = α + βX.
• Simple random sample.
• Population distribution for Y is approximately normal with the same standard
deviation at each X value.
• The data contains no extreme outliers.
2. Hypotheses
• Null: H0 : β = 0.
• Alternative: There are three possible alternative hypotheses.
– H1 : β = 0 which results in a two-sided test.
– H1 : β < 0 which results in a one-sided test.
– H1 : β > 0 which results in a one-sided test.
3. Test statistic
b−0
t0 =   .
seb
where seb is the standard error for the coeﬃcient b (provided by statistical software,
if not, the formula is seb = M SX 2 ).
P Res
i

4. P-value Use the t-distribution with df = n − 2. Be cautious, the P-value for one-sided
test is diﬀerent from the P-value from the two-sided test. Thus, it’s important to be
able to identify which test it is to ﬁnd the correct P-value.
5. Conclusion If we can ﬁnd the corresponding P-value for our test statistic, then the
conclusion based on the P-value is straight forward. Smaller P-value give stronger
evidence against H0 . If a decision is needed, reject H0 if P-value ≤ α, the signiﬁcant
level. However, most of the time, P-value can only be found using statistical software.
If we can not ﬁnd the P-value when solving problems by hand, we can still make
conclusion based solely on the test statistic. Given a signiﬁcant level α,

19
• For the one-sided test, if t0 > tα , then we can reject the null H0 .
• For the two-sided test, it t0 > t α , then we can reject the null H0 .
2

Example Smoking and Nicotine (cont.) MINITAB gives the following output for the
data obtained from Table 3.
The regression equation is
Continine = 157 + 1.52 Cigarettes

Predictor          Coef        SE Coef            P
Constant         156.95          62.98        0.055
Cigarett          1.520          2.500        0.570

S = 122.1          R-Sq = 6.9%          R-Sq(adj) = 0.0%
Test whether β = 0 using the one-sided test with the H1 : β > 0.

Solution:

5.2 The t-tests for ρ

1. Assumptions

• Quantitative variables where the population means of Y at diﬀerent values of X
have a straight line relationship with X; that is, µY (X) = α + βX.
• Simple random sample.
• Population distribution for Y is approximately normal with the same standard
deviation at each X value.
• The data contains no extreme outliers.

20
2. Hypotheses

• Null: H0 : ρ = 0.
• Alternative: There are three possible alternative hypotheses.
– H1 : ρ = 0 which results in a two-sided test.
– H1 : ρ < 0 which results in a one-sided test.
– H1 : ρ > 0 which results in a one-sided test.

3. Test statistic
r−0
t0 =           .
1−R2
n−2

4. P-value Use the t-distribution with df = n − 2. Be cautious, the P-value for one-sided
test is diﬀerent from the P-value from the two-sided test. Thus, it’s important to be
able to identify which test it is to ﬁnd the correct P-value.

5. Conclusion If we can ﬁnd the corresponding P-value for our test statistic, then the
conclusion based on the P-value is straight forward. Smaller P-value give stronger
evidence against H0 . If a decision is needed, reject H0 if P-value ≤ α, the signiﬁcant
level. However, most of the time, P-value can only be found using statistical software.
If we can not ﬁnd the P-value when solving problems by hand, we can still make
conclusion based solely on the test statistic. Given a signiﬁcant level α,

• For the one-sided test, if t0 > tα , then we can reject the null H0 .
• For the two-sided test, it t0 > t α , then we can reject the null H0 .
2

Example Smoking and Nicotine (cont.) MINITAB gives the following output for the
data obtained from Table 3.
Test whether ρ = 0 using the two-sided test.

Solution:

21
5.3 The F-test for β

1. Assumptions
• Quantitative variables where the population means of Y at diﬀerent values of X
have a straight line relationship with X; that is, µY (X) = α + βX.
• Simple random sample.
• Population distribution for Y is approximately normal with the same standard
deviation at each X value.
• The data contains no extreme outliers.
2. Hypotheses
• Null: H0 : β = 0.
• Alternative: H1 : β = 0 .
3. Test statistic
M SRegr
F0 =            .
M SRes
F0 sampling distribution has df1 = 1 and df2 = n − 2 (why?).
4. P-value
Recall the deﬁnition of the P-value: The P-value is the probability of getting a test
statistic value at least as extreme as the one observed, if H0 is true. The P-value is a
tail probability from the F distribution the test statistic has when H0 is true.
Remember that we said the larger values of F are, the more evidence against H0 . So
the P-value is the probability of getting an F value larger than the one we actually
got, if H0 is true. To calculate this probability exactly, we typically need a statistical
software.
5. Conclusion If we can ﬁnd the corresponding P-value for our test statistic, then the
conclusion based on the P-value is straight forward. Smaller P-value give stronger
evidence against H0 . If a decision is needed, reject H0 if P-value ≤ α, the signiﬁcant
level. However, most of the time, P-value can only be found using statistical software.

The F-test for β is equivalent to the two-sided t-test for β. In fact, the three types of
tests are equivalent to each other. Depend on what types of information ig given, we can
always choose a test that make our lives easier.

Example Smoking and Nicotine (cont.) MINITAB gives the following output for the
data obtained from Table 3.

Analysis of Variance

Source                DF            SS               MS           P
Regression             1          5507             5507       0.570
Residual Error         5         74533            14907
Total                  6         80040

22
Test whether β = 0 using the F-test.

Solution:

5.4 Conﬁdence Intervals for the Slope

The regression F and t tests simply ask whether its reasonable that β = 0. Instead, we
might be interested in ﬁguring out what all the reasonable values are for β. We do this by
making a conﬁdence interval. The formula for (1 − α)% CI for β is

b ± t α seb
2

where seb is the standard error for the coeﬃcient b (provided by statistical software, if not,
the formula is seb = M SX 2 )..
P Res
i

The standard interpretation of a (1 − α)% conﬁdence interval for β is as follows:
We are (1 − α)% conﬁdent that the true value of β is between (lower bound) and (upper
bound).

More speciﬁcally, remember that by “(1−α)% conﬁdent,” we mean that if we could take
millions of random samples from this same population, do a regression for each one, and
calculate a (1 − α)% conﬁdence interval for β each time, then (1 − α)% of those intervals
would contain the true value of β.

Example Smoking and Nicotine (cont.) Find and interpret the 99% CI for β based on
our data from Table 3 (use the MINITAB output given from section 5.1)
Solution:

23
PART V - INFERENCE WITH REGRESSION AT SPECIFIC X
VALUE

For a straight-line regression model, we estimate µY , the population mean of Y at a
ˆ
given value of X by the least squares regression equation Yi = a + bXi . How good is this
estimate? We can ﬁnd the (1 − α)% CI for the unknown population parameter µY .

ˆ
Furthermore, the estimate Yi = a + bXi for the mean of Y at a ﬁxed value of X is also
a prediction for the outcome of Y for a particular subject at that value. How good is the
prediction? We can ﬁnd the (1 − α)% prediction interval (PI) for the unknown value of the
subject at a paticular value X.

So what is the diﬀerence between the PI and the CI?
The prediction interval for Y is an inference about where individual observations fall,
whereas the conﬁdence interval for µ is an inference about where a population mean falls.
Use a prediction interval for Y if we want to predict where a single observation on Y
will fall.
Use a conﬁdence interval for µ if we want to estimate the mean of Y for every subject
having a particular X value.

• For large samples with an X value √                             ¯
equal to or close to the mean X, the (1 − α)% PI
ˆ
for Y is approximately Y ± t α ∗ M SRes .
2

Keep in mind that this is just an approximated formula for large samples. The exact
formula is actually

1         ¯
n(X − X)2
ˆ
Y ± tα ∗    M SRes   1+     +
2                    n n( X)2 − X 2

• The (1 − α)% CI for µY is
¯           M SRes
Y ± tα ∗           .
2        n

Example Smoking and Nicotine (cont.) Recall the following table gives the level of
continine for a person who smoke a certain number of cigarette per day

Table 5: Data for Smoking and Nicotine
X (= Cigarettes per day) Y (= Level of continine)
60                       179
10                       283
4                        76
15                       174
10                       209
1                        10
20                       350

24
Given that    Xi = 120,                   ¯
Xi2 = 4442 and X = 17.14. Moreover, from the ANOVA
table, we know that M SRes = 14907.
ˆ
• Find the predicted value Y for X = 34.
ˆ
• Find and interpret the 95% PI for Y (X = 34).
• Find and interpret the 95% CI for the population mean µY .

Solution:

Suggested Problems
Chapter 12: 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 15, 16, 17, 18, 19, 20, 23, 24, 26, 32, 33, 34,
35, 36, 37, 38, 39, 43, 44, 45, 46, 47, 49, 50, 51, 52, 53, 63, 64, 65, 66, 67, 68, 69, 71, 72, 73,
74, 75, 76, 81, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106.

This is the end of chapter 5. Cheers.
c Quan Tran - Summer 2009

25

To top