STA 3024Introduction to Statistics 2 Chapter5 Simple Linear by cty88181

VIEWS: 0 PAGES: 25

									                      STA 3024 Introduction to Statistics 2
                   Chapter 5: Simple Linear Regression Analysis



   As stated in chapter 3 and chapter 4, the table below summarizes the major materials
that we need to cover

           Table 1: Methods to Investigate the Association between Variables
                   Explanatory Variable(s) Response Variable Method
 Chapter 3         Categorical               Categorical         Contingency Tables
 Chapter 4         Categorical               Quantitative        Analysis of Variance (ANOVA)
 Chapter 5 and 6 Quantitative                Quantitative        Regression Analysis
                   Quantitative              Categorical         (not discussed)

    This chapter deals with cases where both explanatory and response variables are quanti-
tative where we’ll use regression analysis to study the association between the two variables.
The regression methods that we’re studying in this chapter restricted to the linear regression
family (as opposed to nonlinear regression analysis).
    If there’s only one quantitative explanatory variable, then we’ll study simple linear re-
gression. If there are more than one explanatory variables, then we’ll introduce multiple
linear regression.

   This chapter corresponds to chapter 12 in our textbook.




                                              1
                             PART I - BACKGROUND


   1.1 Background and Remarks

    Given two quantitative variables, we’d like to find the association between them. For
instance, we might wonder if height and weight have any kind of association that knowing
one might predict the other. (Certainly, both height and weight are quantitative variables)
    Now, we go out and collect data. Each person/subject in our experiment should be
identified by the information that we are interested in; that is, each subject is described by
a pair of data (weight, height).

    In fact, all data in simple linear regression are decribed by a pair of number (X, Y )
which we can think of them as points on a 2D plane. A usual data set for two quantitative
variables looks like

                          Table 2: Data for Weights and Heights
                              X (= Weight) Y (= Height)
                              x1 = 130        y1 = 63
                              x2 = 133        y2 = 65
                              x3 = 157        y3 = 70
                              x4 = 180        y4 = 72
                              x5 = 160        y5 = 74
                              x6 = 177        y6 = 73
                              x7 = 193        y7 = 73
                              x8 = 126        y8 = 61
                              x9 = 179        y9 = 63
                              x10 = 143       y10 = 77
                              x11 = 157       y11 = 68
                              x12 = 122       y12 = 65
                              x13 = 145       y13 = 68
                              x14 = 135       y14 = 67
                              x15 = 117       y15 = 61
                              x16 = 186       y16 = 73
                              x17 = 170       y17 = 71
                              x18 = 173       y18 = 71
                              x19 = 192       y19 = 74
                              x20 = 155       y20 = 64
                              x21 = 177       y21 = 77
                              x22 = 107       y22 = 59
                              x23 = 138       y23 = 69
                              x24 = 155       y24 = 70
                              x25 = 201       y25 = 73




                                             2
 Figure 1: A Scatterplot for a Sample of People where Each Individual is Identified by a
                       Pair of Number (X= Weight, Y = Height).

   Recall: a 2D line can be mathematically described by the equation y = m + n ∗ x




  Figure 2: A Straight Line y = m + n ∗ x where m Indicates the y-Interception and n
                                 Indicates the Slope

    Let X be the quantitative explanatory variable and let Y be the quantitative response
variable. If the relationship between X and Y can be described by a straight line

               µY (X) = α + βX      (where µY (X) is the mean for Y at X)

then simple linear regression is the statistical inference method that analyses the associa-
tion between the two variables. “Simple” because there’s only one explanatory variable.

   Important Remark 1:

   The formula µY (X) = α + βX describes the relationship between X and Y through the
mean/average of Y , and that’s the best we can do.

   We can only say, “The average height for person who’s weighted 165 lbs is 5ft10.” How-
ever, we can NOT predict the EXACT value for Y based on explanatory X. The following


                                             3
statement is incorrect, “(blank) weighted 140 lbs so he/she must be 5ft7.”

    We have to take into account that even under the same X value (weight), the observed
Y ’s (heights) are not the same most of the time. There is always some amount of variability
in the values of Y with the same value of X. Thus, it is NOT correct when write Y = α+βX.

   Important Remark 2:

   In the case where X and Y are linearly related by µY (X) = α + βX, the popuplation
parameters α and β are both unknown (why?) We have to use data from samples to cook
up the statistics a and b as the estimates for parameters α and β respectively.

    There are a number of method to find a and b. Each method depends on the criteria
given before conducting the experiment. We are only interested in the least squares method
which estimates α and β by (the criteria) minimizing the sum of squares of errors. (We will
talk more about this later.)

   Form the sample data, we obtain the estimates a and b which describes the regression
equation between Y and X as
                                   ˆ
                                  Y (X) = a + bX
       ˆ
where Y (X) indicates the predicted value for Y at the value X using the estimates a and b
obtained from the least squares method from the sample data.



   1.2 Making Predictions

    The most basic use of the regression equation is to make predictions. We can predict
the value of X by simply plugging in that value of X into the regression equation and seeing
                 ˆ
what we get for Y . Of course, we would not think that the actual Y value would be exactly
equal to our prediction (as discussed above). The prediction is our single-number best guess
for a Y value at that particular X value.

   Existing observations
   If we plug in the X value corresponding to an observation that we actually have in the
                                            ˆ
sample, we can write the predicted value as Yi as an abbreviation, where i is a number from
that represents which observation we’re talking about. So if we plug in each observation’s
X value into the regression equation, we get a list of predicted values.

    Example Weights and Heights (cont.) Say, out of thin air, I declare the regression equa-
                              ˆ
tion for Weight and Height is Yi = 123 − 0.4 ∗ Xi or Predicted Height = 123 − 0.4 ∗ Weight.
Now what is the predicted height when the weight is 155? (Note that X20 = 155.)

   Solution:




                                             4
    New observations
    We could instead plug in a value of X that doesn’t correspond to any observation that
we actually have. Then we would be predicting the Y value for a new observation. However,
we also need to be careful to avoid making predictions for X values that are outside the
range of X values for which we actually have data in the sample. Making a prediction for
an X value outside the range of the X values in the data is called extrapolation, and it
leads to predictions that are useless.

    Example Weights and Heights (cont.) Using the same regression equation, let’s do some
extrapolation. What’s the predicted height when the weight is 300lbs? (Note that X = 300
is out of our range for weight) Does the answer make sense?

   Solution:




    Note that the further out of the range for X, the worse the extrapolation becomes.
However, sometimes we still use extrapolation when the value X is reasonably close to the
range. For example, people often extrapolate by using historical data to predict the future.
If we have data from 1984 to 2008, making a prediction for 2009 would technically be ex-
trapolation, but it might be okay in some circumstances. However, we probably wouldn’t
want to use that data to make a prediction for, say, the year 3000.



   1.3 Regression toward the mean

    “Regression” may seem like a strange name for the process used to analyze the associa-
tion between quantitative variables. (After all, what is it thats regressing?) The procedure
itself is actually named after a specific phenomenon that the procedure often uncovered in
a variety of situations in the early days of statistics. That phenomenon is called regression
toward the mean, and its best explained with an example.

    A class of students takes two editions of the same test on two successive days. It has
frequently been observed that the worst performers on the first day will tend to improve
their scores on the second day, and the best performers on the first day will tend to do
worse on the second day. The phenomenon occurs because student scores are determined
in part by underlying ability and in part by chance. For the first test, some will be lucky,
and score more than their ability, and some will be unlucky and score less than their ability.
Some of the lucky students on the first test will be lucky again on the second test, but more
of them will have (for them) average or below average scores. Therefore a student who was
lucky on the first test is more likely to have a worse score on the second test than a better
score. Similarly, students who score less than the mean on the first test will tend to see
their scores increase on the second test.

   Regression toward the mean is a very common phenomenon. We see it almost any time


                                              5
that subjects are measured once and then measured again. Its also extremely important to
remember, because many things that might otherwise be considered real effects can actually
be explained simply as regression toward the mean.

    Suppose the 15 students with the lowest scores on the first exam all hire tutors before
the second exam. Of those 15 students, 11 improve their score on the second exam. This
might be because the tutoring helped, but it might also be explained simply as regression
toward the mean. Further investigation would be needed to determine whether the tutoring
truly had any effect.

    The term “regression” was coined by Francis Galton, a cousin of Charles Darwin, in
the nineteenth century to describe a biological phenomenon. The phenomenon was that
the heights of descendants of tall ancestors tend to regress down towards a normal average.
For Galton, regression had only this biological meaning, but his work was later extended
by Udny Yule and Karl Pearson to a more general statistical context.




                                            6
                    PART II - LEAST SQUARES METHOD


   Let X and Y have a linear relation µY (Xi ) = α + βXi .
   We’ve established that α and β are the population parameters and that they are unknown
constants. Based on the sample data, we’ll then try to come up with the estimates for α
and β, called a and b respectively.
                                            ˆ
   The predicted value for Yi is denoted by Yi and it has the following formula
                                              ˆ
                                              Yi = a + bXi
         ˆ
   Since Yi is the predicted value for the true Yi , the two values are almost always different
which results in some error terms. The errors/residuals are defined to be
                                                         ˆ
                                               ei = Yi − Yi

   The least squares method is a method of finding the estimates a and b so that the
sum of squares of the residuals is minimized. That is, solving the criteria of minimizing
                     n             n                       n
                                                   2
                          e2
                           i   =              ˆ
                                         Yi − Yi       =         [Yi − (a + bXi )]2   (∗)
                    i=1            i=1                     i=1

we’ll be able to find a and b from the sample data (using the differentiate method).

   In fact, by solving the criteria of minimizing (*), we obtain the following formula for the
two coefficients
                                      sY                  ¯     ¯
                             b=r              and a = Y − bX
                                      sX
   where:

   • sY is the std for the Y data (either given or can be calculated by your calculator).

   • sX is the std for the X data (either given or can be calculated by your calculator).

   • r is the correlation (which we’ll learn more about it in just a bit later).
     ¯
   • Y is the sample average of the Y data.
     ¯
   • X is the sample average of the X data.


   Interpret the Slope b

                                                     ˆ
   The slope b tells us how much the predicted value Yi changes when the corresponding
Xi increases by one unit.

   Moreover, the sign of the slope also indicates the direction of the association between
the two variables X and Y

   • If b > 0 then X and Y have a positive relationship; that is, if X increases then Y
     increases.


                                                       7
   • If b < 0 then X and Y have a negative relationship; that is, it X increases then Y
     decreases.

   Note: we didn’t mention the possibility where b = 0 (why?)

   Interpret the Intercept a

                                                     ˆ
     The intercept a is simply the predicted value Y where X = 0.
     Be careful though, sometimes the interpretation for the intercept a does not make logical
sense. Still, we accept it as the reasonable estimate in the “least squares” context (that is,
if it meets the least squares criteria, it’s good enough).

    Example Weight vs. Height (cont.) Given the the data in Table 2, MINITAB gives the
following descriptive output

Variable         N        Mean        Median        TrMean         StDev
weight          25      156.32        157.00        156.52         26.07
height          25       68.84         70.00         68.91          5.10

   • Given that r = 0.713, find the least squares regression equation.

   • Interpret the slope and the intercept.

   Solution:




                                               8
                        PART III - THE CORRELATION


   3.1 The Correlation r

   When quantitative variables X and Y have an approximately linear relationship, we
can measure the strength of that relationship with a quantity called the correlation r. The
correlation summaries the direction of the association between two quantitatives variables
and the strength of its straigh-line trend. If we have a sample drawn from a population,
then there are two different quantities we can talk about:

   • The population correlation, ρ, measures the strength of the association in the
     population.

   • The sample correlation, r, measures the strength of the association in the sample.

   We seldom know the value of ρ, so we typically estimate it with the value of r.

   Note: The formula for the sample correlation r is
                                       n         ¯          ¯
                                 1          Xi − X     Yi − Y
                           r=
                                n−1           sX         sY
                                      i=1

but we will never actually calculate r using this formula.

   Properties of the correlation r

   • The correlation r has the same sign as the slope b. Therefore, just like b, r indicates
     the direction that X and Y are related. If r > 0 then there’s a positive relationship.
     If r < 0 then there’s a negative relationship.

   • The correlation r can only take values between -1 and +1; that is −1 ≤ r ≤ +1. The
     closer r is to ±1, the closer the data points fall to a straight line, and the stronger
     the association. The closer r is to zero, the weaker the association.




                    Figure 3. Some Scatterplots and Their Correlation




                                             9
            Figure 4. An Illustration of a Data Set that has Correlation r = 0

   Note: The correlation is only useful for measuring relationships that are linear.




 Figure 5. The Data Set above Clearly Shows There Is a Relationship between X an Y ,
 but the Correlation r is still equal 0 because the Relationship is Not Linear. Thus, It Is
  Always Wise to Look at a Scatterplot of the Data First, to See if the Correlation Will
                               Even Be Worth Talking About.


   3.2 The Roles and Limitations of Correlation r

    Why do we need the correlation? Why can’t we use the slope b to describe the strength
of the association?
    The reason is that the slope’s numerical value depends on the units of measurement.
Recall that the slope b tells us how much the Y value changes when the corresponding X
value increases by 1 unit. The slope b always has the same units of measurement as the Y
value.
    The correlation r, on the other hand, can be thought of as the standardized version
of the slope which does not depend on any units of measurement. The standardization

                                            10
adjusts the slope b for the way it depends on the standard deviations of X and Y . Since
                                                       sY
the correlation r and the slope b are related by b = r sX , equivalently,
                                               sX
                                         r=b      .
                                               sY



   What Factors Affect the Correlation?


  1. Just as outliers can greatly influence the regression equation, they can also greatly
     influence the value of r. Accordingly, the correlation may not be particularly useful
     when outliers are present.

  2. If the subjects are grouped for the observations, the correlation r tends to increase in
     magnitude. This can be deceptive, because it can make X and Y appear to be more
     linear strongly related than they actually are.
     For example, suppose we have a sample with 50 observations. Instead of treating
     our data in the usual way, we could instead make 10 groups of five, average the data
     within each group, and then treat our group averages like a new data set with just
     10 observations. If we did this, we would probably increase the magnitude of the
     correlation.




         Figure 6. The Scatterplot Displays the 50 Individual Data Points and Their
                                   Correlation r = 0.627




       Figure 7. The Scatterplot Displays the Group Means for 10 Groups of 5 from the
               Data Used in Figure 6. The Correlation Now Becomes r = 0.901

                                            11
3. The size of the correlation r also depends on the range of X values sampled: the
   correlation tends to be smaller when we sample only a restricted range of X values
   than when we use the entire range.




    Figure 8. If We Look at the Complete Data Set Presented in Figure 6, then the
  Correlation r = 0.627. However, If We Restrict the Data and Only Look at the Data
  where X Runs from 148 to 168, then the Correlation for the Data in that Range Is
                                      r = 0.596.




                                        12
  PART IV - SUM OF SQUARES, ANOVA TABLE, AND MORE ON
                      CORRELATION


   4.1 Sum of Squares and ANOVA Table for Regression

   We already saw that in one-way and two-way ANOVA, we can construct something called
the ANOVA table, which breaks down the variability in the data into different sources. It
turns out that we can also construct an ANOVA table for simple linear regression to analyze
the variability in the response variable (Y ) values. The regression ANOVA table, like the
ANOVA tables weve seen before, includes sums of squares, degrees of freedom, and mean
squares.

    The prediction error is the difference between the observed value Yi and its predicted
values. There are two types of prediction errors:
                                                                        ˆ
   • The error using the regression line to make a prediction is Yi − Yi . This type of error
     results in formulating the variability in residual sum of squares.
                                        ¯                            ¯
   • The error using the sample mean Y to make a prediction is Yi − Y . This type of error
     results in formulating the variability in total sum of squares.
There is also one more type of “error”, that is, the difference between the predicted values
ˆ                        ¯
Yi and the sample mean Y which results in formulating the variability in regression sum of
squares.

    From those three types of error mentioned above, it follows that the ANOVA table for
regression involves three different sums of squares:

   • The regression sum of squares, or SSRegr , measures the variability due to the
                                                                              ˆ
     regression equation. The regression equation gives us a predicted value, Yi , for each
                                                                                    ¯
     observation, so we measure the variability of those around the sample mean Y using
     the formula
                                                  n
                                                                 2
                                    SSRegr =          ˆ    ¯
                                                      Yi − Y
                                               i=1

   • The residual sum of squares, or SSRes , measures the varaibility of the actual
                                                   ˆ
     observed Yi values around its predicted value Yi . Its formula is
                                                  n
                                                                2
                                     SSRes =               ˆ
                                                      Yi − Yi
                                               i=1

   • The total sum of squares, or SST otal , measures the total variability of the Yi values
                               ¯
     around the sample mean Y . Since it explains the overall/total variability in the data,
     the total sum of squares is the sum of all of the other SS’s. Its formula is
                                                            n
                                                                          ¯   2
                           SSRegr + SSRes = SST otal =               Yi − Y
                                                           i=1




                                             13
   Like weve seen before, there are quantities called degrees of freedom that are associated
with each sum of squares. Their formulas are

                              dfRegr = 1
                               dfRes = n − 2
                              dfT otal = n − 1 = dfRegr + dfRes

where n is the usual notation for the sample size.

   The mean squares are just the sums of squares divided by their degrees of freedom:
                                     SSRegr                      SSRes
                         M SRegr =              and   M SRes =
                                     dfRegr                      dfRes

   It turns out that M SRes makes a good estimate of σ 2 , the variance of the population of
Y values at each X value.

    We typically summarize all this information in the regression ANOVA table, which is
laid out as shown

                 Source         df               SS         MS
                                                                        SSRegr
                 Regression     dfRegr = 1       SSRegr     M SRegr =   dfRegr
                                                                     SSRes
                 Error          dfRes = n − 2    SSRes      M SE =   dfRes
                 Total          df = n − 1       SST otal




   4.2 Coefficient of Determination

   The better our regression equation is at making predictions (hence, better predictive
power), the closer the observed values will be to their predicted values, and hence the
smaller the residuals will be.

   We can quantify this predictive power using the coefficient of determination, R2 ,
which we define as
                                          SSRegr
                                     R2 =
                                          SST otal
    Recall that SST otal = SSRegr + SSRes , so R2 is a number that tells us what proportion
of SST otal comes from SSRegr .

   • If our regression equation has high predictive power, SSRes is small. Then most of
     SST otal comes from SSRegr , so R2 is close to 1.

   • If our regression equation has poor predictive power, SSRes is large. Then most of
     SST otal comes from SSRes instead of SSRegr , so R2 is close to 0.




                                                14
   It turns out that R2 is equal to the square of the correlation r.
   Both the correlation r and the coefficient of determination R2 describe the strength of
the association. However, their interpretations are a bit different. The correlation r falls
between -1 and +1, and it governs the extent of “regression toward the mean.” The R2
measure falls between 0 and +1 (or equivalently, 0% to 100%), and it summarizes the re-
duction in sum of squared errors in predicting Y using the regression line instead of using
the mean of Y .

   Least Squares Regression Equation As “Best Fit”

   Notice from the formulas that SSRes depends on what the predicted values actually are.
Heres what we mean by the least squares regression equation being the “best fit” for the
data:
   The least squares regression equation picks the values of a and b that make SSRes min-
imal.

    SSRes is the residual sum of squares. The “best fit” is the one where the predicted
values are as close as possible to their observed values, in the overall sense of SSRes . In
other words, changing the values of any of the regression coefficients would yield a larger
SSRes .

    Note: We mean the regression equation is the “best fit” when using this particular X
variable. We might be able to get a smaller SSRes by using a completely different X variable
or by using multiple X variables, but thats not what were talking about here.

   Example Smoking and Nicotine The following table gives the level of continine for a
person who smoke a certain number of cigarette per day

                          Table 3: Data for Smoking and Nicotine
                     X (= Cigarettes per day) Y (= Level of continine)
                     60                       179
                     10                       283
                     4                        76
                     15                       174
                     10                       209
                     1                        10
                     20                       350
   MINITAB give out the following descriptive output for the data

Variable         N         Mean         StDev
cigarett         7        17.14         19.94
continin         7        183.0         115.5

   Given that r = 0.263, find the least squares regressin equation for the data. Interprete
the slope and the intercept.

   Solution:



                                            15
    Use the obtained regression equation found above, find the best predicted level of con-
tinine for a person who smoke 40 cigarettes per day. Interpret your prediction.

   Solution:




   MINITAB gives us the following ANOVA Table


Analysis of Variance

Source                DF            SS              MS
Regression                        5507
Residual Error
Total                            80040

Fill in all of the missing information to complete the ANOVA Table.

   The scatterplot for the data is presented below




                   Figure 9. The Scatterplot for the Data from Table 3.

    It’s suspected that the first observation is an outlier. In practice, it’s forbidden to
delibrately censore data in order to make the data “prettier”. However, just for this problem,
let’s delete the first observation and see what happens. The new data and new scatterplot
is prsented below



                                             16
  Table 4: New Data for Smoking and Nicotine After We Delete the First Observation
                X (= Cigarettes per day) Y (= Level of continine)
                10                         283
                4                          76
                15                         174
                10                         209
                1                          10
                20                         350




  Figure 10. The New Scatterplot for the Data from Table 4 After We Delete the First
                                     Observation.

   The new regression equation and ANOVA Table are found to be

The regression equation is
new continine = 25.7 + 15.8 new cigarettes

Predictor          Coef        SE Coef             T         P
Constant          25.65          53.30          0.48     0.655
new ciga         15.802          4.499          3.51     0.025

Analysis of Variance

Source               DF            SS             MS
Regression
Residual Error                  19596
Total                           80021

Fill in the missing information to complete the ANOVA Table.

    Interpret the new slope and the new intercept. Based on the given information, calcu-
late and interpret the coefficient of determination R2 , and in turn, find the correlation r.
Compare the new r value to the r value given from Table 3.


                                           17
Solution:




            18
  PART V - INFERENCE WITH REGRESSION FOR THE SLOPE β

   We have been spending a good deal of time on studying the strength of the relationship
between two quantitative variables X and Y . However, we must also concern whether or
not the response variable Y depends on the explanatory variable X to begin with.
   Recall that this is the same as asking whether or not the slope β = 0 (or, sometimes we
can also use the criteria ρ = 0). In this section, well see how to answer that question.

   So far, we’ve familiarized ourselves with various hypotheses tests and CI methods, and
those are the inferential tools that we’ll be using in this section. First, we’ll go through two
types of hypotheses test to test for β = 0 (or ρ = 0): the t-test and the F-test. Then, we’ll
look at the CI’s for the slope β.

   5.1 The t-tests for β

  1. Assumptions
        • Quantitative variables where the population means of Y at different values of X
          have a straight line relationship with X; that is, µY (X) = α + βX.
        • Simple random sample.
        • Population distribution for Y is approximately normal with the same standard
          deviation at each X value.
        • The data contains no extreme outliers.
  2. Hypotheses
        • Null: H0 : β = 0.
        • Alternative: There are three possible alternative hypotheses.
           – H1 : β = 0 which results in a two-sided test.
           – H1 : β < 0 which results in a one-sided test.
           – H1 : β > 0 which results in a one-sided test.
  3. Test statistic
                                                b−0
                                              t0 =   .
                                                 seb
      where seb is the standard error for the coefficient b (provided by statistical software,
      if not, the formula is seb = M SX 2 ).
                                    P Res
                                          i

  4. P-value Use the t-distribution with df = n − 2. Be cautious, the P-value for one-sided
     test is different from the P-value from the two-sided test. Thus, it’s important to be
     able to identify which test it is to find the correct P-value.
  5. Conclusion If we can find the corresponding P-value for our test statistic, then the
     conclusion based on the P-value is straight forward. Smaller P-value give stronger
     evidence against H0 . If a decision is needed, reject H0 if P-value ≤ α, the significant
     level. However, most of the time, P-value can only be found using statistical software.
      If we can not find the P-value when solving problems by hand, we can still make
      conclusion based solely on the test statistic. Given a significant level α,

                                               19
       • For the one-sided test, if t0 > tα , then we can reject the null H0 .
       • For the two-sided test, it t0 > t α , then we can reject the null H0 .
                                           2




   Example Smoking and Nicotine (cont.) MINITAB gives the following output for the
data obtained from Table 3.
The regression equation is
Continine = 157 + 1.52 Cigarettes

Predictor          Coef        SE Coef            P
Constant         156.95          62.98        0.055
Cigarett          1.520          2.500        0.570

S = 122.1          R-Sq = 6.9%          R-Sq(adj) = 0.0%
   Test whether β = 0 using the one-sided test with the H1 : β > 0.

   Solution:




   5.2 The t-tests for ρ


  1. Assumptions

       • Quantitative variables where the population means of Y at different values of X
         have a straight line relationship with X; that is, µY (X) = α + βX.
       • Simple random sample.
       • Population distribution for Y is approximately normal with the same standard
         deviation at each X value.
       • The data contains no extreme outliers.

                                            20
  2. Hypotheses

        • Null: H0 : ρ = 0.
        • Alternative: There are three possible alternative hypotheses.
               – H1 : ρ = 0 which results in a two-sided test.
               – H1 : ρ < 0 which results in a one-sided test.
               – H1 : ρ > 0 which results in a one-sided test.

  3. Test statistic
                                                    r−0
                                            t0 =           .
                                                    1−R2
                                                     n−2

  4. P-value Use the t-distribution with df = n − 2. Be cautious, the P-value for one-sided
     test is different from the P-value from the two-sided test. Thus, it’s important to be
     able to identify which test it is to find the correct P-value.

  5. Conclusion If we can find the corresponding P-value for our test statistic, then the
     conclusion based on the P-value is straight forward. Smaller P-value give stronger
     evidence against H0 . If a decision is needed, reject H0 if P-value ≤ α, the significant
     level. However, most of the time, P-value can only be found using statistical software.
     If we can not find the P-value when solving problems by hand, we can still make
     conclusion based solely on the test statistic. Given a significant level α,

        • For the one-sided test, if t0 > tα , then we can reject the null H0 .
        • For the two-sided test, it t0 > t α , then we can reject the null H0 .
                                            2




   Example Smoking and Nicotine (cont.) MINITAB gives the following output for the
data obtained from Table 3.
   Test whether ρ = 0 using the two-sided test.

   Solution:




                                               21
   5.3 The F-test for β

  1. Assumptions
        • Quantitative variables where the population means of Y at different values of X
          have a straight line relationship with X; that is, µY (X) = α + βX.
        • Simple random sample.
        • Population distribution for Y is approximately normal with the same standard
          deviation at each X value.
        • The data contains no extreme outliers.
  2. Hypotheses
        • Null: H0 : β = 0.
        • Alternative: H1 : β = 0 .
  3. Test statistic
                                                  M SRegr
                                          F0 =            .
                                                  M SRes
     F0 sampling distribution has df1 = 1 and df2 = n − 2 (why?).
  4. P-value
     Recall the definition of the P-value: The P-value is the probability of getting a test
     statistic value at least as extreme as the one observed, if H0 is true. The P-value is a
     tail probability from the F distribution the test statistic has when H0 is true.
     Remember that we said the larger values of F are, the more evidence against H0 . So
     the P-value is the probability of getting an F value larger than the one we actually
     got, if H0 is true. To calculate this probability exactly, we typically need a statistical
     software.
  5. Conclusion If we can find the corresponding P-value for our test statistic, then the
     conclusion based on the P-value is straight forward. Smaller P-value give stronger
     evidence against H0 . If a decision is needed, reject H0 if P-value ≤ α, the significant
     level. However, most of the time, P-value can only be found using statistical software.


    The F-test for β is equivalent to the two-sided t-test for β. In fact, the three types of
tests are equivalent to each other. Depend on what types of information ig given, we can
always choose a test that make our lives easier.

   Example Smoking and Nicotine (cont.) MINITAB gives the following output for the
data obtained from Table 3.

Analysis of Variance

Source                DF            SS               MS           P
Regression             1          5507             5507       0.570
Residual Error         5         74533            14907
Total                  6         80040

                                             22
   Test whether β = 0 using the F-test.

   Solution:




   5.4 Confidence Intervals for the Slope

   The regression F and t tests simply ask whether its reasonable that β = 0. Instead, we
might be interested in figuring out what all the reasonable values are for β. We do this by
making a confidence interval. The formula for (1 − α)% CI for β is

                                          b ± t α seb
                                                2


where seb is the standard error for the coefficient b (provided by statistical software, if not,
the formula is seb = M SX 2 )..
                        P Res
                            i


   The standard interpretation of a (1 − α)% confidence interval for β is as follows:
   We are (1 − α)% confident that the true value of β is between (lower bound) and (upper
bound).

    More specifically, remember that by “(1−α)% confident,” we mean that if we could take
millions of random samples from this same population, do a regression for each one, and
calculate a (1 − α)% confidence interval for β each time, then (1 − α)% of those intervals
would contain the true value of β.

   Example Smoking and Nicotine (cont.) Find and interpret the 99% CI for β based on
our data from Table 3 (use the MINITAB output given from section 5.1)
   Solution:




                                              23
    PART V - INFERENCE WITH REGRESSION AT SPECIFIC X
                         VALUE


    For a straight-line regression model, we estimate µY , the population mean of Y at a
                                                           ˆ
given value of X by the least squares regression equation Yi = a + bXi . How good is this
estimate? We can find the (1 − α)% CI for the unknown population parameter µY .

                               ˆ
   Furthermore, the estimate Yi = a + bXi for the mean of Y at a fixed value of X is also
a prediction for the outcome of Y for a particular subject at that value. How good is the
prediction? We can find the (1 − α)% prediction interval (PI) for the unknown value of the
subject at a paticular value X.

    So what is the difference between the PI and the CI?
    The prediction interval for Y is an inference about where individual observations fall,
whereas the confidence interval for µ is an inference about where a population mean falls.
    Use a prediction interval for Y if we want to predict where a single observation on Y
will fall.
    Use a confidence interval for µ if we want to estimate the mean of Y for every subject
having a particular X value.


   • For large samples with an X value √                             ¯
                                       equal to or close to the mean X, the (1 − α)% PI
                              ˆ
     for Y is approximately Y ± t α ∗ M SRes .
                                   2

     Keep in mind that this is just an approximated formula for large samples. The exact
     formula is actually

                                                  1         ¯
                                                      n(X − X)2
                        ˆ
                        Y ± tα ∗    M SRes   1+     +
                             2                    n n( X)2 − X 2

   • The (1 − α)% CI for µY is
                                      ¯           M SRes
                                      Y ± tα ∗           .
                                           2        n



   Example Smoking and Nicotine (cont.) Recall the following table gives the level of
continine for a person who smoke a certain number of cigarette per day

                       Table 5: Data for Smoking and Nicotine
                  X (= Cigarettes per day) Y (= Level of continine)
                  60                       179
                  10                       283
                  4                        76
                  15                       174
                  10                       209
                  1                        10
                  20                       350

                                             24
   Given that    Xi = 120,                   ¯
                              Xi2 = 4442 and X = 17.14. Moreover, from the ANOVA
table, we know that M SRes = 14907.
                              ˆ
   • Find the predicted value Y for X = 34.
                                       ˆ
   • Find and interpret the 95% PI for Y (X = 34).
   • Find and interpret the 95% CI for the population mean µY .


   Solution:




   Suggested Problems
    Chapter 12: 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 15, 16, 17, 18, 19, 20, 23, 24, 26, 32, 33, 34,
35, 36, 37, 38, 39, 43, 44, 45, 46, 47, 49, 50, 51, 52, 53, 63, 64, 65, 66, 67, 68, 69, 71, 72, 73,
74, 75, 76, 81, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106.




   This is the end of chapter 5. Cheers.
   c Quan Tran - Summer 2009

                                                25

								
To top