VIEWS: 0 PAGES: 25 CATEGORY: Jobs & Careers POSTED ON: 7/10/2010
STA 3024 Introduction to Statistics 2 Chapter 5: Simple Linear Regression Analysis As stated in chapter 3 and chapter 4, the table below summarizes the major materials that we need to cover Table 1: Methods to Investigate the Association between Variables Explanatory Variable(s) Response Variable Method Chapter 3 Categorical Categorical Contingency Tables Chapter 4 Categorical Quantitative Analysis of Variance (ANOVA) Chapter 5 and 6 Quantitative Quantitative Regression Analysis Quantitative Categorical (not discussed) This chapter deals with cases where both explanatory and response variables are quanti- tative where we’ll use regression analysis to study the association between the two variables. The regression methods that we’re studying in this chapter restricted to the linear regression family (as opposed to nonlinear regression analysis). If there’s only one quantitative explanatory variable, then we’ll study simple linear re- gression. If there are more than one explanatory variables, then we’ll introduce multiple linear regression. This chapter corresponds to chapter 12 in our textbook. 1 PART I - BACKGROUND 1.1 Background and Remarks Given two quantitative variables, we’d like to ﬁnd the association between them. For instance, we might wonder if height and weight have any kind of association that knowing one might predict the other. (Certainly, both height and weight are quantitative variables) Now, we go out and collect data. Each person/subject in our experiment should be identiﬁed by the information that we are interested in; that is, each subject is described by a pair of data (weight, height). In fact, all data in simple linear regression are decribed by a pair of number (X, Y ) which we can think of them as points on a 2D plane. A usual data set for two quantitative variables looks like Table 2: Data for Weights and Heights X (= Weight) Y (= Height) x1 = 130 y1 = 63 x2 = 133 y2 = 65 x3 = 157 y3 = 70 x4 = 180 y4 = 72 x5 = 160 y5 = 74 x6 = 177 y6 = 73 x7 = 193 y7 = 73 x8 = 126 y8 = 61 x9 = 179 y9 = 63 x10 = 143 y10 = 77 x11 = 157 y11 = 68 x12 = 122 y12 = 65 x13 = 145 y13 = 68 x14 = 135 y14 = 67 x15 = 117 y15 = 61 x16 = 186 y16 = 73 x17 = 170 y17 = 71 x18 = 173 y18 = 71 x19 = 192 y19 = 74 x20 = 155 y20 = 64 x21 = 177 y21 = 77 x22 = 107 y22 = 59 x23 = 138 y23 = 69 x24 = 155 y24 = 70 x25 = 201 y25 = 73 2 Figure 1: A Scatterplot for a Sample of People where Each Individual is Identiﬁed by a Pair of Number (X= Weight, Y = Height). Recall: a 2D line can be mathematically described by the equation y = m + n ∗ x Figure 2: A Straight Line y = m + n ∗ x where m Indicates the y-Interception and n Indicates the Slope Let X be the quantitative explanatory variable and let Y be the quantitative response variable. If the relationship between X and Y can be described by a straight line µY (X) = α + βX (where µY (X) is the mean for Y at X) then simple linear regression is the statistical inference method that analyses the associa- tion between the two variables. “Simple” because there’s only one explanatory variable. Important Remark 1: The formula µY (X) = α + βX describes the relationship between X and Y through the mean/average of Y , and that’s the best we can do. We can only say, “The average height for person who’s weighted 165 lbs is 5ft10.” How- ever, we can NOT predict the EXACT value for Y based on explanatory X. The following 3 statement is incorrect, “(blank) weighted 140 lbs so he/she must be 5ft7.” We have to take into account that even under the same X value (weight), the observed Y ’s (heights) are not the same most of the time. There is always some amount of variability in the values of Y with the same value of X. Thus, it is NOT correct when write Y = α+βX. Important Remark 2: In the case where X and Y are linearly related by µY (X) = α + βX, the popuplation parameters α and β are both unknown (why?) We have to use data from samples to cook up the statistics a and b as the estimates for parameters α and β respectively. There are a number of method to ﬁnd a and b. Each method depends on the criteria given before conducting the experiment. We are only interested in the least squares method which estimates α and β by (the criteria) minimizing the sum of squares of errors. (We will talk more about this later.) Form the sample data, we obtain the estimates a and b which describes the regression equation between Y and X as ˆ Y (X) = a + bX ˆ where Y (X) indicates the predicted value for Y at the value X using the estimates a and b obtained from the least squares method from the sample data. 1.2 Making Predictions The most basic use of the regression equation is to make predictions. We can predict the value of X by simply plugging in that value of X into the regression equation and seeing ˆ what we get for Y . Of course, we would not think that the actual Y value would be exactly equal to our prediction (as discussed above). The prediction is our single-number best guess for a Y value at that particular X value. Existing observations If we plug in the X value corresponding to an observation that we actually have in the ˆ sample, we can write the predicted value as Yi as an abbreviation, where i is a number from that represents which observation we’re talking about. So if we plug in each observation’s X value into the regression equation, we get a list of predicted values. Example Weights and Heights (cont.) Say, out of thin air, I declare the regression equa- ˆ tion for Weight and Height is Yi = 123 − 0.4 ∗ Xi or Predicted Height = 123 − 0.4 ∗ Weight. Now what is the predicted height when the weight is 155? (Note that X20 = 155.) Solution: 4 New observations We could instead plug in a value of X that doesn’t correspond to any observation that we actually have. Then we would be predicting the Y value for a new observation. However, we also need to be careful to avoid making predictions for X values that are outside the range of X values for which we actually have data in the sample. Making a prediction for an X value outside the range of the X values in the data is called extrapolation, and it leads to predictions that are useless. Example Weights and Heights (cont.) Using the same regression equation, let’s do some extrapolation. What’s the predicted height when the weight is 300lbs? (Note that X = 300 is out of our range for weight) Does the answer make sense? Solution: Note that the further out of the range for X, the worse the extrapolation becomes. However, sometimes we still use extrapolation when the value X is reasonably close to the range. For example, people often extrapolate by using historical data to predict the future. If we have data from 1984 to 2008, making a prediction for 2009 would technically be ex- trapolation, but it might be okay in some circumstances. However, we probably wouldn’t want to use that data to make a prediction for, say, the year 3000. 1.3 Regression toward the mean “Regression” may seem like a strange name for the process used to analyze the associa- tion between quantitative variables. (After all, what is it thats regressing?) The procedure itself is actually named after a speciﬁc phenomenon that the procedure often uncovered in a variety of situations in the early days of statistics. That phenomenon is called regression toward the mean, and its best explained with an example. A class of students takes two editions of the same test on two successive days. It has frequently been observed that the worst performers on the ﬁrst day will tend to improve their scores on the second day, and the best performers on the ﬁrst day will tend to do worse on the second day. The phenomenon occurs because student scores are determined in part by underlying ability and in part by chance. For the ﬁrst test, some will be lucky, and score more than their ability, and some will be unlucky and score less than their ability. Some of the lucky students on the ﬁrst test will be lucky again on the second test, but more of them will have (for them) average or below average scores. Therefore a student who was lucky on the ﬁrst test is more likely to have a worse score on the second test than a better score. Similarly, students who score less than the mean on the ﬁrst test will tend to see their scores increase on the second test. Regression toward the mean is a very common phenomenon. We see it almost any time 5 that subjects are measured once and then measured again. Its also extremely important to remember, because many things that might otherwise be considered real eﬀects can actually be explained simply as regression toward the mean. Suppose the 15 students with the lowest scores on the ﬁrst exam all hire tutors before the second exam. Of those 15 students, 11 improve their score on the second exam. This might be because the tutoring helped, but it might also be explained simply as regression toward the mean. Further investigation would be needed to determine whether the tutoring truly had any eﬀect. The term “regression” was coined by Francis Galton, a cousin of Charles Darwin, in the nineteenth century to describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average. For Galton, regression had only this biological meaning, but his work was later extended by Udny Yule and Karl Pearson to a more general statistical context. 6 PART II - LEAST SQUARES METHOD Let X and Y have a linear relation µY (Xi ) = α + βXi . We’ve established that α and β are the population parameters and that they are unknown constants. Based on the sample data, we’ll then try to come up with the estimates for α and β, called a and b respectively. ˆ The predicted value for Yi is denoted by Yi and it has the following formula ˆ Yi = a + bXi ˆ Since Yi is the predicted value for the true Yi , the two values are almost always diﬀerent which results in some error terms. The errors/residuals are deﬁned to be ˆ ei = Yi − Yi The least squares method is a method of ﬁnding the estimates a and b so that the sum of squares of the residuals is minimized. That is, solving the criteria of minimizing n n n 2 e2 i = ˆ Yi − Yi = [Yi − (a + bXi )]2 (∗) i=1 i=1 i=1 we’ll be able to ﬁnd a and b from the sample data (using the diﬀerentiate method). In fact, by solving the criteria of minimizing (*), we obtain the following formula for the two coeﬃcients sY ¯ ¯ b=r and a = Y − bX sX where: • sY is the std for the Y data (either given or can be calculated by your calculator). • sX is the std for the X data (either given or can be calculated by your calculator). • r is the correlation (which we’ll learn more about it in just a bit later). ¯ • Y is the sample average of the Y data. ¯ • X is the sample average of the X data. Interpret the Slope b ˆ The slope b tells us how much the predicted value Yi changes when the corresponding Xi increases by one unit. Moreover, the sign of the slope also indicates the direction of the association between the two variables X and Y • If b > 0 then X and Y have a positive relationship; that is, if X increases then Y increases. 7 • If b < 0 then X and Y have a negative relationship; that is, it X increases then Y decreases. Note: we didn’t mention the possibility where b = 0 (why?) Interpret the Intercept a ˆ The intercept a is simply the predicted value Y where X = 0. Be careful though, sometimes the interpretation for the intercept a does not make logical sense. Still, we accept it as the reasonable estimate in the “least squares” context (that is, if it meets the least squares criteria, it’s good enough). Example Weight vs. Height (cont.) Given the the data in Table 2, MINITAB gives the following descriptive output Variable N Mean Median TrMean StDev weight 25 156.32 157.00 156.52 26.07 height 25 68.84 70.00 68.91 5.10 • Given that r = 0.713, ﬁnd the least squares regression equation. • Interpret the slope and the intercept. Solution: 8 PART III - THE CORRELATION 3.1 The Correlation r When quantitative variables X and Y have an approximately linear relationship, we can measure the strength of that relationship with a quantity called the correlation r. The correlation summaries the direction of the association between two quantitatives variables and the strength of its straigh-line trend. If we have a sample drawn from a population, then there are two diﬀerent quantities we can talk about: • The population correlation, ρ, measures the strength of the association in the population. • The sample correlation, r, measures the strength of the association in the sample. We seldom know the value of ρ, so we typically estimate it with the value of r. Note: The formula for the sample correlation r is n ¯ ¯ 1 Xi − X Yi − Y r= n−1 sX sY i=1 but we will never actually calculate r using this formula. Properties of the correlation r • The correlation r has the same sign as the slope b. Therefore, just like b, r indicates the direction that X and Y are related. If r > 0 then there’s a positive relationship. If r < 0 then there’s a negative relationship. • The correlation r can only take values between -1 and +1; that is −1 ≤ r ≤ +1. The closer r is to ±1, the closer the data points fall to a straight line, and the stronger the association. The closer r is to zero, the weaker the association. Figure 3. Some Scatterplots and Their Correlation 9 Figure 4. An Illustration of a Data Set that has Correlation r = 0 Note: The correlation is only useful for measuring relationships that are linear. Figure 5. The Data Set above Clearly Shows There Is a Relationship between X an Y , but the Correlation r is still equal 0 because the Relationship is Not Linear. Thus, It Is Always Wise to Look at a Scatterplot of the Data First, to See if the Correlation Will Even Be Worth Talking About. 3.2 The Roles and Limitations of Correlation r Why do we need the correlation? Why can’t we use the slope b to describe the strength of the association? The reason is that the slope’s numerical value depends on the units of measurement. Recall that the slope b tells us how much the Y value changes when the corresponding X value increases by 1 unit. The slope b always has the same units of measurement as the Y value. The correlation r, on the other hand, can be thought of as the standardized version of the slope which does not depend on any units of measurement. The standardization 10 adjusts the slope b for the way it depends on the standard deviations of X and Y . Since sY the correlation r and the slope b are related by b = r sX , equivalently, sX r=b . sY What Factors Aﬀect the Correlation? 1. Just as outliers can greatly inﬂuence the regression equation, they can also greatly inﬂuence the value of r. Accordingly, the correlation may not be particularly useful when outliers are present. 2. If the subjects are grouped for the observations, the correlation r tends to increase in magnitude. This can be deceptive, because it can make X and Y appear to be more linear strongly related than they actually are. For example, suppose we have a sample with 50 observations. Instead of treating our data in the usual way, we could instead make 10 groups of ﬁve, average the data within each group, and then treat our group averages like a new data set with just 10 observations. If we did this, we would probably increase the magnitude of the correlation. Figure 6. The Scatterplot Displays the 50 Individual Data Points and Their Correlation r = 0.627 Figure 7. The Scatterplot Displays the Group Means for 10 Groups of 5 from the Data Used in Figure 6. The Correlation Now Becomes r = 0.901 11 3. The size of the correlation r also depends on the range of X values sampled: the correlation tends to be smaller when we sample only a restricted range of X values than when we use the entire range. Figure 8. If We Look at the Complete Data Set Presented in Figure 6, then the Correlation r = 0.627. However, If We Restrict the Data and Only Look at the Data where X Runs from 148 to 168, then the Correlation for the Data in that Range Is r = 0.596. 12 PART IV - SUM OF SQUARES, ANOVA TABLE, AND MORE ON CORRELATION 4.1 Sum of Squares and ANOVA Table for Regression We already saw that in one-way and two-way ANOVA, we can construct something called the ANOVA table, which breaks down the variability in the data into diﬀerent sources. It turns out that we can also construct an ANOVA table for simple linear regression to analyze the variability in the response variable (Y ) values. The regression ANOVA table, like the ANOVA tables weve seen before, includes sums of squares, degrees of freedom, and mean squares. The prediction error is the diﬀerence between the observed value Yi and its predicted values. There are two types of prediction errors: ˆ • The error using the regression line to make a prediction is Yi − Yi . This type of error results in formulating the variability in residual sum of squares. ¯ ¯ • The error using the sample mean Y to make a prediction is Yi − Y . This type of error results in formulating the variability in total sum of squares. There is also one more type of “error”, that is, the diﬀerence between the predicted values ˆ ¯ Yi and the sample mean Y which results in formulating the variability in regression sum of squares. From those three types of error mentioned above, it follows that the ANOVA table for regression involves three diﬀerent sums of squares: • The regression sum of squares, or SSRegr , measures the variability due to the ˆ regression equation. The regression equation gives us a predicted value, Yi , for each ¯ observation, so we measure the variability of those around the sample mean Y using the formula n 2 SSRegr = ˆ ¯ Yi − Y i=1 • The residual sum of squares, or SSRes , measures the varaibility of the actual ˆ observed Yi values around its predicted value Yi . Its formula is n 2 SSRes = ˆ Yi − Yi i=1 • The total sum of squares, or SST otal , measures the total variability of the Yi values ¯ around the sample mean Y . Since it explains the overall/total variability in the data, the total sum of squares is the sum of all of the other SS’s. Its formula is n ¯ 2 SSRegr + SSRes = SST otal = Yi − Y i=1 13 Like weve seen before, there are quantities called degrees of freedom that are associated with each sum of squares. Their formulas are dfRegr = 1 dfRes = n − 2 dfT otal = n − 1 = dfRegr + dfRes where n is the usual notation for the sample size. The mean squares are just the sums of squares divided by their degrees of freedom: SSRegr SSRes M SRegr = and M SRes = dfRegr dfRes It turns out that M SRes makes a good estimate of σ 2 , the variance of the population of Y values at each X value. We typically summarize all this information in the regression ANOVA table, which is laid out as shown Source df SS MS SSRegr Regression dfRegr = 1 SSRegr M SRegr = dfRegr SSRes Error dfRes = n − 2 SSRes M SE = dfRes Total df = n − 1 SST otal 4.2 Coeﬃcient of Determination The better our regression equation is at making predictions (hence, better predictive power), the closer the observed values will be to their predicted values, and hence the smaller the residuals will be. We can quantify this predictive power using the coeﬃcient of determination, R2 , which we deﬁne as SSRegr R2 = SST otal Recall that SST otal = SSRegr + SSRes , so R2 is a number that tells us what proportion of SST otal comes from SSRegr . • If our regression equation has high predictive power, SSRes is small. Then most of SST otal comes from SSRegr , so R2 is close to 1. • If our regression equation has poor predictive power, SSRes is large. Then most of SST otal comes from SSRes instead of SSRegr , so R2 is close to 0. 14 It turns out that R2 is equal to the square of the correlation r. Both the correlation r and the coeﬃcient of determination R2 describe the strength of the association. However, their interpretations are a bit diﬀerent. The correlation r falls between -1 and +1, and it governs the extent of “regression toward the mean.” The R2 measure falls between 0 and +1 (or equivalently, 0% to 100%), and it summarizes the re- duction in sum of squared errors in predicting Y using the regression line instead of using the mean of Y . Least Squares Regression Equation As “Best Fit” Notice from the formulas that SSRes depends on what the predicted values actually are. Heres what we mean by the least squares regression equation being the “best ﬁt” for the data: The least squares regression equation picks the values of a and b that make SSRes min- imal. SSRes is the residual sum of squares. The “best ﬁt” is the one where the predicted values are as close as possible to their observed values, in the overall sense of SSRes . In other words, changing the values of any of the regression coeﬃcients would yield a larger SSRes . Note: We mean the regression equation is the “best ﬁt” when using this particular X variable. We might be able to get a smaller SSRes by using a completely diﬀerent X variable or by using multiple X variables, but thats not what were talking about here. Example Smoking and Nicotine The following table gives the level of continine for a person who smoke a certain number of cigarette per day Table 3: Data for Smoking and Nicotine X (= Cigarettes per day) Y (= Level of continine) 60 179 10 283 4 76 15 174 10 209 1 10 20 350 MINITAB give out the following descriptive output for the data Variable N Mean StDev cigarett 7 17.14 19.94 continin 7 183.0 115.5 Given that r = 0.263, ﬁnd the least squares regressin equation for the data. Interprete the slope and the intercept. Solution: 15 Use the obtained regression equation found above, ﬁnd the best predicted level of con- tinine for a person who smoke 40 cigarettes per day. Interpret your prediction. Solution: MINITAB gives us the following ANOVA Table Analysis of Variance Source DF SS MS Regression 5507 Residual Error Total 80040 Fill in all of the missing information to complete the ANOVA Table. The scatterplot for the data is presented below Figure 9. The Scatterplot for the Data from Table 3. It’s suspected that the ﬁrst observation is an outlier. In practice, it’s forbidden to delibrately censore data in order to make the data “prettier”. However, just for this problem, let’s delete the ﬁrst observation and see what happens. The new data and new scatterplot is prsented below 16 Table 4: New Data for Smoking and Nicotine After We Delete the First Observation X (= Cigarettes per day) Y (= Level of continine) 10 283 4 76 15 174 10 209 1 10 20 350 Figure 10. The New Scatterplot for the Data from Table 4 After We Delete the First Observation. The new regression equation and ANOVA Table are found to be The regression equation is new continine = 25.7 + 15.8 new cigarettes Predictor Coef SE Coef T P Constant 25.65 53.30 0.48 0.655 new ciga 15.802 4.499 3.51 0.025 Analysis of Variance Source DF SS MS Regression Residual Error 19596 Total 80021 Fill in the missing information to complete the ANOVA Table. Interpret the new slope and the new intercept. Based on the given information, calcu- late and interpret the coeﬃcient of determination R2 , and in turn, ﬁnd the correlation r. Compare the new r value to the r value given from Table 3. 17 Solution: 18 PART V - INFERENCE WITH REGRESSION FOR THE SLOPE β We have been spending a good deal of time on studying the strength of the relationship between two quantitative variables X and Y . However, we must also concern whether or not the response variable Y depends on the explanatory variable X to begin with. Recall that this is the same as asking whether or not the slope β = 0 (or, sometimes we can also use the criteria ρ = 0). In this section, well see how to answer that question. So far, we’ve familiarized ourselves with various hypotheses tests and CI methods, and those are the inferential tools that we’ll be using in this section. First, we’ll go through two types of hypotheses test to test for β = 0 (or ρ = 0): the t-test and the F-test. Then, we’ll look at the CI’s for the slope β. 5.1 The t-tests for β 1. Assumptions • Quantitative variables where the population means of Y at diﬀerent values of X have a straight line relationship with X; that is, µY (X) = α + βX. • Simple random sample. • Population distribution for Y is approximately normal with the same standard deviation at each X value. • The data contains no extreme outliers. 2. Hypotheses • Null: H0 : β = 0. • Alternative: There are three possible alternative hypotheses. – H1 : β = 0 which results in a two-sided test. – H1 : β < 0 which results in a one-sided test. – H1 : β > 0 which results in a one-sided test. 3. Test statistic b−0 t0 = . seb where seb is the standard error for the coeﬃcient b (provided by statistical software, if not, the formula is seb = M SX 2 ). P Res i 4. P-value Use the t-distribution with df = n − 2. Be cautious, the P-value for one-sided test is diﬀerent from the P-value from the two-sided test. Thus, it’s important to be able to identify which test it is to ﬁnd the correct P-value. 5. Conclusion If we can ﬁnd the corresponding P-value for our test statistic, then the conclusion based on the P-value is straight forward. Smaller P-value give stronger evidence against H0 . If a decision is needed, reject H0 if P-value ≤ α, the signiﬁcant level. However, most of the time, P-value can only be found using statistical software. If we can not ﬁnd the P-value when solving problems by hand, we can still make conclusion based solely on the test statistic. Given a signiﬁcant level α, 19 • For the one-sided test, if t0 > tα , then we can reject the null H0 . • For the two-sided test, it t0 > t α , then we can reject the null H0 . 2 Example Smoking and Nicotine (cont.) MINITAB gives the following output for the data obtained from Table 3. The regression equation is Continine = 157 + 1.52 Cigarettes Predictor Coef SE Coef P Constant 156.95 62.98 0.055 Cigarett 1.520 2.500 0.570 S = 122.1 R-Sq = 6.9% R-Sq(adj) = 0.0% Test whether β = 0 using the one-sided test with the H1 : β > 0. Solution: 5.2 The t-tests for ρ 1. Assumptions • Quantitative variables where the population means of Y at diﬀerent values of X have a straight line relationship with X; that is, µY (X) = α + βX. • Simple random sample. • Population distribution for Y is approximately normal with the same standard deviation at each X value. • The data contains no extreme outliers. 20 2. Hypotheses • Null: H0 : ρ = 0. • Alternative: There are three possible alternative hypotheses. – H1 : ρ = 0 which results in a two-sided test. – H1 : ρ < 0 which results in a one-sided test. – H1 : ρ > 0 which results in a one-sided test. 3. Test statistic r−0 t0 = . 1−R2 n−2 4. P-value Use the t-distribution with df = n − 2. Be cautious, the P-value for one-sided test is diﬀerent from the P-value from the two-sided test. Thus, it’s important to be able to identify which test it is to ﬁnd the correct P-value. 5. Conclusion If we can ﬁnd the corresponding P-value for our test statistic, then the conclusion based on the P-value is straight forward. Smaller P-value give stronger evidence against H0 . If a decision is needed, reject H0 if P-value ≤ α, the signiﬁcant level. However, most of the time, P-value can only be found using statistical software. If we can not ﬁnd the P-value when solving problems by hand, we can still make conclusion based solely on the test statistic. Given a signiﬁcant level α, • For the one-sided test, if t0 > tα , then we can reject the null H0 . • For the two-sided test, it t0 > t α , then we can reject the null H0 . 2 Example Smoking and Nicotine (cont.) MINITAB gives the following output for the data obtained from Table 3. Test whether ρ = 0 using the two-sided test. Solution: 21 5.3 The F-test for β 1. Assumptions • Quantitative variables where the population means of Y at diﬀerent values of X have a straight line relationship with X; that is, µY (X) = α + βX. • Simple random sample. • Population distribution for Y is approximately normal with the same standard deviation at each X value. • The data contains no extreme outliers. 2. Hypotheses • Null: H0 : β = 0. • Alternative: H1 : β = 0 . 3. Test statistic M SRegr F0 = . M SRes F0 sampling distribution has df1 = 1 and df2 = n − 2 (why?). 4. P-value Recall the deﬁnition of the P-value: The P-value is the probability of getting a test statistic value at least as extreme as the one observed, if H0 is true. The P-value is a tail probability from the F distribution the test statistic has when H0 is true. Remember that we said the larger values of F are, the more evidence against H0 . So the P-value is the probability of getting an F value larger than the one we actually got, if H0 is true. To calculate this probability exactly, we typically need a statistical software. 5. Conclusion If we can ﬁnd the corresponding P-value for our test statistic, then the conclusion based on the P-value is straight forward. Smaller P-value give stronger evidence against H0 . If a decision is needed, reject H0 if P-value ≤ α, the signiﬁcant level. However, most of the time, P-value can only be found using statistical software. The F-test for β is equivalent to the two-sided t-test for β. In fact, the three types of tests are equivalent to each other. Depend on what types of information ig given, we can always choose a test that make our lives easier. Example Smoking and Nicotine (cont.) MINITAB gives the following output for the data obtained from Table 3. Analysis of Variance Source DF SS MS P Regression 1 5507 5507 0.570 Residual Error 5 74533 14907 Total 6 80040 22 Test whether β = 0 using the F-test. Solution: 5.4 Conﬁdence Intervals for the Slope The regression F and t tests simply ask whether its reasonable that β = 0. Instead, we might be interested in ﬁguring out what all the reasonable values are for β. We do this by making a conﬁdence interval. The formula for (1 − α)% CI for β is b ± t α seb 2 where seb is the standard error for the coeﬃcient b (provided by statistical software, if not, the formula is seb = M SX 2 ).. P Res i The standard interpretation of a (1 − α)% conﬁdence interval for β is as follows: We are (1 − α)% conﬁdent that the true value of β is between (lower bound) and (upper bound). More speciﬁcally, remember that by “(1−α)% conﬁdent,” we mean that if we could take millions of random samples from this same population, do a regression for each one, and calculate a (1 − α)% conﬁdence interval for β each time, then (1 − α)% of those intervals would contain the true value of β. Example Smoking and Nicotine (cont.) Find and interpret the 99% CI for β based on our data from Table 3 (use the MINITAB output given from section 5.1) Solution: 23 PART V - INFERENCE WITH REGRESSION AT SPECIFIC X VALUE For a straight-line regression model, we estimate µY , the population mean of Y at a ˆ given value of X by the least squares regression equation Yi = a + bXi . How good is this estimate? We can ﬁnd the (1 − α)% CI for the unknown population parameter µY . ˆ Furthermore, the estimate Yi = a + bXi for the mean of Y at a ﬁxed value of X is also a prediction for the outcome of Y for a particular subject at that value. How good is the prediction? We can ﬁnd the (1 − α)% prediction interval (PI) for the unknown value of the subject at a paticular value X. So what is the diﬀerence between the PI and the CI? The prediction interval for Y is an inference about where individual observations fall, whereas the conﬁdence interval for µ is an inference about where a population mean falls. Use a prediction interval for Y if we want to predict where a single observation on Y will fall. Use a conﬁdence interval for µ if we want to estimate the mean of Y for every subject having a particular X value. • For large samples with an X value √ ¯ equal to or close to the mean X, the (1 − α)% PI ˆ for Y is approximately Y ± t α ∗ M SRes . 2 Keep in mind that this is just an approximated formula for large samples. The exact formula is actually 1 ¯ n(X − X)2 ˆ Y ± tα ∗ M SRes 1+ + 2 n n( X)2 − X 2 • The (1 − α)% CI for µY is ¯ M SRes Y ± tα ∗ . 2 n Example Smoking and Nicotine (cont.) Recall the following table gives the level of continine for a person who smoke a certain number of cigarette per day Table 5: Data for Smoking and Nicotine X (= Cigarettes per day) Y (= Level of continine) 60 179 10 283 4 76 15 174 10 209 1 10 20 350 24 Given that Xi = 120, ¯ Xi2 = 4442 and X = 17.14. Moreover, from the ANOVA table, we know that M SRes = 14907. ˆ • Find the predicted value Y for X = 34. ˆ • Find and interpret the 95% PI for Y (X = 34). • Find and interpret the 95% CI for the population mean µY . Solution: Suggested Problems Chapter 12: 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 15, 16, 17, 18, 19, 20, 23, 24, 26, 32, 33, 34, 35, 36, 37, 38, 39, 43, 44, 45, 46, 47, 49, 50, 51, 52, 53, 63, 64, 65, 66, 67, 68, 69, 71, 72, 73, 74, 75, 76, 81, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106. This is the end of chapter 5. Cheers. c Quan Tran - Summer 2009 25