VIEWS: 1 PAGES: 19 CATEGORY: Markets / Industries POSTED ON: 4/9/2011 Public Domain
Solutions to End-of-Section and Chapter Review Problems 325 13.73 (h) E.R.A. Residual Plot 15 10 5 Residuals 0 -5 -10 -15 -20 0 1 2 3 4 5 6 E.R.A. Based on a visual inspection of the graphs of the distribution of studentized residuals and the residuals versus E.R.A., there is no pattern. The model appears to be adequate. (i) p-value = 7.28222E-05 < 0.05. Reject H0. There is evidence that the fitted linear regression model is useful. (j) 81.5457 Y | X 888.2396 (k) 68.8964 YI 100.8889 (l) 21.7459 1 8.4395 (m) The “population” might be considered to be all the recent years in which baseball has been played. (n) Other independent variables might be considered for inclusion in the models are (i) runs scored, (ii) hits allowed, (iii) walks allowed, (iv) number of errors, etc. 13.74 (a) Scatter Diagram 80 70 60 Price per Person 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 Sum of Ratings 326 Chapter 13: Simple Linear Regression 13.74 (b) ˆ Y 24.2468 1.0464X cont. (c) Since no restaurant will receive a summated rating of 0, it is not meaningful to interpret b0. For each additional unit of increase in summated rating, the estimated average price per person will increase by $1.0464. (d) Y 24.2468 1.0464 50 $28.07 ˆ (e) SYX = 6.0352. (f) r2 = 0.6584. 65.84% of the variation in price per person can be explained by the variation in summated rating. (g) r r 2 0.6584 0.8114 . (h) Summated Rating Residual Plot 20 15 10 Residuals 5 0 -5 -10 -15 -20 0 10 20 30 40 50 60 70 80 90 Summated Rating Based on a visual inspection of the residual plot of summated rating, there may be a nonlinear relationship between price per person and summated rating. A quadratic model may be more appropriate for the data. (i) p-value is virtually 0. Reject H0. There is very strong evidence to conclude that there is a linear relationship between price per persona and summated rating. (j) $26.55 Y | X $29.59 (k) $16.00 YI $40.15 (l) 0.8953 1 1.1975 (m) The linear regression model appears to have provided an adequate fit and shown a significant linear relationship between price per person and summated rating. Since 65.84% of the variation in price per person can be explained by the variation in summated rating, price per person is moderately useful in predicting the price. Given the parabolic pattern in the summated rating residual plot, a quadratic regression model may perform better in predicting price per person using summated rating. Solutions to End-of-Section and Chapter Review Problems 327 13.75 (a) Scatter Plot 4500000 4000000 3500000 3000000 Sales 2500000 2000000 1500000 1000000 500000 0 0 10000 20000 30000 40000 50000 60000 Income (b) b0 299876.8059 b1 39.1698 Y 299876.8059+39.1698X (c) Since median family income of customer base cannot be 0, b0 just captures the portion of the latest one-month sales total that varies with factors other than income. b1 39.1698 means that as the median family income of customer base increases by one dollar, the estimated average latest one-month sales total will increase by $39.17. (d) SYX 849860.17 (e) r 2 0.1472 . 14.72% of the total variation in the franchise's latest one-month sales total can be explained by using the median family income of customer base. (f) r r 0.3837 . There is not a very strong positive linear relationship between latest 2 one-month sales total and median family income of customer base. 328 Chapter 13: Simple Linear Regression 13.75 (g) cont. Income Residual Plot 2000000 Residuals 0 0 10000 20000 30000 40000 50000 60000 -2000000 Income There is a slight increase in the variance of the residuals at the higher end of the median family income. In general, however, the assumption of homoscedasticity seems to be intact. 10 100.00% 9 90.00% 8 80.00% Frequency 7 70.00% 6 60.00% Frequency 5 50.00% 4 40.00% Cumulative % 3 30.00% 2 20.00% 1 10.00% 0 .00% -1500000 -500000 1500000 500000 Residual The histrogram does not suggest severe asymmetry nor abnormal extreme observations. So the normality assumption also seems to be intact. (h) H0 : 0 H1 : 0 r Test statistic: t 2.4926 1 r2 n2 Decision rule: Reject H 0 when |t|>2.0289. Decision: Since t = 2.4926 is above the upper critical bound 2.4926, reject H 0 . There is enough evidence to conclude that there is a linear relationship between one- month sales total and median income of customer base. (i) b1 tn2 Sb1 39.1697 2.028115.7143 7.2995< <71.0400 Solutions to End-of-Section and Chapter Review Problems 329 13.75 (j) (a) cont. Scatter Plot 5000000 4000000 3000000 Sales 2000000 1000000 0 0 5 10 15 20 25 30 35 40 Age (b) Y 931626.16+21782.76X (c) Since median age of customer base cannot be 0, b0 just captures the portion of the latest one-month sales total that varies with factors other than age. b1 21782.76 means that as the median age of customer base increases by one year, the estimated average latest one-month sales total will increase by $21782.76. (d) SYX 919492.84 (e) r 2 0.0017 . Only 0.17% of the total variation in the franchise's latest one- month sales total can be explained by using the median age of customer base. (f) r r 2 0.0413 . There is essentially no linear relationship between latest one-month sales total and median age of customer base. (g) Age Residual Plot 4000000 Residuals 2000000 0 -2000000 0 10 20 30 40 Age The residuals are very eventually spread out across different range of median age. 330 Chapter 13: Simple Linear Regression 13.75 (j) (h) H0 : 0 H1 : 0 r Test statistic: t 0.2482 1 r2 n2 Decision rule: Reject H 0 when |t|>2.0289. Decision: Since t = 0.2482 is below the upper critical bound 2.4926, do not reject H 0 . There is not enough evidence to conclude that there is a linear relationship between one-month sales total and median age of customer base. (i) b1 tn2 Sb1 21782.76354 2.028187749.63 -156181.50< <199747.02 (k) (a) Scatter Diagram 4500000 4000000 3500000 3000000 Sales 2500000 2000000 1500000 1000000 500000 0 0 20 40 60 80 100 HS There appear to be some positive linear relationship between total sales and percentage of customer base with high school diploma. (b) Y -2969741.23+59660.09X (c) Since the percent of customer base with high school diploma cannot be 0, b0 just captures the portion of the latest one-month sales total that varies with factors other than HS. b1 59660.09 means that as one more percent of the customer base have received a high school diploma, the estimated average latest one-month sales total will increase by $59660.09. (d) SYX 802003.81 (e) r 2 0.2405 . 24.05% of the total variation in the franchise's latest one- month sales total can be explained by the percentage of customer base with a high school diploma. (f) r r 2 0.4904 . There is some positive linear relationship between latest one-month sales total and percentage of customer base with a high school diploma. Solutions to End-of-Section and Chapter Review Problems 331 13.75 (k) (g) cont. HS Residual Plot 2000000 Residuals 0 0 20 40 60 80 100 -2000000 HS The residual plot suggests there might be a violation of the homoscedasticity assumption since the variance of the residuals increases as the percentage of customer base with a high school diploma increases. (h) H0 : 0 H1 : 0 r Test statistic: t 3.3766 1 r2 n2 Decision rule: Reject H 0 when |t|>2.0289. Decision: Since t = 3.3766 is above the upper critical bound 2.4926, reject H 0 . There is enough evidence to conclude that there is a linear relationship between one-month sales total and percentage of customer base with a high school diploma. (i) b1 tn2 Sb1 59660.09 2.028117668.885 23825.98< <95494.21 (l) (a) Scatter Diagram 4500000 4000000 3500000 3000000 Sales 2500000 2000000 1500000 1000000 500000 0 0 10 20 30 40 50 Collge There is a positive linear relationship between total sales and percentage of customer base with a college diploma. (b) Y 789847.38+35854.15X 332 Chapter 13: Simple Linear Regression 13.75 (l) (c) Since the percent of customer base with college diploma cannot be 0, b0 just cont. captures the portion of the latest one-month sales total that varies with factors other than College. b1 35854.15 means that as one more percent of the customer base have received a college diploma, the estimated average latest one-month sales total will increase by $35854.15. (d) SYX 871329.93 (e) r 2 0.1036 . 10.36% of the total variation in the franchise's latest one- month sales total can be explained by the percentage of customer base with a college diploma. (f) r r 2 0.3218 . There is some positive linear relationship between latest one-month sales total and percentage of customer base with a college diploma. (g) College Residual Plot 4000000 Residu als 2000000 0 -2000000 0 10 20 30 40 50 College The residuals are quite evenly spread out around zero even though there might be a slight tendency for the variance to increase as the percentage of customer base with a college diploma increases. (h) H0 : 0 H1 : 0 r Test statistic: t 2.0392 1 r2 n2 Decision rule: Reject H 0 when |t|>2.0289. Decision: Since t = 2.0392 is above the upper critical bound 2.4926, reject H 0 . There is enough evidence to conclude that there is a linear relationship between one-month sales total and percentage of customer base with a college diploma. (i) b1 tn2 Sb1 35854.15 2.028117582.269 195.75< <71512.60 Solutions to End-of-Section and Chapter Review Problems 333 13.75 (m) (a) cont. Scatter Diagram 4500000 4000000 3500000 3000000 Sales 2500000 2000000 1500000 1000000 500000 0 -5 0 5 10 15 20 25 Growth It is not obvious that there is any linear relationship between total sales and annual population growth rate of customer base over the past 10 years. (b) Y 1595571.48+26833.54X (c) b0 =1595571 means the estimated average latest one-month sales total is $1595571 when the annual population growth rate of customer base over the past 10 years is zero. b1 26833.54 means that as the annual population growth rate increases by 1%, the estimated average latest one-month sales total will increase by $26833.54. (d) SYX 914466.58 (e) r 2 0.0126 . Only 1.26% of the total variation in the franchise's latest one- month sales total can be explained by the annual population growth rate of customer base over the past 10 years. (f) r r 2 0.1122 . If there is any linear relationship between latest one- month sales total and annual population growth rate of customer base over the past 10 years, it will be a very weak positive relationship. 334 Chapter 13: Simple Linear Regression 13.75 (m) (g) Growth Residual Plot 4000000 Residu als 2000000 0 -5 -2000000 0 5 10 15 20 25 Growth There seems to be a diamond shape pattern of the residual distribution and, hence, a violation of the homoscedasticity assumption. The variance is larger the closer is the growth rate towards zero. (h) H0 : 0 H1 : 0 r Test statistic: t 0.6776 1 r2 n2 Decision rule: Reject H 0 when |t|>2.0289. Decision: Since t = 0.6776 is below the upper critical bound 2.4926, do not reject H 0 . There is not enough evidence to conclude that there is a linear relationship between one-month sales total and the annual population growth rate of customer base over the past 10 years. (i) b1 tn2 Sb1 26833.54 2.0281 39601.427 -53481.77< <107148.86 (n) Percentage of customer base with a high school diploma will be the best predictor for sales at an individual store location since it provides the highest explanatory power of 24.05% among the four models. Solutions to End-of-Section and Chapter Review Problems 335 13.76 (a) Scatter Diagram 100 90 80 70 % Passing 60 50 40 30 20 10 0 88 90 92 94 96 98 % Attendance There is a very obvious positive relationship between % of students passing the proficiency test and the daily average of the percentage of students attending class. (b) b0 -771.5868 b1 8.8447 Y -771.5869+8.8447X (c) Since it does not make sense for % attendance to be zero, b0 -771.5868 should be interpreted as the portion of % passing the proficiency exam that will varies with factors other than % attendance. b1 8.8447 implies that as daily average percentage of students attending class increases by 1%, the estimated average percentage of students passing the ninth-grade proficiency test will increase by 8.8447%. (d) SYX 10.5787 (e) r 2 0.6024 . 60.24% of the total variation in % passing the proficiency test can be explained by % attendance. (f) r r 2 0.7762 . There is a rather strong positive linear relationship between % of students passing the proficiency test and daily average of the % of students attending class. 336 Chapter 13: Simple Linear Regression 13.76 (g) cont. % Attendance Residual Plot 50 Residuals 0 88 90 92 94 96 98 -50 % Attendance The residuals are evenly distributed across difference range of % attendance. There is no obvious violation of the homoscedasticity assumption. Histogram of Residuals 16 100.00% 14 80.00% 12 Frequency 10 60.00% Frequency 8 40.00% Cumulative % 6 4 20.00% 2 0 .00% -3 4 11 18 -24 -17 -10 Bin The distribution of the residuals is left skewed. However, with the exception of 2 extremely negative residuals, the histrogram is not too badly skewed. (h) H0 : 0 H1 : 0 r Test statistic: t 8.2578 1 r2 n2 Decision rule: Reject H 0 when |t|>2.014. Decision: Since t = 8.2578 is above the upper critical bound 2.014, reject H 0 . There is enough evidence to conclude that there is a linear relationship between % passing and % attendance. (i) b1 tn2 Sb1 8.8447 2.01411.0711 6.6874< <11.0020 Solutions to End-of-Section and Chapter Review Problems 337 13.76 (j) (a) cont. Scatter Diagram 100 90 80 70 % Passing 60 50 40 30 20 10 0 0 10000 20000 30000 40000 50000 Salary There seems to be a slightly positive relationship between % of students passing the proficiency test and average teacher salary. (b) Y 23.065+0.0011X (c) Since it does not make sense for teach salary to be zero, b0 23.065 should be interpreted as the portion of % passing the proficiency exam that will varies with factors other than teacher salary. b1 0.0011 implies that as average teacher salary increase by $1, the estimated average percentage of students passing the ninth-grade proficiency test will increase by 0.0011%. (d) SYX 16.3755 (e) r 2 0.0474 . Only 4.74% of the total variation in % passing the proficiency test can be explained by average teacher salary. (f) r r 2 0.2177 . There seems to be a rather weak positive linear relationship between % of students passing the proficiency test and average teacher salary. 338 Chapter 13: Simple Linear Regression 13.76 (j) (g) cont. Salary Residual Plot 50 Residuals 0 0 10000 20000 30000 40000 50000 -50 Salary The residuals are evenly distributed across difference range of % attendance. There is no obvious violation of the homoscedasticity assumption. Histogram of Residuals 15 100.00% 80.00% Frequency 10 60.00% Frequency 40.00% Cumulative % 5 20.00% 0 .00% 5 25 -35 -15 Bin The distribution of the residuals is slightly left skewed but not too far from a normal distribution. (h) H0 : 0 H1 : 0 r Test statistic: t 1.496 1 r2 n2 Decision rule: Reject H 0 when |t|>2.014. Decision: Since t = 1.496 is below the upper critical bound 2.014, do not reject H 0 . There is not enough evidence to conclude that there is a linear relationship between % passing and average teacher salary (i) b1 tn2 Sb1 0.0011 2.0141 0.00073 -0.000375< <0.002542 Solutions to End-of-Section and Chapter Review Problems 339 13.76 (k) (a) Scatter Diagram 100 90 80 70 % Passing 60 50 40 30 20 10 0 0 1000 2000 3000 4000 Spending There seems to be a slightly positive relationship between % of students passing the proficiency test and spending per pupil. (b) Y 35.7843+0.0109X (c) Since it does not make sense for spending per pupil to be zero, b0 35.7843 should be interpreted as the portion of % passing the proficiency exam that will varies with factors other than spending. b1 0.0109 implies that as spending per pupil increase by $1, the estimated average percentage of students passing the ninth-grade proficiency test will increase by 0.019%. (d) SYX 15.9984 (e) r 2 0.0907 . Only 9.07% of the total variation in % passing the proficiency test can be explained by spending per pupil. (f) r r 2 0.3012 . There seems to be a rather weak positive linear relationship between % of students passing the proficiency test and spending per pupil. Spending Residual Plot 50 Residuals 0 0 1000 2000 3000 4000 -50 Spending The residuals are evenly distributed across difference range of % attendance. There is no obvious violation of the homoscedasticity assumption. 340 Chapter 13: Simple Linear Regression 13.76 (k) (g) cont. Histogram of Residuals 15 100.00% Frequency 80.00% 10 60.00% Frequency 5 40.00% Cumulative % 20.00% 0 .00% -5 -35 -25 -15 5 15 25 Bin The distribution of the residuals is slightly left skewed. Excluding the largest negative residual, the histrogram is quite symmetric. (h) H0 : 0 H1 : 0 r Test statistic: t 2.1192 1 r2 n2 Decision rule: Reject H 0 when |t|>2.014. Decision: Since t = 2.1192 is above the upper critical bound 2.014, reject H 0 . There is enough evidence to conclude that there is a linear relationship between % passing and spending. (i) b1 tn2 Sb1 0.0109 2.0141 0.0052 0.00054< <0.02129 (l) The model with % attendance is the best model to use for predicting % passing since it has the highest R-square of 60.24%. 13.77 (a) Y 0.2141+0.4738X (b) For Exxon, the estimated value of its stock will increase by 0.47% on average when the S&P 500 index increases by 1%. (c) (a) Y 0.6064+0.3607X (b) For Mobil Oil, the estimated value of its stock will increase by 0.36% on average when the S&P 500 index increases by 1%. (d) (a) Y -0.2101+0.2068X (b) For International Aluminum, the estimated value of its stock will increase by 0.21% on average when the S&P 500 index increases by 1%. (e) (a) Y -0.4182+0.5131X (b) For Sears, the estimated value of its stock will increase by 0.51% on average when the S&P 500 index increases by 1%. (f) (a) Y -0.4471+1.4286X (b) For BancOne Corporation, the estimated value of its stock will increase by 1.43% on average when the S&P 500 index increases by 1%. (g) (a) Y 0.0361+0.9430X (b) For General Motors, the estimated value of its stock will increase by 0.94% on average when the S&P 500 index increases by 1%. Solutions to End-of-Section and Chapter Review Problems 341 13.78 (a) GM Ford IAL HCR GM 1 Ford 0.856954 1 IAL -0.06409 0.194917 1 HCR -0.4403 -0.08868 0.598569 1 (b) There is a strong positive correlation of r = 0.8569 between the stock prices of GM and Ford, an also relatively strong positive correlation of r = 0.5986 between the prices of IAL and HCR, a moderately negative correlation of r = -0.4403 between the prices of GM and HCR, a weak positive correlation of r = 0.1949 between the prices of Ford and IAL, a very weak negative correlation of r = -0.08868 between the prices of Ford and HCR, and also a very weak negative correlation of r = -0.06408 between GM and IAL. (c) It is not a good idea to have all the stocks in an individual's portfolio be strongly positively correlated for that will increase the variance, a measure of risk, of the portfolio. Some negatively correlated stock prices in a portfolio can reduce the combined variance, though at the price of reduced combined expected return. SSXY 13.79 (a) r 0.2196 SSX SSY (b) H0 : H1 : 0 r Test statistic: t = -1.7146 1 r2 n2 Decision rule: Reject H 0 if the p-value is less than the level of significance 0.05 . Decision: Since the p value of 0.0918 is greater than the level of significance, do not reject the null hypothesis. There is not enough evidence to conclude that there is a linear relationship between the two variables. (c) The 10% level of significance is greater than the p value of 0.0918 and, hence, the null hypothesis will be rejected. There is enough evidence at the 10% level of significance to conclude that there is a linear relationship between the two variables. 342 Chapter 13: Simple Linear Regression The Springville Herald Case SH13.1 The method was placing too much emphasis on the last two time periods instead of considering a large amount of data that was available. This overemphasis would tend to lead to large fluctuations in the forecast accuracy when the trend differed from what had occurred in the last two months. SH13.2 There are many factors that might be considered other than the number of outlets. Among them are: 1. The amount of newspaper advertising for new subscriptions. 2. The amount of radio and television advertising for new subscriptions. 3. Whether or not special promotional campaigns were being used in a particular month. 4. The average experience of the telemarketers used in a given month. 5. The training programs provided to the telemarketers in a given month. SH13.3 (a) The statistical model initially fit to the data is a simple linear model that predicts New Subscriptions = -413.82 + 4.40795 Telemarketing Hours The Y intercept (b0) equal to -413.82, has no direct interpretation since sales of below zero new subscriptions is not feasible. We can say however, that -413.82 represents a constant portion of the predicted average new subscriptions that varies with factors other than the number of telemarketing hours. The slope (b1) equal to 4.40795, can be interpreted to mean that for each increase of one telemarketing hour, average new subscriptions are predicted to increase by 4.40795 per month. The r2 value of .8617 can be interpreted to mean that 86.17% of the variation in new subscriptions can be explained by variation in the number of telemarketing hours from month to month. Since the data was collected for 24 consecutive months, it is important for us to determine whether there is any autocorrelation among the residuals. The Durbin-Watson D statistic of 1.7009 is > 1.45, the upper critical value for n = 24 and 0.05 . This leads us to believe that there isn't any positive autocorrelation among the residuals. We could also examine the plot of the standardized residuals over time to see whether a pattern existed. In this case this plot does not seem to indicate any evidence of positive association among consecutive residuals. Before moving further with any predictions based on the model, we need to determine whether the simple linear model was appropriate for these data The residual plot of the standardized residuals versus the fitted Y values (or the X values) indicates no evidence of any pattern. Thus, we may conclude that the simple linear model was appropriate for these data. Solutions to End-of-Section and Chapter Review Problems 343 SH13.3 (a) cont. We may examine the validity of the assumption of normality among the residuals by studying the normal probability plot. If the points in this plot fall on an approximate straight line, there would be no reason to suspect serious departure from normality. Since that seems to be the case here, the normality assumption does not appear to have been seriously violated. (b) For X = 1,000, the predicted value of New Subscriptions = -413.82 + 4.40795 (1000) = 3,994.1 (c) We note that the value of 2,000 for is beyond the range of our X values. Thus, we would be assuming that the regression model fit for a range of 704 to 1,498 telemarketing hours would be valid for a month in which the telemarketing hours was 2,000. This represents a situation which is well beyond the range of the number of telemarketing hours that we have used in the past 24 months. Many things could change with this increase of telemarketing hours, and the type of increase in new subscriptions that we have had with expanding telemarketing hours may not continue when we further expand by another 25% above the maximum amount that we have used in the past.