ch16 by stariya

VIEWS: 3 PAGES: 71

									                                        CHAPTER 16
                           MULTIPLE REGRESSION AND CORRELATION

SECTION EXERCISES
16.1 d/p/e Simple linear regression involves only one independent variable; multiple regression involves
two or more independent variables. Multiple regression analysis is preferred whenever two or more
variables impact upon the dependent variable.

16.2 d/p/e As with simple regression analysis, multiple regression analysis is used in determining and
interpreting the linear relationship between the dependent and independent variables. Correlation analysis
measures the strength of the relationship.

16.3 d/p/m Many variables could affect the annual household expenditure for auto maintenance and
repair: the number of cars owned, the number of miles driven each year, the age(s) of the car(s), the
make(s) of the car(s). These are just a few of the many variables that could have a notable effect.

16.4 d/p/m The director may wish to examine the personnel file for the following variables: the number of
vacation days taken last year, the number of personal days taken last year, the times late to work last year,
the number of conferences scheduled with the employee's superior, and the number of days called in sick
the previous year.

16.5 d/p/e The multiple regression model is:
yi = 0 + 1x1i + 2x2i + ... + kxki + i where
yi = a value of the dependent variable, y
0 = a constant
x1i, x2i, ... , xki = values of the independent variables x1, x2, ... , xk
1, 2, ... , k = partial regression coefficients for independent variables x1, x2, ... , xk
i = random error, or residual

16.6 d/p/m In terms of the residual component of the model, the assumptions underlying multiple
regression are:
1. For any given set of values for the independent variables, the population of residuals will be normally
   distributed with a mean of zero and a standard deviation of .
2. The standard deviation of the error terms is the same regardless of the combination of values taken on
   by the independent variables.
3. The error terms are statistically independent from each other.

16.7 d/p/m When there are two independent variables, the regression equation can be thought of in terms
of a geometric plane. When there are three or more independent variables, the regression equation
becomes a mathematical entity called a hyperplane; it is impossible to visually summarize a regression
with three or more independent variables because it will be in four or more dimensions.

16.8 c/a/e
a. The y-intercept, or constant term, is 100. The partial regression coefficient for x1 is 20; for x2, -3; and,
   for x3, 120.
                                 ˆ
b. The estimated value of y is y = 100 + 20(12) - 3(5) + 120(10) = 1525.
                                             ˆ
c. If x3 were to increase by 4, the value of y would increase by 480. To offset this increase, x2 would
   have to increase by 160, or 480/3.




                                                       523
16.9 p/a/e
a. The y-intercept is 300, the partial regression coefficients are 7 for x1 and 13 for x2.
                                                             ˆ
b. If 3 people live in a 6-room home, the estimated bill is y = 300 + 7(3) + 13(6) = 399.

16.10 p/a/e
a. The y-intercept or constant term is -0.1; this is the estimated total operating cost (in millions of dollars)
   when there is no labor cost and no power cost. (Note: it is very unlikely that a plant ever operates
   without incurring either labor or power costs; this estimate is very suspect. We must be careful when
   making estimates based on x values that lie beyond the range of the underlying data.) The partial
   regression coefficient for the labor cost is 1.1; this indicates that, for a given level of electric cost,
   the estimated operating cost will increase by $1.10 for each additional $1 incurred in labor costs.
   The partial regression coefficient for the electric power cost is 2.8; this indicates that, for a given level
   of labor cost, the estimated operating cost will increase by $2.80 for each additional $1 increase in
   electric power cost.
b. If labor costs $6 million and electric power costs $0.3 million, the estimated annual cost to operate the
             ˆ
   plant is: y = -0.1 + 1.1(6) + 2.8(0.3) = $7.34 million.

16.11 p/c/m The Minitab printout is shown below.
Regression Analysis: Visitors versus AdSize, Discount

The regression equation is Visitors = 10.7 + 2.16 AdSize + 0.0416 Discount

Predictor          Coef      SE Coef             T         P
Constant         10.687        3.875          2.76     0.040
AdSize           2.1569       0.6281          3.43     0.019
Discount        0.04157      0.04380          0.95     0.386

S = 3.375         R-Sq = 71.6%        R-Sq(adj) = 60.3%

Analysis of Variance
Source            DF             SS            MS           F         P
Regression         2         143.92         71.96        6.32     0.043
Residual Error     5          56.95         11.39
Total              7         200.87

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                      95.0% PI
1         24.59       1.74   (   20.12,   29.06)         (     14.83,   34.35)

Values of Predictors for New Observations
New Obs    AdSize Discount
1            5.00      75.0


a. The regression equation is Visitors = 10.687 + 2.1569*AdSize + 0.04157*Discount.
b. The y-intercept indicates that about 10 or 11 visitors (10.687) would come to the clubs if there were
   neither ads nor discounts. The partial regression coefficient for the ad data indicates that, holding the
   level of the discount constant, increasing the ad size by one column inch will bring in about 2 new
   visitors (2.1569). Finally, the partial regression coefficient for the discount data indicates that, holding
   the size of the ad constant, an additional $1 discount will add 0.04157 to the number of visitors.
c. If the size of the ad is 5 column-inches and a $75 discount is offered, the estimated number of new
   visitors to the club is 24.59. See the "Fit" column in the printout.




                                                      524
The corresponding Excel multiple regression printout is shown below.
                A                 B              C              D            E             F            G
 14   SUMMARY OUTPUT                                      Visitors     Col-Inches Discount
 15          Regression Statistics                              23           4            100
 16   Multiple R                   0.8465                       30           7             20
 17   R Square                     0.7165                       20           3             40
 18   Adjusted R Square            0.6031                       26           6             25
 19   Standard Error               3.3749                       20           2             50
 20   Observations                      8                       18           5             30
 21                                                             17           4             25
 22   ANOVA                                                     31           8             80
 23                              df             SS             MS           F       Significance F
 24   Regression                        2        143.924        71.962        6.318           0.043
 25   Residual                          5          56.951       11.390
 26   Total                             7        200.875
 27
 28                         Coefficients Standard Error       t Stat      P-value     Lower 95%     Upper 95%
 29   Intercept                   10.687          3.875           2.758       0.040           0.726      20.648
 30   Col-Inches                   2.157          0.628           3.434       0.019           0.542       3.771
 31   Discount                     0.042          0.044           0.949       0.386          -0.071       0.154


16.12 p/c/m The Minitab printout is shown below.
Regression Analysis: Overall versus Ride, Handling, Comfort

The regression equation is Overall = 35.6 + 3.68 Ride + 2.89 Handling - 0.11 Comfort

Predictor              Coef           SE Coef             T            P
Constant              35.63             13.42          2.66        0.045
Ride                  3.675             1.639          2.24        0.075
Handling              2.892             1.055          2.74        0.041
Comfort              -0.110             1.625         -0.07        0.949

S = 2.858              R-Sq = 75.6%             R-Sq(adj) = 61.0%

Analysis of Variance
Source            DF                       SS            MS              F          P
Regression         3                  126.714        42.238           5.17      0.054
Residual Error     5                   40.842         8.168
Total              8                  167.556

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                                  95.0% PI
1        82.937      2.493   ( 76.529, 89.345)                        (   73.188, 92.686)

Values of Predictors for New Observations
New Obs      Ride Handling    Comfort
1            6.00      9.00      7.00


a. The regression equation is:
    Overall = 35.63 + 3.675*Ride + 2.892*Handling - 0.110*Comfort
b. The y-intercept indicates that a car that scores 0 on all three of the independent variables will receive
   an overall rating of 35.63. (This result should be considered cautiously since there were no 0 scores in
   the data used to estimate the regression.) The partial regression coefficient for Ride indicates that,
   holding the other two scores constant, an additional point in Ride will result in an overall rating that is
   3.675 points higher. The partial regression coefficient for Handling indicates that, holding the other
   two scores constant, an additional point in Handling will result in an overall rating that is 2.892 points
   higher. The partial regression coefficient for Comfort indicates that, holding the other two scores
   constant, an additional point in Comfort will result in an overall rating that is 0.110 points lower.
c. The estimated overall rating for a vehicle that scores 6 on Ride, 9 on Handling, and 7 on Comfort is
   82.937. This can be calculated as 35.63 + 3.675(6) + 2.892(9) - 0.110(7). In the Minitab printout, refer
   to the "Fit" column.

                                                                  525
The corresponding Excel multiple regression printout is shown below.
                  A             B               C                D            E              F            G
 13                                           Rating            Ride       Handling        Comfort
 14                                            83                8            7              7
 15   SUMMARY OUTPUT                           86                8            8              8
 16          Regression Statistics             83                6            8              7
 17   Multiple R                 0.86963       83                8            7              9
 18   R Square                   0.75625       95                9            9              9
 19   Adjusted R Square          0.61000       84                8            8              9
 20   Standard Error             2.85803       88                9            6              9
 21   Observations                     9       82                7            8              7
 22                                            92                8            9              8
 23   ANOVA
 24                             df             SS               MS             F        Significance F
 25   Regression                      3        126.7139         42.2380        5.1709            0.0543
 26   Residual                        5         40.8416          8.1683
 27   Total                           8        167.5556
 28
 29                        Coefficients Standard Error          t Stat     P-value       Lower 95%    Upper 95%
 30   Intercept              35.62642         13.41832           2.65506     0.04515          1.13360   70.11924
 31   Ride                     3.67543         1.63891           2.24260     0.07497         -0.53752    7.88838
 32   Handling                 2.89205         1.05540           2.74024     0.04078          0.17907    5.60502
 33   Comfort                 -0.11009         1.62469          -0.06776     0.94860         -4.28648    4.06631


16.13 p/c/m The Minitab printout is shown below.
Regression Analysis: Crispness versus OvenTime, Temp

The regression equation is Crispness = - 127 + 7.61 OvenTime + 0.357 Temp

Predictor                Coef        SE Coef                T              P
Constant              -127.19          61.33            -2.07          0.072
OvenTime                7.611          3.873             1.97          0.085
Temp                   0.3567         0.1177             3.03          0.016

S = 15.44              R-Sq = 58.6%           R-Sq(adj) = 48.2%

Analysis of Variance
Source            DF                     SS                MS              F          P
Regression         2                 2696.4            1348.2           5.65      0.029
Residual Error     8                 1907.3             238.4
Total             10                 4603.6

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                                   95.0% PI
1        -17.79      29.01   ( -84.69,    49.11) (                         -93.57,   57.99) XX
X denotes a row with X values away from the center
XX denotes a row with very extreme X values

Values of Predictors for New Observations
New Obs OvenTime       Temp
1            5.00       200

a. The regression equation is Crispness = -127.19 + 7.611*OvenTime + 0.3567*Temp.
b. The y-intercept indicates that a crust that is not cooked will receive a crispness rating of -127.19.
   (Caution should be used in interpreting this value since there were no such extreme values in the data
   used to estimate the regression.) The partial regression coefficient for OvenTime indicates that, for a
   given temperature, an additional minute in the oven will add 7.611 points to the crispness rating.
   Likewise, the partial regression coefficient for Temp indicates that, for a given cooking time,
   a one-degree increase in the oven temperature will result in a 0.3567 increase in the crispness rating.
c. The estimated crispness rating for a pie that is cooked 5 minutes at 200 degrees is -17.79. See the "Fit"
   column of the Minitab printout or substitute OvenTime = 5 and Temp = 200 into the regression
   equation. This estimate should be viewed cautiously since the oven temperature is well beyond the
   limits of the data used to estimate the regression.

                                                                   526
The corresponding Excel multiple regression printout is shown below.
                  A              B               C              D            E               F           G
 12                                                         Crispness      Time           Temp.
 13                                                             68          6.0             460
 14                                                             76          8.9             430
 15   SUMMARY OUTPUT                                            49          8.8             360
 16          Regression Statistics                              99          7.8             460
 17   Multiple R                   0.7653                       90          7.3             390
 18   R Square                     0.5857                       32          5.3             360
 19   Adjusted R Square            0.4821                       96          8.8             420
 20   Standard Error             15.4405                        77          9.0             350
 21   Observations                     11                       94          8.0             450
 22                                                             82          8.2             400
 23   ANOVA                                                     97          6.4             450
 24                              df             SS             MS            F        Significance F
 25   Regression                       2        2696.3635    1348.1817       5.6549            0.0295
 26   Residual                         8        1907.2729     238.4091
 27   Total                           10        4603.6364
 28
 29                         Coefficients Standard Error       t Stat      P-value      Lower 95%     Upper 95%
 30   Intercept               -127.1896         61.3267         -2.0740      0.0718       -268.6093     14.2300
 31   Time                        7.6111         3.8732          1.9651      0.0850          -1.3205    16.5428
 32   Temp.                       0.3567         0.1177          3.0315      0.0163           0.0854      0.6281


16.14 p/c/m The Minitab printout is shown below.
Regression Analysis: Budget versus Attend, Acres, Species

The regression equation is Budget = - 0.68 + 12.0 Attend + 0.0612 Acres - 0.0154 Species

Predictor                 Coef        SE Coef             T            P
Constant                -0.681          6.600         -0.10        0.921
Attend                  11.956          4.142          2.89        0.028
Acres                  0.06115        0.03343          1.83        0.117
Species               -0.01538        0.01562         -0.98        0.363

S = 4.914               R-Sq = 77.7%           R-Sq(adj) = 66.6%

Analysis of Variance
Source            DF                      SS             MS              F          P
Regression         3                  506.06         168.69           6.99      0.022
Residual Error     6                  144.88          24.15
Total              9                  650.94

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                                   95.0% PI
1         23.18       2.97   (   15.90,   30.45)                      (      9.12,   37.23)

Values of Predictors for New Observations
New Obs    Attend     Acres   Species
1            2.00       150       600


a. The regression equation is
   Budget = -0.681 + 11.956*Attend + 0.06115*Acres - 0.01538*Species.
b. The y-intercept indicates that a city zoo that has 0 attendance, occupies 0 acres and features 0 species
   will have an annual budget of -0.681 million dollars. Naturally, there is no such zoo, and this result
   should be considered cautiously since there were no 0 scores in the data used to estimate the regression
   equation. The partial regression coefficient for Attend indicates that, holding the other two
   independent variables constant, an additional 1 million in attendance will raise the estimated budget by
   $11.956 million. The partial regression coefficient for Acres indicates that, holding the other two
   independent variables constant, a 1-acre increase in space will increase the estimated budget by
   $0.06115 million. The partial regression coefficient for Species indicates that, holding the other two
   independent variables constant, bringing 1 additional species of animal into the park will decrease the
   estimated budget by $0.01538 million.

                                                                  527
c. The estimated annual budget for a zoo that has 2.0 million annual attendance, occupies 150 acres, and
   has 600 animal species is $23.18 million. See the "Fit" column in the Minitab printout or substitute
   Attend = 2, Acres = 150, and Species = 600 into the regression equation.

The corresponding Excel multiple regression printout is shown below.
                  A              B             C            D           E                F           G
 13                                         Budget        Attend      Acres          Species
 14                                          14.5          0.6         210              271
 15   SUMMARY OUTPUT                         35.0          2.0         216              400
 16          Regression Statistics            6.9          0.4          70              377
 17   Multiple R                   0.8817     9.0          1.0         125              277
 18   R Square                     0.7774     6.6          1.5          55              721
 19   Adjusted R Square            0.6662    17.2          1.3          80              400
 20   Standard Error               4.9139    15.5          1.3          42              437
 21   Observations                     10    21.0          2.5          91              759
 22                                          12.0          0.9         125              270
 23   ANOVA                                   9.6          1.1          92              260
 24                              df          SS            MS           F         Significance F
 25   Regression                       3      506.0640    168.6880      6.9861             0.0220
 26   Residual                         6      144.8770     24.1462
 27   Total                            9      650.9410
 28
 29                         Coefficients Standard Error   t Stat      P-value      Lower 95%     Upper 95%
 30   Intercept                -0.68145         6.60013    -0.10325      0.9211        -16.8314     15.4685
 31   Attend                  11.95568          4.14185     2.88655      0.0278           1.8209    22.0904
 32   Acres                     0.06115         0.03343     1.82928      0.1171          -0.0206      0.1429
 33   Species                  -0.01538         0.01562    -0.98430      0.3630          -0.0536      0.0229


16.15 p/a/m
a. The multiple regression equation is
   TEST04 = 11.98 + 0.2745*TEST01 + 0.37619*TEST02 + 0.32648*TEST03
   The y-intercept indicates that an individual unit scoring 0 on the first three tests can expect to score
   11.98 on the fourth test. (However, this is meaningless, since the test scores range from 200 to 800.)
   The partial regression coefficient for TEST01 indicates that, for a given set of scores on TEST02 and
   TEST03, a unit will gain 0.2745 points on TEST04 for an additional point on TEST01.
   Similarly, the partial regression coefficient for TEST02 implies, for a given set of scores on TEST01
   and TEST03, a unit will gain 0.37619 points on TEST04 for an additional point on TEST02.
   Likewise, the partial regression coefficient for TEST03 indicates an improvement of 0.32648 points
   on TEST04 for each additional point on TEST03, given a set of scores for TEST01 and TEST02.
b. If an individual unit has scored 350, 400, and 600 on the first three tests, its estimated score on the
   fourth test is: TEST04 = 11.98 + 0.2745(350) + 0.37619(400) + 0.32648(600) = 454.419.

16.16 p/a/m
a. First, we must determine the midpoint of the approximate 90% confidence interval:
    ˆ
    y = 11.98 + 0.2745(300) + 0.37619(500) + 0.32648(400) = 413.017
   From the printout, we see that the multiple standard error of the estimate is 52.72, and we know that
   n = 12. With d.f. = 12 - 3 - 1 = 8, the appropriate t-value is 1.860. The approximate 90% confidence
   interval for the mean rating on test four for units that have been rated at 300, 500, and 400 on the first
   three tests is:
                     s                      52.72
               y  t e  413.017  1.860
               ˆ                                   413.017  28.307  (384.710, 441.324)
                       n                      12
b. The approximate 90% prediction interval is:
    y  tse = 413.017  1.860(52.72) = 413.017  98.059 = (314.958, 511.076)
    ˆ




                                                              528
16.17 c/a/m
                    ˆ
a. The mean of y is y = 5.0 + 1.0(25) + 2.5(40) = 130.0.
                                                           173.5
b. The multiple standard error of the estimate is se                 3.195 .
                                                          20  2  1
c. The approximate 95% confidence interval for the mean of y whenever x1 = 20 and x2 = 30 can be
   found in several steps. First, we must find the midpoint of the approximate confidence interval.
                 ˆ
   This will be y = 5.0 + 1.0(20) + 2.5(30) = 100.
   The degrees of freedom are 20 - 2 - 1 = 17. The appropriate t-value is 2.110. The approximate
   confidence interval for the mean of y is:
          s                 3.195
    y  t e  100  2.110
    ˆ                               100  1.507  (98.493, 101.507)
            n                 20
d. The approximate 95% prediction interval for an individual y value when x1 = 20 and x2 = 30 is:
    y  tse = 100  2.110(3.195) = 100  6.741 = (93.259, 106.741)
    ˆ

16.18 p/a/m The solution can be obtained with formulas and calculator, but we will use Minitab and the
printout below:
Regression Analysis: Rating versus Price, Perform, BattLife

The regression equation is Rating = 65.0 - 0.00606 Price + 0.160 Perform + 1.25 BattLife

Predictor         Coef       SE Coef            T         P
Constant         64.98         19.54         3.33     0.029
Price        -0.006056      0.003189        -1.90     0.130
Perform         0.1601        0.1711         0.94     0.402
BattLife         1.250         2.277         0.55     0.612

S = 2.629         R-Sq = 59.3%        R-Sq(adj) = 28.8%

Analysis of Variance
Source            DF             SS            MS           F        P
Regression         3         40.347        13.449        1.95    0.264
Residual Error     4         27.653         6.913
Total              7         68.000

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                     95.0% PI
1        78.058      1.795   ( 73.073, 83.042)           (   69.218, 86.897)

Values of Predictors for New Observations
New Obs     Price   Perform BattLife
1            1000       100      2.50

a. The regression equation is Rating = 65.0 - 0.00606*Price + 0.160*Perform + 1.25*BattLife.
   For the population of computers that have a $1000 street price, a performance score of 100, and a
   2.50-hour battery life, we are 95% confident that the mean rating of such computers will be within the
   interval from 73.073 to 83.042.
b. For an individual computer with a $1000 street price, a performance score of 100, and a 2.50-hour
   battery life, we are 95% confident that the rating for this particular computer will be within the interval
   from 69.218 to 86.897.




                                                     529
16.19 p/c/m The solution can be obtained with formulas and calculator, but we will use Minitab and the
printout below:
Regression Analysis: CalcFin versus MathPro, SATQ

The regression equation is CalcFin = - 26.6 + 0.776 MathPro + 0.0820 SATQ

Predictor         Coef      SE Coef            T         P
Constant        -26.62        17.18        -1.55     0.172
MathPro         0.7763       0.1465         5.30     0.002
SATQ           0.08202      0.02699         3.04     0.023

S = 4.027        R-Sq = 88.5%        R-Sq(adj) = 84.7%

Analysis of Variance
Source            DF            SS            MS          F         P
Regression         2        751.57        375.78      23.17     0.002
Residual Error     6         97.32         16.22
Total              8        848.89

Predicted Values for New Observations
New Obs     Fit     SE Fit         90.0% CI                    90.0% PI
1         68.74       2.43   (   64.01,   73.46)       (     59.59,   77.88)

Values of Predictors for New Observations
New Obs   MathPro      SATQ
1            70.0       500

a. The regression equation is CalcFin = -26.6 + 0.776*MathPro + 0.0820*SATQ. For the population of
   entering freshmen who scored 70 on the math proficiency test and 500 on the quantitative portion of
   the SAT exam, we are 90% confident that their mean calculus final exam score will be within the
   interval from 64.01 to 73.46.
b. For an individual entering freshman who scored 70 on the math proficiency test and 500 on the
   quantitative portion of the SAT exam, we are 90% confident that his or her calculus final exam score
   will be within the interval from 59.59 to 77.88.

16.20 p/a/m
a. Using only the regression equation and summary information obtained in exercise 16.11, we can
   determine the approximate 95% confidence interval for the mean number of new visitors for clubs
   using 5 column-inches ads and offering an $80 discount. First, the midpoint of the interval will be
    ˆ
    y = 10.687 + 2.1569(5) + 0.04157(80) = 24.797.
   Eight observations were used to estimate the regression, so d.f. = 8 - 2 - 1 = 5. The appropriate t-value
   is 2.571, the multiple standard error of the estimate is 3.375, and the approximate 95% confidence
   interval is:
          s                    3.375
    y  t e  24.797  2.571
    ˆ                                 24.797  3.068  (21.729, 27.865)
            n                     8
b. The corresponding approximate 95% prediction interval is:
    y  tse = 24.797  2.571(3.375) = 24.797  8.677 = (16.120, 33.474)
    ˆ
   The preceding are the approximate intervals that could be calculated based only on the information
   shown in the printouts for exercise 16.11. As discussed in the text, the exact intervals will tend to be
   wider than the approximate intervals. This is because the exact intervals take into account that the
   specified values for x1 and x2 may differ from their respective means. The exact Minitab intervals
   corresponding to parts a and b of this exercise are:
   95% confidence interval, (19.91, 29.69); 95% prediction interval, (14.84, 34.76).




                                                    530
16.21 p/a/m
a. Using only the regression equation and summary information obtained in exercise 16.12, we can
   determine the approximate 95% confidence interval for the mean overall rating of cars that receive
   ratings of 8 on ride, 7 on handling, and 9 on driver comfort. First, the midpoint is:
    ˆ
    y = 35.63 + 3.675(8) + 2.892(7) - 0.110(9) = 84.284.
   There were nine observations used to estimate the regression, so d.f. = 9 - 3 - 1 = 5. The appropriate
   t-value is 2.571, the multiple standard error of the estimate is 2.858, and the approximate 95%
   confidence interval is:
          s                     2.858
    y  t e  84.284  2.571
    ˆ                                  84.284  2.449  (81.835, 86.733)
            n                      9
b. The corresponding approximate 95% prediction interval is:
    y  tse = 84.284  2.571(2.858) = 84.284  7.348 = (76.936, 91.632)
    ˆ
   The preceding are the approximate intervals that could be calculated based only on the information
   shown in the printouts for exercise 16.12. As discussed in the text, the exact intervals will tend to be
   wider than the approximate intervals. This is because the exact intervals take into account that the
   specified values for x1, x2, and x3 may differ from their respective means. The exact Minitab intervals
   corresponding to parts a and b of this exercise are:
   95% confidence interval, (79.587, 88.980); 95% prediction interval, (75.562, 93.005).

16.22 p/a/m
a. Using only the regression equation and summary information obtained in exercise 16.13, we can
   determine the approximate 95% confidence interval for the mean crispness rating for pies that are
   cooked 5.0 minutes at 300 degrees. First, the midpoint is:
    ˆ
    y = -127.19 + 7.611(5) + 0.3567(300) = 17.875
   There were eleven observations used to estimate the regression, so d.f. = 11 - 2 - 1 = 8.
   The appropriate t-value is 2.306, the multiple standard error of the estimate is 15.44, and the
   approximate 95% confidence interval is:
          s                    15.44
    y  t e  17.875  2.306
    ˆ                                 17.875  10.735  (7.140, 28.610)
            n                    11
b. The corresponding approximate 95% prediction interval is:
    y  tse = 17.875  2.306(15.44) = 17.875  35.605 = (-17.730, 53.480)
    ˆ
   The preceding are the approximate intervals that could be calculated based only on the information
   shown in the printouts for exercise 16.13. As discussed in the text, the exact intervals will tend to be
   wider than the approximate intervals. This is because the exact intervals take into account that the
   specified values for x1 and x2 may differ from their respective means. The exact Minitab intervals
   corresponding to parts a and b of this exercise are:
   95% confidence interval, (-25.31, 61.07); 95% prediction interval, (-38.10, 73.86).

16.23 d/p/e The coefficient of multiple determination (R2) is analogous to the coefficient of determination
in simple linear regression. It is the proportion of variation in y that is explained by the multiple
regression equation.

16.24 d/p/m SST is the total variation in the y values, SSR is the variation in the y values that is explained
by the regression, and SSE is the variation in the y values that is not explained by the regression.
The coefficient of multiple determination is equal to 1 - (SSE/SST), or SSR/SST.
If SSE is small compared to SST, SSR will be large compared to SST, and the multiple regression
equation will explain a large portion of the variation in y. Recall that SST = SSR + SSE.




                                                     531
16.25 d/p/e The coefficient of multiple determination for exercise 16.15 is 0.872. This means that 87.2%
of the variation in scores on the fourth test can be explained by variations in scores on the first three tests.

16.26 p/c/m The coefficient of multiple determination for the regression equation obtained in exercise
16.12 is 0.756. This indicates that 75.6% of the variation in overall ratings is explained by the regression
equation.

16.27 p/c/m The coefficient of multiple determination for the regression equation obtained in exercise
16.11 is 0.716. This indicates that 71.6% of the variation in the number of new visitors to the club is
explained by the regression equation.

16.28 d/p/d Both of these tests will reach the same conclusion. If the confidence interval for 3 does not
include zero, the hypothesis test will reject the null hypothesis. On the other hand, if the confidence
interval for 3 does contain zero, the hypothesis test will not reject the null hypothesis.

16.29 p/c/m We will base much of our discussion on the Minitab printout for exercise 16.11. The results
will be similar if you refer to the Excel printout.
a. The appropriate null and alternative hypotheses are:
   H0: 1 = 2 = 0 and H1: j  0, for j = 1 or 2
   From the ANOVA portion of the Minitab printout, we have:
Analysis of Variance
Source            DF              SS            MS          F          P
Regression         2          143.92         71.96       6.32      0.043
Residual Error     5           56.95         11.39
Total              7          200.87

   The p-value for the ANOVA test of the overall significance of the regression equation is 0.043.
   Since p-value = 0.043 is <  = 0.05 level of significance for the test, we reject H0. At this level, there
   is evidence to suggest that the regression equation is significant.

b. From the upper portion of the Minitab printout:
The regression equation is Visitors = 10.7 + 2.16 AdSize + 0.0416 Discount

Predictor          Coef       SE Coef            T         P
Constant         10.687         3.875         2.76     0.040
AdSize           2.1569        0.6281         3.43     0.019
Discount        0.04157       0.04380         0.95     0.386

S = 3.375         R-Sq = 71.6%         R-Sq(adj) = 60.3%

   Here we are asked to conduct two hypothesis tests. We will not test the y-intercept since this test is
   generally not of practical importance. The appropriate null and alternative hypotheses are:
       Test for 1: H0: 1 = 0 and H1: 1  0
       Test for 2: H0: 2 = 0 and H1: 2  0
   The p-value for the test of 1 is 0.019. Since p-value = 0.019 is <  = 0.05 level of significance for the
   test, we reject H0. At this level, there is evidence to suggest that 1 is nonzero.
   The p-value for the test of 2 is 0.386. Since p-value = 0.386 is not <  = 0.05 level of significance for
   the test, we do not reject H0. At this level, there is no evidence to suggest that 1 is nonzero.
c. The ANOVA test for the overall regression indicates that the regression explains a significant
   proportion of the variation in the number of new visitors to the club. The tests for the individual partial
   regression coefficients indicate that the size of the ad contributes to the explanatory power of the
   model, while the discount offered does not.



                                                      532
d. With d.f. = 8 - 2 - 1 = 5, the appropriate t-value for the 95% confidence interval will be 2.571.
   The 95% confidence interval for population partial regression coefficient 1 is:
       b1  t s b1 = 2.1569  2.571(0.6281) = 2.1569  1.6148 = (0.5421, 3.7717)
   The 95% confidence interval for population partial regression coefficient 2 is:
       b2  t s b 2 = 0.04157  2.571(0.04380) = 0.04157  0.1126 = (-0.0710, 0.1542)
With Excel, we can obtain confidence intervals for the population regression coefficients along with the
standard regression output. Excel will provide 95% confidence intervals, but we can also specify the
inclusion of 90% or any other confidence levels we wish to see. The Excel printout for exercise 16.11
included 95% confidence intervals for 1 and 2.

16.30 p/c/m We will base much of our discussion on the Minitab printout for exercise 16.12. The results
will be similar if you refer to the Excel printout.
a. The appropriate null and alternative hypotheses are:
   H0: 1 = 2 = 3 = 0 and H1: j  0, for j = 1, 2, or 3
   From the ANOVA portion of the Minitab printout, we have:

Analysis of Variance
Source            DF             SS           MS           F         P
Regression         3        126.714       42.238        5.17     0.054
Residual Error     5         40.842        8.168
Total              8        167.556


   The p-value for the ANOVA test of the overall significance of the regression equation is 0.054.
   Since p-value = 0.054 is not <  = 0.05 level of significance for the test, we do not reject H0. At this
   level, there is no evidence to suggest that the regression equation is significant.

b. From the upper portion of the Minitab printout:

The regression equation is Overall = 35.6 + 3.68 Ride + 2.89 Handling - 0.11 Comfort

Predictor         Coef       SE Coef            T         P
Constant         35.63         13.42         2.66     0.045
Ride             3.675         1.639         2.24     0.075
Handling         2.892         1.055         2.74     0.041
Comfort         -0.110         1.625        -0.07     0.949

S = 2.858         R-Sq = 75.6%        R-Sq(adj) = 61.0%


   Here we are asked to conduct three hypothesis tests. We will not test the y-intercept since this test is
   generally not of practical importance. The appropriate null and alternative hypotheses are:
       Test for 1: H0: 1 = 0 and H1: 1  0
       Test for 2: H0: 2 = 0 and H1: 2  0
       Test for 3: H0: 3 = 0 and H1: 3  0
   The p-value for the test of 1 is 0.075. Since p-value = 0.075 is not <  = 0.05 level of significance for
   the test, we do not reject H0. At this level, there is no evidence to suggest that 1 is nonzero.
   The p-value for the test of 2 is 0.041. Since p-value = 0.041 is <  = 0.05 level of significance for the
   test, we reject H0. At this level, there is evidence to suggest that 2 is nonzero.
   The p-value for the test of 3 is 0.949. Since p-value = 0.949 is not <  = 0.05 level of significance for
   the test, we do not reject H0. At this level, there is no evidence to suggest that 3 is nonzero.
c. The ANOVA test for the overall regression indicates that the regression does not explain a significant
   (at the 0.05 level) proportion of the variation in the overall ratings. In only one case, that for 2



                                                     533
   (associated with handling), does an individual hypothesis test indicate that a population regression
   coefficient could be nonzero.
d. With d.f. = 9 - 3 - 1 = 5, the appropriate t-value for the 95% confidence interval will be 2.571.
   The 95% confidence interval for population partial regression coefficient 1 is:
        b1  t s b1 = 3.675  2.571(1.639) = 3.675  4.214 = (-0.54, 7.89)
      The 95% confidence interval for population partial regression coefficient 2 is:
          b2  t s b 2 = 2.892  2.571(1.055) = 2.892  2.712 = (0.18, 5.60)
      The 95% confidence interval for population partial regression coefficient 3 is:
          b3  t s b3 = -0.110  2.571(1.625) = -0.110  4.178 = (-4.29, 4.07)
With Excel, we can obtain confidence intervals for the population regression coefficients along with the
standard regression output. Excel will provide 95% confidence intervals, but we can also specify the
inclusion of 90% or any other confidence levels we wish to see. The Excel printout for exercise 16.12
already included 95% confidence intervals for 1, 2, and 3. Here is a repeat of the lower portion of that
Excel printout:
                  A        B             C            D           E               F           G
 23   ANOVA
 24                        df            SS          MS           F         Significance F
 25   Regression                 3       126.7139    42.2380      5.1709             0.0543
 26   Residual                   5        40.8416     8.1683
 27   Total                      8       167.5556
 28
 29                    Coefficients Standard Error   t Stat     P-value      Lower 95%    Upper 95%
 30   Intercept          35.62642         13.41832    2.65506     0.04515         1.13360   70.11924
 31   Ride                 3.67543         1.63891    2.24260     0.07497        -0.53752    7.88838
 32   Handling             2.89205         1.05540    2.74024     0.04078         0.17907    5.60502
 33   Comfort             -0.11009         1.62469   -0.06776     0.94860        -4.28648    4.06631


16.31 p/a/m To determine the 90% confidence interval for each partial regression coefficient in exercise
16.15, we must determine the appropriate t. There are (12 - 3 - 1) = 8 degrees of freedom, so the
appropriate t is t =1.860.
The 90% confidence interval for population partial regression coefficient 1 is:
        b1  t s b1 = 0.2745  1.860(0.1111) = 0.2745 ± 0.2066 = (0.0679, 0.4811)
       This confidence interval does not contain zero, so it is likely the variation in the scores on test 1
       do contribute significantly to the explanation of the variation of the scores on test 4.
The 90% confidence interval for population partial regression coefficient 2 is:
       b2  t s b 2 = 0.37619  1.860(0.09858) = 0.37619 ± 0.18336 = (0.1928, 0.5596).
       This confidence interval does not contain zero, so it is likely the variation in the scores on test 2
       do contribute significantly to the explanation of the variation of the scores on test 4.
The 90% confidence interval for population partial regression coefficient 3 is:
       b3  t s b3 = 0.32648  1.860(0.08084) = 0.32648 ± 0.15036 = (0.1761, 0.4768)
           This confidence interval does not contain zero, so it is likely the variation in the scores on test 3
           do contribute significantly to the explanation of the variation of the scores on test 4.




                                                           534
16.32 p/c/m Referring to the Minitab printout in the solution to exercise 16.18:
a. In the ANOVA test for overall significance, p-value = 0.264 is not < 0.10 level of significance, so we
   conclude that the overall regression is not significant. At this level, all of the population partial
   regression coefficients could be zero.
b. In testing the partial regression coefficients for price, performance, and battery life, the p-values are
   0.130, 0.402, and 0.612, respectively. None of these is less than the 0.10 level of significance being
   used to reach a conclusion. None of the three partial regression coefficients differs significantly from
   zero.

16.33 p/c/m Referring to the Minitab printout in the solution to exercise 16.19:
a. In the ANOVA test for overall significance, p-value = 0.002 is < 0.05 level of significance, so we
   conclude that the overall regression is significant.
b. In testing the partial regression coefficients for math proficiency test score and SAT quantitative score,
   the p-values are 0.002 and 0.023, respectively. Each p-value is < 0.05 level of significance, and each of
   the partial regression coefficients differs significantly from zero.

16.34 p/c/m The Minitab printout is shown below.
Regression Analysis: Est P/E Rati versus Revenue%Grow, Earn/Share %

The regression equation is
Est P/E Ratio = 51.7 - 0.103 Revenue%Growth + 0.0143 Earn/Share %Growth

96 cases used 4 cases contain missing values

Predictor         Coef       SE Coef            T         P
Constant         51.73         15.66         3.30     0.001
Revenue%       -0.1027        0.2171        -0.47     0.637
Earn/Sha       0.01431       0.07316         0.20     0.845

S = 64.30         R-Sq = 0.3%         R-Sq(adj) = 0.0%

Analysis of Variance
Source            DF             SS            MS           F        P
Regression         2            986           493        0.12    0.888
Residual Error    93         384502          4134
Total             95         385488

a. The regression equation is
   Est P/E Ratio = 51.7 - 0.103*Revenue%Growth + 0.0143*Earn/Share %Growth.
   The partial regression coefficient for revenue growth percentage is -0.103. On average, with
   earnings/share growth percentage fixed, a one percentage point increase in revenue growth percentage
   will be accompanied by a decrease of 0.103 in the estimated price/earnings ratio.
   The partial regression coefficient for earnings/share growth percentage is 0.0143. On average, with
   revenue growth percentage fixed, a one percentage point increase in earnings/share growth percentage
   will be accompanied by an increase of 0.0143 in the estimated price/earnings ratio.
b. The p-value in the ANOVA section of the printout is 0.888. This is not less than the 0.05 level of
   significance. At this level, the overall regression equation is not significant.
c. The p-values for the tests of the two partial regression coefficients are 0.637 and 0.845, respectively.
   Neither p-value is less than the 0.05 level of significance, and we conclude that neither partial
   regression coefficient differs significantly from zero.
d. The 95% confidence interval for each partial regression coefficient could be calculated using formulas
   and pocket calculator, as was demonstrated in the solution to exercise 16.31. We will rely on the Excel
   printout, shown below. The 95% confidence interval for population partial regression coefficient 1 is
   from -0.5338 to 0.3285. The 95% confidence interval for population partial regression coefficient 2 is

                                                     535
      from -0.1310 to 0.1596. (Note: In applying Excel, it is necessary to delete the four cases that have
      missing data for one or more of these variables.)
    SUMMARY OUTPUT

            Regression Statistics
    Multiple R                     0.0506
    R Square                       0.0026
    Adjusted R Square             -0.0189
    Standard Error               64.2995
    Observations                       96

    ANOVA
                                 df             SS         MS            F      Significance F
    Regression                         2        986.0007 493.0004        0.1192          0.8877
    Residual                          93     384501.9576 4134.4297
    Total                             95     385487.9583

                            Coefficients Standard Error    t Stat      P-value    Lower 95%    Upper 95%
    Intercept                   51.7305         15.6650       3.3023     0.0014       20.6230     82.8380
    Revenue%Growth               -0.1027         0.2171      -0.4729     0.6374        -0.5338     0.3285
.   Earn/Share %Growth            0.0143         0.0732       0.1955     0.8454        -0.1310     0.1596


16.35 p/c/m The Minitab printout is shown below.

The regression equation is
$GroupRevenue = -40855482 + 44282 RetailUnits + 152760 NumDealrs

Predictor             Coef        SE Coef            T        P
Constant         -40855482       20217627        -2.02    0.046
RetailUn             44282           1290        34.33    0.000
NumDealr            152760        1943687         0.08    0.938

S = 176197662         R-Sq = 99.3%          R-Sq(adj) = 99.3%

Analysis of Variance
Source            DF          SS          MS                    F          P
Regression         2 4.46877E+20 2.23439E+20              7197.11      0.000
Residual Error    95 2.94933E+18 3.10456E+16
Total             97 4.49827E+20

a. The regression equation is
   $GroupRevenue = -40,855,482 + 44,282*RetailUnits + 152,760*NumDealrs
   The partial regression coefficient for RetailUnits is 44,282. On average, with the number of dealers
   fixed, an increase of 1 in retail units sold is accompanied by an increase of $44,282 in revenue for the
   dealer group.
   The partial regression coefficient for NumDealrs is 152,760. On average, with the number of retail
   units fixed, an increase of 1 in the number of dealers will be accompanied by an increase of $152,760
   in revenue for the dealer group.
b. The p-value in the ANOVA section of the printout is (to three decimal places) 0.000. This is less than
   the 0.02 level of significance. At this level, the overall regression equation is significant.
c. The p-values for the tests of the two partial regression coefficients are 0.000 and 0.938, respectively.
   Using the 0.02 level of significance, the partial regression coefficient for the first independent variable
   (retail units) is significantly different from zero, but the partial regression coefficient for the second
   independent variable (number of dealers) does not differ significantly from zero.
d. The 98% confidence interval for each partial regression coefficient could be calculated using formulas
   and pocket calculator, as was demonstrated in the solution to exercise 16.31. We will rely on the Excel
   printout, shown below. The 98% confidence interval for population partial regression coefficient 1 is


                                                          536
   from 41,229.4 to 47,333.8. The 98% confidence interval for population partial regression coefficient 2
   is from -4,446,472 to 4,751,992.
            J                   K             L            M             N            O          P
 1    SUMMARY OUTPUT
 2
 3           Regression Statistics
 4    Multiple R                  0.9967
 5    R Square                    0.9934
 6    Adjusted R Square           0.9933
 7    Standard Error         176197662
 8    Observations                    98
 9
 10   ANOVA
 11                             df           SS             MS           F      Significance F
 12   Regression                      2     4.469E+20    2.234E+20    7.197E+03    1.960E-104
 13   Residual                       95     2.949E+18    3.105E+16
 14   Total                          97     4.498E+20
 15
 16                        Coefficients Standard Error    t Stat      P-value   Lower 98.0% Upper 98.0%
 17   Intercept            -40855482.0     20217627.3        -2.021       0.046  -88695270.0  6984305.9
 18   RetailUnits              44281.6        1289.89        34.330       0.000      41229.4    47333.8
 19   NumDealrs               152760.2     1943686.79         0.079       0.938     -4446472    4751992


16.36 d/p/m The normal probability plot is used to examine whether the residuals could have come from a
normally distributed population. One of the assumptions underlying multiple regression analysis is that
the residuals are normally distributed with a mean of zero.

16.37 d/p/m Residual analysis can be used to examine the residuals with respect to the assumptions
underlying multiple regression analysis. We can do many things with residual analysis, including:
(1) constructing a histogram of the residuals as a rough check to see if they are approximately normally
distributed, (2) constructing a normal probability plot or other normality test to examine whether the
residuals could have come from a normally-distributed population, (3) plotting the residuals versus each
of the independent variables to see if they exhibit some cycle or pattern with respect to that variable, and
(4) plotting the residuals versus the order in which the observations were recorded to look for
autocorrelation.

16.38 p/p/m Referring to the printout given in exercise 16.15, we can determine the following:
a. The partial regression coefficient for TEST01 is 0.2745. This implies that, holding the scores on tests 2
   and 3 constant, a one point increase in the score on test 1 will result in a 0.2745 point increase in the
   score on test 4. The partial regression coefficient for TEST02 is 0.37619. This implies that, for a given
   level of scores on tests 1 and 3, a one point increase in the score on test 2 will result in a 0.37619 point
   increase in test 4. Finally, the partial regression coefficient for TEST03 is 0.32648. This implies that,
   holding scores on tests 1 and 2 constant, a one point increase in the score on test 3 will result in a
   0.32648 point increase in the score on test 4.
b. 87.2% of the variation in y is explained by the equation.
c. The overall regression is significant at the 0.001 level.
d. The p-value for the partial regression coefficient for TEST01 is 0.039; for TEST02, 0.005; and, for
   TEST03, 0.004. This would indicate that TEST03 contributes the most to the explanation of the
   variation in scores on test 4; however, TEST02 is almost as useful. TEST01 appears to be least useful,
   but it is still significant at the 0.05 level.




                                                         537
16.39 p/c/m
a. The histogram does not reveal any radical departures from a symmetric distribution. Of course, it is
   difficult to determine this with only eight data points.
                                       Histogram of the Residuals
                                            (response is Visitors)

             2.0



             1.5
 Frequency




             1.0



             0.5



             0.0
                                -4              -2                   0     2
                                                       Residual




b. In this Minitab test for normality, the points in the normal probability plot don't appear to deviate
   excessively from a straight line and the approximate p-value is shown as >0.15. There is nothing here
   to suggest that the residuals may not have come from a normally distributed population.
                                        Probability Plot of RESI1
                                                     Normal
             99
                                                                           Mean      -5.77316E-15
                                                                           StDev            2.852
             95                                                            N                    8
                                                                           KS               0.184
             90
                                                                           P-Value         >0.150
             80
             70
 Percent




             60
             50
             40
             30
             20

             10

             5


             1
                  -7.5   -5.0        -2.5     0.0        2.5         5.0
                                            RESI1




                                                                                     538
c. Plots of residuals versus the independent variables.
   Plot of residuals versus ad size.
                                    Residuals Versus AdSize
                                           (response is Visitors)
            4

            3

            2

            1
 Residual




            0

            -1

            -2

            -3

            -4

            -5
                      2        3           4          5              6        7          8
                                                    AdSize




        Plot of residuals versus discount size.
                                    Residuals Versus Discount
                                           (response is Visitors)
            4

            3

            2

            1
 Residual




            0

            -1

            -2

            -3

            -4

            -5
                 10       20   30     40         50      60         70   80       90     100
                                                   Discount




The plots above do not reveal any alarming problems. Overall, there is no strong evidence to indicate that
any underlying assumptions of the multiple regression model have been violated.




                                                                                       539
16.40 p/c/m
a. The histogram does not reveal any radical departures from a symmetric distribution. Of course, there
   are only 9 points.
                                      Histogram of the Residuals
                                           (response is Overall)

             2.0



             1.5
 Frequency




             1.0



             0.5



             0.0
                          -3      -2        -1          0          1         2             3
                                                     Residual




b. In this Minitab test for normality, the points in the normal probability plot seem to deviate somewhat
   from the straight line, and the approximate p-value is shown as 0.056. This would seem to raise some
   suspicions -- however, at the 0.05 level of significance, we would conclude that the residuals could
   have come from a normally distributed population.
                                       Probability Plot of RESI1
                                                   Normal
             99
                                                                                 Mean      -2.68427E-14
                                                                                 StDev            2.259
             95                                                                  N                    9
                                                                                 KS               0.271
             90
                                                                                 P-Value          0.056
             80
             70
 Percent




             60
             50
             40
             30
             20

             10

             5


             1
                   -5.0        -2.5         0.0             2.5        5.0
                                           RESI1




                                                                                           540
c. Plots of residuals versus the independent variables. Considering the small n, the plots below do not
   reveal any alarming problems. Overall, there is no strong evidence to indicate that any underlying
   assumptions of the multiple regression model have been violated.

       Plot of residuals versus Ride.
                               Residuals Versus Ride
                                  (response is Overall)

            3

            2

            1
 Residual




            0

            -1

            -2

            -3

            -4
                 6.0   6.5       7.0         7.5          8.0   8.5    9.0
                                            Ride




        Plot of residuals versus Handling.
                             Residuals Versus Handling
                                  (response is Overall)

            3

            2

            1
 Residual




            0

            -1

            -2

            -3

            -4
                 6.0   6.5       7.0        7.5           8.0   8.5    9.0
                                          Handling




        Plot of residuals versus Comfort.




                                                                      541
                                            Residuals Versus Comfort
                                                   (response is Overall)

             3

             2

             1
 Residual




             0

             -1

             -2

             -3

             -4
                        7.0                 7.5              8.0                      8.5                   9.0
                                                           Comfort


16.41 p/c/m
a. This histogram does not appear to reveal any radical departures from a symmetric distribution,
   although there are relatively few data points.
                                            Histogram of the Residuals
                                                  (response is Crispness)

             2.0



             1.5
 Frequency




             1.0



             0.5



             0.0
                              -18     -12         -6       0       6             12         18         24
                                                            Residual




b. In this Minitab test for normality, the points in the normal probability plot don't appear to deviate
   excessively from a straight line and the approximate p-value is shown as >0.15. There is nothing here
   to suggest that the residuals may not have come from a normally distributed population.
                                             Probability Plot of RESI1
                                                          Normal
             99
                                                                                             Mean      2.583792E-15
                                                                                             StDev            13.81
             95                                                                              N                   11
                                                                                             KS               0.130
             90
                                                                                             P-Value         >0.150
             80
             70
 Percent




             60
             50
             40
             30
             20

             10

             5


             1
                  -40         -30   -20     -10     0       10     20       30         40
                                                  RESI1




                                                                                                       542
c. Plots of residuals versus the independent variables. The plots below do not reveal any alarming
   problems. Overall, there is no strong evidence to indicate that any underlying assumptions of the
   multiple regression model have been violated.

       Plot of residuals versus time in oven.
                              Residuals Versus OvenTime
                                    (response is Crispness)


            20



            10
 Residual




             0



            -10



            -20
                  5          6                 7                8           9
                                             OvenTime




       Plot of residuals versus oven temperature.
                                  Residuals Versus Temp
                                    (response is Crispness)


            20



            10
 Residual




             0



            -10



            -20
                      350   375           400             425       450         475
                                                Temp




                                                                          543
16.42 p/c/m
a. This histogram does not appear to reveal any radical departures from a symmetric distribution,
   although there are relatively few data points.
                                      Histogram of the Residuals
                                              (response is Rating)

             2.0



             1.5
 Frequency




             1.0



             0.5



             0.0
                         -3          -2        -1          0         1     2             3
                                                        Residual




b. In this Minitab test for normality, the points in the normal probability plot don't appear to deviate
   excessively from a straight line and the approximate p-value is shown as >0.15. There is nothing here
   to suggest that the residuals may not have come from a normally distributed population.
                                          Probability Plot of RESI1
                                                      Normal
             99
                                                                               Mean      -3.55271E-15
                                                                               StDev            1.988
             95                                                                N                    8
                                                                               KS               0.170
             90
                                                                               P-Value         >0.150
             80
             70
 Percent




             60
             50
             40
             30
             20

             10

             5


             1
                  -5.0        -2.5             0.0             2.5       5.0
                                              RESI1




                                                                                         544
c. Plots of residuals versus the independent variables. The plots below do not reveal any alarming
   problems. Overall, there is no strong evidence to indicate that any underlying assumptions of the
   multiple regression model have been violated.

       Plot of residuals versus price.
                                   Residuals Versus Price
                                      (response is Rating)

            3


            2


            1
 Residual




            0


            -1


            -2


            -3
                      1000    1200          1400             1600         1800         2000
                                               Price




       Plot of residuals versus performance.
                                  Residuals Versus Perform
                                      (response is Rating)

            3


            2


            1
 Residual




            0


            -1


            -2


            -3
                 90          95              100                    105          110
                                              Perform




       Plot of residuals versus battery life.



                                                                                 545
                                   Residuals Versus BattLife
                                         (response is Rating)

             3


             2


             1
 Residual




             0


             -1


             -2


             -3
                        2.0        2.2       2.4           2.6         2.8            3.0         3.2
                                                    BattLife


16.43 p/c/m
a. This histogram does not appear to reveal any radical departures from a symmetric distribution,
   although there are relatively few data points.
                                   Histogram of the Residuals
                                         (response is CalcFin)

             3.0


             2.5


             2.0
 Frequency




             1.5


             1.0


             0.5


             0.0
                        -6          -4         -2              0             2              4
                                                    Residual




b. In this Minitab test for normality, the points in the normal probability plot don't appear to deviate
   excessively from a straight line and the approximate p-value is shown as >0.15. There is nothing here
   to suggest that the residuals may not have come from a normally distributed population.
                                    Probability Plot of RESI1
                                               Normal
             99
                                                                                        Mean           0
                                                                                        StDev      3.488
             95                                                                         N              9
                                                                                        KS         0.197
             90
                                                                                        P-Value   >0.150
             80
             70
 Percent




             60
             50
             40
             30
             20

             10

             5


             1
                  -10         -5             0                     5             10
                                           RESI1




                                                                                            546
c. Plots of residuals versus the independent variables. The plots below do not reveal any alarming
   problems. Overall, there is no strong evidence to indicate that any underlying assumptions of the
   multiple regression model have been violated.

       Plot of residuals versus math proficiency test.
                              Residuals Versus MathPro
                                    (response is CalcFin)

            5.0



            2.5
 Residual




            0.0



            -2.5



            -5.0


                         70    75          80       85       90   95
                                            MathPro




        Plot of residuals versus SAT quantitative.
                                Residuals Versus SATQ
                                    (response is CalcFin)

            5.0



            2.5
 Residual




            0.0



            -2.5



            -5.0


                   450        500               550         600         650
                                               SATQ




                                                                  547
16.44 p/c/m The Minitab printout is shown below.
Regression Analysis: Time versus Years, Score

The regression equation is Time = 104 - 0.288 Years - 0.679 Score

Predictor         Coef      SE Coef            T         P
Constant        103.85        16.69         6.22     0.000
Years          -0.2884       0.3216        -0.90     0.391
Score          -0.6792       0.2218        -3.06     0.012

S = 2.862        R-Sq = 53.9%        R-Sq(adj) = 44.7%

Analysis of Variance
Source            DF            SS            MS          F         P
Regression         2        95.757        47.879       5.84     0.021
Residual Error    10        81.935         8.194
Total             12       177.692


a. The multiple regression equation is Time = 103.85 - 0.2884*Years - 0.6792*Score.
   The partial regression coefficient for years on the job indicates that, for a given score on the aptitude
   test, the time it takes to perform the standard task decreases by 0.2884 seconds for each additional year
   on the job. The partial regression coefficient for the test score indicates that, given a set number of
   years on the job, a one point increase in the test score will result in a 0.6792 second decrease in the
   amount of time required to perform the required task.
b. The appropriate number of degrees of freedom for this problem will be d.f. = 13 - 2 - 1, or 10, and the
   appropriate t-value for a 95% confidence interval is t = 2.228.
       The 95% confidence interval for population partial regression coefficient 1 is:
           b1  t s b1 = -0.2884  2.228(0.3216) = -0.2884 ± 0.7165 = (-1.0049, 0.4281)
      The 95% confidence interval for population partial regression coefficient 2 is:
        b2  t s b 2 = -0.6792  2.228(0.2218) = -0.6792 ± 0.4942 = (-1.1734, -0.1850)
c. The coefficient of multiple determination is 0.539. This indicates that 53.9% of the variation in the
   time required to complete the task is explained by the regression equation. The partial regression
   coefficient for Years is significantly different from zero at the 0.391 level. The partial regression
   coefficient for Score is significantly different from zero at the 0.012 level, and the overall regression
   equation is significant at the 0.021 level.
d. The residual analyses follow. First, the histogram of the residuals is examined to see if it is symmetric
   about zero. Next the normal probability plot is graphed and the p-value interpreted to examine whether
   the residuals could have come from a normal population. Finally, the residuals are plotted against each
   of the independent variables to check for cyclical or other patterns.



                                                    548
In the histogram, there does not appear to be any alarming deviation from a symmetric distribution.
                                 Histogram of the Residuals
                                         (response is Time)

             3.0


             2.5


             2.0
 Frequency




             1.5


             1.0


             0.5


             0.0
                         -4.5    -3.0        -1.5         0.0         1.5        3.0
                                                 Residual




In this Minitab test for normality, the points in the normal probability plot don't appear to deviate
excessively from a straight line but the approximate p-value is shown as >0.15. There is nothing here to
suggest that the residuals may not have come from a normally distributed population.
                                  Probability Plot of RESI1
                                                 Normal
             99
                                                                            Mean       -1.31177E-14
                                                                            StDev             2.613
             95                                                             N                    13
                                                                            KS                0.136
             90
                                                                            P-Value          >0.150
             80
             70
 Percent




             60
             50
             40
             30
             20

             10

             5


             1
                  -7.5   -5.0   -2.5       0.0            2.5   5.0
                                        RESI1




The plots of residuals versus the independent variables do not present any alarming patterns. Overall, the
residual analysis does not suggest that any of the assumptions underlying multiple regression analysis
have been violated.
Plot of residuals versus years on job.


                                                                                        549
                                      Residuals Versus Years
                                              (response is Time)
            4

            3

            2

            1
 Residual




            0

            -1

            -2

            -3

            -4

            -5
                 5      6        7        8         9       10      11        12        13     14
                                                        Years


Plot of residuals versus test score.
                                      Residuals Versus Score
                                              (response is Time)
            4

            3

            2

            1
 Residual




            0

            -1

            -2

            -3

            -4

            -5
                 70         72       74            76        78          80        82          84
                                                        Score




The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients. When generating
this printout, we can also specify a normal probability plot and plots of the residuals against the
independent variables. Their appearance would be essentially similar to those of Minitab.
                      A                 B                          C                     D            E               F            G
 16         SUMMARY OUTPUT
 17                Regression Statistics
 18         Multiple R                   0.7341
 19         R Square                     0.5389
 20         Adjusted R Square            0.4467
 21         Standard Error               2.8624
 22         Observations                     13
 23
 24         ANOVA
 25                                           df                   SS                   MS            F         Significance F
 26         Regression                                   2           95.757              47.879        5.843              0.021
 27         Residual                                    10           81.935               8.194
 28         Total                                       12          177.692
 29
 30                                   Coefficients Standard Error                   t Stat          P-value      Lower 95%     Upper 95%
 31         Intercept                   103.8529          16.6919                      6.2217           0.000           66.661    141.045
 32         Years                          -0.2884         0.3216                     -0.8966           0.391           -1.005       0.428
 33         Score                          -0.6792         0.2218                     -3.0623           0.012           -1.173      -0.185




                                                                                             550
16.45 p/c/m The Minitab printout is shown below.
Regression Analysis: Distance versus Price, Sensitiv, Weight

The regression equation is
Distance = - 0.562 +0.000355 Price + 0.0112 Sensitiv - 0.0212 Weight

Predictor          Coef       SE Coef            T          P
Constant        -0.5617        0.8656        -0.65      0.545
Price         0.0003550     0.0005601         0.63      0.554
Sensitiv       0.011248      0.007605         1.48      0.199
Weight         -0.02116       0.02471        -0.86      0.431

S = 0.05167       R-Sq = 46.5%         R-Sq(adj) = 14.4%

Analysis of Variance
Source            DF              SS            MS           F          P
Regression         3        0.011590      0.003863        1.45      0.334
Residual Error     5        0.013349      0.002670
Total              8        0.024939

a. The estimated regression equation is:
   Distance = -0.5617 + 0.0003550*Price + 0.011248*Sensitiv - 0.02116*Weight.
   The partial regression coefficient for the price indicates that, holding the weight and sensitivity
   constant, a $1 increase in price will result in a 0.0003550 mile increase in the warning distance.
   The partial regression coefficient for the sensitivity indicates that, holding the price and weight
   constant, a one unit increase in sensitivity will result in a 0.011248 mile increase in the warning
   distance. Finally, the partial regression coefficient for the weight indicates that, holding the price and
   sensitivity constant, a one ounce increase in weight will result in a 0.02116 mile decrease in the
   warning distance.
b. The appropriate degrees of freedom for this problem will be d.f. = 9 - 3 - 1 = 5. The t-value for a 95%
   confidence interval with 5 degrees of freedom is t = 2.571.
       The 95% confidence interval for population partial regression coefficient 1 is:
            b1  t s b1 = 0.000355  2.571(0.0005601) = 0.000355 ± 0.001440 = (-0.0011, 0.0018)
       The 95% confidence interval for population partial regression coefficient 2 is:
           b2  t s b 2 = 0.011248  2.571(0.007605) = 0.011248 ± 0.019552 = (-0.0083, 0.0308)
       The 95% confidence interval for population partial regression coefficient 3 is:
           b3  t s b3 = -0.02116  2.571(0.02471) = -0.02116 ± 0.06353 = (-0.0847, 0.0424)
c. The coefficient of multiple determination is 0.465. This indicates that 46.5% of the variation in the
   warning distance is explained by the regression equation. However, none of the partial regression
   coefficients is significant at the 0.10 level. (The coefficient for price is significant at the 0.554 level;


                                                       551
   for sensitivity, at the 0.199 level; and, for weight, at the 0.431 level.) The overall regression is only
   significant at the 0.334 level. The adjusted R-square is 0.144. Recall that this has been adjusted for the
   degrees of freedom. Thus, there are no significant relationships in this regression. Apparently the
   coefficient of multiple determination is as large as it is because of the limited size of the data set.
d. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
   about zero. Next the normal probability plot is graphed to examine whether the residuals could have
   come from a normally distributed population. Finally, the residuals are plotted against each of the
   independent variables to check for cyclical patterns.




In the following histogram of residuals, there seems to be a slight deviation from a symmetric
distribution, but the number of data values is relatively small.
                                             Histogram of the Residuals
                                                    (response is Distance)

             4



             3
 Frequency




             2



             1



             0
                          -0.075           -0.050      -0.025      0.000     0.025       0.050
                                                            Residual




In this Minitab test for normality, the points in the normal probability plot appear to deviate excessively
from a straight line and the approximate p-value is shown as 0.048. At the 0.05 level of significance,
we would conclude that the residuals did not come from a normally distributed population.
                                              Probability Plot of RESI1
                                                            Normal
             99
                                                                                     Mean      9.868649E-17
                                                                                     StDev          0.04085
             95                                                                      N                    9
                                                                                     KS               0.276
             90
                                                                                     P-Value          0.048
             80
             70
 Percent




             60
             50
             40
             30
             20

             10

             5


             1
                  -0.10            -0.05             0.00            0.05    0.10
                                                    RESI1




                                                                                               552
The plots of residuals versus the independent variables are shown below. Given the relatively small
number of data points, none of the three plots shows any alarming patterns. In the third plot, most of the
unusual pattern is due to the underlying data, with most of the weights clustered about the six ounce level,
while one of the detectors weighs only 3.8 ounces.
                                          Residuals Versus Price
                                                (response is Distance)
            0.050



            0.025


            0.000
 Residual




            -0.025



            -0.050


            -0.075
                       200           220                240               260         280         300
                                                              Price




                                         Residuals Versus Sensitiv
                                                (response is Distance)
            0.050



            0.025


            0.000
 Residual




            -0.025



            -0.050


            -0.075
                     102           104               106            108         110           112
                                                            Sensitiv




                                         Residuals Versus Weight
                                                (response is Distance)
            0.050



            0.025


            0.000
 Residual




            -0.025



            -0.050


            -0.075
                             4.0          4.5              5.0           5.5    6.0         6.5
                                                              Weight




                                                                                            553
Overall, the residual analysis suggests that the residuals may not have come from a normally distributed
population. If this is true, then one of the underlying assumptions has been violated and the multiple
regression analysis may not be valid.



The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients.
                            A                     B                  C             D          E                                F                       G
 11                                                               Distance        Price    Sensitivity                     Weight
 12   SUMMARY OUTPUT                                               0.675          289        108                              3.8
 13          Regression Statistics                                 0.660          295        110                              6.1
 14   Multiple R                   0.6817                          0.640          240        108                              5.8
 15   R Square                     0.4647                          0.560          249        103                              6.6
 16   Adjusted R Square            0.1436                          0.540          260        107                              6.0
 17   Standard Error               0.0517                          0.640          200        108                              5.8
 18   Observations                      9                          0.540          199        109                              5.9
 19                                                                0.645          220        108                              5.8
 20   ANOVA                                                        0.670          250        112                              6.2
 21                                            df                   SS            MS          F                         Significance F
 22   Regression                                         3              0.0116      0.0039    1.4471                              0.3342
 23   Residual                                           5              0.0133      0.0027
 24   Total                                              8              0.0249
 25
 26                                        Coefficients Standard Error            t Stat                P-value          Lower 95%     Upper 95%
 27   Intercept                               -0.56174          0.8656              -0.6490                0.5450              -2.7868      1.6633
 28   Price                                    0.00035          0.0006               0.6338                0.5541              -0.0011      0.0018
 29   Sensitivity                              0.01125          0.0076               1.4790                0.1992              -0.0083      0.0308
 30   Weight                                  -0.02116          0.0247              -0.8564                0.4309              -0.0847      0.0424


When generating the Excel printout, we can also specify a normal probability plot and plots of the
residuals against the independent variables. Their appearance is essentially similar to those of Minitab.
             J                    K               L              M          N           O                    P            Q                  R             S
 1
 2                               Normal Probability Plot                                                         Price Residual Plot
                      0.8                                                                         0.06
 3                                                                                                0.04
                      0.6
          Distance




 4
                                                                                     Residuals




                                                                                                  0.02
 5                    0.4                                                                            0
 6                    0.2                                                                        -0.02100        150     200           250       300       350
 7                                                                                               -0.04
                       0                                                                         -0.06
 8                          0         20      40       60             80    100                  -0.08
 9                                          Sample Percentile                                                                  Price
 10


             J                    K               L              M          N           O                    P            Q                  R             S
 11
 12                              Sensitivity Residual Plot                                                   Weight Residual Plot
                       0.06                                                                       0.06
 13                    0.04                                                                       0.04
 14                    0.02
          Residuals




                                                                                     Residuals




                                                                                                  0.02
 15                       0                                                                          0
 16                   -0.02102     104      106       108       110   112   114                  -0.02 2.0        3.0    4.0           5.0       6.0       7.0
 17                   -0.04                                                                      -0.04
 18                   -0.06                                                                      -0.06
                      -0.08                                                                      -0.08
 19                                               Sensitivity                                                                 Weight
 20


16.46 d/p/e A dummy variable is a variable that takes on a value of one or zero to indicate the presence or
absence of an attribute. Dummy variables can help explain some of the variation in y due to the presence
or absence of a characteristic. Three dummy variables that can be used to describe one town versus
another are URBAN (1 if urban, 0 otherwise), MANUF (1 if durable goods manufacturing is the major

                                                                                    554
industry, 0 otherwise), and POPMIL (1 if the population is 1 million or more, 0 otherwise). Other dummy
variables could include the presence of a major university, a major medical center, a major research
institution, and many more.



16.47 p/p/e The partial regression coefficient for x1 implies that, holding the day of the week constant,
a one degree Fahrenheit increase in the temperature will result in an increase of 8 in attendance.
The partial regression coefficient for x2 implies that the attendance increases by 150 people on Saturdays
and Sundays (assuming a constant temperature).

16.48 p/p/m The estimate of 100 persons swimming on a zero-degree weekday is made well beyond the
limits of the underlying temperature data. It is always dangerous to extrapolate beyond the bounds of the
data used to estimate an equation.

16.49 d/p/m Multicollinearity is a situation in which two or more of the independent variables in a
multiple regression are highly correlated with each other. When this happens, the two correlated x
variables are really not saying different things about y. The standard errors for the partial regression
coefficients become very large and the coefficients are statistically unreliable and difficult to interpret.
Multicollinearity is a problem when we are trying to interpret the partial regression coefficients.
There are several clues to the presence of multicollinearity: (1) an independent variable known to be an
important predictor ends up having a partial regression coefficient that is not significant;
(2) a partial regression coefficient exhibits the wrong sign; and/or, (3) when an independent variable is
added or deleted, the partial regression coefficients for the other variables change dramatically.
A more practical way to identify multicollinearity is through the examination of a correlation matrix,
which is a matrix that shows the correlation of each variable with each of the other variables.
A high correlation between two independent variables is an indication of multicollinearity.

16.50 p/c/m The Minitab printout is shown below.
Regression Analysis: Pounds versus Months, Session, Gender

The regression equation is Pounds = 2.24 + 3.36 Months + 1.54 Session + 3.02 Gender

Predictor          Coef      SE Coef            T         P
Constant          2.243        8.876         0.25     0.807
Months            3.356        1.271         2.64     0.030
Session           1.538        6.791         0.23     0.826
Gender            3.018        6.671         0.45     0.663

S = 11.39         R-Sq = 48.5%        R-Sq(adj) = 29.1%

Analysis of Variance
Source            DF             SS            MS           F         P
Regression         3          975.0         325.0        2.51     0.133
Residual Error     8         1037.3         129.7
Total             11         2012.2

The partial regression coefficient for Months implies that, holding session and gender constant, an
additional month at the weight-loss clinic results in an additional weight loss of 3.356 pounds. The partial
regression coefficient for Session implies that persons attending the day sessions, holding months and
gender constant, lose 1.538 more pounds than those attending the night sessions. The partial regression
coefficient for Gender implies that, holding months and session constant, men lose 3.018 more pounds
than women. Of course, the partial regression coefficients for Session and Gender have p-values of 0.826
and 0.663, respectively. This indicates that the true coefficients are likely not different from zero.



                                                     555
Therefore, Months (p-value of 0.030) contributes the most to the explanatory power of this regression
equation.




The data and Excel multiple regression solution for this exercise are shown below.
                  A              B                 C            D           E                F           G
  1                                            Pounds Lost    Months      Session        Gender
  2                                                31           5            1               1
  3                                                49           8            1               1
  4                                                12           3            1               0
  5   SUMMARY OUTPUT                               26           9            0               0
  6          Regression Statistics                 34           8            0               1
  7   Multiple R                   0.6961          11           2            0               0
  8   R Square                     0.4845           4           1            0               1
  9   Adjusted R Square            0.2912          27           8            0               1
 10   Standard Error             11.3867           12           6            1               1
 11   Observations                     12          28           9            1               0
 12                                                41           6            0               0
 13   ANOVA                                        16           6            0               0
 14                              df               SS           MS           F         Significance F
 15   Regression                         3         974.9973   324.9991      2.5066             0.1329
 16   Residual                           8        1037.2527   129.6566
 17   Total                             11        2012.2500
 18
 19                         Coefficients Standard Error       t Stat      P-value      Lower 95%     Upper 95%
 20   Intercept                   2.2435         8.8760          0.2528      0.8068        -18.2246     22.7115
 21   Months                      3.3561         1.2714          2.6396      0.0297           0.4241      6.2880
 22   Session                     1.5383         6.7911          0.2265      0.8265        -14.1220     17.1987
 23   Gender                      3.0176         6.6710          0.4523      0.6630        -12.3658     18.4010


16.51 p/c/m The Minitab printout is shown below.
Regression Analysis: Speed versus Occupnts, SeatBelt

The regression equation is Speed = 67.6 - 3.21 Occupnts - 6.63 SeatBelt

Predictor               Coef          SE Coef            T         P
Constant              67.629            5.017        13.48     0.000
Occupnts              -3.214            2.191        -1.47     0.170
SeatBelt              -6.629            3.200        -2.07     0.063

S = 5.465             R-Sq = 31.5%             R-Sq(adj) = 19.1%

Analysis of Variance
Source            DF                      SS            MS          F          P
Regression         2                  151.20         75.60       2.53      0.125
Residual Error    11                  328.51         29.86
Total             13                  479.71

The partial regression coefficient for Occupnts implies that, holding seat belt usage constant, the speed
decreases by 3.214 miles per hour for each additional occupant in the car. The partial regression
coefficient for SeatBelt implies that, for a given number of occupants, drivers who wear seat belts travel
6.629 miles per hour slower than those who do not. The p-value for Occupnts is 0.170; this implies that
the partial regression coefficient for this variable is not significantly different from zero. The p-value for
SeatBelt is 0.063; this implies that the partial regression coefficient for this variable is significantly
different from zero at the 0.063 level. It appears that seat belt usage provides a much stronger explanation
for the variation in speeds driven by various drivers than does the number of occupants in the car.

                                                              556
The Excel multiple regression solution for this exercise is shown below.
               D                 E              F             G           H            I           J
 2
 3           Regression Statistics
 4    Multiple R                   0.5614
 5    R Square                     0.3152
 6    Adjusted R Square            0.1907
 7    Standard Error               5.4649
 8    Observations                     14
 9
 10   ANOVA
 11                              df            SS            MS           F      Significance F
 12   Regression                       2       151.2000      75.6000      2.5314         0.1246
 13   Residual                        11       328.5143      29.8649
 14   Total                           13       479.7143
 15
 16                          Coefficients Standard Error    t Stat      P-value    Lower 95%   Upper 95%
 17   Intercept                  67.6286          5.0172     13.4795      0.0000       56.5859    78.6713
 18   Occupnts                    -3.2143         2.1908      -1.4672     0.1703       -8.0363     1.6077
 19   SeatBelt                    -6.6286         3.1999      -2.0715     0.0626      -13.6715     0.4144



CHAPTER EXERCISES
16.52 p/c/m The Minitab printout is shown below.
Regression Analysis: Tip versus Check, Diners

The regression equation is Tip = - 1.92 + 0.223 Check - 0.184 Diners

Predictor             Coef        SE Coef               T       P
Constant            -1.915          1.598           -1.20   0.284
Check              0.22275        0.04608            4.83   0.005
Diners             -0.1845         0.4133           -0.45   0.674

S = 1.524            R-Sq = 83.4%           R-Sq(adj) = 76.8%

Analysis of Variance
Source            DF                  SS            MS           F           P
Regression         2              58.389        29.194       12.57       0.011
Residual Error     5              11.611         2.322
Total              7              70.000

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                           95.0% PI
1         6.441      0.800   (   4.385,   8.498)              (     2.017, 10.866)

Values of Predictors for New Observations
New Obs     Check    Diners
1            40.0      3.00


a. The regression equation is: Tip = -1.915 + 0.22275*Check - 0.1845*Diners
   The partial regression coefficient for the check indicates that, the number of diners held constant, a $1
   increase in the check will result in a $0.22275 increase in the tip. The partial regression coefficient for


                                                            557
            the number of diners indicates that, holding the size of the check constant, an additional diner will
            result in a tip that is $0.1845 smaller.
b.          The estimated tip amount for three diners who have a $40 check is $6.441.
c.          The 95% prediction interval for the tip left by a dining party like the one in part b is $2.017 to $10.866.
d.          The 95% confidence interval for the mean tip left by all dining parties like the one in part b is $4.385
            to $8.498.
e.          The appropriate value for d.f. will be d.f. = 8 - 2 – 1 = 5. The t-value for a 95% confidence interval
            with 5 degrees of freedom is t = 2.571.
                The 95% confidence interval for population partial regression coefficient 1 is:
                       b1  t s b1 = 0.22275  2.571(0.04608) = 0.22275 ± 0.11847 = (0.1043, 0.3412)
                     The 95% confidence interval for population partial regression coefficient 2 is:
                          b2  t s b 2 = -0.1845  2.571(0.4133) = -0.1845 ± 1.0626 = (-1.2471, 0.8781)
f. The significance tests for the partial regression coefficients show that the partial regression coefficient
   for the size of the check is significant at the 0.005 level, while the partial regression coefficient for the
   number of diners is significant at the 0.674 level. Thus, the size of the check is much more useful in
   predicting the size of the tip than the number of diners. The overall regression is significant at the
   0.011 level. The coefficient of multiple determination indicates that 83.4% of the variation in the size
   of the tip is explained by the regression.
g. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
   about zero. Next the normal probability plot is graphed to examine whether the residuals could have
   come from a normally distributed population. Finally, the residuals are plotted against each of the
   independent variables to check for cyclical patterns.

In the following histogram of the residuals, there are a lot of values in the category with 1.0 as the
midpoint. This is some cause for concern, even though there are relatively few observations in the data
set.
                                      Histogram of the Residuals
                                            (response is Tip)

                 4



                 3
     Frequency




                 2



                 1



                 0
                        -2.0   -1.5       -1.0      -0.5        0.0   0.5   1.0
                                                  Residual




In this Minitab test for normality, the points in the normal probability plot seem to deviate excessively
from a straight line and the approximate p-value is shown as 0.040. At the 0.05 level of significance,
we would conclude that the residuals did not come from a normally distributed population.




                                                                            558
                                         Probability Plot of RESI1
                                                     Normal
              99
                                                                           Mean       2.498002E-16
                                                                           StDev             1.288
              95                                                           N                     8
                                                                           KS                0.300
              90
                                                                           P-Value           0.040
              80
              70
    Percent




              60
              50
              40
              30
              20

              10

               5


               1
                     -3        -2   -1         0         1        2   3
                                             RESI1


The plots for residuals versus the independent variables are shown below. No alarming patterns seem to
be present in the two charts that follow. However, overall, the residual analysis provides some evidence to
suggest that the residuals may have come from a non-normally distributed population.

Residuals versus size of check.
                                         Residuals Versus Check
                                              (response is Tip)

              1.0

              0.5

              0.0
 Residual




              -0.5

              -1.0

              -1.5

              -2.0

              -2.5
                          10         20                 30            40                50
                                                       Check




Residuals versus number of diners.
                                     Residuals Versus Diners
                                              (response is Tip)

              1.0

              0.5

              0.0
 Residual




              -0.5

              -1.0

              -1.5

              -2.0

              -2.5
                          1              2                3           4                  5
                                                       Diners




                                                                                     559
The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients.
                A                 B                    C           D           E                F           G
 12   SUMMARY OUTPUT                                              Tip        Check           Diners
 13          Regression Statistics                                7.5          40               2
 14   Multiple R                   0.9133                         0.5          15               1
 15   R Square                     0.8341                         2.0          30               3
 16   Adjusted R Square            0.7678                         3.5          25               4
 17   Standard Error               1.5239                         9.5          50               4
 18   Observations                      8                         2.5          20               5
 19                                                               3.5          35               5
 20   ANOVA                                                       1.0          10               2
 21                                df              SS             MS           F         Significance F
 22   Regression                          2         58.3886       29.1943     12.5714             0.0112
 23   Residual                            5         11.6114        2.3223
 24   Total                               7         70.0000
 25
 26                           Coefficients Standard Error        t Stat      P-value      Lower 95%     Upper 95%
 27   Intercept                    -1.9154         1.5980          -1.1986      0.2844          -6.0231      2.1923
 28   Check                         0.2228         0.0461           4.8337      0.0047           0.1043      0.3412
 29   Diners                       -0.1845         0.4133          -0.4463      0.6741          -1.2469      0.8780


Excel has generated the optional normal probability plot and plots of the residuals against the independent
variables. Their appearance is essentially similar to those of Minitab.
          J             K          L          M            N
 1
 2                      Normal Probability Plot
               10
 3
                8
 4
                6
         Tip




 5
                4
 6
                2
 7
                0
 8
                    0    20     40       60       80       100
 9                            Sample Percentile
 10




                                                                 560
           J                K          L          M           N        O                 P        Q         R       S
 11
 12                          Check Residual Plot                                         Diners Residual Plot
 13                2                                                              2
 14                1                                                              1
       Residuals




                                                                      Residuals
 15                0                                                               0
 16                -1 0     10    20       30    40      50   60                  -1 0   1    2       3     4   5       6
 17                -2                                                             -2
 18                -3
                                                                                  -3
 19                                    Check                                                       Diners
 20




16.53 p/c/m The Minitab printout is shown below.
Regression Analysis: AllFruit versus Apples, Grapes

The regression equation is AllFruit = 99.9 + 1.24 Apples + 0.822 Grapes

Predictor                  Coef   SE Coef           T        P
Constant                 99.865     2.952       33.83    0.000
Apples                  1.23640   0.09971       12.40    0.001
Grapes                   0.8221    0.2307        3.56    0.038

S = 0.269451               R-Sq = 98.1%         R-Sq(adj) = 96.9%

Analysis of Variance
Source          DF       SS                         MS        F        P
Regression       2 11.3555                      5.6778    78.20    0.003
Residual Error   3   0.2178                     0.0726
Total            5 11.5733

Predicted Values for New Observations
New
Obs      Fit SE Fit         95% CI              95% PI
  1 125.816    0.433 (124.439, 127.193) (124.194, 127.438)XX
XX denotes a point that is an extreme outlier in the predictors.

Values of Predictors for New Observations
New
Obs Apples Grapes
  1    17.0    6.00


a. The regression equation is AllFruit = 99.865 + 1.2364*Apples + 0.8221*Grapes.
   The partial regression coefficient for apples implies that, holding the consumption of grapes constant,
   a one pound increase in the consumption of apples will result in a 1.2364 pound increase in the
   consumption of all fresh fruits. The partial regression coefficient for grapes implies that, holding apple
   consumption constant, a one pound increase in the consumption of grapes will result in a 0.8221
   pound increase in the consumption of all fresh fruits.
b. The estimated per capita consumption of all fresh fruits during a year when 17 pounds of apples and 6
   pounds of grapes are consumed is 125.816 pounds.
c. The 95% prediction interval for per capita consumption during a year like the one in part b is

                                                                      561
    124.194 to 127.438 pounds.
d. The 95% confidence interval for mean per capita consumption during all years like the one in part b is
   124.439 to 127.193 pounds.
e. For this problem, the appropriate d.f. = 6 – 2 – 1 = 3. The t-value for a 95% confidence interval with 3
   degrees of freedom is 3.182.
      The 95% confidence interval for population partial regression coefficient 1 is:
                 b1  t s b1 = 1.2364  3.182(0.09971) = 1.2364 ± 0.3173 = (0.92, 1.55)
                   The 95% confidence interval for population partial regression coefficient 2 is:
                           b2  t s b 2 = 0.8221  3.182(0.2307) = 0.8221 ± 0.7341 = (0.09, 1.56)
f. Both of the partial regression coefficients are impressive (apples p-value, 0.001; grapes p-value,
   0.038). Also, the overall regression is highly significant; p-value = 0.003.The coefficient of multiple
   determination is 0.981. This regression appears to do a very good job of explaining the variation in
   fresh fruit consumption.
g. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
   about zero. Next the normal probability plot is graphed to examine whether the residuals could have
   come from a normally distributed population. Finally, the residuals are plotted against each of the
   independent variables to check for cyclical patterns.


The histogram of the residuals is shown below. This histogram offers no reason to believe that the
residuals may not have come from a normally distributed population.
                                 Histogram of the Residuals
                                      (response is AllFruit)

             2.0



             1.5
 Frequency




             1.0



             0.5



             0.0
                       -0.3   -0.2    -0.1       0.0           0.1   0.2   0.3
                                               Residual




In this Minitab test for normality, the points in the normal probability plot do not deviate excessively
from a straight line and the approximate p-value is shown as > 0.150. Our conclusion is that the residuals
could have come from a normally distributed population.




                                                                           562
                                          Probability Plot of RESI1
                                                       Normal
            99
                                                                                  Mean      -2.13163E-14
                                                                                  StDev           0.2087
            95                                                                    N                    6
                                                                                  KS               0.125
            90
                                                                                  P-Value         >0.150
            80
            70
 Percent




            60
            50
            40
            30
            20

            10

            5


            1
                 -0.50          -0.25           0.00            0.25       0.50
                                               RESI1




The plots for residuals versus the independent variables are shown below. For this small data set, no
alarming patterns seem to be present. Overall, the residual analysis provides no evidence to suggest that
the assumptions for the multiple regression model have not been satisfied.

Residuals versus per-capita apple consumption.
                                           Residuals Versus Apples
                                               (response is AllFruit)

             0.3


             0.2


             0.1
 Residual




             0.0


            -0.1


            -0.2


            -0.3
                         16.0           16.5     17.0        17.5       18.0      18.5        19.0
                                                          Apples




Residuals versus per-capita grape consumption.

                                                                                            563
                                     Residuals Versus Grapes
                                         (response is AllFruit)

            0.3


            0.2


            0.1
 Residual




            0.0


            -0.1


            -0.2


            -0.3
                   7.0   7.2   7.4      7.6      7.8    8.0       8.2   8.4   8.6        8.8
                                                   Grapes




The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients.
                  E                            F                    G                H            I             J           K
 1          SUMMARY OUTPUT
 2
 3                 Regression Statistics
 4          Multiple R                   0.9905
 5          R Square                     0.9812
 6          Adjusted R Square            0.9686
 7          Standard Error               0.2695
 8          Observations                      6
 9
 10         ANOVA
 11                                           df                   SS               MS           F       Significance F
 12         Regression                                   2          11.3555         5.6778       78.2017          0.0026
 13         Residual                                     3           0.2178         0.0726
 14         Total                                        5          11.5733
 15
 16                                     Coefficients Standard Error                 t Stat     P-value     Lower 95%    Upper 95%
 17         Intercept                        99.8648         2.9516                 33.8343       0.0001       90.4715    109.2581
 18         Apples                            1.2364         0.0997                 12.4001       0.0011         0.9191     1.5537
 19         Grapes                            0.8221         0.2307                   3.5632      0.0377         0.0878     1.5563


Excel has generated the optional normal probability plot and plots of the residuals against the independent
variables. Their appearance is essentially similar to those of Minitab.


                                                                               564
          O                   P            Q             R           S
 15
 16                           Normal Probability Plot
 17               131
                  130
 18
      AllFruit
                  129
 19               128
 20               127
                  126
 21               125
 22                      0        20       40      60        80     100
 23
 24                                    Sample Percentile
 25


          O                   P            Q             R           S                   O                   P        Q         R   S
 26                                                                        37
 27                            Apples Residual Plot                        38                                Grapes Residual Plot
 28               0.4                                                      39                   0.4
 29                                                                        40
      Residuals




                                                                                    Residuals
                  0.2                                                                           0.2
 30                                                                        41
                    0                                                                             0
 31                                                                        42
                         15       16       17      18        19      20                                7.0                8.0       9.0
 32               -0.2                                                     43                   -0.2
 33               -0.4                                                     44
                                                                                                -0.4
 34                                                                        45
 35                                         Apples                         46                                         Grapes
 36                                                                        47




16.54 p/c/m The Minitab printout is shown below.
Regression Analysis: Salary versus GPA, Activities

The regression equation is Salary = 24.3 + 3.84 GPA + 1.68 Activities

Predictor                       Coef            SE Coef               T        P
Constant                      24.309              3.192            7.62    0.000
GPA                            3.842              1.234            3.11    0.017
Activiti                      1.6810             0.5291            3.18    0.016

S = 1.448                     R-Sq = 82.4%               R-Sq(adj) = 77.4%

Analysis of Variance
Source            DF                                SS                MS            F                       P
Regression         2                            68.924            34.462        16.44                   0.002
Residual Error     7                            14.676             2.097
Total              9                            83.600

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                                                95.0% PI
1        43.182      1.131   ( 40.506, 45.858)                                  (       38.834, 47.530)

Values of Predictors for New Observations
New Obs       GPA Activiti
1            3.60      3.00


a. The regression equation is: Salary = 24.309 + 3.842*GPA + 1.6810*Activities.


                                                                           565
            The partial regression coefficient for the GPA indicates that, holding the number of activities constant,
            a one point increase in GPA will result in a starting salary that is $3842 higher. The partial regression
            coefficient for the number of activities indicates that, holding the GPA constant, an additional activity
            will result in a starting salary that is $1681 higher.
b.          The estimated starting salary for Dave (3.6 grade point average and 3 activities) is $43,182.
c.          The 95% prediction interval for the starting salary for Dave is between $38,834 and $47,530.
d.          The 95% confidence interval for the mean starting salary for all persons like Dave (i.e., 3.6 GPA and 3
            activities) is between $40,506 and $45,858.
e.          For this problem, the appropriate d.f. = 10 - 2 - 1 = 7. The t-value for a 95% confidence interval with 7
            degrees of freedom is 2.365.
                The 95% confidence interval for population partial regression coefficient 1 is:
                         b1  t s b1 = 3.842  2.365(1.234) = 3.842 ± 2.918 = (0.924, 6.760)
                       The 95% confidence interval for population partial regression coefficient 2 is:
                              b2  t s b 2 = 1.6810  2.365(0.5291) = 1.6810 ± 1.2513 = (0.4297, 2.9323)
f. The partial regression coefficients for grade point average and activities are both significant at the 0.05
   level. (GPA p-value, 0.017; Activities p-value, 0.016) The overall regression is significant at the 0.002
   level. The coefficient of multiple determination indicates that 82.4% of the variation in starting salaries
   is explained by the regression.
g. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
   about zero. Next the normal probability plot is graphed to examine whether the residuals could have
   come from a normally distributed population. Finally, the residuals are plotted against each of the
   independent variables to check for cyclical patterns.




Shown below, the histogram is fairly symmetric and there is no evidence to suggest that the residuals may
not have come from a normal distribution.
                                    Histogram of the Residuals
                                            (response is Salary)

                 2.0



                 1.5
     Frequency




                 1.0



                 0.5



                 0.0
                          -2.4       -1.2             0.0          1.2   2.4
                                                    Residual




In this Minitab test for normality, the points in the normal probability plot do not deviate excessively
from a straight line and the approximate p-value is shown as > 0.150. Our conclusion is that the residuals
could have come from a normally distributed population.

                                                                         566
                                   Probability Plot of RESI1
                                               Normal
            99
                                                                        Mean      -1.42109E-15
                                                                        StDev            1.277
            95                                                          N                   10
                                                                        KS               0.125
            90
                                                                        P-Value         >0.150
            80
            70
 Percent




            60
            50
            40
            30
            20

            10

            5


            1
                  -3    -2    -1         0         1          2    3
                                       RESI1




The plots for residuals versus the independent variables are shown below. For this small data set, no
alarming patterns seem to be present. Overall, the residual analysis provides no evidence to suggest that
the assumptions for the multiple regression model have not been satisfied.

Residuals versus grade point average.
                                    Residuals Versus GPA
                                       (response is Salary)


            2



            1
 Residual




            0



            -1



            -2

                 2.0   2.2   2.4       2.6        2.8     3.0     3.2    3.4         3.6
                                                   GPA




Residuals versus number of activities.

                                                                                  567
                                   Residuals Versus Activities
                                        (response is Salary)


            2



            1
 Residual




            0



            -1



            -2

                 2.0         2.5       3.0           3.5       4.0    4.5    5.0
                                                  Activities




The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients.
                         E                   F                   G            H              I                 J         K
  1                                                               Salary           GPA      Activities
  2         SUMMARY OUTPUT                                            40            3.2              2
  3                                                                   46            3.6              5
  4                Regression Statistics                              38            2.8              3
  5         Multiple R                   0.9080                       39            2.4              4
  6         R Square                     0.8245                       37            2.5              2
  7         Adjusted R Square            0.7743                       38            2.1              3
  8         Standard Error               1.4479                       42            2.7              3
  9         Observations                     10                       37            2.6              2
 10                                                                   44            3.0              4
 11         ANOVA                                                     41            2.9              3
 12                                          df                 SS           MS             F          Significance F
 13         Regression                                  2        68.9244      34.4622       16.4379             0.0023
 14         Residual                                    7        14.6756       2.0965
 15         Total                                       9        83.6000
 16                                                                                                                           Lower 95.0%
 17                                    Coefficients Standard Error          t Stat        P-value       Lower 95%   Upper 95%
 18         Intercept                      24.3092         3.1919               7.6159        0.0001        16.7615    31.8569
 19         GPA                              3.8416        1.2342               3.1127        0.0170         0.9233     6.7600
 20         Activities                       1.6810        0.5291               3.1768        0.0156         0.4298     2.9322




                                                                            568
Excel has generated the optional normal probability plot and plots of the residuals against the independent
variables. Their appearance is essentially similar to those of Minitab.
         N                    O           P            Q            R
 1
 2                            Normal Probability Plot
 3                 60
 4
       Salary


                   40
 5
                   20
 6
 7                 0
 8                      0                   50                     100
 9                                   Sample Percentile
 10

          N                    O          P            Q            R            S                T          U            V   W
 11
 12                                GPA Residual Plot                                             Activities Residual Plot
 13                 4                                                                    4




                                                                            Residuals
       Residuals




 14                 2                                                                    2
 15                 0                                                                    0
 16                -2                                                                   -2
 17                -4                                                                   -4
 18                     0.0         1.0       2.0          3.0      4.0                      0           2                4       6
 19                                           GPA                                                            Activities
 20




16.55 p/c/m The Minitab printout is shown below.
Regression Analysis: FrGPA versus SAT, HSRank

The regression equation is FrGPA = - 1.98 + 0.00372 SAT + 0.00658 HSRank

Predictor                       Coef        SE Coef                   T       P
Constant                      -1.984          1.532               -1.30   0.218
SAT                         0.003719       0.001562                2.38   0.033
HSRank                      0.006585       0.008023                0.82   0.427

S = 0.4651                    R-Sq = 45.2%             R-Sq(adj) = 36.8%

Analysis of Variance
Source            DF                              SS                 MS         F                    P
Regression         2                          2.3244             1.1622      5.37                0.020
Residual Error    13                          2.8125             0.2163
Total             15                          5.1370

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                                                    95.0% PI
1         2.634      0.125   (   2.365,   2.904)                             (               1.594,   3.674)

Values of Predictors for New Observations
New Obs       SAT    HSRank
1            1100      80.0


a. The regression equation is: FrGPA = -1.984 + 0.003719*SAT + 0.006585*HSRank.

                                                                          569
           The partial regression coefficient for the SAT score indicates that, holding the rank constant,
           a 1 point increase in the SAT score will result in a 0.003719 point increase in the freshman GPA.
           The coefficient for the high school rank indicates that, holding the SAT score constant, a 1 point
           increase in the high school rank will result in a 0.006585 point increase in freshman GPA.
b.         The estimated freshman GPA for a student who scored 1100 on the SAT and had a class rank of 80%
           is 2.634.
c.         The 95% prediction interval for the GPA for a student like the one in part b is between 1.594 and
           3.674.
d.         The 95% confidence interval for the mean GPA for all students like the one in part b is 2.365 to 2.904.
e.         For this problem, the appropriate d.f. = 16 - 2 - 1 = 13. The t-value for a 95% interval with 13 degrees
           of freedom is 2.160.
               The 95% confidence interval for population partial regression coefficient 1 is:
                    b1  t s b1 = 0.003719  2.160(0.001562) = 0.003719 ± 0.003374 = (0.000345, 0.007093)
                     The 95% confidence interval for population partial regression coefficient 2 is:
                         b2  t s b 2 = 0.006585  2.160(0.008023) = 0.006585 ± 0.017330 = (-0.010745, 0.023915)
f. The partial regression coefficient for the SAT score is significantly different from zero at the 0.033
   level. The partial regression coefficient for the high school rank is not significantly different from zero
   (p-value = 0.427). The overall regression is significant at the 0.020 level.
g. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
   about zero. Next the normal probability plot is graphed to examine whether the residuals could have
   come from a normally distributed population. Finally, the residuals are plotted against each of the
   independent variables to check for cyclical patterns.




The histogram below seems to be fairly symmetric about zero.
                                      Histogram of the Residuals
                                           (response is FrGPA)

                 7

                 6

                 5
     Frequency




                 4

                 3

                 2

                 1

                 0
                        -0.6   -0.4       -0.2      0.0          0.2   0.4   0.6
                                                  Residual




In this Minitab test for normality, the points in the normal probability plot do not deviate excessively
from a straight line and the approximate p-value is shown as > 0.150. Our conclusion is that the residuals
could have come from a normally distributed population.


                                                                             570
                                  Probability Plot of RESI1
                                              Normal
            99
                                                                     Mean      -9.99201E-16
                                                                     StDev           0.4330
            95                                                       N                   16
                                                                     KS               0.167
            90
                                                                     P-Value         >0.150
            80
            70
 Percent




            60
            50
            40
            30
            20

            10

            5


            1
                 -1.0     -0.5         0.0             0.5     1.0
                                      RESI1




The plots for residuals versus the independent variables are shown below. Neither of the plots reveals any
alarming patterns that suggest the underlying assumptions of the multiple regression analysis may have
been violated. Overall, the residual analysis does not reveal anything to suggest that the assumptions
underlying the multiple regression analysis have been violated.

Residuals versus SAT score.
                                   Residuals Versus SAT
                                      (response is FrGPA)



             0.50


             0.25
 Residual




             0.00


            -0.25


            -0.50


            -0.75
                    950    1000       1050         1100      1150    1200            1250
                                                   SAT




                                                                               571
Residuals versus high school rank.
                               Residuals Versus HSRank
                                   (response is FrGPA)



            0.50


            0.25
 Residual




            0.00


            -0.25


            -0.50


            -0.75
                    40    50       60          70        80        90         100
                                             HSRank




The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients.
                      A                 B                 C               D           E               F           G
 20         SUMMARY OUTPUT
 21                Regression Statistics
 22         Multiple R                   0.6727
 23         R Square                     0.4525
 24         Adjusted R Square            0.3683
 25         Standard Error               0.4651
 26         Observations                     16
 27
 28         ANOVA
 29                                     df               SS              MS           F         Significance F
 30         Regression                         2              2.3244      1.1622      5.3719             0.0199
 31         Residual                          13              2.8125      0.2163
 32         Total                             15              5.1370
 33
 34                               Coefficients Standard Error           t Stat      P-value      Lower 95%    Upper 95%
 35         Intercept               -1.983878         1.53190           -1.29505      0.21783        -5.29334    1.32558
 36         SAT                      0.003719         0.00156            2.38050      0.03328         0.00034    0.00709
 37         HS Rank                  0.006585         0.00802            0.82074      0.42659        -0.01075    0.02392


Excel has generated the optional normal probability plot and plots of the residuals against the independent
variables. Their appearance is essentially similar to those of Minitab.

                                                                        572
           J                K            L           M           N
 1
 2                          Normal Probability Plot
 3                5
 4
      Fr. GPA     4
 5                3
                  2
 6
                  1
 7                0
 8
                      0                  50                     100
 9                                Sample Percentile
 10


            J                K           L             M         N            O                  P          Q        R        S
 11
 12                              SAT Residual Plot                                              HS Rank Residual Plot
 13                1.0                                                                1.0
      Residuals




                                                                         Residuals
 14                0.5                                                                0.5
 15                0.0                                                                0.0
 16               -0.5                                                               -0.5
 17               -1.0                                                               -1.0
 18                   900         1000       1100      1200     1300                     40.0        60.0     80.0  100.0   120.0
 19                                          SAT                                                            HS Rank
 20




16.56 p/c/m The Minitab printout is shown below.
Regression Analysis: Price versus Acres, SqFeet, CentralAir

The regression equation is Price = 36045 + 15663 Acres + 10.9 SqFeet + 4181 CentralAir

Predictor                     Coef           SE Coef              T        P
Constant                     36045             14539           2.48    0.025
Acres                        15663              5716           2.74    0.015
SqFeet                      10.875             4.959           2.19    0.043
CentralA                      4181              5652           0.74    0.470

S = 12321                   R-Sq = 41.8%             R-Sq(adj) = 30.9%

Analysis of Variance
Source            DF                         SS                   MS          F                     P
Regression         3                 1745571591            581857197       3.83                 0.030
Residual Error    16                 2428953909            151809619
Total             19                 4174525500

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                                              95.0% PI
1         73898       5236   (   62799,   84998)                           (           45518, 102278)

Values of Predictors for New Observations
New Obs     Acres    SqFeet CentralA
1           0.900      1800      1.00




                                                                       573
a. The regression equation is:
   Price = 36045 + 15663*Acres + 10.875*SqFeet + 4181*CentralAir.
   The partial regression coefficient for the lot size indicates that, all other variables held constant, an
   additional acre of land will add $15,663 to the selling price. The partial regression coefficient for the
   size of the living area indicates that, all other variables held constant, an additional square foot of
   living area will add $10.875 to the selling price. Finally, the partial regression coefficient for the
   presence of central air conditioning indicates that, all other variables held constant, the presence of
   central air will increase the selling price by $4181.
b. The estimated selling price for a house sitting on a 0.9 acre lot with 1800 square feet of living area
   with central air conditioning is $73,898.
c. The 95% prediction interval for the selling price of the house described in part b is between $45,518
   and $102,278.
d. The 95% confidence interval for the mean selling price of all houses like the one in part b is between
   $62,799 and $84,998.
e. For this problem, the appropriate d.f. = 20 - 3 - 1 = 16. The t-value for a 95% interval with 16 degrees
   of freedom is 2.120.
       The 95% confidence interval for population partial regression coefficient 1 is:
              b1  t s b1 = 15,663  2.120(5716) = 15,663 ± 12,117.92 = (3545.08, 27,780.92)
                 The 95% confidence interval for population partial regression coefficient 2 is:
                       b2  t s b 2 = 10.875  2.120(4.959) = 10.875 ± 10.513 = (0.362, 21.388)
                 The 95% confidence interval for population partial regression coefficient 3 is:
                       b3  t s b3 = 4181  2.120(5652) = 4181 ± 11,982.24 = (-7801.24, 16,163.24)
f. The partial regression coefficient for Acres is significantly different from zero at the 0.015 level, and
   the coefficient for SqFeet significantly differs from zero at the 0.043 level. However, the coefficient
   for CentralAir does not differ from zero significantly (p-value = 0.470). The overall regression is
   significant at the 0.030 level.
g. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
   about zero. Next the normal probability plot is graphed to examine whether the residuals could have
   come from a normally distributed population. Finally, the residuals are plotted against each of the
   independent variables to check for cyclical patterns.

The histogram below appears to be relatively symmetric.
                               Histogram of the Residuals
                                     (response is Price)

             6


             5


             4
 Frequency




             3


             2


             1


             0
                  -20000    -10000          0              10000   20000
                                            Residual




                                                                           574
In this Minitab test for normality, the points in the normal probability plot do not deviate excessively
from a straight line and the approximate p-value is shown as > 0.150. Our conclusion is that the residuals
could have come from a normally distributed population.
                                  Probability Plot of RESI1
                                                 Normal
            99
                                                                                    Mean      -2.03727E-11
                                                                                    StDev            11307
            95                                                                      N                   20
                                                                                    KS               0.124
            90
                                                                                    P-Value         >0.150
            80
            70
 Percent




            60
            50
            40
            30
            20

            10

            5


            1
            -30000   -20000   -10000      0        10000        20000      30000
                                        RESI1




The plots for residuals versus the independent variables are shown below. None of them reveals any
alarming patterns that suggest the underlying assumptions of the multiple regression analysis may have
been violated. Overall, the residual analysis does not reveal anything to suggest that the assumptions
underlying the multiple regression analysis have been violated.




Residuals versus lot size.
                                      Residuals Versus Acres
                                          (response is Price)
             30000



             20000


             10000
 Residual




                 0


            -10000


            -20000
                     0.50      0.75       1.00        1.25          1.50           1.75         2.00
                                                     Acres




Residuals versus living area.




                                                                                              575
                               Residuals Versus SqFeet
                                     (response is Price)
            30000



            20000


            10000
 Residual




                0


            -10000


            -20000
                     1000     1500           2000             2500         3000
                                               SqFeet




Residuals versus central air conditioning.
                              Residuals Versus CentralAir
                                     (response is Price)
            30000



            20000


            10000
 Residual




                0


            -10000


            -20000
                        0.0   0.2          0.4          0.6          0.8          1.0
                                              CentralAir




The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients.




                                                                            576
            E                                   F                G                       H               I                 J              K
 1    SUMMARY OUTPUT
 2
 3           Regression Statistics
 4    Multiple R                  0.6466
 5    R Square                    0.4181
 6    Adjusted R Square           0.3091
 7    Standard Error             12321.1
 8    Observations                    20
 9
 10   ANOVA
 11                                             df              SS          MS                           F     Significance F
 12   Regression                                      3       1745571591 5.82E+08                       3.8328          0.0304
 13   Residual                                       16       2428953909 1.52E+08
 14   Total                                          19       4174525500
 15
 16                                    Coefficients Standard Error              t Stat             P-value           Lower 95%    Upper 95%
 17   Intercept                            36045.0       14539.258                 2.479              0.025             5223.179 66866.864
 18   Acres                                15662.9        5715.898                 2.740              0.015             3545.743 27780.064
 19   SqFeet                                 10.875          4.959                 2.193              0.043                 0.362     21.388
 20   CentralAir                             4181.1       5652.117                 0.740              0.470            -7800.861 16163.040


Excel has generated the optional normal probability plot and plots of the residuals against the independent
variables. Their appearance is essentially similar to those of Minitab.
           N                 O              P             Q          R           S                  T               U          V          W
 1
 2                           Normal Probability Plot                                                    Acres Residual Plot
 3                  150000                                                                40000
                                                                             Residuals




 4                  100000                                                                20000
        Price




 5                                                                                            0
 6                   50000                                                               -20000
 7                       0                                                               -40000
 8                                                                                                0.0        0.5     1.0   1.5      2.0   2.5
                             0       20   40     60     80           100
 9                                     Sample Percentile                                                               Acres
 10


           N                 O              P             Q          R           S                  T               U          V          W
 11
 12                              SqFeet Residual Plot                                             CentralAir Residual Plot
 13                  40000                                                                40000
                                                                             Residuals
        Residuals




 14                  20000                                                                20000
 15                      0                                                                    0
 16                 -20000                                                               -20000
 17                 -40000                                                               -40000
 18
                             0       1000        2000     3000   4000                             0                0.5          1         1.5
 19                                             SqFeet                                                               CentralAir
 20




16.57 p/c/m The estimated selling price of a house occupying a 0.1 acre lot with 100 square feet of living
area and no central air conditioning is $38,699. This selling price does not seem reasonable. The problem


                                                                           577
with this estimate arises because the regression equation has been extrapolated far beyond the limits of the
underlying data used to estimate it.

16.58 p/c/m The Minitab printout is shown below.
Regression Analysis: Time versus Age, Gender

The regression equation is Time = 69.5 + 0.110 Age - 12.2 Gender

Predictor         Coef       SE Coef           T         P
Constant         69.49         11.48        6.06     0.000
Age             0.1101        0.2257        0.49     0.635
Gender         -12.186         5.312       -2.29     0.042

S = 8.720        R-Sq = 43.7%        R-Sq(adj) = 33.5%

Analysis of Variance
Source            DF            SS            MS          F         P
Regression         2        649.25        324.63       4.27     0.042
Residual Error    11        836.46         76.04
Total             13       1485.71

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                    95.0% PI
1         74.45       3.40   (   66.96,   81.93)       (     53.85,   95.05)

Values of Predictors for New Observations
New Obs       Age    Gender
1            45.0 0.000000


a. The regression equation is: Time = 69.49 + 0.1101*Age - 12.186*Gender.
   The partial regression coefficient for Age indicates that, holding the gender constant, an increase of
   one year in age will result in an increase of 0.1101 seconds to complete the transaction.
   The partial regression coefficient for Gender indicates that, holding the age constant, a male takes
   12.186 seconds less to complete his transaction than a female.
b. The estimated time required to complete a transaction by a female customer who is 45 years of age is
   74.45 seconds.
c. The 95% prediction interval for the time required by the customer described in part b is 53.85 to 95.05
   seconds.
d. The 95% confidence interval for the mean time required by all customers like the one in part b is 66.96
   to 81.93 seconds.
e. For this problem, the appropriate d.f. = 14 - 2 - 1 = 11. The t-value for a 95% interval with 11 degrees
   of freedom is 2.201.
       The 95% confidence interval for population partial regression coefficient 1 is:
               b1  t s b1 = 0.1101  2.201(0.2257) = 0.1101 ± 0.4968 = (-0.3867, 0.6069)
      The 95% confidence interval for population partial regression coefficient 2 is:
             b2  t s b 2 = -12.186  2.201(5.312) = -12.186 ± 11.692 = (-23.878, -0.494)
f. The partial regression coefficient for age does not differ from zero significantly (p-value = 0.635).
   However, the coefficient for gender differs significantly from zero at the 0.042 level. The overall
   regression is significant at the 0.042 level.
g. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
   about zero. Next the normal probability plot is graphed to examine whether the residuals could have
   come from a normally distributed population. Finally, the residuals are plotted against each of the
   independent variables to check for cyclical patterns.

The histogram shown below does not seem very symmetrical, but the small number of observations could
lead to erroneous conclusions.

                                                    578
                                      Histogram of the Residuals
                                           (response is Time)

             4



             3
 Frequency




             2



             1



             0
                        -20     -15        -10        -5        0    5             10
                                                   Residual




In this Minitab test for normality, the points in the normal probability plot do not deviate excessively
from a straight line and the approximate p-value is shown as > 0.150. Our conclusion is that the residuals
could have come from a normally distributed population
                                      Probability Plot of RESI1
                                                  Normal
             99
                                                                         Mean       -3.55271E-14
                                                                         StDev             8.021
             95                                                          N                    14
                                                                         KS                0.137
             90
                                                                         P-Value          >0.150
             80
             70
 Percent




             60
             50
             40
             30
             20

             10

             5


             1
                  -20         -10           0              10       20
                                          RESI1




The plots for residuals versus the independent variables are shown below. Although the first plot seems to
show more positive residuals for persons in the 40-50 age range, neither of the plots reveals any alarming
patterns that suggest the underlying assumptions of the multiple regression analysis may have been

                                                                                   579
violated. Overall, the residual analysis does not reveal anything to suggest that the assumptions
underlying the multiple regression analysis have been violated.

Residuals versus age of customer.
                                     Residuals Versus Age
                                        (response is Time)
            10


             5


             0
 Residual




             -5


            -10


            -15


            -20
                  20         30            40                50      60           70
                                                  Age




Residuals versus gender of customer.
                                    Residuals Versus Gender
                                        (response is Time)
            10


             5


             0
 Residual




             -5


            -10


            -15


            -20
                       0.0    0.2          0.4               0.6   0.8      1.0
                                                 Gender




The Excel multiple regression solution for the data in this exercise is shown below.



                                                                          580
            A                                B                    C            D                      E              F            G
 18   SUMMARY OUTPUT
 19
 20          Regression Statistics
 21   Multiple R                   0.6611
 22   R Square                     0.4370
 23   Adjusted R Square            0.3346
 24   Standard Error               8.7202
 25   Observations                     14
 26
 27   ANOVA
 28                                          df                  SS           MS                      F        Significance F
 29   Regression                                    2             649.2501   324.6251                 4.2690            0.0424
 30   Residual                                     11             836.4641    76.0422
 31   Total                                        13            1485.7143
 32
 33                                      Coefficients Standard Error         t Stat               P-value       Lower 95%     Upper 95%
 34   Intercept                              69.4926         11.4769            6.0550               0.0001          44.2322     94.7530
 35   Age                                      0.1101         0.2257            0.4880               0.6351           -0.3866      0.6068
 36   Gender                                -12.1858          5.3116           -2.2942               0.0425         -23.8765      -0.4951


Excel has generated the optional normal probability plot and plots of the residuals against the independent
variables. Their appearance is essentially similar to those of Minitab.
              J                 K            L               M         N
 1
 2                              Normal Probability Plot
 3                   100
 4                    80
         Time




                      60
 5
                      40
 6                    20
 7                     0
 8
                           0        20      40     60      80         100
 9                                       Sample Percentile
 10

               J                K             L              M         N            O                 P          Q          R         S
 12
 13                                 Age Residual Plot                                                  Gender Residual Plot
 14                   20                                                                     20
         Residuals




                                                                                Residuals




 15                   10                                                                     10
                       0                                                                      0
 16
                     -10                                                                    -10
 17
                     -20                                                                    -20
 18                                                                                         -30
                     -30
 19
                           20       30       40         50       60     70                        0                                       1
 20                                               Age                                                             Gender
 21




                                                                  INTEGRATED CASES

                                                                             581
                                THORNDIKE SPORTS EQUIPMENT

Ted uses Minitab to generate the printout shown below.
Regression Analysis: Skiers versus Weekend, SnowInch, Temperat

The regression equation is Skiers = 560 + 147 Weekend + 1.42 SnowInch - 1.60 Temperat

Predictor     Coef   SE Coef       T       P
Constant    559.87     76.78    7.29   0.000
Weekend     147.35     51.86    2.84   0.009
SnowInch     1.424     2.696    0.53   0.602
Temperat    -1.604     2.771   -0.58   0.568

S = 125.061    R-Sq = 25.4%     R-Sq(adj) = 16.8%

Analysis of Variance
Source          DF     SS         MS      F        P
Regression       3 138705      46235   2.96    0.051
Residual Error 26 406650       15640
Total           29 545354

Examining the printout, Ted sees that the coefficient for Weekend is significantly different from zero at
the 0.009 level, and the overall regression is significant at the 0.051 level. Overall, only 25.4% of the
variation in daily ski patronage is explained by these independent variables. Perhaps some of the
remaining variation could be at least partially explained by some other variables (e.g., live music or
entertainment, conference attendance, or type of group staging a conference) that Ted has not included in
his analysis.



                                 SPRINGDALE SHOPPING SURVEY

Caution: These exercises include the recoding of two of the variables. If you save the revised data file, do
so using a different filename.

If you are using Minitab, recode as follows:
1. Click Data. Select Code. Click Numeric to Numeric.
2. Enter C26 C28 into the Code data from columns box. Enter C26 C28 into the Into columns box.
    Enter 2 into the Original values box. Enter 0 into the New box. Click OK.

If you are using Excel, recode as follows:
1. Click and drag to select cells Z1:Z151. (This highlights the variable name, RESPGEND, and the 150
    data values below.) Click Edit. Click Replace.
2. Enter 2 into the Find what box. Enter 0 into the Replace with box. Click Replace All.
3. Repeat steps 1 and 2 for cells AB1:AB151, which contain the variable name, RESPMARI, and the 150
    data values below.




1a through 1e, with dependent variable 7, Attitude toward Springdale Mall.


                                                       582
Regression Analysis: SPRILIKE versus IMPVARIE, IMPHELP, ...

The regression equation is
SPRILIKE = 2.90 + 0.188 IMPVARIE + 0.0043 IMPHELP + 0.034 RESPGEND
           + 0.191 RESPMARI

Predictor               Coef   SE Coef          T            P
Constant              2.9009    0.2963       9.79        0.000
IMPVARIE             0.18839   0.05383       3.50        0.001
IMPHELP              0.00432   0.04203       0.10        0.918
RESPGEND              0.0341    0.1306       0.26        0.794
RESPMARI              0.1909    0.1232       1.55        0.123

S = 0.738875            R-Sq = 11.9%           R-Sq(adj) = 9.5%

Analysis of Variance
Source           DF      SS                        MS           F           P
Regression        4 10.7125                    2.6781        4.91       0.001
Residual Error 145 79.1608                     0.5459
Total           149 89.8733


1a. The partial regression coefficient for RESPGEND is 0.034. With the variables coded so that
    1 = male and 0 = female, this implies that males tend to have an attitude toward Springdale Mall that
    is 0.034 points higher than the attitude displayed by females toward this shopping area. However, the
    p-value for the test of this partial regression coefficient is 0.794, which is not less than  = 0.05, and
    the coefficient does not differ significantly from zero at the 0.05 level of significance. The partial
    regression coefficient for IMPVARIE (test p-value = 0.001) is the only one that differs significantly
    at the 0.05 level of significance.
1b. The p-value for the strength of the overall relationship is 0.001. This is less than the 0.05 level
    specified, so the overall regression equation is significant at the 0.05 level.
1c. The percentage of the variation in y that is explained by the regression equation is 11.9%
    (unadjusted). In the ANOVA portion of the printout, this is the Regression sum of squares (10.7125)
    divided by the Total sum of squares (89.8733).
1d. Plotting the residuals versus each of the independent variables. In each plot, the residuals seem to be
    unrelated to the independent variable, thus supporting the validity of the model.
                            Residuals Versus IMPVARIE
                                (response is SPRILIKE)
            2



            1
 Residual




            0



            -1



            -2


                 1      2        3          4            5          6           7
                                        IMPVARIE




                                                                           583
                                Residuals Versus IMPHELP
                                    (response is SPRILIKE)
           2



           1
Residual




           0



           -1



           -2


                1     2             3            4             5         6     7
                                              IMPHELP




                            Residuals Versus RESPGEND
                                    (response is SPRILIKE)
           2



           1
Residual




           0



           -1



           -2


                0.0       0.2           0.4              0.6       0.8        1.0
                                              RESPGEND




                            Residuals Versus RESPMARI
                                    (response is SPRILIKE)
           2



           1
Residual




           0



           -1



           -2


                0.0       0.2           0.4           0.6          0.8        1.0
                                              RESPMARI




                                                                             584
1e. In this Minitab test for normality, the points in the normal probability plot appear to deviate
    excessively from a straight line and the approximate p-value is shown as < 0.01. At the 0.05 level of
    significance, we would conclude that the residuals could not have come from a normally distributed
    population. For this regression analysis, it appears that the assumption of normality of residuals may
    have been violated.
                                 Probability Plot of RESI1
                                          Normal
           99.9
                                                                        Mean      -3.12639E-15
                                                                        StDev           0.7289
            99
                                                                        N                  150
                                                                        KS               0.129
            95                                                          P-Value         <0.010
            90
            80
            70
 Percent




            60
            50
            40
            30
            20
            10
             5

             1


            0.1
                  -2        -1         0           1         2
                                     RESI1




2. Repeating 1a through 1e, with dependent variable 8, Attitude toward Downtown.
Regression Analysis: DOWNLIKE versus IMPVARIE, IMPHELP, ...

The regression equation is
DOWNLIKE = 3.72 + 0.0251 IMPVARIE - 0.0671 IMPHELP + 0.015 RESPGEND
           - 0.006 RESPMARI

Predictor                  Coef     SE Coef            T         P
Constant                 3.7211      0.3796         9.80     0.000
IMPVARIE                0.02512     0.06896         0.36     0.716
IMPHELP                -0.06710     0.05384        -1.25     0.215
RESPGEND                 0.0148      0.1673         0.09     0.929
RESPMARI                -0.0057      0.1578        -0.04     0.971

S = 0.946571              R-Sq = 1.2%          R-Sq(adj) = 0.0%

Analysis of Variance
Source           DF       SS                           MS           F          P
Regression        4   1.5205                       0.3801        0.42      0.791
Residual Error 145 129.9195                        0.8960
Total           149 131.4400

2a. The partial regression coefficient for RESPGEND is 0.015. With the variables coded so that
    1 = male and 0 = female, this implies that males tend to have an attitude toward Downtown that is
    0.015 points higher than the attitude displayed by females toward this shopping area. However, the
    p-value for the test of this partial regression coefficient is 0.929, which is not less than  = 0.05, and
    the coefficient does not differ significantly from zero at the 0.05 level of significance. In this
    regression, none of the partial regression coefficients is significantly different from zero at the 0.05
    level of significance.
2b. The p-value for the strength of the overall relationship is 0.791. This is not less than the 0.05 level
    specified, so the overall regression equation is not significant at the 0.05 level.
2c. The percentage of the variation in y that is explained by the regression equation is only 1.2%
    (unadjusted). In the ANOVA portion of the printout, this is the Regression sum of squares (1.5205)
    divided by the Total sum of squares (131.4400).



                                                                                  585
2d. Plotting the residuals versus each of the independent variables. In each plot, the residuals seem to be
    unrelated to the independent variable, thus supporting the validity of the model.
                                 Residuals Versus IMPVARIE
                                     (response is DOWNLIKE)
            2



            1



            0
 Residual




            -1



            -2


            -3
                 1     2              3             4            5         6     7
                                                IMPVARIE




                                 Residuals Versus IMPHELP
                                     (response is DOWNLIKE)
            2



            1



            0
 Residual




            -1



            -2


            -3
                 1     2              3            4             5         6     7
                                                IMPHELP




                             Residuals Versus RESPGEND
                                     (response is DOWNLIKE)
            2



            1



            0
 Residual




            -1



            -2


            -3
                 0.0       0.2            0.4              0.6       0.8        1.0
                                                RESPGEND




                                                                               586
                                    Residuals Versus RESPMARI
                                             (response is DOWNLIKE)
            2



            1



            0
 Residual




            -1



            -2


            -3
                     0.0          0.2             0.4           0.6          0.8               1.0
                                                        RESPMARI




2e. In this Minitab test for normality, the points in the normal probability plot appear to deviate
    excessively from a straight line and the approximate p-value is shown as < 0.01. At the 0.05 level of
    significance, we would conclude that the residuals could not have come from a normally distributed
    population. For this regression analysis, it appears that the assumption of normality of residuals may
    have been violated.
                                         Probability Plot of RESI1
                                                    Normal
            99.9
                                                                                   Mean      -5.62513E-17
                                                                                   StDev           0.9338
             99
                                                                                   N                  150
                                                                                   KS               0.148
             95                                                                    P-Value         <0.010
             90
             80
             70
 Percent




             60
             50
             40
             30
             20
             10
                 5

                 1


             0.1
                     -3     -2          -1      0          1       2     3
                                              RESI1




3. Repeating 1a through 1e, with dependent variable 9, Attitude toward West Mall.
Regression Analysis: WESTLIKE versus IMPVARIE, IMPHELP, ...

The regression equation is
WESTLIKE = 3.54 - 0.0906 IMPVARIE + 0.0341 IMPHELP - 0.201 RESPGEND + 0.270 RESPMARI

Predictor                      Coef          SE Coef               T       P
Constant                     3.5398           0.4162            8.51   0.000
IMPVARIE                   -0.09060          0.07560           -1.20   0.233
IMPHELP                     0.03413          0.05903            0.58   0.564
RESPGEND                    -0.2013           0.1834           -1.10   0.274
RESPMARI                     0.2704           0.1730            1.56   0.120

S = 1.03772                      R-Sq = 3.5%              R-Sq(adj) = 0.9%

Analysis of Variance
Source           DF      SS                                       MS      F            P
Regression        4   5.729                                    1.432   1.33        0.262
Residual Error 145 156.144                                     1.077
Total           149 161.873


                                                                                             587
3a. The partial regression coefficient for RESPGEND is -0.2013. With the variables coded as 1 = male
    and 0 = female, males tend to have an attitude toward West Mall that is 0.2013 points lower than that
    displayed by females. However, p-value = 0.274 is not less than  = 0.05, and the coefficient does
    not differ significantly from zero at the 0.05 level of significance. In this regression, none of the
    partial regression coefficients differs different from zero at the 0.05 level.
3b. The p-value for the strength of the overall relationship is 0.262. This is not less than the 0.05 level
    specified, so the overall regression equation is not significant at the 0.05 level.
3c. The percentage of the variation in y that is explained by the regression equation is only 3.5%
    (unadjusted). In the ANOVA portion of the printout, this is the Regression sum of squares (5.729)
    divided by the Total sum of squares (161.873).
3d. Plotting the residuals versus each of the independent variables. In each plot, the residuals seem to be
    unrelated to the independent variable, thus supporting the validity of the model.

                         Residuals Versus IMPVARIE
                             (response is WESTLIKE)

            2


            1


            0
 Residual




            -1


            -2


            -3
                 1   2        3          4            5   6     7
                                     IMPVARIE




                         Residuals Versus IMPHELP
                             (response is WESTLIKE)

            2


            1


            0
 Residual




            -1


            -2


            -3
                 1   2        3          4            5   6     7
                                      IMPHELP




                                                              588
                                      Residuals Versus RESPGEND
                                             (response is WESTLIKE)

            2


            1


            0
 Residual




            -1


            -2


            -3
                      0.0         0.2             0.4              0.6       0.8               1.0
                                                        RESPGEND




                                      Residuals Versus RESPMARI
                                             (response is WESTLIKE)

            2


            1


            0
 Residual




            -1


            -2


            -3
                      0.0         0.2             0.4           0.6          0.8               1.0
                                                        RESPMARI




3e. In this Minitab test for normality, the points in the normal probability plot appear to deviate
    excessively from a straight line and the approximate p-value is shown as < 0.01. At the 0.05 level of
    significance, we would conclude that the residuals could not have come from a normally distributed
    population. For this regression analysis, it appears that the assumption of normality of residuals may
    have been violated
                                        Probability Plot of RESI1
                                                    Normal
            99.9
                                                                                   Mean      -1.89478E-16
                                                                                   StDev            1.024
             99
                                                                                   N                  150
                                                                                   KS               0.097
             95                                                                    P-Value         <0.010
             90
             80
             70
 Percent




             60
             50
             40
             30
             20
             10
                 5

                 1


             0.1
                     -4     -3   -2     -1      0         1    2         3   4
                                              RESI1




                                                                                             589
4. The four independent variables -- IMPVARIE, IMPHELP, RESPGEND, and RESPMARI -- do a
   better job of predicting attitude toward Springdale Mall (R-sq = 11.9%, overall p-value = 0.001) than
   for either Downtown (R-sq = 1.2%, p-value = 0.791) or West Mall (R-sq = 3.5%, p-value = 0.262).


                                           BUSINESS CASES

                                 EASTON REALTY COMPANY (A)

1. With regard to the two parties claiming their homes were not sold for fair market price by Easton:
   a. The selling price of the first home, not located in the Dallas portion of the metroplex, four years
      old, and with 2190 square feet, was $88,500. The selling price for the second home, not located in
      the Dallas portion of the metroplex, nine years old, and with 1848 square feet, was $79,500.
      Using Minitab and the EASTON data file, we identify the average selling price for all homes in
      the most recent three-month period as well as the average selling price for all homes sold during
      each of these three months:
        For all homes sold during the most recent three-month period:
Descriptive Statistics: Price
Variable    N   Mean SE Mean      StDev   Minimum      Q1   Median       Q3   Maximum
Price     378 91367       895     17394     51800   78475    89400   102850    137100

        For homes sold during each of the most recent three months:

Descriptive Statistics: Price
Variable Month     N   Mean SE     Mean   StDev   Minimum      Q1    Median       Q3
Price     4      131 95649         1456   16661     60400   82200     96200   107000
          5      127 90972         1570   17696     58100   78200     88900   102700
          6      120 87112         1541   16883     51800   75650     85700    99075

Variable   Month   Maximum
Price      4        137100
           5        134100
           6        131900

       The prices of the two homes in question ($88,500 and $79,500) are below the mean price for all
       homes sold during the most recent three-month period ($91,367). However, the homes that are
       the subject of the controversy were sold in the most recent month (June, or month code 6), during
       which the mean price was just $87,112 in a declining market -- note the declining mean selling
       prices from month 4 through month 6. On this basis, it would not appear that the homes in
       question were very much different from the mean price for all homes sold during the most recent
       month, and one of them even sold for a higher price than the mean for the most recent month.
    b. There are a number of pricing factors that could make the comparison in part (a) unfair.
       In considering only the selling price, we are not taking into consideration other factors that could
       affect the price of a home. Such factors could include variables such as location, age, size,
       number of bedrooms, and many more variables of which real estate agents are well aware.
       Regarding location, we will see in our regression in part 2a that homes in Dallas sell for a rather
       large premium versus comparable homes sold elsewhere.
    c. In making their argument, the complaining sellers are relying heavily on the average selling price
       ($104,250) stated in the article for all homes sold in the area during the previous twelve months
       during a weakening housing market. Therein lies the weakest component of their argument. They
       sold their houses during the 12th month of a 1-year period during which housing prices in the
       area had been decreasing.




                                                    590
2. Using multiple regression to estimate Price as a function of SqFeet. Bedrooms, Age, Dallas, and
   Easton, we obtain the following Minitab printout for the most recent three months of home sales.
   a. Interpreting the partial regression coefficients: On average, the price tends to increase by $38.6
       for each additional square foot of living space, by $358 for each additional bedroom, and by $48
       for each additional year in age. Also, the price tends to be $21,282 higher if located in Dallas and
       $132 higher if sold by Easton rather than another realtor. The positive $132 coefficient for the
       Easton variable would appear to undermine accusations by the claimants that Easton has been
       engaging in a practice of underpricing its residential properties relative to other real estate
       companies. Especially noteworthy is the $21,282 premium for a home in Dallas versus elsewhere
       in the metroplex area, because neither of the disputed homes is located in Dallas.
Regression Analysis: Price versus SqFeet, Bedrooms, Age, Dallas, Easton

The regression equation is
Price = 8309 + 38.6 SqFeet + 358 Bedrooms + 48 Age + 21282 Dallas + 132 Easton

Predictor      Coef   SE Coef       T       P
Constant       8309      2082    3.99   0.000
SqFeet       38.640     1.257   30.73   0.000
Bedrooms      357.8     664.8    0.54   0.591
Age            47.8     152.9    0.31   0.755
Dallas      21281.8     647.4   32.87   0.000
Easton          132      1060    0.12   0.901

S = 6069.24    R-Sq = 88.0%     R-Sq(adj) = 87.8%

Analysis of Variance
Source           DF          SS               MS          F       P
Regression        5 1.00353E+11      20070528157     544.87   0.000
Residual Error 372 13702868976          36835669
Total           377 1.14056E+11

    b. In this case, we will consider the fact that each of the homes in dispute was sold during the most
       recent month. Thus, the printout below will include only data for the most recent month (June, or
       month code 6). For each of the two homes that are the subject of complaints, the printout includes
       for each home a point estimate as well as 95% confidence and prediction intervals for homes
       having comparable characteristics and being sold by a realtor other than Easton. In the printout
       below, note that the “Dallas” predictive variable has been specified as 0 for each of the disputed
       homes, because neither is located in Dallas.
Regression Analysis: Price versus SqFeet, Bedrooms, Age, Dallas, Easton

The regression equation is
Price = 3046 + 36.4 SqFeet + 1474 Bedrooms + 445 Age + 21456 Dallas + 624 Easton

Predictor     Coef SE Coef          T      P
Constant      3046     3116      0.98 0.330
SqFeet      36.388    2.085     17.45 0.000
Bedrooms      1474     1091      1.35 0.179
Age          445.0    218.5      2.04 0.044
Dallas     21455.6    991.0     21.65 0.000
Easton         624     1491      0.42 0.677
S = 5044.68   R-Sq = 91.4%      R-Sq(adj) = 91.1%

Analysis of Variance
Source           DF          SS              MS         F         P
Regression        5 31017720744      6203544149    243.77     0.000
Residual Error 114   2901162922        25448798
Total           119 33918883667

Predicted Values for New Observations
New
Obs    Fit SE Fit       95% CI              95% PI


                                                     591
  1     88938    1277   (86408, 91469)   (78630, 99247)
  2     78718     905   (76926, 80510)   (68565, 88871)

Values of Predictors for New Observations
New
Obs SqFeet Bedrooms     Age    Dallas     Easton
  1    2190      3.00 4.00 0.000000 0.000000
  2    1848      3.00 9.00 0.000000 0.000000

          For the home that sold for $88,500, the point estimate is $88,938 for the selling price of a
          comparable home sold by another realtor. Also, referring to the prediction interval, we have 95%
          confidence that a comparable home sold by another realtor would have brought a price within the
          interval from $78,630 to $99,247. The price for which Easton sold the home is very close to the
          point estimate and well within the prediction interval. The point estimate and prediction interval
          provide no evidence that would tend to support the complaint being made by this seller.
             For the home that sold for $79,500, the point estimate is $78,718 for the selling price of a
          comparable home sold by another realtor. Also, referring to the prediction interval, we have 95%
          confidence that a comparable home sold by another realtor would have brought a price within the
          interval from $68,565 and $88,871. The price for which Easton sold the home is actually slightly
          more than the point estimate and is well within the prediction interval. The point estimate and
          prediction interval provide no evidence that would tend to support the complaint being made by
          this seller.
   c.     In addition to the points made in item 2b, above, it should be noted that the regression equation
          that includes only June data shows a partial regression coefficient of +$624 for the Easton
          variable. On average, a home sold during June by Easton sold for $624 more than a comparable
          home sold by another Realtor. This is yet another point that refutes the arguments of the
          disgruntled sellers of the two homes in question. Based on the evidence presented above, it would
          not seem that Easton is underpricing its residential properties.



                                       CIRCUIT SYSTEMS, INC. (C)

In Chapters 11 and 14, we visited Circuit Systems, Inc., a company that was concerned about the
effectiveness of their new program for reducing the cost of absenteeism among hourly workers.
In this chapter, we will be taking a different approach to analyzing their data.

1. We will first use a multiple regression model to estimate the number of days of sick leave this year as
   a function of two variables: days of sick leave taken last year and whether the employee is a
   participant in the exercise program. The Minitab printout is shown below.
Regression Analysis: Sick_ThisYr versus Sick_LastYr, Exercise?

The regression equation is Sick_ThisYr = 1.53 + 0.566 Sick_LastYr - 0.955 Exercise?

Predictor          Coef   SE Coef       T        P
Constant         1.5325    0.3529    4.34    0.000
Sick_LastYr     0.56577   0.02439   23.19    0.000
Exercise?       -0.9549    0.2643   -3.61    0.000

S = 1.86447      R-Sq = 70.5%    R-Sq(adj) = 70.3%

Analysis of Variance
Source           DF      SS             MS        F       P
Regression        2 1913.93         956.97   275.29   0.000
Residual Error 230   799.54           3.48
Total           232 2713.47




                                                       592
    The significance of the overall regression is quite strong, with the p-value displayed as 0.000.
    Interpreting the partial regression coefficients in this model: On average, for a 1-day increase in the
    number of sick days a person took last year, the model will predict a 0.566-day increase in the
    number of sick days taken this year. This would indicate that the program is working in terms of
    reducing the number of sick days taken. On average, a person participating in the exercise program
    would tend to have 0.955 fewer sick days this year than a person not participating in the exercise
    program. Both signs are as we would have expected. On the basis of this regression analysis, the
    exercise program is worthy of continuation. However, keep in mind that we are only considering days
    of absence, not the total cost associated with absence, which includes the $200 subsidy for persons
    participating in the exercise program.

2. The regression model explains 70.5% of the variation in days of sick leave this year, so 29.5% of the
   variation in the number of sick days taken this year is not explained. Some variables that could
   probably help explain some of the as-yet unexplained variation are associated with the incentive
   package implemented by the company. Possible variables that are not in the database could include
   the employee’s level of work satisfaction, age, gender, family size, and length of commute.




                                                   593

								
To top