# ch16 by stariya

VIEWS: 3 PAGES: 71

• pg 1
```									                                        CHAPTER 16
MULTIPLE REGRESSION AND CORRELATION

SECTION EXERCISES
16.1 d/p/e Simple linear regression involves only one independent variable; multiple regression involves
two or more independent variables. Multiple regression analysis is preferred whenever two or more
variables impact upon the dependent variable.

16.2 d/p/e As with simple regression analysis, multiple regression analysis is used in determining and
interpreting the linear relationship between the dependent and independent variables. Correlation analysis
measures the strength of the relationship.

16.3 d/p/m Many variables could affect the annual household expenditure for auto maintenance and
repair: the number of cars owned, the number of miles driven each year, the age(s) of the car(s), the
make(s) of the car(s). These are just a few of the many variables that could have a notable effect.

16.4 d/p/m The director may wish to examine the personnel file for the following variables: the number of
vacation days taken last year, the number of personal days taken last year, the times late to work last year,
the number of conferences scheduled with the employee's superior, and the number of days called in sick
the previous year.

16.5 d/p/e The multiple regression model is:
yi = 0 + 1x1i + 2x2i + ... + kxki + i where
yi = a value of the dependent variable, y
0 = a constant
x1i, x2i, ... , xki = values of the independent variables x1, x2, ... , xk
1, 2, ... , k = partial regression coefficients for independent variables x1, x2, ... , xk
i = random error, or residual

16.6 d/p/m In terms of the residual component of the model, the assumptions underlying multiple
regression are:
1. For any given set of values for the independent variables, the population of residuals will be normally
distributed with a mean of zero and a standard deviation of .
2. The standard deviation of the error terms is the same regardless of the combination of values taken on
by the independent variables.
3. The error terms are statistically independent from each other.

16.7 d/p/m When there are two independent variables, the regression equation can be thought of in terms
of a geometric plane. When there are three or more independent variables, the regression equation
becomes a mathematical entity called a hyperplane; it is impossible to visually summarize a regression
with three or more independent variables because it will be in four or more dimensions.

16.8 c/a/e
a. The y-intercept, or constant term, is 100. The partial regression coefficient for x1 is 20; for x2, -3; and,
for x3, 120.
ˆ
b. The estimated value of y is y = 100 + 20(12) - 3(5) + 120(10) = 1525.
ˆ
c. If x3 were to increase by 4, the value of y would increase by 480. To offset this increase, x2 would
have to increase by 160, or 480/3.

523
16.9 p/a/e
a. The y-intercept is 300, the partial regression coefficients are 7 for x1 and 13 for x2.
ˆ
b. If 3 people live in a 6-room home, the estimated bill is y = 300 + 7(3) + 13(6) = 399.

16.10 p/a/e
a. The y-intercept or constant term is -0.1; this is the estimated total operating cost (in millions of dollars)
when there is no labor cost and no power cost. (Note: it is very unlikely that a plant ever operates
without incurring either labor or power costs; this estimate is very suspect. We must be careful when
making estimates based on x values that lie beyond the range of the underlying data.) The partial
regression coefficient for the labor cost is 1.1; this indicates that, for a given level of electric cost,
the estimated operating cost will increase by \$1.10 for each additional \$1 incurred in labor costs.
The partial regression coefficient for the electric power cost is 2.8; this indicates that, for a given level
of labor cost, the estimated operating cost will increase by \$2.80 for each additional \$1 increase in
electric power cost.
b. If labor costs \$6 million and electric power costs \$0.3 million, the estimated annual cost to operate the
ˆ
plant is: y = -0.1 + 1.1(6) + 2.8(0.3) = \$7.34 million.

16.11 p/c/m The Minitab printout is shown below.
Regression Analysis: Visitors versus AdSize, Discount

The regression equation is Visitors = 10.7 + 2.16 AdSize + 0.0416 Discount

Predictor          Coef      SE Coef             T         P
Constant         10.687        3.875          2.76     0.040
Discount        0.04157      0.04380          0.95     0.386

S = 3.375         R-Sq = 71.6%        R-Sq(adj) = 60.3%

Analysis of Variance
Source            DF             SS            MS           F         P
Regression         2         143.92         71.96        6.32     0.043
Residual Error     5          56.95         11.39
Total              7         200.87

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                      95.0% PI
1         24.59       1.74   (   20.12,   29.06)         (     14.83,   34.35)

Values of Predictors for New Observations
1            5.00      75.0

a. The regression equation is Visitors = 10.687 + 2.1569*AdSize + 0.04157*Discount.
b. The y-intercept indicates that about 10 or 11 visitors (10.687) would come to the clubs if there were
neither ads nor discounts. The partial regression coefficient for the ad data indicates that, holding the
level of the discount constant, increasing the ad size by one column inch will bring in about 2 new
visitors (2.1569). Finally, the partial regression coefficient for the discount data indicates that, holding
the size of the ad constant, an additional \$1 discount will add 0.04157 to the number of visitors.
c. If the size of the ad is 5 column-inches and a \$75 discount is offered, the estimated number of new
visitors to the club is 24.59. See the "Fit" column in the printout.

524
The corresponding Excel multiple regression printout is shown below.
A                 B              C              D            E             F            G
14   SUMMARY OUTPUT                                      Visitors     Col-Inches Discount
15          Regression Statistics                              23           4            100
16   Multiple R                   0.8465                       30           7             20
17   R Square                     0.7165                       20           3             40
18   Adjusted R Square            0.6031                       26           6             25
19   Standard Error               3.3749                       20           2             50
20   Observations                      8                       18           5             30
21                                                             17           4             25
22   ANOVA                                                     31           8             80
23                              df             SS             MS           F       Significance F
24   Regression                        2        143.924        71.962        6.318           0.043
25   Residual                          5          56.951       11.390
26   Total                             7        200.875
27
28                         Coefficients Standard Error       t Stat      P-value     Lower 95%     Upper 95%
29   Intercept                   10.687          3.875           2.758       0.040           0.726      20.648
30   Col-Inches                   2.157          0.628           3.434       0.019           0.542       3.771
31   Discount                     0.042          0.044           0.949       0.386          -0.071       0.154

16.12 p/c/m The Minitab printout is shown below.
Regression Analysis: Overall versus Ride, Handling, Comfort

The regression equation is Overall = 35.6 + 3.68 Ride + 2.89 Handling - 0.11 Comfort

Predictor              Coef           SE Coef             T            P
Constant              35.63             13.42          2.66        0.045
Ride                  3.675             1.639          2.24        0.075
Handling              2.892             1.055          2.74        0.041
Comfort              -0.110             1.625         -0.07        0.949

S = 2.858              R-Sq = 75.6%             R-Sq(adj) = 61.0%

Analysis of Variance
Source            DF                       SS            MS              F          P
Regression         3                  126.714        42.238           5.17      0.054
Residual Error     5                   40.842         8.168
Total              8                  167.556

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                                  95.0% PI
1        82.937      2.493   ( 76.529, 89.345)                        (   73.188, 92.686)

Values of Predictors for New Observations
New Obs      Ride Handling    Comfort
1            6.00      9.00      7.00

a. The regression equation is:
Overall = 35.63 + 3.675*Ride + 2.892*Handling - 0.110*Comfort
b. The y-intercept indicates that a car that scores 0 on all three of the independent variables will receive
an overall rating of 35.63. (This result should be considered cautiously since there were no 0 scores in
the data used to estimate the regression.) The partial regression coefficient for Ride indicates that,
holding the other two scores constant, an additional point in Ride will result in an overall rating that is
3.675 points higher. The partial regression coefficient for Handling indicates that, holding the other
two scores constant, an additional point in Handling will result in an overall rating that is 2.892 points
higher. The partial regression coefficient for Comfort indicates that, holding the other two scores
constant, an additional point in Comfort will result in an overall rating that is 0.110 points lower.
c. The estimated overall rating for a vehicle that scores 6 on Ride, 9 on Handling, and 7 on Comfort is
82.937. This can be calculated as 35.63 + 3.675(6) + 2.892(9) - 0.110(7). In the Minitab printout, refer
to the "Fit" column.

525
The corresponding Excel multiple regression printout is shown below.
A             B               C                D            E              F            G
13                                           Rating            Ride       Handling        Comfort
14                                            83                8            7              7
15   SUMMARY OUTPUT                           86                8            8              8
16          Regression Statistics             83                6            8              7
17   Multiple R                 0.86963       83                8            7              9
18   R Square                   0.75625       95                9            9              9
19   Adjusted R Square          0.61000       84                8            8              9
20   Standard Error             2.85803       88                9            6              9
21   Observations                     9       82                7            8              7
22                                            92                8            9              8
23   ANOVA
24                             df             SS               MS             F        Significance F
25   Regression                      3        126.7139         42.2380        5.1709            0.0543
26   Residual                        5         40.8416          8.1683
27   Total                           8        167.5556
28
29                        Coefficients Standard Error          t Stat     P-value       Lower 95%    Upper 95%
30   Intercept              35.62642         13.41832           2.65506     0.04515          1.13360   70.11924
31   Ride                     3.67543         1.63891           2.24260     0.07497         -0.53752    7.88838
32   Handling                 2.89205         1.05540           2.74024     0.04078          0.17907    5.60502
33   Comfort                 -0.11009         1.62469          -0.06776     0.94860         -4.28648    4.06631

16.13 p/c/m The Minitab printout is shown below.
Regression Analysis: Crispness versus OvenTime, Temp

The regression equation is Crispness = - 127 + 7.61 OvenTime + 0.357 Temp

Predictor                Coef        SE Coef                T              P
Constant              -127.19          61.33            -2.07          0.072
OvenTime                7.611          3.873             1.97          0.085
Temp                   0.3567         0.1177             3.03          0.016

S = 15.44              R-Sq = 58.6%           R-Sq(adj) = 48.2%

Analysis of Variance
Source            DF                     SS                MS              F          P
Regression         2                 2696.4            1348.2           5.65      0.029
Residual Error     8                 1907.3             238.4
Total             10                 4603.6

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                                   95.0% PI
1        -17.79      29.01   ( -84.69,    49.11) (                         -93.57,   57.99) XX
X denotes a row with X values away from the center
XX denotes a row with very extreme X values

Values of Predictors for New Observations
New Obs OvenTime       Temp
1            5.00       200

a. The regression equation is Crispness = -127.19 + 7.611*OvenTime + 0.3567*Temp.
b. The y-intercept indicates that a crust that is not cooked will receive a crispness rating of -127.19.
(Caution should be used in interpreting this value since there were no such extreme values in the data
used to estimate the regression.) The partial regression coefficient for OvenTime indicates that, for a
given temperature, an additional minute in the oven will add 7.611 points to the crispness rating.
Likewise, the partial regression coefficient for Temp indicates that, for a given cooking time,
a one-degree increase in the oven temperature will result in a 0.3567 increase in the crispness rating.
c. The estimated crispness rating for a pie that is cooked 5 minutes at 200 degrees is -17.79. See the "Fit"
column of the Minitab printout or substitute OvenTime = 5 and Temp = 200 into the regression
equation. This estimate should be viewed cautiously since the oven temperature is well beyond the
limits of the data used to estimate the regression.

526
The corresponding Excel multiple regression printout is shown below.
A              B               C              D            E               F           G
12                                                         Crispness      Time           Temp.
13                                                             68          6.0             460
14                                                             76          8.9             430
15   SUMMARY OUTPUT                                            49          8.8             360
16          Regression Statistics                              99          7.8             460
17   Multiple R                   0.7653                       90          7.3             390
18   R Square                     0.5857                       32          5.3             360
19   Adjusted R Square            0.4821                       96          8.8             420
20   Standard Error             15.4405                        77          9.0             350
21   Observations                     11                       94          8.0             450
22                                                             82          8.2             400
23   ANOVA                                                     97          6.4             450
24                              df             SS             MS            F        Significance F
25   Regression                       2        2696.3635    1348.1817       5.6549            0.0295
26   Residual                         8        1907.2729     238.4091
27   Total                           10        4603.6364
28
29                         Coefficients Standard Error       t Stat      P-value      Lower 95%     Upper 95%
30   Intercept               -127.1896         61.3267         -2.0740      0.0718       -268.6093     14.2300
31   Time                        7.6111         3.8732          1.9651      0.0850          -1.3205    16.5428
32   Temp.                       0.3567         0.1177          3.0315      0.0163           0.0854      0.6281

16.14 p/c/m The Minitab printout is shown below.
Regression Analysis: Budget versus Attend, Acres, Species

The regression equation is Budget = - 0.68 + 12.0 Attend + 0.0612 Acres - 0.0154 Species

Predictor                 Coef        SE Coef             T            P
Constant                -0.681          6.600         -0.10        0.921
Attend                  11.956          4.142          2.89        0.028
Acres                  0.06115        0.03343          1.83        0.117
Species               -0.01538        0.01562         -0.98        0.363

S = 4.914               R-Sq = 77.7%           R-Sq(adj) = 66.6%

Analysis of Variance
Source            DF                      SS             MS              F          P
Regression         3                  506.06         168.69           6.99      0.022
Residual Error     6                  144.88          24.15
Total              9                  650.94

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                                   95.0% PI
1         23.18       2.97   (   15.90,   30.45)                      (      9.12,   37.23)

Values of Predictors for New Observations
New Obs    Attend     Acres   Species
1            2.00       150       600

a. The regression equation is
Budget = -0.681 + 11.956*Attend + 0.06115*Acres - 0.01538*Species.
b. The y-intercept indicates that a city zoo that has 0 attendance, occupies 0 acres and features 0 species
will have an annual budget of -0.681 million dollars. Naturally, there is no such zoo, and this result
should be considered cautiously since there were no 0 scores in the data used to estimate the regression
equation. The partial regression coefficient for Attend indicates that, holding the other two
independent variables constant, an additional 1 million in attendance will raise the estimated budget by
\$11.956 million. The partial regression coefficient for Acres indicates that, holding the other two
independent variables constant, a 1-acre increase in space will increase the estimated budget by
\$0.06115 million. The partial regression coefficient for Species indicates that, holding the other two
independent variables constant, bringing 1 additional species of animal into the park will decrease the
estimated budget by \$0.01538 million.

527
c. The estimated annual budget for a zoo that has 2.0 million annual attendance, occupies 150 acres, and
has 600 animal species is \$23.18 million. See the "Fit" column in the Minitab printout or substitute
Attend = 2, Acres = 150, and Species = 600 into the regression equation.

The corresponding Excel multiple regression printout is shown below.
A              B             C            D           E                F           G
13                                         Budget        Attend      Acres          Species
14                                          14.5          0.6         210              271
15   SUMMARY OUTPUT                         35.0          2.0         216              400
16          Regression Statistics            6.9          0.4          70              377
17   Multiple R                   0.8817     9.0          1.0         125              277
18   R Square                     0.7774     6.6          1.5          55              721
19   Adjusted R Square            0.6662    17.2          1.3          80              400
20   Standard Error               4.9139    15.5          1.3          42              437
21   Observations                     10    21.0          2.5          91              759
22                                          12.0          0.9         125              270
23   ANOVA                                   9.6          1.1          92              260
24                              df          SS            MS           F         Significance F
25   Regression                       3      506.0640    168.6880      6.9861             0.0220
26   Residual                         6      144.8770     24.1462
27   Total                            9      650.9410
28
29                         Coefficients Standard Error   t Stat      P-value      Lower 95%     Upper 95%
30   Intercept                -0.68145         6.60013    -0.10325      0.9211        -16.8314     15.4685
31   Attend                  11.95568          4.14185     2.88655      0.0278           1.8209    22.0904
32   Acres                     0.06115         0.03343     1.82928      0.1171          -0.0206      0.1429
33   Species                  -0.01538         0.01562    -0.98430      0.3630          -0.0536      0.0229

16.15 p/a/m
a. The multiple regression equation is
TEST04 = 11.98 + 0.2745*TEST01 + 0.37619*TEST02 + 0.32648*TEST03
The y-intercept indicates that an individual unit scoring 0 on the first three tests can expect to score
11.98 on the fourth test. (However, this is meaningless, since the test scores range from 200 to 800.)
The partial regression coefficient for TEST01 indicates that, for a given set of scores on TEST02 and
TEST03, a unit will gain 0.2745 points on TEST04 for an additional point on TEST01.
Similarly, the partial regression coefficient for TEST02 implies, for a given set of scores on TEST01
and TEST03, a unit will gain 0.37619 points on TEST04 for an additional point on TEST02.
Likewise, the partial regression coefficient for TEST03 indicates an improvement of 0.32648 points
on TEST04 for each additional point on TEST03, given a set of scores for TEST01 and TEST02.
b. If an individual unit has scored 350, 400, and 600 on the first three tests, its estimated score on the
fourth test is: TEST04 = 11.98 + 0.2745(350) + 0.37619(400) + 0.32648(600) = 454.419.

16.16 p/a/m
a. First, we must determine the midpoint of the approximate 90% confidence interval:
ˆ
y = 11.98 + 0.2745(300) + 0.37619(500) + 0.32648(400) = 413.017
From the printout, we see that the multiple standard error of the estimate is 52.72, and we know that
n = 12. With d.f. = 12 - 3 - 1 = 8, the appropriate t-value is 1.860. The approximate 90% confidence
interval for the mean rating on test four for units that have been rated at 300, 500, and 400 on the first
three tests is:
s                      52.72
y  t e  413.017  1.860
ˆ                                   413.017  28.307  (384.710, 441.324)
n                      12
b. The approximate 90% prediction interval is:
y  tse = 413.017  1.860(52.72) = 413.017  98.059 = (314.958, 511.076)
ˆ

528
16.17 c/a/m
ˆ
a. The mean of y is y = 5.0 + 1.0(25) + 2.5(40) = 130.0.
173.5
b. The multiple standard error of the estimate is se                 3.195 .
20  2  1
c. The approximate 95% confidence interval for the mean of y whenever x1 = 20 and x2 = 30 can be
found in several steps. First, we must find the midpoint of the approximate confidence interval.
ˆ
This will be y = 5.0 + 1.0(20) + 2.5(30) = 100.
The degrees of freedom are 20 - 2 - 1 = 17. The appropriate t-value is 2.110. The approximate
confidence interval for the mean of y is:
s                 3.195
y  t e  100  2.110
ˆ                               100  1.507  (98.493, 101.507)
n                 20
d. The approximate 95% prediction interval for an individual y value when x1 = 20 and x2 = 30 is:
y  tse = 100  2.110(3.195) = 100  6.741 = (93.259, 106.741)
ˆ

16.18 p/a/m The solution can be obtained with formulas and calculator, but we will use Minitab and the
printout below:
Regression Analysis: Rating versus Price, Perform, BattLife

The regression equation is Rating = 65.0 - 0.00606 Price + 0.160 Perform + 1.25 BattLife

Predictor         Coef       SE Coef            T         P
Constant         64.98         19.54         3.33     0.029
Price        -0.006056      0.003189        -1.90     0.130
Perform         0.1601        0.1711         0.94     0.402
BattLife         1.250         2.277         0.55     0.612

S = 2.629         R-Sq = 59.3%        R-Sq(adj) = 28.8%

Analysis of Variance
Source            DF             SS            MS           F        P
Regression         3         40.347        13.449        1.95    0.264
Residual Error     4         27.653         6.913
Total              7         68.000

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                     95.0% PI
1        78.058      1.795   ( 73.073, 83.042)           (   69.218, 86.897)

Values of Predictors for New Observations
New Obs     Price   Perform BattLife
1            1000       100      2.50

a. The regression equation is Rating = 65.0 - 0.00606*Price + 0.160*Perform + 1.25*BattLife.
For the population of computers that have a \$1000 street price, a performance score of 100, and a
2.50-hour battery life, we are 95% confident that the mean rating of such computers will be within the
interval from 73.073 to 83.042.
b. For an individual computer with a \$1000 street price, a performance score of 100, and a 2.50-hour
battery life, we are 95% confident that the rating for this particular computer will be within the interval
from 69.218 to 86.897.

529
16.19 p/c/m The solution can be obtained with formulas and calculator, but we will use Minitab and the
printout below:
Regression Analysis: CalcFin versus MathPro, SATQ

The regression equation is CalcFin = - 26.6 + 0.776 MathPro + 0.0820 SATQ

Predictor         Coef      SE Coef            T         P
Constant        -26.62        17.18        -1.55     0.172
MathPro         0.7763       0.1465         5.30     0.002
SATQ           0.08202      0.02699         3.04     0.023

S = 4.027        R-Sq = 88.5%        R-Sq(adj) = 84.7%

Analysis of Variance
Source            DF            SS            MS          F         P
Regression         2        751.57        375.78      23.17     0.002
Residual Error     6         97.32         16.22
Total              8        848.89

Predicted Values for New Observations
New Obs     Fit     SE Fit         90.0% CI                    90.0% PI
1         68.74       2.43   (   64.01,   73.46)       (     59.59,   77.88)

Values of Predictors for New Observations
New Obs   MathPro      SATQ
1            70.0       500

a. The regression equation is CalcFin = -26.6 + 0.776*MathPro + 0.0820*SATQ. For the population of
entering freshmen who scored 70 on the math proficiency test and 500 on the quantitative portion of
the SAT exam, we are 90% confident that their mean calculus final exam score will be within the
interval from 64.01 to 73.46.
b. For an individual entering freshman who scored 70 on the math proficiency test and 500 on the
quantitative portion of the SAT exam, we are 90% confident that his or her calculus final exam score
will be within the interval from 59.59 to 77.88.

16.20 p/a/m
a. Using only the regression equation and summary information obtained in exercise 16.11, we can
determine the approximate 95% confidence interval for the mean number of new visitors for clubs
using 5 column-inches ads and offering an \$80 discount. First, the midpoint of the interval will be
ˆ
y = 10.687 + 2.1569(5) + 0.04157(80) = 24.797.
Eight observations were used to estimate the regression, so d.f. = 8 - 2 - 1 = 5. The appropriate t-value
is 2.571, the multiple standard error of the estimate is 3.375, and the approximate 95% confidence
interval is:
s                    3.375
y  t e  24.797  2.571
ˆ                                 24.797  3.068  (21.729, 27.865)
n                     8
b. The corresponding approximate 95% prediction interval is:
y  tse = 24.797  2.571(3.375) = 24.797  8.677 = (16.120, 33.474)
ˆ
The preceding are the approximate intervals that could be calculated based only on the information
shown in the printouts for exercise 16.11. As discussed in the text, the exact intervals will tend to be
wider than the approximate intervals. This is because the exact intervals take into account that the
specified values for x1 and x2 may differ from their respective means. The exact Minitab intervals
corresponding to parts a and b of this exercise are:
95% confidence interval, (19.91, 29.69); 95% prediction interval, (14.84, 34.76).

530
16.21 p/a/m
a. Using only the regression equation and summary information obtained in exercise 16.12, we can
determine the approximate 95% confidence interval for the mean overall rating of cars that receive
ratings of 8 on ride, 7 on handling, and 9 on driver comfort. First, the midpoint is:
ˆ
y = 35.63 + 3.675(8) + 2.892(7) - 0.110(9) = 84.284.
There were nine observations used to estimate the regression, so d.f. = 9 - 3 - 1 = 5. The appropriate
t-value is 2.571, the multiple standard error of the estimate is 2.858, and the approximate 95%
confidence interval is:
s                     2.858
y  t e  84.284  2.571
ˆ                                  84.284  2.449  (81.835, 86.733)
n                      9
b. The corresponding approximate 95% prediction interval is:
y  tse = 84.284  2.571(2.858) = 84.284  7.348 = (76.936, 91.632)
ˆ
The preceding are the approximate intervals that could be calculated based only on the information
shown in the printouts for exercise 16.12. As discussed in the text, the exact intervals will tend to be
wider than the approximate intervals. This is because the exact intervals take into account that the
specified values for x1, x2, and x3 may differ from their respective means. The exact Minitab intervals
corresponding to parts a and b of this exercise are:
95% confidence interval, (79.587, 88.980); 95% prediction interval, (75.562, 93.005).

16.22 p/a/m
a. Using only the regression equation and summary information obtained in exercise 16.13, we can
determine the approximate 95% confidence interval for the mean crispness rating for pies that are
cooked 5.0 minutes at 300 degrees. First, the midpoint is:
ˆ
y = -127.19 + 7.611(5) + 0.3567(300) = 17.875
There were eleven observations used to estimate the regression, so d.f. = 11 - 2 - 1 = 8.
The appropriate t-value is 2.306, the multiple standard error of the estimate is 15.44, and the
approximate 95% confidence interval is:
s                    15.44
y  t e  17.875  2.306
ˆ                                 17.875  10.735  (7.140, 28.610)
n                    11
b. The corresponding approximate 95% prediction interval is:
y  tse = 17.875  2.306(15.44) = 17.875  35.605 = (-17.730, 53.480)
ˆ
The preceding are the approximate intervals that could be calculated based only on the information
shown in the printouts for exercise 16.13. As discussed in the text, the exact intervals will tend to be
wider than the approximate intervals. This is because the exact intervals take into account that the
specified values for x1 and x2 may differ from their respective means. The exact Minitab intervals
corresponding to parts a and b of this exercise are:
95% confidence interval, (-25.31, 61.07); 95% prediction interval, (-38.10, 73.86).

16.23 d/p/e The coefficient of multiple determination (R2) is analogous to the coefficient of determination
in simple linear regression. It is the proportion of variation in y that is explained by the multiple
regression equation.

16.24 d/p/m SST is the total variation in the y values, SSR is the variation in the y values that is explained
by the regression, and SSE is the variation in the y values that is not explained by the regression.
The coefficient of multiple determination is equal to 1 - (SSE/SST), or SSR/SST.
If SSE is small compared to SST, SSR will be large compared to SST, and the multiple regression
equation will explain a large portion of the variation in y. Recall that SST = SSR + SSE.

531
16.25 d/p/e The coefficient of multiple determination for exercise 16.15 is 0.872. This means that 87.2%
of the variation in scores on the fourth test can be explained by variations in scores on the first three tests.

16.26 p/c/m The coefficient of multiple determination for the regression equation obtained in exercise
16.12 is 0.756. This indicates that 75.6% of the variation in overall ratings is explained by the regression
equation.

16.27 p/c/m The coefficient of multiple determination for the regression equation obtained in exercise
16.11 is 0.716. This indicates that 71.6% of the variation in the number of new visitors to the club is
explained by the regression equation.

16.28 d/p/d Both of these tests will reach the same conclusion. If the confidence interval for 3 does not
include zero, the hypothesis test will reject the null hypothesis. On the other hand, if the confidence
interval for 3 does contain zero, the hypothesis test will not reject the null hypothesis.

16.29 p/c/m We will base much of our discussion on the Minitab printout for exercise 16.11. The results
will be similar if you refer to the Excel printout.
a. The appropriate null and alternative hypotheses are:
H0: 1 = 2 = 0 and H1: j  0, for j = 1 or 2
From the ANOVA portion of the Minitab printout, we have:
Analysis of Variance
Source            DF              SS            MS          F          P
Regression         2          143.92         71.96       6.32      0.043
Residual Error     5           56.95         11.39
Total              7          200.87

The p-value for the ANOVA test of the overall significance of the regression equation is 0.043.
Since p-value = 0.043 is <  = 0.05 level of significance for the test, we reject H0. At this level, there
is evidence to suggest that the regression equation is significant.

b. From the upper portion of the Minitab printout:
The regression equation is Visitors = 10.7 + 2.16 AdSize + 0.0416 Discount

Predictor          Coef       SE Coef            T         P
Constant         10.687         3.875         2.76     0.040
Discount        0.04157       0.04380         0.95     0.386

S = 3.375         R-Sq = 71.6%         R-Sq(adj) = 60.3%

Here we are asked to conduct two hypothesis tests. We will not test the y-intercept since this test is
generally not of practical importance. The appropriate null and alternative hypotheses are:
Test for 1: H0: 1 = 0 and H1: 1  0
Test for 2: H0: 2 = 0 and H1: 2  0
The p-value for the test of 1 is 0.019. Since p-value = 0.019 is <  = 0.05 level of significance for the
test, we reject H0. At this level, there is evidence to suggest that 1 is nonzero.
The p-value for the test of 2 is 0.386. Since p-value = 0.386 is not <  = 0.05 level of significance for
the test, we do not reject H0. At this level, there is no evidence to suggest that 1 is nonzero.
c. The ANOVA test for the overall regression indicates that the regression explains a significant
proportion of the variation in the number of new visitors to the club. The tests for the individual partial
regression coefficients indicate that the size of the ad contributes to the explanatory power of the
model, while the discount offered does not.

532
d. With d.f. = 8 - 2 - 1 = 5, the appropriate t-value for the 95% confidence interval will be 2.571.
The 95% confidence interval for population partial regression coefficient 1 is:
b1  t s b1 = 2.1569  2.571(0.6281) = 2.1569  1.6148 = (0.5421, 3.7717)
The 95% confidence interval for population partial regression coefficient 2 is:
b2  t s b 2 = 0.04157  2.571(0.04380) = 0.04157  0.1126 = (-0.0710, 0.1542)
With Excel, we can obtain confidence intervals for the population regression coefficients along with the
standard regression output. Excel will provide 95% confidence intervals, but we can also specify the
inclusion of 90% or any other confidence levels we wish to see. The Excel printout for exercise 16.11
included 95% confidence intervals for 1 and 2.

16.30 p/c/m We will base much of our discussion on the Minitab printout for exercise 16.12. The results
will be similar if you refer to the Excel printout.
a. The appropriate null and alternative hypotheses are:
H0: 1 = 2 = 3 = 0 and H1: j  0, for j = 1, 2, or 3
From the ANOVA portion of the Minitab printout, we have:

Analysis of Variance
Source            DF             SS           MS           F         P
Regression         3        126.714       42.238        5.17     0.054
Residual Error     5         40.842        8.168
Total              8        167.556

The p-value for the ANOVA test of the overall significance of the regression equation is 0.054.
Since p-value = 0.054 is not <  = 0.05 level of significance for the test, we do not reject H0. At this
level, there is no evidence to suggest that the regression equation is significant.

b. From the upper portion of the Minitab printout:

The regression equation is Overall = 35.6 + 3.68 Ride + 2.89 Handling - 0.11 Comfort

Predictor         Coef       SE Coef            T         P
Constant         35.63         13.42         2.66     0.045
Ride             3.675         1.639         2.24     0.075
Handling         2.892         1.055         2.74     0.041
Comfort         -0.110         1.625        -0.07     0.949

S = 2.858         R-Sq = 75.6%        R-Sq(adj) = 61.0%

Here we are asked to conduct three hypothesis tests. We will not test the y-intercept since this test is
generally not of practical importance. The appropriate null and alternative hypotheses are:
Test for 1: H0: 1 = 0 and H1: 1  0
Test for 2: H0: 2 = 0 and H1: 2  0
Test for 3: H0: 3 = 0 and H1: 3  0
The p-value for the test of 1 is 0.075. Since p-value = 0.075 is not <  = 0.05 level of significance for
the test, we do not reject H0. At this level, there is no evidence to suggest that 1 is nonzero.
The p-value for the test of 2 is 0.041. Since p-value = 0.041 is <  = 0.05 level of significance for the
test, we reject H0. At this level, there is evidence to suggest that 2 is nonzero.
The p-value for the test of 3 is 0.949. Since p-value = 0.949 is not <  = 0.05 level of significance for
the test, we do not reject H0. At this level, there is no evidence to suggest that 3 is nonzero.
c. The ANOVA test for the overall regression indicates that the regression does not explain a significant
(at the 0.05 level) proportion of the variation in the overall ratings. In only one case, that for 2

533
(associated with handling), does an individual hypothesis test indicate that a population regression
coefficient could be nonzero.
d. With d.f. = 9 - 3 - 1 = 5, the appropriate t-value for the 95% confidence interval will be 2.571.
The 95% confidence interval for population partial regression coefficient 1 is:
b1  t s b1 = 3.675  2.571(1.639) = 3.675  4.214 = (-0.54, 7.89)
The 95% confidence interval for population partial regression coefficient 2 is:
b2  t s b 2 = 2.892  2.571(1.055) = 2.892  2.712 = (0.18, 5.60)
The 95% confidence interval for population partial regression coefficient 3 is:
b3  t s b3 = -0.110  2.571(1.625) = -0.110  4.178 = (-4.29, 4.07)
With Excel, we can obtain confidence intervals for the population regression coefficients along with the
standard regression output. Excel will provide 95% confidence intervals, but we can also specify the
inclusion of 90% or any other confidence levels we wish to see. The Excel printout for exercise 16.12
already included 95% confidence intervals for 1, 2, and 3. Here is a repeat of the lower portion of that
Excel printout:
A        B             C            D           E               F           G
23   ANOVA
24                        df            SS          MS           F         Significance F
25   Regression                 3       126.7139    42.2380      5.1709             0.0543
26   Residual                   5        40.8416     8.1683
27   Total                      8       167.5556
28
29                    Coefficients Standard Error   t Stat     P-value      Lower 95%    Upper 95%
30   Intercept          35.62642         13.41832    2.65506     0.04515         1.13360   70.11924
31   Ride                 3.67543         1.63891    2.24260     0.07497        -0.53752    7.88838
32   Handling             2.89205         1.05540    2.74024     0.04078         0.17907    5.60502
33   Comfort             -0.11009         1.62469   -0.06776     0.94860        -4.28648    4.06631

16.31 p/a/m To determine the 90% confidence interval for each partial regression coefficient in exercise
16.15, we must determine the appropriate t. There are (12 - 3 - 1) = 8 degrees of freedom, so the
appropriate t is t =1.860.
The 90% confidence interval for population partial regression coefficient 1 is:
b1  t s b1 = 0.2745  1.860(0.1111) = 0.2745 ± 0.2066 = (0.0679, 0.4811)
This confidence interval does not contain zero, so it is likely the variation in the scores on test 1
do contribute significantly to the explanation of the variation of the scores on test 4.
The 90% confidence interval for population partial regression coefficient 2 is:
b2  t s b 2 = 0.37619  1.860(0.09858) = 0.37619 ± 0.18336 = (0.1928, 0.5596).
This confidence interval does not contain zero, so it is likely the variation in the scores on test 2
do contribute significantly to the explanation of the variation of the scores on test 4.
The 90% confidence interval for population partial regression coefficient 3 is:
b3  t s b3 = 0.32648  1.860(0.08084) = 0.32648 ± 0.15036 = (0.1761, 0.4768)
This confidence interval does not contain zero, so it is likely the variation in the scores on test 3
do contribute significantly to the explanation of the variation of the scores on test 4.

534
16.32 p/c/m Referring to the Minitab printout in the solution to exercise 16.18:
a. In the ANOVA test for overall significance, p-value = 0.264 is not < 0.10 level of significance, so we
conclude that the overall regression is not significant. At this level, all of the population partial
regression coefficients could be zero.
b. In testing the partial regression coefficients for price, performance, and battery life, the p-values are
0.130, 0.402, and 0.612, respectively. None of these is less than the 0.10 level of significance being
used to reach a conclusion. None of the three partial regression coefficients differs significantly from
zero.

16.33 p/c/m Referring to the Minitab printout in the solution to exercise 16.19:
a. In the ANOVA test for overall significance, p-value = 0.002 is < 0.05 level of significance, so we
conclude that the overall regression is significant.
b. In testing the partial regression coefficients for math proficiency test score and SAT quantitative score,
the p-values are 0.002 and 0.023, respectively. Each p-value is < 0.05 level of significance, and each of
the partial regression coefficients differs significantly from zero.

16.34 p/c/m The Minitab printout is shown below.
Regression Analysis: Est P/E Rati versus Revenue%Grow, Earn/Share %

The regression equation is
Est P/E Ratio = 51.7 - 0.103 Revenue%Growth + 0.0143 Earn/Share %Growth

96 cases used 4 cases contain missing values

Predictor         Coef       SE Coef            T         P
Constant         51.73         15.66         3.30     0.001
Revenue%       -0.1027        0.2171        -0.47     0.637
Earn/Sha       0.01431       0.07316         0.20     0.845

S = 64.30         R-Sq = 0.3%         R-Sq(adj) = 0.0%

Analysis of Variance
Source            DF             SS            MS           F        P
Regression         2            986           493        0.12    0.888
Residual Error    93         384502          4134
Total             95         385488

a. The regression equation is
Est P/E Ratio = 51.7 - 0.103*Revenue%Growth + 0.0143*Earn/Share %Growth.
The partial regression coefficient for revenue growth percentage is -0.103. On average, with
earnings/share growth percentage fixed, a one percentage point increase in revenue growth percentage
will be accompanied by a decrease of 0.103 in the estimated price/earnings ratio.
The partial regression coefficient for earnings/share growth percentage is 0.0143. On average, with
revenue growth percentage fixed, a one percentage point increase in earnings/share growth percentage
will be accompanied by an increase of 0.0143 in the estimated price/earnings ratio.
b. The p-value in the ANOVA section of the printout is 0.888. This is not less than the 0.05 level of
significance. At this level, the overall regression equation is not significant.
c. The p-values for the tests of the two partial regression coefficients are 0.637 and 0.845, respectively.
Neither p-value is less than the 0.05 level of significance, and we conclude that neither partial
regression coefficient differs significantly from zero.
d. The 95% confidence interval for each partial regression coefficient could be calculated using formulas
and pocket calculator, as was demonstrated in the solution to exercise 16.31. We will rely on the Excel
printout, shown below. The 95% confidence interval for population partial regression coefficient 1 is
from -0.5338 to 0.3285. The 95% confidence interval for population partial regression coefficient 2 is

535
from -0.1310 to 0.1596. (Note: In applying Excel, it is necessary to delete the four cases that have
missing data for one or more of these variables.)
SUMMARY OUTPUT

Regression Statistics
Multiple R                     0.0506
R Square                       0.0026
Standard Error               64.2995
Observations                       96

ANOVA
df             SS         MS            F      Significance F
Regression                         2        986.0007 493.0004        0.1192          0.8877
Residual                          93     384501.9576 4134.4297
Total                             95     385487.9583

Coefficients Standard Error    t Stat      P-value    Lower 95%    Upper 95%
Intercept                   51.7305         15.6650       3.3023     0.0014       20.6230     82.8380
Revenue%Growth               -0.1027         0.2171      -0.4729     0.6374        -0.5338     0.3285
.   Earn/Share %Growth            0.0143         0.0732       0.1955     0.8454        -0.1310     0.1596

16.35 p/c/m The Minitab printout is shown below.

The regression equation is
\$GroupRevenue = -40855482 + 44282 RetailUnits + 152760 NumDealrs

Predictor             Coef        SE Coef            T        P
Constant         -40855482       20217627        -2.02    0.046
RetailUn             44282           1290        34.33    0.000
NumDealr            152760        1943687         0.08    0.938

S = 176197662         R-Sq = 99.3%          R-Sq(adj) = 99.3%

Analysis of Variance
Source            DF          SS          MS                    F          P
Regression         2 4.46877E+20 2.23439E+20              7197.11      0.000
Residual Error    95 2.94933E+18 3.10456E+16
Total             97 4.49827E+20

a. The regression equation is
\$GroupRevenue = -40,855,482 + 44,282*RetailUnits + 152,760*NumDealrs
The partial regression coefficient for RetailUnits is 44,282. On average, with the number of dealers
fixed, an increase of 1 in retail units sold is accompanied by an increase of \$44,282 in revenue for the
dealer group.
The partial regression coefficient for NumDealrs is 152,760. On average, with the number of retail
units fixed, an increase of 1 in the number of dealers will be accompanied by an increase of \$152,760
in revenue for the dealer group.
b. The p-value in the ANOVA section of the printout is (to three decimal places) 0.000. This is less than
the 0.02 level of significance. At this level, the overall regression equation is significant.
c. The p-values for the tests of the two partial regression coefficients are 0.000 and 0.938, respectively.
Using the 0.02 level of significance, the partial regression coefficient for the first independent variable
(retail units) is significantly different from zero, but the partial regression coefficient for the second
independent variable (number of dealers) does not differ significantly from zero.
d. The 98% confidence interval for each partial regression coefficient could be calculated using formulas
and pocket calculator, as was demonstrated in the solution to exercise 16.31. We will rely on the Excel
printout, shown below. The 98% confidence interval for population partial regression coefficient 1 is

536
from 41,229.4 to 47,333.8. The 98% confidence interval for population partial regression coefficient 2
is from -4,446,472 to 4,751,992.
J                   K             L            M             N            O          P
1    SUMMARY OUTPUT
2
3           Regression Statistics
4    Multiple R                  0.9967
5    R Square                    0.9934
7    Standard Error         176197662
8    Observations                    98
9
10   ANOVA
11                             df           SS             MS           F      Significance F
12   Regression                      2     4.469E+20    2.234E+20    7.197E+03    1.960E-104
13   Residual                       95     2.949E+18    3.105E+16
14   Total                          97     4.498E+20
15
16                        Coefficients Standard Error    t Stat      P-value   Lower 98.0% Upper 98.0%
17   Intercept            -40855482.0     20217627.3        -2.021       0.046  -88695270.0  6984305.9
18   RetailUnits              44281.6        1289.89        34.330       0.000      41229.4    47333.8
19   NumDealrs               152760.2     1943686.79         0.079       0.938     -4446472    4751992

16.36 d/p/m The normal probability plot is used to examine whether the residuals could have come from a
normally distributed population. One of the assumptions underlying multiple regression analysis is that
the residuals are normally distributed with a mean of zero.

16.37 d/p/m Residual analysis can be used to examine the residuals with respect to the assumptions
underlying multiple regression analysis. We can do many things with residual analysis, including:
(1) constructing a histogram of the residuals as a rough check to see if they are approximately normally
distributed, (2) constructing a normal probability plot or other normality test to examine whether the
residuals could have come from a normally-distributed population, (3) plotting the residuals versus each
of the independent variables to see if they exhibit some cycle or pattern with respect to that variable, and
(4) plotting the residuals versus the order in which the observations were recorded to look for
autocorrelation.

16.38 p/p/m Referring to the printout given in exercise 16.15, we can determine the following:
a. The partial regression coefficient for TEST01 is 0.2745. This implies that, holding the scores on tests 2
and 3 constant, a one point increase in the score on test 1 will result in a 0.2745 point increase in the
score on test 4. The partial regression coefficient for TEST02 is 0.37619. This implies that, for a given
level of scores on tests 1 and 3, a one point increase in the score on test 2 will result in a 0.37619 point
increase in test 4. Finally, the partial regression coefficient for TEST03 is 0.32648. This implies that,
holding scores on tests 1 and 2 constant, a one point increase in the score on test 3 will result in a
0.32648 point increase in the score on test 4.
b. 87.2% of the variation in y is explained by the equation.
c. The overall regression is significant at the 0.001 level.
d. The p-value for the partial regression coefficient for TEST01 is 0.039; for TEST02, 0.005; and, for
TEST03, 0.004. This would indicate that TEST03 contributes the most to the explanation of the
variation in scores on test 4; however, TEST02 is almost as useful. TEST01 appears to be least useful,
but it is still significant at the 0.05 level.

537
16.39 p/c/m
a. The histogram does not reveal any radical departures from a symmetric distribution. Of course, it is
difficult to determine this with only eight data points.
Histogram of the Residuals
(response is Visitors)

2.0

1.5
Frequency

1.0

0.5

0.0
-4              -2                   0     2
Residual

b. In this Minitab test for normality, the points in the normal probability plot don't appear to deviate
excessively from a straight line and the approximate p-value is shown as >0.15. There is nothing here
to suggest that the residuals may not have come from a normally distributed population.
Probability Plot of RESI1
Normal
99
Mean      -5.77316E-15
StDev            2.852
95                                                            N                    8
KS               0.184
90
P-Value         >0.150
80
70
Percent

60
50
40
30
20

10

5

1
-7.5   -5.0        -2.5     0.0        2.5         5.0
RESI1

538
c. Plots of residuals versus the independent variables.
Plot of residuals versus ad size.
(response is Visitors)
4

3

2

1
Residual

0

-1

-2

-3

-4

-5
2        3           4          5              6        7          8

Plot of residuals versus discount size.
Residuals Versus Discount
(response is Visitors)
4

3

2

1
Residual

0

-1

-2

-3

-4

-5
10       20   30     40         50      60         70   80       90     100
Discount

The plots above do not reveal any alarming problems. Overall, there is no strong evidence to indicate that
any underlying assumptions of the multiple regression model have been violated.

539
16.40 p/c/m
a. The histogram does not reveal any radical departures from a symmetric distribution. Of course, there
are only 9 points.
Histogram of the Residuals
(response is Overall)

2.0

1.5
Frequency

1.0

0.5

0.0
-3      -2        -1          0          1         2             3
Residual

b. In this Minitab test for normality, the points in the normal probability plot seem to deviate somewhat
from the straight line, and the approximate p-value is shown as 0.056. This would seem to raise some
suspicions -- however, at the 0.05 level of significance, we would conclude that the residuals could
have come from a normally distributed population.
Probability Plot of RESI1
Normal
99
Mean      -2.68427E-14
StDev            2.259
95                                                                  N                    9
KS               0.271
90
P-Value          0.056
80
70
Percent

60
50
40
30
20

10

5

1
-5.0        -2.5         0.0             2.5        5.0
RESI1

540
c. Plots of residuals versus the independent variables. Considering the small n, the plots below do not
reveal any alarming problems. Overall, there is no strong evidence to indicate that any underlying
assumptions of the multiple regression model have been violated.

Plot of residuals versus Ride.
Residuals Versus Ride
(response is Overall)

3

2

1
Residual

0

-1

-2

-3

-4
6.0   6.5       7.0         7.5          8.0   8.5    9.0
Ride

Plot of residuals versus Handling.
Residuals Versus Handling
(response is Overall)

3

2

1
Residual

0

-1

-2

-3

-4
6.0   6.5       7.0        7.5           8.0   8.5    9.0
Handling

Plot of residuals versus Comfort.

541
Residuals Versus Comfort
(response is Overall)

3

2

1
Residual

0

-1

-2

-3

-4
7.0                 7.5              8.0                      8.5                   9.0
Comfort

16.41 p/c/m
a. This histogram does not appear to reveal any radical departures from a symmetric distribution,
although there are relatively few data points.
Histogram of the Residuals
(response is Crispness)

2.0

1.5
Frequency

1.0

0.5

0.0
-18     -12         -6       0       6             12         18         24
Residual

b. In this Minitab test for normality, the points in the normal probability plot don't appear to deviate
excessively from a straight line and the approximate p-value is shown as >0.15. There is nothing here
to suggest that the residuals may not have come from a normally distributed population.
Probability Plot of RESI1
Normal
99
Mean      2.583792E-15
StDev            13.81
95                                                                              N                   11
KS               0.130
90
P-Value         >0.150
80
70
Percent

60
50
40
30
20

10

5

1
-40         -30   -20     -10     0       10     20       30         40
RESI1

542
c. Plots of residuals versus the independent variables. The plots below do not reveal any alarming
problems. Overall, there is no strong evidence to indicate that any underlying assumptions of the
multiple regression model have been violated.

Plot of residuals versus time in oven.
Residuals Versus OvenTime
(response is Crispness)

20

10
Residual

0

-10

-20
5          6                 7                8           9
OvenTime

Plot of residuals versus oven temperature.
Residuals Versus Temp
(response is Crispness)

20

10
Residual

0

-10

-20
350   375           400             425       450         475
Temp

543
16.42 p/c/m
a. This histogram does not appear to reveal any radical departures from a symmetric distribution,
although there are relatively few data points.
Histogram of the Residuals
(response is Rating)

2.0

1.5
Frequency

1.0

0.5

0.0
-3          -2        -1          0         1     2             3
Residual

b. In this Minitab test for normality, the points in the normal probability plot don't appear to deviate
excessively from a straight line and the approximate p-value is shown as >0.15. There is nothing here
to suggest that the residuals may not have come from a normally distributed population.
Probability Plot of RESI1
Normal
99
Mean      -3.55271E-15
StDev            1.988
95                                                                N                    8
KS               0.170
90
P-Value         >0.150
80
70
Percent

60
50
40
30
20

10

5

1
-5.0        -2.5             0.0             2.5       5.0
RESI1

544
c. Plots of residuals versus the independent variables. The plots below do not reveal any alarming
problems. Overall, there is no strong evidence to indicate that any underlying assumptions of the
multiple regression model have been violated.

Plot of residuals versus price.
Residuals Versus Price
(response is Rating)

3

2

1
Residual

0

-1

-2

-3
1000    1200          1400             1600         1800         2000
Price

Plot of residuals versus performance.
Residuals Versus Perform
(response is Rating)

3

2

1
Residual

0

-1

-2

-3
90          95              100                    105          110
Perform

Plot of residuals versus battery life.

545
Residuals Versus BattLife
(response is Rating)

3

2

1
Residual

0

-1

-2

-3
2.0        2.2       2.4           2.6         2.8            3.0         3.2
BattLife

16.43 p/c/m
a. This histogram does not appear to reveal any radical departures from a symmetric distribution,
although there are relatively few data points.
Histogram of the Residuals
(response is CalcFin)

3.0

2.5

2.0
Frequency

1.5

1.0

0.5

0.0
-6          -4         -2              0             2              4
Residual

b. In this Minitab test for normality, the points in the normal probability plot don't appear to deviate
excessively from a straight line and the approximate p-value is shown as >0.15. There is nothing here
to suggest that the residuals may not have come from a normally distributed population.
Probability Plot of RESI1
Normal
99
Mean           0
StDev      3.488
95                                                                         N              9
KS         0.197
90
P-Value   >0.150
80
70
Percent

60
50
40
30
20

10

5

1
-10         -5             0                     5             10
RESI1

546
c. Plots of residuals versus the independent variables. The plots below do not reveal any alarming
problems. Overall, there is no strong evidence to indicate that any underlying assumptions of the
multiple regression model have been violated.

Plot of residuals versus math proficiency test.
Residuals Versus MathPro
(response is CalcFin)

5.0

2.5
Residual

0.0

-2.5

-5.0

70    75          80       85       90   95
MathPro

Plot of residuals versus SAT quantitative.
Residuals Versus SATQ
(response is CalcFin)

5.0

2.5
Residual

0.0

-2.5

-5.0

450        500               550         600         650
SATQ

547
16.44 p/c/m The Minitab printout is shown below.
Regression Analysis: Time versus Years, Score

The regression equation is Time = 104 - 0.288 Years - 0.679 Score

Predictor         Coef      SE Coef            T         P
Constant        103.85        16.69         6.22     0.000
Years          -0.2884       0.3216        -0.90     0.391
Score          -0.6792       0.2218        -3.06     0.012

S = 2.862        R-Sq = 53.9%        R-Sq(adj) = 44.7%

Analysis of Variance
Source            DF            SS            MS          F         P
Regression         2        95.757        47.879       5.84     0.021
Residual Error    10        81.935         8.194
Total             12       177.692

a. The multiple regression equation is Time = 103.85 - 0.2884*Years - 0.6792*Score.
The partial regression coefficient for years on the job indicates that, for a given score on the aptitude
test, the time it takes to perform the standard task decreases by 0.2884 seconds for each additional year
on the job. The partial regression coefficient for the test score indicates that, given a set number of
years on the job, a one point increase in the test score will result in a 0.6792 second decrease in the
amount of time required to perform the required task.
b. The appropriate number of degrees of freedom for this problem will be d.f. = 13 - 2 - 1, or 10, and the
appropriate t-value for a 95% confidence interval is t = 2.228.
The 95% confidence interval for population partial regression coefficient 1 is:
b1  t s b1 = -0.2884  2.228(0.3216) = -0.2884 ± 0.7165 = (-1.0049, 0.4281)
The 95% confidence interval for population partial regression coefficient 2 is:
b2  t s b 2 = -0.6792  2.228(0.2218) = -0.6792 ± 0.4942 = (-1.1734, -0.1850)
c. The coefficient of multiple determination is 0.539. This indicates that 53.9% of the variation in the
time required to complete the task is explained by the regression equation. The partial regression
coefficient for Years is significantly different from zero at the 0.391 level. The partial regression
coefficient for Score is significantly different from zero at the 0.012 level, and the overall regression
equation is significant at the 0.021 level.
d. The residual analyses follow. First, the histogram of the residuals is examined to see if it is symmetric
about zero. Next the normal probability plot is graphed and the p-value interpreted to examine whether
the residuals could have come from a normal population. Finally, the residuals are plotted against each
of the independent variables to check for cyclical or other patterns.

548
In the histogram, there does not appear to be any alarming deviation from a symmetric distribution.
Histogram of the Residuals
(response is Time)

3.0

2.5

2.0
Frequency

1.5

1.0

0.5

0.0
-4.5    -3.0        -1.5         0.0         1.5        3.0
Residual

In this Minitab test for normality, the points in the normal probability plot don't appear to deviate
excessively from a straight line but the approximate p-value is shown as >0.15. There is nothing here to
suggest that the residuals may not have come from a normally distributed population.
Probability Plot of RESI1
Normal
99
Mean       -1.31177E-14
StDev             2.613
95                                                             N                    13
KS                0.136
90
P-Value          >0.150
80
70
Percent

60
50
40
30
20

10

5

1
-7.5   -5.0   -2.5       0.0            2.5   5.0
RESI1

The plots of residuals versus the independent variables do not present any alarming patterns. Overall, the
residual analysis does not suggest that any of the assumptions underlying multiple regression analysis
have been violated.
Plot of residuals versus years on job.

549
Residuals Versus Years
(response is Time)
4

3

2

1
Residual

0

-1

-2

-3

-4

-5
5      6        7        8         9       10      11        12        13     14
Years

Plot of residuals versus test score.
Residuals Versus Score
(response is Time)
4

3

2

1
Residual

0

-1

-2

-3

-4

-5
70         72       74            76        78          80        82          84
Score

The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients. When generating
this printout, we can also specify a normal probability plot and plots of the residuals against the
independent variables. Their appearance would be essentially similar to those of Minitab.
A                 B                          C                     D            E               F            G
16         SUMMARY OUTPUT
17                Regression Statistics
18         Multiple R                   0.7341
19         R Square                     0.5389
21         Standard Error               2.8624
22         Observations                     13
23
24         ANOVA
25                                           df                   SS                   MS            F         Significance F
26         Regression                                   2           95.757              47.879        5.843              0.021
27         Residual                                    10           81.935               8.194
28         Total                                       12          177.692
29
30                                   Coefficients Standard Error                   t Stat          P-value      Lower 95%     Upper 95%
31         Intercept                   103.8529          16.6919                      6.2217           0.000           66.661    141.045
32         Years                          -0.2884         0.3216                     -0.8966           0.391           -1.005       0.428
33         Score                          -0.6792         0.2218                     -3.0623           0.012           -1.173      -0.185

550
16.45 p/c/m The Minitab printout is shown below.
Regression Analysis: Distance versus Price, Sensitiv, Weight

The regression equation is
Distance = - 0.562 +0.000355 Price + 0.0112 Sensitiv - 0.0212 Weight

Predictor          Coef       SE Coef            T          P
Constant        -0.5617        0.8656        -0.65      0.545
Price         0.0003550     0.0005601         0.63      0.554
Sensitiv       0.011248      0.007605         1.48      0.199
Weight         -0.02116       0.02471        -0.86      0.431

S = 0.05167       R-Sq = 46.5%         R-Sq(adj) = 14.4%

Analysis of Variance
Source            DF              SS            MS           F          P
Regression         3        0.011590      0.003863        1.45      0.334
Residual Error     5        0.013349      0.002670
Total              8        0.024939

a. The estimated regression equation is:
Distance = -0.5617 + 0.0003550*Price + 0.011248*Sensitiv - 0.02116*Weight.
The partial regression coefficient for the price indicates that, holding the weight and sensitivity
constant, a \$1 increase in price will result in a 0.0003550 mile increase in the warning distance.
The partial regression coefficient for the sensitivity indicates that, holding the price and weight
constant, a one unit increase in sensitivity will result in a 0.011248 mile increase in the warning
distance. Finally, the partial regression coefficient for the weight indicates that, holding the price and
sensitivity constant, a one ounce increase in weight will result in a 0.02116 mile decrease in the
warning distance.
b. The appropriate degrees of freedom for this problem will be d.f. = 9 - 3 - 1 = 5. The t-value for a 95%
confidence interval with 5 degrees of freedom is t = 2.571.
The 95% confidence interval for population partial regression coefficient 1 is:
b1  t s b1 = 0.000355  2.571(0.0005601) = 0.000355 ± 0.001440 = (-0.0011, 0.0018)
The 95% confidence interval for population partial regression coefficient 2 is:
b2  t s b 2 = 0.011248  2.571(0.007605) = 0.011248 ± 0.019552 = (-0.0083, 0.0308)
The 95% confidence interval for population partial regression coefficient 3 is:
b3  t s b3 = -0.02116  2.571(0.02471) = -0.02116 ± 0.06353 = (-0.0847, 0.0424)
c. The coefficient of multiple determination is 0.465. This indicates that 46.5% of the variation in the
warning distance is explained by the regression equation. However, none of the partial regression
coefficients is significant at the 0.10 level. (The coefficient for price is significant at the 0.554 level;

551
for sensitivity, at the 0.199 level; and, for weight, at the 0.431 level.) The overall regression is only
significant at the 0.334 level. The adjusted R-square is 0.144. Recall that this has been adjusted for the
degrees of freedom. Thus, there are no significant relationships in this regression. Apparently the
coefficient of multiple determination is as large as it is because of the limited size of the data set.
d. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
about zero. Next the normal probability plot is graphed to examine whether the residuals could have
come from a normally distributed population. Finally, the residuals are plotted against each of the
independent variables to check for cyclical patterns.

In the following histogram of residuals, there seems to be a slight deviation from a symmetric
distribution, but the number of data values is relatively small.
Histogram of the Residuals
(response is Distance)

4

3
Frequency

2

1

0
-0.075           -0.050      -0.025      0.000     0.025       0.050
Residual

In this Minitab test for normality, the points in the normal probability plot appear to deviate excessively
from a straight line and the approximate p-value is shown as 0.048. At the 0.05 level of significance,
we would conclude that the residuals did not come from a normally distributed population.
Probability Plot of RESI1
Normal
99
Mean      9.868649E-17
StDev          0.04085
95                                                                      N                    9
KS               0.276
90
P-Value          0.048
80
70
Percent

60
50
40
30
20

10

5

1
-0.10            -0.05             0.00            0.05    0.10
RESI1

552
The plots of residuals versus the independent variables are shown below. Given the relatively small
number of data points, none of the three plots shows any alarming patterns. In the third plot, most of the
unusual pattern is due to the underlying data, with most of the weights clustered about the six ounce level,
while one of the detectors weighs only 3.8 ounces.
Residuals Versus Price
(response is Distance)
0.050

0.025

0.000
Residual

-0.025

-0.050

-0.075
200           220                240               260         280         300
Price

Residuals Versus Sensitiv
(response is Distance)
0.050

0.025

0.000
Residual

-0.025

-0.050

-0.075
102           104               106            108         110           112
Sensitiv

Residuals Versus Weight
(response is Distance)
0.050

0.025

0.000
Residual

-0.025

-0.050

-0.075
4.0          4.5              5.0           5.5    6.0         6.5
Weight

553
Overall, the residual analysis suggests that the residuals may not have come from a normally distributed
population. If this is true, then one of the underlying assumptions has been violated and the multiple
regression analysis may not be valid.

The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients.
A                     B                  C             D          E                                F                       G
11                                                               Distance        Price    Sensitivity                     Weight
12   SUMMARY OUTPUT                                               0.675          289        108                              3.8
13          Regression Statistics                                 0.660          295        110                              6.1
14   Multiple R                   0.6817                          0.640          240        108                              5.8
15   R Square                     0.4647                          0.560          249        103                              6.6
16   Adjusted R Square            0.1436                          0.540          260        107                              6.0
17   Standard Error               0.0517                          0.640          200        108                              5.8
18   Observations                      9                          0.540          199        109                              5.9
19                                                                0.645          220        108                              5.8
20   ANOVA                                                        0.670          250        112                              6.2
21                                            df                   SS            MS          F                         Significance F
22   Regression                                         3              0.0116      0.0039    1.4471                              0.3342
23   Residual                                           5              0.0133      0.0027
24   Total                                              8              0.0249
25
26                                        Coefficients Standard Error            t Stat                P-value          Lower 95%     Upper 95%
27   Intercept                               -0.56174          0.8656              -0.6490                0.5450              -2.7868      1.6633
28   Price                                    0.00035          0.0006               0.6338                0.5541              -0.0011      0.0018
29   Sensitivity                              0.01125          0.0076               1.4790                0.1992              -0.0083      0.0308
30   Weight                                  -0.02116          0.0247              -0.8564                0.4309              -0.0847      0.0424

When generating the Excel printout, we can also specify a normal probability plot and plots of the
residuals against the independent variables. Their appearance is essentially similar to those of Minitab.
J                    K               L              M          N           O                    P            Q                  R             S
1
2                               Normal Probability Plot                                                         Price Residual Plot
0.8                                                                         0.06
3                                                                                                0.04
0.6
Distance

4
Residuals

0.02
5                    0.4                                                                            0
6                    0.2                                                                        -0.02100        150     200           250       300       350
7                                                                                               -0.04
0                                                                         -0.06
8                          0         20      40       60             80    100                  -0.08
9                                          Sample Percentile                                                                  Price
10

J                    K               L              M          N           O                    P            Q                  R             S
11
12                              Sensitivity Residual Plot                                                   Weight Residual Plot
0.06                                                                       0.06
13                    0.04                                                                       0.04
14                    0.02
Residuals

Residuals

0.02
15                       0                                                                          0
16                   -0.02102     104      106       108       110   112   114                  -0.02 2.0        3.0    4.0           5.0       6.0       7.0
17                   -0.04                                                                      -0.04
18                   -0.06                                                                      -0.06
-0.08                                                                      -0.08
19                                               Sensitivity                                                                 Weight
20

16.46 d/p/e A dummy variable is a variable that takes on a value of one or zero to indicate the presence or
absence of an attribute. Dummy variables can help explain some of the variation in y due to the presence
or absence of a characteristic. Three dummy variables that can be used to describe one town versus
another are URBAN (1 if urban, 0 otherwise), MANUF (1 if durable goods manufacturing is the major

554
industry, 0 otherwise), and POPMIL (1 if the population is 1 million or more, 0 otherwise). Other dummy
variables could include the presence of a major university, a major medical center, a major research
institution, and many more.

16.47 p/p/e The partial regression coefficient for x1 implies that, holding the day of the week constant,
a one degree Fahrenheit increase in the temperature will result in an increase of 8 in attendance.
The partial regression coefficient for x2 implies that the attendance increases by 150 people on Saturdays
and Sundays (assuming a constant temperature).

16.48 p/p/m The estimate of 100 persons swimming on a zero-degree weekday is made well beyond the
limits of the underlying temperature data. It is always dangerous to extrapolate beyond the bounds of the
data used to estimate an equation.

16.49 d/p/m Multicollinearity is a situation in which two or more of the independent variables in a
multiple regression are highly correlated with each other. When this happens, the two correlated x
variables are really not saying different things about y. The standard errors for the partial regression
coefficients become very large and the coefficients are statistically unreliable and difficult to interpret.
Multicollinearity is a problem when we are trying to interpret the partial regression coefficients.
There are several clues to the presence of multicollinearity: (1) an independent variable known to be an
important predictor ends up having a partial regression coefficient that is not significant;
(2) a partial regression coefficient exhibits the wrong sign; and/or, (3) when an independent variable is
added or deleted, the partial regression coefficients for the other variables change dramatically.
A more practical way to identify multicollinearity is through the examination of a correlation matrix,
which is a matrix that shows the correlation of each variable with each of the other variables.
A high correlation between two independent variables is an indication of multicollinearity.

16.50 p/c/m The Minitab printout is shown below.
Regression Analysis: Pounds versus Months, Session, Gender

The regression equation is Pounds = 2.24 + 3.36 Months + 1.54 Session + 3.02 Gender

Predictor          Coef      SE Coef            T         P
Constant          2.243        8.876         0.25     0.807
Months            3.356        1.271         2.64     0.030
Session           1.538        6.791         0.23     0.826
Gender            3.018        6.671         0.45     0.663

S = 11.39         R-Sq = 48.5%        R-Sq(adj) = 29.1%

Analysis of Variance
Source            DF             SS            MS           F         P
Regression         3          975.0         325.0        2.51     0.133
Residual Error     8         1037.3         129.7
Total             11         2012.2

The partial regression coefficient for Months implies that, holding session and gender constant, an
additional month at the weight-loss clinic results in an additional weight loss of 3.356 pounds. The partial
regression coefficient for Session implies that persons attending the day sessions, holding months and
gender constant, lose 1.538 more pounds than those attending the night sessions. The partial regression
coefficient for Gender implies that, holding months and session constant, men lose 3.018 more pounds
than women. Of course, the partial regression coefficients for Session and Gender have p-values of 0.826
and 0.663, respectively. This indicates that the true coefficients are likely not different from zero.

555
Therefore, Months (p-value of 0.030) contributes the most to the explanatory power of this regression
equation.

The data and Excel multiple regression solution for this exercise are shown below.
A              B                 C            D           E                F           G
1                                            Pounds Lost    Months      Session        Gender
2                                                31           5            1               1
3                                                49           8            1               1
4                                                12           3            1               0
5   SUMMARY OUTPUT                               26           9            0               0
6          Regression Statistics                 34           8            0               1
7   Multiple R                   0.6961          11           2            0               0
8   R Square                     0.4845           4           1            0               1
9   Adjusted R Square            0.2912          27           8            0               1
10   Standard Error             11.3867           12           6            1               1
11   Observations                     12          28           9            1               0
12                                                41           6            0               0
13   ANOVA                                        16           6            0               0
14                              df               SS           MS           F         Significance F
15   Regression                         3         974.9973   324.9991      2.5066             0.1329
16   Residual                           8        1037.2527   129.6566
17   Total                             11        2012.2500
18
19                         Coefficients Standard Error       t Stat      P-value      Lower 95%     Upper 95%
20   Intercept                   2.2435         8.8760          0.2528      0.8068        -18.2246     22.7115
21   Months                      3.3561         1.2714          2.6396      0.0297           0.4241      6.2880
22   Session                     1.5383         6.7911          0.2265      0.8265        -14.1220     17.1987
23   Gender                      3.0176         6.6710          0.4523      0.6630        -12.3658     18.4010

16.51 p/c/m The Minitab printout is shown below.
Regression Analysis: Speed versus Occupnts, SeatBelt

The regression equation is Speed = 67.6 - 3.21 Occupnts - 6.63 SeatBelt

Predictor               Coef          SE Coef            T         P
Constant              67.629            5.017        13.48     0.000
Occupnts              -3.214            2.191        -1.47     0.170
SeatBelt              -6.629            3.200        -2.07     0.063

S = 5.465             R-Sq = 31.5%             R-Sq(adj) = 19.1%

Analysis of Variance
Source            DF                      SS            MS          F          P
Regression         2                  151.20         75.60       2.53      0.125
Residual Error    11                  328.51         29.86
Total             13                  479.71

The partial regression coefficient for Occupnts implies that, holding seat belt usage constant, the speed
decreases by 3.214 miles per hour for each additional occupant in the car. The partial regression
coefficient for SeatBelt implies that, for a given number of occupants, drivers who wear seat belts travel
6.629 miles per hour slower than those who do not. The p-value for Occupnts is 0.170; this implies that
the partial regression coefficient for this variable is not significantly different from zero. The p-value for
SeatBelt is 0.063; this implies that the partial regression coefficient for this variable is significantly
different from zero at the 0.063 level. It appears that seat belt usage provides a much stronger explanation
for the variation in speeds driven by various drivers than does the number of occupants in the car.

556
The Excel multiple regression solution for this exercise is shown below.
D                 E              F             G           H            I           J
2
3           Regression Statistics
4    Multiple R                   0.5614
5    R Square                     0.3152
7    Standard Error               5.4649
8    Observations                     14
9
10   ANOVA
11                              df            SS            MS           F      Significance F
12   Regression                       2       151.2000      75.6000      2.5314         0.1246
13   Residual                        11       328.5143      29.8649
14   Total                           13       479.7143
15
16                          Coefficients Standard Error    t Stat      P-value    Lower 95%   Upper 95%
17   Intercept                  67.6286          5.0172     13.4795      0.0000       56.5859    78.6713
18   Occupnts                    -3.2143         2.1908      -1.4672     0.1703       -8.0363     1.6077
19   SeatBelt                    -6.6286         3.1999      -2.0715     0.0626      -13.6715     0.4144

CHAPTER EXERCISES
16.52 p/c/m The Minitab printout is shown below.
Regression Analysis: Tip versus Check, Diners

The regression equation is Tip = - 1.92 + 0.223 Check - 0.184 Diners

Predictor             Coef        SE Coef               T       P
Constant            -1.915          1.598           -1.20   0.284
Check              0.22275        0.04608            4.83   0.005
Diners             -0.1845         0.4133           -0.45   0.674

S = 1.524            R-Sq = 83.4%           R-Sq(adj) = 76.8%

Analysis of Variance
Source            DF                  SS            MS           F           P
Regression         2              58.389        29.194       12.57       0.011
Residual Error     5              11.611         2.322
Total              7              70.000

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                           95.0% PI
1         6.441      0.800   (   4.385,   8.498)              (     2.017, 10.866)

Values of Predictors for New Observations
New Obs     Check    Diners
1            40.0      3.00

a. The regression equation is: Tip = -1.915 + 0.22275*Check - 0.1845*Diners
The partial regression coefficient for the check indicates that, the number of diners held constant, a \$1
increase in the check will result in a \$0.22275 increase in the tip. The partial regression coefficient for

557
the number of diners indicates that, holding the size of the check constant, an additional diner will
result in a tip that is \$0.1845 smaller.
b.          The estimated tip amount for three diners who have a \$40 check is \$6.441.
c.          The 95% prediction interval for the tip left by a dining party like the one in part b is \$2.017 to \$10.866.
d.          The 95% confidence interval for the mean tip left by all dining parties like the one in part b is \$4.385
to \$8.498.
e.          The appropriate value for d.f. will be d.f. = 8 - 2 – 1 = 5. The t-value for a 95% confidence interval
with 5 degrees of freedom is t = 2.571.
The 95% confidence interval for population partial regression coefficient 1 is:
b1  t s b1 = 0.22275  2.571(0.04608) = 0.22275 ± 0.11847 = (0.1043, 0.3412)
The 95% confidence interval for population partial regression coefficient 2 is:
b2  t s b 2 = -0.1845  2.571(0.4133) = -0.1845 ± 1.0626 = (-1.2471, 0.8781)
f. The significance tests for the partial regression coefficients show that the partial regression coefficient
for the size of the check is significant at the 0.005 level, while the partial regression coefficient for the
number of diners is significant at the 0.674 level. Thus, the size of the check is much more useful in
predicting the size of the tip than the number of diners. The overall regression is significant at the
0.011 level. The coefficient of multiple determination indicates that 83.4% of the variation in the size
of the tip is explained by the regression.
g. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
about zero. Next the normal probability plot is graphed to examine whether the residuals could have
come from a normally distributed population. Finally, the residuals are plotted against each of the
independent variables to check for cyclical patterns.

In the following histogram of the residuals, there are a lot of values in the category with 1.0 as the
midpoint. This is some cause for concern, even though there are relatively few observations in the data
set.
Histogram of the Residuals
(response is Tip)

4

3
Frequency

2

1

0
-2.0   -1.5       -1.0      -0.5        0.0   0.5   1.0
Residual

In this Minitab test for normality, the points in the normal probability plot seem to deviate excessively
from a straight line and the approximate p-value is shown as 0.040. At the 0.05 level of significance,
we would conclude that the residuals did not come from a normally distributed population.

558
Probability Plot of RESI1
Normal
99
Mean       2.498002E-16
StDev             1.288
95                                                           N                     8
KS                0.300
90
P-Value           0.040
80
70
Percent

60
50
40
30
20

10

5

1
-3        -2   -1         0         1        2   3
RESI1

The plots for residuals versus the independent variables are shown below. No alarming patterns seem to
be present in the two charts that follow. However, overall, the residual analysis provides some evidence to
suggest that the residuals may have come from a non-normally distributed population.

Residuals versus size of check.
Residuals Versus Check
(response is Tip)

1.0

0.5

0.0
Residual

-0.5

-1.0

-1.5

-2.0

-2.5
10         20                 30            40                50
Check

Residuals versus number of diners.
Residuals Versus Diners
(response is Tip)

1.0

0.5

0.0
Residual

-0.5

-1.0

-1.5

-2.0

-2.5
1              2                3           4                  5
Diners

559
The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients.
A                 B                    C           D           E                F           G
12   SUMMARY OUTPUT                                              Tip        Check           Diners
13          Regression Statistics                                7.5          40               2
14   Multiple R                   0.9133                         0.5          15               1
15   R Square                     0.8341                         2.0          30               3
16   Adjusted R Square            0.7678                         3.5          25               4
17   Standard Error               1.5239                         9.5          50               4
18   Observations                      8                         2.5          20               5
19                                                               3.5          35               5
20   ANOVA                                                       1.0          10               2
21                                df              SS             MS           F         Significance F
22   Regression                          2         58.3886       29.1943     12.5714             0.0112
23   Residual                            5         11.6114        2.3223
24   Total                               7         70.0000
25
26                           Coefficients Standard Error        t Stat      P-value      Lower 95%     Upper 95%
27   Intercept                    -1.9154         1.5980          -1.1986      0.2844          -6.0231      2.1923
28   Check                         0.2228         0.0461           4.8337      0.0047           0.1043      0.3412
29   Diners                       -0.1845         0.4133          -0.4463      0.6741          -1.2469      0.8780

Excel has generated the optional normal probability plot and plots of the residuals against the independent
variables. Their appearance is essentially similar to those of Minitab.
J             K          L          M            N
1
2                      Normal Probability Plot
10
3
8
4
6
Tip

5
4
6
2
7
0
8
0    20     40       60       80       100
9                            Sample Percentile
10

560
J                K          L          M           N        O                 P        Q         R       S
11
12                          Check Residual Plot                                         Diners Residual Plot
13                2                                                              2
14                1                                                              1
Residuals

Residuals
15                0                                                               0
16                -1 0     10    20       30    40      50   60                  -1 0   1    2       3     4   5       6
17                -2                                                             -2
18                -3
-3
19                                    Check                                                       Diners
20

16.53 p/c/m The Minitab printout is shown below.
Regression Analysis: AllFruit versus Apples, Grapes

The regression equation is AllFruit = 99.9 + 1.24 Apples + 0.822 Grapes

Predictor                  Coef   SE Coef           T        P
Constant                 99.865     2.952       33.83    0.000
Apples                  1.23640   0.09971       12.40    0.001
Grapes                   0.8221    0.2307        3.56    0.038

S = 0.269451               R-Sq = 98.1%         R-Sq(adj) = 96.9%

Analysis of Variance
Source          DF       SS                         MS        F        P
Regression       2 11.3555                      5.6778    78.20    0.003
Residual Error   3   0.2178                     0.0726
Total            5 11.5733

Predicted Values for New Observations
New
Obs      Fit SE Fit         95% CI              95% PI
1 125.816    0.433 (124.439, 127.193) (124.194, 127.438)XX
XX denotes a point that is an extreme outlier in the predictors.

Values of Predictors for New Observations
New
Obs Apples Grapes
1    17.0    6.00

a. The regression equation is AllFruit = 99.865 + 1.2364*Apples + 0.8221*Grapes.
The partial regression coefficient for apples implies that, holding the consumption of grapes constant,
a one pound increase in the consumption of apples will result in a 1.2364 pound increase in the
consumption of all fresh fruits. The partial regression coefficient for grapes implies that, holding apple
consumption constant, a one pound increase in the consumption of grapes will result in a 0.8221
pound increase in the consumption of all fresh fruits.
b. The estimated per capita consumption of all fresh fruits during a year when 17 pounds of apples and 6
pounds of grapes are consumed is 125.816 pounds.
c. The 95% prediction interval for per capita consumption during a year like the one in part b is

561
124.194 to 127.438 pounds.
d. The 95% confidence interval for mean per capita consumption during all years like the one in part b is
124.439 to 127.193 pounds.
e. For this problem, the appropriate d.f. = 6 – 2 – 1 = 3. The t-value for a 95% confidence interval with 3
degrees of freedom is 3.182.
The 95% confidence interval for population partial regression coefficient 1 is:
b1  t s b1 = 1.2364  3.182(0.09971) = 1.2364 ± 0.3173 = (0.92, 1.55)
The 95% confidence interval for population partial regression coefficient 2 is:
b2  t s b 2 = 0.8221  3.182(0.2307) = 0.8221 ± 0.7341 = (0.09, 1.56)
f. Both of the partial regression coefficients are impressive (apples p-value, 0.001; grapes p-value,
0.038). Also, the overall regression is highly significant; p-value = 0.003.The coefficient of multiple
determination is 0.981. This regression appears to do a very good job of explaining the variation in
fresh fruit consumption.
g. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
about zero. Next the normal probability plot is graphed to examine whether the residuals could have
come from a normally distributed population. Finally, the residuals are plotted against each of the
independent variables to check for cyclical patterns.

The histogram of the residuals is shown below. This histogram offers no reason to believe that the
residuals may not have come from a normally distributed population.
Histogram of the Residuals
(response is AllFruit)

2.0

1.5
Frequency

1.0

0.5

0.0
-0.3   -0.2    -0.1       0.0           0.1   0.2   0.3
Residual

In this Minitab test for normality, the points in the normal probability plot do not deviate excessively
from a straight line and the approximate p-value is shown as > 0.150. Our conclusion is that the residuals
could have come from a normally distributed population.

562
Probability Plot of RESI1
Normal
99
Mean      -2.13163E-14
StDev           0.2087
95                                                                    N                    6
KS               0.125
90
P-Value         >0.150
80
70
Percent

60
50
40
30
20

10

5

1
-0.50          -0.25           0.00            0.25       0.50
RESI1

The plots for residuals versus the independent variables are shown below. For this small data set, no
alarming patterns seem to be present. Overall, the residual analysis provides no evidence to suggest that
the assumptions for the multiple regression model have not been satisfied.

Residuals versus per-capita apple consumption.
Residuals Versus Apples
(response is AllFruit)

0.3

0.2

0.1
Residual

0.0

-0.1

-0.2

-0.3
16.0           16.5     17.0        17.5       18.0      18.5        19.0
Apples

Residuals versus per-capita grape consumption.

563
Residuals Versus Grapes
(response is AllFruit)

0.3

0.2

0.1
Residual

0.0

-0.1

-0.2

-0.3
7.0   7.2   7.4      7.6      7.8    8.0       8.2   8.4   8.6        8.8
Grapes

The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients.
E                            F                    G                H            I             J           K
1          SUMMARY OUTPUT
2
3                 Regression Statistics
4          Multiple R                   0.9905
5          R Square                     0.9812
7          Standard Error               0.2695
8          Observations                      6
9
10         ANOVA
11                                           df                   SS               MS           F       Significance F
12         Regression                                   2          11.3555         5.6778       78.2017          0.0026
13         Residual                                     3           0.2178         0.0726
14         Total                                        5          11.5733
15
16                                     Coefficients Standard Error                 t Stat     P-value     Lower 95%    Upper 95%
17         Intercept                        99.8648         2.9516                 33.8343       0.0001       90.4715    109.2581
18         Apples                            1.2364         0.0997                 12.4001       0.0011         0.9191     1.5537
19         Grapes                            0.8221         0.2307                   3.5632      0.0377         0.0878     1.5563

Excel has generated the optional normal probability plot and plots of the residuals against the independent
variables. Their appearance is essentially similar to those of Minitab.

564
O                   P            Q             R           S
15
16                           Normal Probability Plot
17               131
130
18
AllFruit
129
19               128
20               127
126
21               125
22                      0        20       40      60        80     100
23
24                                    Sample Percentile
25

O                   P            Q             R           S                   O                   P        Q         R   S
26                                                                        37
27                            Apples Residual Plot                        38                                Grapes Residual Plot
28               0.4                                                      39                   0.4
29                                                                        40
Residuals

Residuals
0.2                                                                           0.2
30                                                                        41
0                                                                             0
31                                                                        42
15       16       17      18        19      20                                7.0                8.0       9.0
32               -0.2                                                     43                   -0.2
33               -0.4                                                     44
-0.4
34                                                                        45
35                                         Apples                         46                                         Grapes
36                                                                        47

16.54 p/c/m The Minitab printout is shown below.
Regression Analysis: Salary versus GPA, Activities

The regression equation is Salary = 24.3 + 3.84 GPA + 1.68 Activities

Predictor                       Coef            SE Coef               T        P
Constant                      24.309              3.192            7.62    0.000
GPA                            3.842              1.234            3.11    0.017
Activiti                      1.6810             0.5291            3.18    0.016

S = 1.448                     R-Sq = 82.4%               R-Sq(adj) = 77.4%

Analysis of Variance
Source            DF                                SS                MS            F                       P
Regression         2                            68.924            34.462        16.44                   0.002
Residual Error     7                            14.676             2.097
Total              9                            83.600

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                                                95.0% PI
1        43.182      1.131   ( 40.506, 45.858)                                  (       38.834, 47.530)

Values of Predictors for New Observations
New Obs       GPA Activiti
1            3.60      3.00

a. The regression equation is: Salary = 24.309 + 3.842*GPA + 1.6810*Activities.

565
The partial regression coefficient for the GPA indicates that, holding the number of activities constant,
a one point increase in GPA will result in a starting salary that is \$3842 higher. The partial regression
coefficient for the number of activities indicates that, holding the GPA constant, an additional activity
will result in a starting salary that is \$1681 higher.
b.          The estimated starting salary for Dave (3.6 grade point average and 3 activities) is \$43,182.
c.          The 95% prediction interval for the starting salary for Dave is between \$38,834 and \$47,530.
d.          The 95% confidence interval for the mean starting salary for all persons like Dave (i.e., 3.6 GPA and 3
activities) is between \$40,506 and \$45,858.
e.          For this problem, the appropriate d.f. = 10 - 2 - 1 = 7. The t-value for a 95% confidence interval with 7
degrees of freedom is 2.365.
The 95% confidence interval for population partial regression coefficient 1 is:
b1  t s b1 = 3.842  2.365(1.234) = 3.842 ± 2.918 = (0.924, 6.760)
The 95% confidence interval for population partial regression coefficient 2 is:
b2  t s b 2 = 1.6810  2.365(0.5291) = 1.6810 ± 1.2513 = (0.4297, 2.9323)
f. The partial regression coefficients for grade point average and activities are both significant at the 0.05
level. (GPA p-value, 0.017; Activities p-value, 0.016) The overall regression is significant at the 0.002
level. The coefficient of multiple determination indicates that 82.4% of the variation in starting salaries
is explained by the regression.
g. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
about zero. Next the normal probability plot is graphed to examine whether the residuals could have
come from a normally distributed population. Finally, the residuals are plotted against each of the
independent variables to check for cyclical patterns.

Shown below, the histogram is fairly symmetric and there is no evidence to suggest that the residuals may
not have come from a normal distribution.
Histogram of the Residuals
(response is Salary)

2.0

1.5
Frequency

1.0

0.5

0.0
-2.4       -1.2             0.0          1.2   2.4
Residual

In this Minitab test for normality, the points in the normal probability plot do not deviate excessively
from a straight line and the approximate p-value is shown as > 0.150. Our conclusion is that the residuals
could have come from a normally distributed population.

566
Probability Plot of RESI1
Normal
99
Mean      -1.42109E-15
StDev            1.277
95                                                          N                   10
KS               0.125
90
P-Value         >0.150
80
70
Percent

60
50
40
30
20

10

5

1
-3    -2    -1         0         1          2    3
RESI1

The plots for residuals versus the independent variables are shown below. For this small data set, no
alarming patterns seem to be present. Overall, the residual analysis provides no evidence to suggest that
the assumptions for the multiple regression model have not been satisfied.

Residuals Versus GPA
(response is Salary)

2

1
Residual

0

-1

-2

2.0   2.2   2.4       2.6        2.8     3.0     3.2    3.4         3.6
GPA

Residuals versus number of activities.

567
Residuals Versus Activities
(response is Salary)

2

1
Residual

0

-1

-2

2.0         2.5       3.0           3.5       4.0    4.5    5.0
Activities

The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients.
E                   F                   G            H              I                 J         K
1                                                               Salary           GPA      Activities
2         SUMMARY OUTPUT                                            40            3.2              2
3                                                                   46            3.6              5
4                Regression Statistics                              38            2.8              3
5         Multiple R                   0.9080                       39            2.4              4
6         R Square                     0.8245                       37            2.5              2
7         Adjusted R Square            0.7743                       38            2.1              3
8         Standard Error               1.4479                       42            2.7              3
9         Observations                     10                       37            2.6              2
10                                                                   44            3.0              4
11         ANOVA                                                     41            2.9              3
12                                          df                 SS           MS             F          Significance F
13         Regression                                  2        68.9244      34.4622       16.4379             0.0023
14         Residual                                    7        14.6756       2.0965
15         Total                                       9        83.6000
16                                                                                                                           Lower 95.0%
17                                    Coefficients Standard Error          t Stat        P-value       Lower 95%   Upper 95%
18         Intercept                      24.3092         3.1919               7.6159        0.0001        16.7615    31.8569
19         GPA                              3.8416        1.2342               3.1127        0.0170         0.9233     6.7600
20         Activities                       1.6810        0.5291               3.1768        0.0156         0.4298     2.9322

568
Excel has generated the optional normal probability plot and plots of the residuals against the independent
variables. Their appearance is essentially similar to those of Minitab.
N                    O           P            Q            R
1
2                            Normal Probability Plot
3                 60
4
Salary

40
5
20
6
7                 0
8                      0                   50                     100
9                                   Sample Percentile
10

N                    O          P            Q            R            S                T          U            V   W
11
12                                GPA Residual Plot                                             Activities Residual Plot
13                 4                                                                    4

Residuals
Residuals

14                 2                                                                    2
15                 0                                                                    0
16                -2                                                                   -2
17                -4                                                                   -4
18                     0.0         1.0       2.0          3.0      4.0                      0           2                4       6
19                                           GPA                                                            Activities
20

16.55 p/c/m The Minitab printout is shown below.
Regression Analysis: FrGPA versus SAT, HSRank

The regression equation is FrGPA = - 1.98 + 0.00372 SAT + 0.00658 HSRank

Predictor                       Coef        SE Coef                   T       P
Constant                      -1.984          1.532               -1.30   0.218
SAT                         0.003719       0.001562                2.38   0.033
HSRank                      0.006585       0.008023                0.82   0.427

S = 0.4651                    R-Sq = 45.2%             R-Sq(adj) = 36.8%

Analysis of Variance
Source            DF                              SS                 MS         F                    P
Regression         2                          2.3244             1.1622      5.37                0.020
Residual Error    13                          2.8125             0.2163
Total             15                          5.1370

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                                                    95.0% PI
1         2.634      0.125   (   2.365,   2.904)                             (               1.594,   3.674)

Values of Predictors for New Observations
New Obs       SAT    HSRank
1            1100      80.0

a. The regression equation is: FrGPA = -1.984 + 0.003719*SAT + 0.006585*HSRank.

569
The partial regression coefficient for the SAT score indicates that, holding the rank constant,
a 1 point increase in the SAT score will result in a 0.003719 point increase in the freshman GPA.
The coefficient for the high school rank indicates that, holding the SAT score constant, a 1 point
increase in the high school rank will result in a 0.006585 point increase in freshman GPA.
b.         The estimated freshman GPA for a student who scored 1100 on the SAT and had a class rank of 80%
is 2.634.
c.         The 95% prediction interval for the GPA for a student like the one in part b is between 1.594 and
3.674.
d.         The 95% confidence interval for the mean GPA for all students like the one in part b is 2.365 to 2.904.
e.         For this problem, the appropriate d.f. = 16 - 2 - 1 = 13. The t-value for a 95% interval with 13 degrees
of freedom is 2.160.
The 95% confidence interval for population partial regression coefficient 1 is:
b1  t s b1 = 0.003719  2.160(0.001562) = 0.003719 ± 0.003374 = (0.000345, 0.007093)
The 95% confidence interval for population partial regression coefficient 2 is:
b2  t s b 2 = 0.006585  2.160(0.008023) = 0.006585 ± 0.017330 = (-0.010745, 0.023915)
f. The partial regression coefficient for the SAT score is significantly different from zero at the 0.033
level. The partial regression coefficient for the high school rank is not significantly different from zero
(p-value = 0.427). The overall regression is significant at the 0.020 level.
g. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
about zero. Next the normal probability plot is graphed to examine whether the residuals could have
come from a normally distributed population. Finally, the residuals are plotted against each of the
independent variables to check for cyclical patterns.

The histogram below seems to be fairly symmetric about zero.
Histogram of the Residuals
(response is FrGPA)

7

6

5
Frequency

4

3

2

1

0
-0.6   -0.4       -0.2      0.0          0.2   0.4   0.6
Residual

In this Minitab test for normality, the points in the normal probability plot do not deviate excessively
from a straight line and the approximate p-value is shown as > 0.150. Our conclusion is that the residuals
could have come from a normally distributed population.

570
Probability Plot of RESI1
Normal
99
Mean      -9.99201E-16
StDev           0.4330
95                                                       N                   16
KS               0.167
90
P-Value         >0.150
80
70
Percent

60
50
40
30
20

10

5

1
-1.0     -0.5         0.0             0.5     1.0
RESI1

The plots for residuals versus the independent variables are shown below. Neither of the plots reveals any
alarming patterns that suggest the underlying assumptions of the multiple regression analysis may have
been violated. Overall, the residual analysis does not reveal anything to suggest that the assumptions
underlying the multiple regression analysis have been violated.

Residuals versus SAT score.
Residuals Versus SAT
(response is FrGPA)

0.50

0.25
Residual

0.00

-0.25

-0.50

-0.75
950    1000       1050         1100      1150    1200            1250
SAT

571
Residuals versus high school rank.
Residuals Versus HSRank
(response is FrGPA)

0.50

0.25
Residual

0.00

-0.25

-0.50

-0.75
40    50       60          70        80        90         100
HSRank

The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients.
A                 B                 C               D           E               F           G
20         SUMMARY OUTPUT
21                Regression Statistics
22         Multiple R                   0.6727
23         R Square                     0.4525
25         Standard Error               0.4651
26         Observations                     16
27
28         ANOVA
29                                     df               SS              MS           F         Significance F
30         Regression                         2              2.3244      1.1622      5.3719             0.0199
31         Residual                          13              2.8125      0.2163
32         Total                             15              5.1370
33
34                               Coefficients Standard Error           t Stat      P-value      Lower 95%    Upper 95%
35         Intercept               -1.983878         1.53190           -1.29505      0.21783        -5.29334    1.32558
36         SAT                      0.003719         0.00156            2.38050      0.03328         0.00034    0.00709
37         HS Rank                  0.006585         0.00802            0.82074      0.42659        -0.01075    0.02392

Excel has generated the optional normal probability plot and plots of the residuals against the independent
variables. Their appearance is essentially similar to those of Minitab.

572
J                K            L           M           N
1
2                          Normal Probability Plot
3                5
4
Fr. GPA     4
5                3
2
6
1
7                0
8
0                  50                     100
9                                Sample Percentile
10

J                K           L             M         N            O                  P          Q        R        S
11
12                              SAT Residual Plot                                              HS Rank Residual Plot
13                1.0                                                                1.0
Residuals

Residuals
14                0.5                                                                0.5
15                0.0                                                                0.0
16               -0.5                                                               -0.5
17               -1.0                                                               -1.0
18                   900         1000       1100      1200     1300                     40.0        60.0     80.0  100.0   120.0
19                                          SAT                                                            HS Rank
20

16.56 p/c/m The Minitab printout is shown below.
Regression Analysis: Price versus Acres, SqFeet, CentralAir

The regression equation is Price = 36045 + 15663 Acres + 10.9 SqFeet + 4181 CentralAir

Predictor                     Coef           SE Coef              T        P
Constant                     36045             14539           2.48    0.025
Acres                        15663              5716           2.74    0.015
SqFeet                      10.875             4.959           2.19    0.043
CentralA                      4181              5652           0.74    0.470

S = 12321                   R-Sq = 41.8%             R-Sq(adj) = 30.9%

Analysis of Variance
Source            DF                         SS                   MS          F                     P
Regression         3                 1745571591            581857197       3.83                 0.030
Residual Error    16                 2428953909            151809619
Total             19                 4174525500

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                                              95.0% PI
1         73898       5236   (   62799,   84998)                           (           45518, 102278)

Values of Predictors for New Observations
New Obs     Acres    SqFeet CentralA
1           0.900      1800      1.00

573
a. The regression equation is:
Price = 36045 + 15663*Acres + 10.875*SqFeet + 4181*CentralAir.
The partial regression coefficient for the lot size indicates that, all other variables held constant, an
additional acre of land will add \$15,663 to the selling price. The partial regression coefficient for the
size of the living area indicates that, all other variables held constant, an additional square foot of
living area will add \$10.875 to the selling price. Finally, the partial regression coefficient for the
presence of central air conditioning indicates that, all other variables held constant, the presence of
central air will increase the selling price by \$4181.
b. The estimated selling price for a house sitting on a 0.9 acre lot with 1800 square feet of living area
with central air conditioning is \$73,898.
c. The 95% prediction interval for the selling price of the house described in part b is between \$45,518
and \$102,278.
d. The 95% confidence interval for the mean selling price of all houses like the one in part b is between
\$62,799 and \$84,998.
e. For this problem, the appropriate d.f. = 20 - 3 - 1 = 16. The t-value for a 95% interval with 16 degrees
of freedom is 2.120.
The 95% confidence interval for population partial regression coefficient 1 is:
b1  t s b1 = 15,663  2.120(5716) = 15,663 ± 12,117.92 = (3545.08, 27,780.92)
The 95% confidence interval for population partial regression coefficient 2 is:
b2  t s b 2 = 10.875  2.120(4.959) = 10.875 ± 10.513 = (0.362, 21.388)
The 95% confidence interval for population partial regression coefficient 3 is:
b3  t s b3 = 4181  2.120(5652) = 4181 ± 11,982.24 = (-7801.24, 16,163.24)
f. The partial regression coefficient for Acres is significantly different from zero at the 0.015 level, and
the coefficient for SqFeet significantly differs from zero at the 0.043 level. However, the coefficient
for CentralAir does not differ from zero significantly (p-value = 0.470). The overall regression is
significant at the 0.030 level.
g. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
about zero. Next the normal probability plot is graphed to examine whether the residuals could have
come from a normally distributed population. Finally, the residuals are plotted against each of the
independent variables to check for cyclical patterns.

The histogram below appears to be relatively symmetric.
Histogram of the Residuals
(response is Price)

6

5

4
Frequency

3

2

1

0
-20000    -10000          0              10000   20000
Residual

574
In this Minitab test for normality, the points in the normal probability plot do not deviate excessively
from a straight line and the approximate p-value is shown as > 0.150. Our conclusion is that the residuals
could have come from a normally distributed population.
Probability Plot of RESI1
Normal
99
Mean      -2.03727E-11
StDev            11307
95                                                                      N                   20
KS               0.124
90
P-Value         >0.150
80
70
Percent

60
50
40
30
20

10

5

1
-30000   -20000   -10000      0        10000        20000      30000
RESI1

The plots for residuals versus the independent variables are shown below. None of them reveals any
alarming patterns that suggest the underlying assumptions of the multiple regression analysis may have
been violated. Overall, the residual analysis does not reveal anything to suggest that the assumptions
underlying the multiple regression analysis have been violated.

Residuals versus lot size.
Residuals Versus Acres
(response is Price)
30000

20000

10000
Residual

0

-10000

-20000
0.50      0.75       1.00        1.25          1.50           1.75         2.00
Acres

Residuals versus living area.

575
Residuals Versus SqFeet
(response is Price)
30000

20000

10000
Residual

0

-10000

-20000
1000     1500           2000             2500         3000
SqFeet

Residuals versus central air conditioning.
Residuals Versus CentralAir
(response is Price)
30000

20000

10000
Residual

0

-10000

-20000
0.0   0.2          0.4          0.6          0.8          1.0
CentralAir

The Excel multiple regression solution for the data in this exercise is shown below. Note that Excel
already provides 95% confidence intervals for the population regression coefficients.

576
E                                   F                G                       H               I                 J              K
1    SUMMARY OUTPUT
2
3           Regression Statistics
4    Multiple R                  0.6466
5    R Square                    0.4181
7    Standard Error             12321.1
8    Observations                    20
9
10   ANOVA
11                                             df              SS          MS                           F     Significance F
12   Regression                                      3       1745571591 5.82E+08                       3.8328          0.0304
13   Residual                                       16       2428953909 1.52E+08
14   Total                                          19       4174525500
15
16                                    Coefficients Standard Error              t Stat             P-value           Lower 95%    Upper 95%
17   Intercept                            36045.0       14539.258                 2.479              0.025             5223.179 66866.864
18   Acres                                15662.9        5715.898                 2.740              0.015             3545.743 27780.064
19   SqFeet                                 10.875          4.959                 2.193              0.043                 0.362     21.388
20   CentralAir                             4181.1       5652.117                 0.740              0.470            -7800.861 16163.040

Excel has generated the optional normal probability plot and plots of the residuals against the independent
variables. Their appearance is essentially similar to those of Minitab.
N                 O              P             Q          R           S                  T               U          V          W
1
2                           Normal Probability Plot                                                    Acres Residual Plot
3                  150000                                                                40000
Residuals

4                  100000                                                                20000
Price

5                                                                                            0
6                   50000                                                               -20000
7                       0                                                               -40000
8                                                                                                0.0        0.5     1.0   1.5      2.0   2.5
0       20   40     60     80           100
9                                     Sample Percentile                                                               Acres
10

N                 O              P             Q          R           S                  T               U          V          W
11
12                              SqFeet Residual Plot                                             CentralAir Residual Plot
13                  40000                                                                40000
Residuals
Residuals

14                  20000                                                                20000
15                      0                                                                    0
16                 -20000                                                               -20000
17                 -40000                                                               -40000
18
0       1000        2000     3000   4000                             0                0.5          1         1.5
19                                             SqFeet                                                               CentralAir
20

16.57 p/c/m The estimated selling price of a house occupying a 0.1 acre lot with 100 square feet of living
area and no central air conditioning is \$38,699. This selling price does not seem reasonable. The problem

577
with this estimate arises because the regression equation has been extrapolated far beyond the limits of the
underlying data used to estimate it.

16.58 p/c/m The Minitab printout is shown below.
Regression Analysis: Time versus Age, Gender

The regression equation is Time = 69.5 + 0.110 Age - 12.2 Gender

Predictor         Coef       SE Coef           T         P
Constant         69.49         11.48        6.06     0.000
Age             0.1101        0.2257        0.49     0.635
Gender         -12.186         5.312       -2.29     0.042

S = 8.720        R-Sq = 43.7%        R-Sq(adj) = 33.5%

Analysis of Variance
Source            DF            SS            MS          F         P
Regression         2        649.25        324.63       4.27     0.042
Residual Error    11        836.46         76.04
Total             13       1485.71

Predicted Values for New Observations
New Obs     Fit     SE Fit         95.0% CI                    95.0% PI
1         74.45       3.40   (   66.96,   81.93)       (     53.85,   95.05)

Values of Predictors for New Observations
New Obs       Age    Gender
1            45.0 0.000000

a. The regression equation is: Time = 69.49 + 0.1101*Age - 12.186*Gender.
The partial regression coefficient for Age indicates that, holding the gender constant, an increase of
one year in age will result in an increase of 0.1101 seconds to complete the transaction.
The partial regression coefficient for Gender indicates that, holding the age constant, a male takes
12.186 seconds less to complete his transaction than a female.
b. The estimated time required to complete a transaction by a female customer who is 45 years of age is
74.45 seconds.
c. The 95% prediction interval for the time required by the customer described in part b is 53.85 to 95.05
seconds.
d. The 95% confidence interval for the mean time required by all customers like the one in part b is 66.96
to 81.93 seconds.
e. For this problem, the appropriate d.f. = 14 - 2 - 1 = 11. The t-value for a 95% interval with 11 degrees
of freedom is 2.201.
The 95% confidence interval for population partial regression coefficient 1 is:
b1  t s b1 = 0.1101  2.201(0.2257) = 0.1101 ± 0.4968 = (-0.3867, 0.6069)
The 95% confidence interval for population partial regression coefficient 2 is:
b2  t s b 2 = -12.186  2.201(5.312) = -12.186 ± 11.692 = (-23.878, -0.494)
f. The partial regression coefficient for age does not differ from zero significantly (p-value = 0.635).
However, the coefficient for gender differs significantly from zero at the 0.042 level. The overall
regression is significant at the 0.042 level.
g. The residual analyses follow. First the histogram of the residuals is examined to see if it is symmetric
about zero. Next the normal probability plot is graphed to examine whether the residuals could have
come from a normally distributed population. Finally, the residuals are plotted against each of the
independent variables to check for cyclical patterns.

The histogram shown below does not seem very symmetrical, but the small number of observations could

578
Histogram of the Residuals
(response is Time)

4

3
Frequency

2

1

0
-20     -15        -10        -5        0    5             10
Residual

In this Minitab test for normality, the points in the normal probability plot do not deviate excessively
from a straight line and the approximate p-value is shown as > 0.150. Our conclusion is that the residuals
could have come from a normally distributed population
Probability Plot of RESI1
Normal
99
Mean       -3.55271E-14
StDev             8.021
95                                                          N                    14
KS                0.137
90
P-Value          >0.150
80
70
Percent

60
50
40
30
20

10

5

1
-20         -10           0              10       20
RESI1

The plots for residuals versus the independent variables are shown below. Although the first plot seems to
show more positive residuals for persons in the 40-50 age range, neither of the plots reveals any alarming
patterns that suggest the underlying assumptions of the multiple regression analysis may have been

579
violated. Overall, the residual analysis does not reveal anything to suggest that the assumptions
underlying the multiple regression analysis have been violated.

Residuals versus age of customer.
Residuals Versus Age
(response is Time)
10

5

0
Residual

-5

-10

-15

-20
20         30            40                50      60           70
Age

Residuals versus gender of customer.
Residuals Versus Gender
(response is Time)
10

5

0
Residual

-5

-10

-15

-20
0.0    0.2          0.4               0.6   0.8      1.0
Gender

The Excel multiple regression solution for the data in this exercise is shown below.

580
A                                B                    C            D                      E              F            G
18   SUMMARY OUTPUT
19
20          Regression Statistics
21   Multiple R                   0.6611
22   R Square                     0.4370
24   Standard Error               8.7202
25   Observations                     14
26
27   ANOVA
28                                          df                  SS           MS                      F        Significance F
29   Regression                                    2             649.2501   324.6251                 4.2690            0.0424
30   Residual                                     11             836.4641    76.0422
31   Total                                        13            1485.7143
32
33                                      Coefficients Standard Error         t Stat               P-value       Lower 95%     Upper 95%
34   Intercept                              69.4926         11.4769            6.0550               0.0001          44.2322     94.7530
35   Age                                      0.1101         0.2257            0.4880               0.6351           -0.3866      0.6068
36   Gender                                -12.1858          5.3116           -2.2942               0.0425         -23.8765      -0.4951

Excel has generated the optional normal probability plot and plots of the residuals against the independent
variables. Their appearance is essentially similar to those of Minitab.
J                 K            L               M         N
1
2                              Normal Probability Plot
3                   100
4                    80
Time

60
5
40
6                    20
7                     0
8
0        20      40     60      80         100
9                                       Sample Percentile
10

J                K             L              M         N            O                 P          Q          R         S
12
13                                 Age Residual Plot                                                  Gender Residual Plot
14                   20                                                                     20
Residuals

Residuals

15                   10                                                                     10
0                                                                      0
16
-10                                                                    -10
17
-20                                                                    -20
18                                                                                         -30
-30
19
20       30       40         50       60     70                        0                                       1
20                                               Age                                                             Gender
21

INTEGRATED CASES

581
THORNDIKE SPORTS EQUIPMENT

Ted uses Minitab to generate the printout shown below.
Regression Analysis: Skiers versus Weekend, SnowInch, Temperat

The regression equation is Skiers = 560 + 147 Weekend + 1.42 SnowInch - 1.60 Temperat

Predictor     Coef   SE Coef       T       P
Constant    559.87     76.78    7.29   0.000
Weekend     147.35     51.86    2.84   0.009
SnowInch     1.424     2.696    0.53   0.602
Temperat    -1.604     2.771   -0.58   0.568

S = 125.061    R-Sq = 25.4%     R-Sq(adj) = 16.8%

Analysis of Variance
Source          DF     SS         MS      F        P
Regression       3 138705      46235   2.96    0.051
Residual Error 26 406650       15640
Total           29 545354

Examining the printout, Ted sees that the coefficient for Weekend is significantly different from zero at
the 0.009 level, and the overall regression is significant at the 0.051 level. Overall, only 25.4% of the
variation in daily ski patronage is explained by these independent variables. Perhaps some of the
remaining variation could be at least partially explained by some other variables (e.g., live music or
entertainment, conference attendance, or type of group staging a conference) that Ted has not included in
his analysis.

SPRINGDALE SHOPPING SURVEY

Caution: These exercises include the recoding of two of the variables. If you save the revised data file, do
so using a different filename.

If you are using Minitab, recode as follows:
1. Click Data. Select Code. Click Numeric to Numeric.
2. Enter C26 C28 into the Code data from columns box. Enter C26 C28 into the Into columns box.
Enter 2 into the Original values box. Enter 0 into the New box. Click OK.

If you are using Excel, recode as follows:
1. Click and drag to select cells Z1:Z151. (This highlights the variable name, RESPGEND, and the 150
data values below.) Click Edit. Click Replace.
2. Enter 2 into the Find what box. Enter 0 into the Replace with box. Click Replace All.
3. Repeat steps 1 and 2 for cells AB1:AB151, which contain the variable name, RESPMARI, and the 150
data values below.

1a through 1e, with dependent variable 7, Attitude toward Springdale Mall.

582
Regression Analysis: SPRILIKE versus IMPVARIE, IMPHELP, ...

The regression equation is
SPRILIKE = 2.90 + 0.188 IMPVARIE + 0.0043 IMPHELP + 0.034 RESPGEND
+ 0.191 RESPMARI

Predictor               Coef   SE Coef          T            P
Constant              2.9009    0.2963       9.79        0.000
IMPVARIE             0.18839   0.05383       3.50        0.001
IMPHELP              0.00432   0.04203       0.10        0.918
RESPGEND              0.0341    0.1306       0.26        0.794
RESPMARI              0.1909    0.1232       1.55        0.123

S = 0.738875            R-Sq = 11.9%           R-Sq(adj) = 9.5%

Analysis of Variance
Source           DF      SS                        MS           F           P
Regression        4 10.7125                    2.6781        4.91       0.001
Residual Error 145 79.1608                     0.5459
Total           149 89.8733

1a. The partial regression coefficient for RESPGEND is 0.034. With the variables coded so that
1 = male and 0 = female, this implies that males tend to have an attitude toward Springdale Mall that
is 0.034 points higher than the attitude displayed by females toward this shopping area. However, the
p-value for the test of this partial regression coefficient is 0.794, which is not less than  = 0.05, and
the coefficient does not differ significantly from zero at the 0.05 level of significance. The partial
regression coefficient for IMPVARIE (test p-value = 0.001) is the only one that differs significantly
at the 0.05 level of significance.
1b. The p-value for the strength of the overall relationship is 0.001. This is less than the 0.05 level
specified, so the overall regression equation is significant at the 0.05 level.
1c. The percentage of the variation in y that is explained by the regression equation is 11.9%
(unadjusted). In the ANOVA portion of the printout, this is the Regression sum of squares (10.7125)
divided by the Total sum of squares (89.8733).
1d. Plotting the residuals versus each of the independent variables. In each plot, the residuals seem to be
unrelated to the independent variable, thus supporting the validity of the model.
Residuals Versus IMPVARIE
(response is SPRILIKE)
2

1
Residual

0

-1

-2

1      2        3          4            5          6           7
IMPVARIE

583
Residuals Versus IMPHELP
(response is SPRILIKE)
2

1
Residual

0

-1

-2

1     2             3            4             5         6     7
IMPHELP

Residuals Versus RESPGEND
(response is SPRILIKE)
2

1
Residual

0

-1

-2

0.0       0.2           0.4              0.6       0.8        1.0
RESPGEND

Residuals Versus RESPMARI
(response is SPRILIKE)
2

1
Residual

0

-1

-2

0.0       0.2           0.4           0.6          0.8        1.0
RESPMARI

584
1e. In this Minitab test for normality, the points in the normal probability plot appear to deviate
excessively from a straight line and the approximate p-value is shown as < 0.01. At the 0.05 level of
significance, we would conclude that the residuals could not have come from a normally distributed
population. For this regression analysis, it appears that the assumption of normality of residuals may
have been violated.
Probability Plot of RESI1
Normal
99.9
Mean      -3.12639E-15
StDev           0.7289
99
N                  150
KS               0.129
95                                                          P-Value         <0.010
90
80
70
Percent

60
50
40
30
20
10
5

1

0.1
-2        -1         0           1         2
RESI1

2. Repeating 1a through 1e, with dependent variable 8, Attitude toward Downtown.
Regression Analysis: DOWNLIKE versus IMPVARIE, IMPHELP, ...

The regression equation is
DOWNLIKE = 3.72 + 0.0251 IMPVARIE - 0.0671 IMPHELP + 0.015 RESPGEND
- 0.006 RESPMARI

Predictor                  Coef     SE Coef            T         P
Constant                 3.7211      0.3796         9.80     0.000
IMPVARIE                0.02512     0.06896         0.36     0.716
IMPHELP                -0.06710     0.05384        -1.25     0.215
RESPGEND                 0.0148      0.1673         0.09     0.929
RESPMARI                -0.0057      0.1578        -0.04     0.971

S = 0.946571              R-Sq = 1.2%          R-Sq(adj) = 0.0%

Analysis of Variance
Source           DF       SS                           MS           F          P
Regression        4   1.5205                       0.3801        0.42      0.791
Residual Error 145 129.9195                        0.8960
Total           149 131.4400

2a. The partial regression coefficient for RESPGEND is 0.015. With the variables coded so that
1 = male and 0 = female, this implies that males tend to have an attitude toward Downtown that is
0.015 points higher than the attitude displayed by females toward this shopping area. However, the
p-value for the test of this partial regression coefficient is 0.929, which is not less than  = 0.05, and
the coefficient does not differ significantly from zero at the 0.05 level of significance. In this
regression, none of the partial regression coefficients is significantly different from zero at the 0.05
level of significance.
2b. The p-value for the strength of the overall relationship is 0.791. This is not less than the 0.05 level
specified, so the overall regression equation is not significant at the 0.05 level.
2c. The percentage of the variation in y that is explained by the regression equation is only 1.2%
(unadjusted). In the ANOVA portion of the printout, this is the Regression sum of squares (1.5205)
divided by the Total sum of squares (131.4400).

585
2d. Plotting the residuals versus each of the independent variables. In each plot, the residuals seem to be
unrelated to the independent variable, thus supporting the validity of the model.
Residuals Versus IMPVARIE
(response is DOWNLIKE)
2

1

0
Residual

-1

-2

-3
1     2              3             4            5         6     7
IMPVARIE

Residuals Versus IMPHELP
(response is DOWNLIKE)
2

1

0
Residual

-1

-2

-3
1     2              3            4             5         6     7
IMPHELP

Residuals Versus RESPGEND
(response is DOWNLIKE)
2

1

0
Residual

-1

-2

-3
0.0       0.2            0.4              0.6       0.8        1.0
RESPGEND

586
Residuals Versus RESPMARI
(response is DOWNLIKE)
2

1

0
Residual

-1

-2

-3
0.0          0.2             0.4           0.6          0.8               1.0
RESPMARI

2e. In this Minitab test for normality, the points in the normal probability plot appear to deviate
excessively from a straight line and the approximate p-value is shown as < 0.01. At the 0.05 level of
significance, we would conclude that the residuals could not have come from a normally distributed
population. For this regression analysis, it appears that the assumption of normality of residuals may
have been violated.
Probability Plot of RESI1
Normal
99.9
Mean      -5.62513E-17
StDev           0.9338
99
N                  150
KS               0.148
95                                                                    P-Value         <0.010
90
80
70
Percent

60
50
40
30
20
10
5

1

0.1
-3     -2          -1      0          1       2     3
RESI1

3. Repeating 1a through 1e, with dependent variable 9, Attitude toward West Mall.
Regression Analysis: WESTLIKE versus IMPVARIE, IMPHELP, ...

The regression equation is
WESTLIKE = 3.54 - 0.0906 IMPVARIE + 0.0341 IMPHELP - 0.201 RESPGEND + 0.270 RESPMARI

Predictor                      Coef          SE Coef               T       P
Constant                     3.5398           0.4162            8.51   0.000
IMPVARIE                   -0.09060          0.07560           -1.20   0.233
IMPHELP                     0.03413          0.05903            0.58   0.564
RESPGEND                    -0.2013           0.1834           -1.10   0.274
RESPMARI                     0.2704           0.1730            1.56   0.120

S = 1.03772                      R-Sq = 3.5%              R-Sq(adj) = 0.9%

Analysis of Variance
Source           DF      SS                                       MS      F            P
Regression        4   5.729                                    1.432   1.33        0.262
Residual Error 145 156.144                                     1.077
Total           149 161.873

587
3a. The partial regression coefficient for RESPGEND is -0.2013. With the variables coded as 1 = male
and 0 = female, males tend to have an attitude toward West Mall that is 0.2013 points lower than that
displayed by females. However, p-value = 0.274 is not less than  = 0.05, and the coefficient does
not differ significantly from zero at the 0.05 level of significance. In this regression, none of the
partial regression coefficients differs different from zero at the 0.05 level.
3b. The p-value for the strength of the overall relationship is 0.262. This is not less than the 0.05 level
specified, so the overall regression equation is not significant at the 0.05 level.
3c. The percentage of the variation in y that is explained by the regression equation is only 3.5%
(unadjusted). In the ANOVA portion of the printout, this is the Regression sum of squares (5.729)
divided by the Total sum of squares (161.873).
3d. Plotting the residuals versus each of the independent variables. In each plot, the residuals seem to be
unrelated to the independent variable, thus supporting the validity of the model.

Residuals Versus IMPVARIE
(response is WESTLIKE)

2

1

0
Residual

-1

-2

-3
1   2        3          4            5   6     7
IMPVARIE

Residuals Versus IMPHELP
(response is WESTLIKE)

2

1

0
Residual

-1

-2

-3
1   2        3          4            5   6     7
IMPHELP

588
Residuals Versus RESPGEND
(response is WESTLIKE)

2

1

0
Residual

-1

-2

-3
0.0         0.2             0.4              0.6       0.8               1.0
RESPGEND

Residuals Versus RESPMARI
(response is WESTLIKE)

2

1

0
Residual

-1

-2

-3
0.0         0.2             0.4           0.6          0.8               1.0
RESPMARI

3e. In this Minitab test for normality, the points in the normal probability plot appear to deviate
excessively from a straight line and the approximate p-value is shown as < 0.01. At the 0.05 level of
significance, we would conclude that the residuals could not have come from a normally distributed
population. For this regression analysis, it appears that the assumption of normality of residuals may
have been violated
Probability Plot of RESI1
Normal
99.9
Mean      -1.89478E-16
StDev            1.024
99
N                  150
KS               0.097
95                                                                    P-Value         <0.010
90
80
70
Percent

60
50
40
30
20
10
5

1

0.1
-4     -3   -2     -1      0         1    2         3   4
RESI1

589
4. The four independent variables -- IMPVARIE, IMPHELP, RESPGEND, and RESPMARI -- do a
better job of predicting attitude toward Springdale Mall (R-sq = 11.9%, overall p-value = 0.001) than
for either Downtown (R-sq = 1.2%, p-value = 0.791) or West Mall (R-sq = 3.5%, p-value = 0.262).

EASTON REALTY COMPANY (A)

1. With regard to the two parties claiming their homes were not sold for fair market price by Easton:
a. The selling price of the first home, not located in the Dallas portion of the metroplex, four years
old, and with 2190 square feet, was \$88,500. The selling price for the second home, not located in
the Dallas portion of the metroplex, nine years old, and with 1848 square feet, was \$79,500.
Using Minitab and the EASTON data file, we identify the average selling price for all homes in
the most recent three-month period as well as the average selling price for all homes sold during
each of these three months:
For all homes sold during the most recent three-month period:
Descriptive Statistics: Price
Variable    N   Mean SE Mean      StDev   Minimum      Q1   Median       Q3   Maximum
Price     378 91367       895     17394     51800   78475    89400   102850    137100

For homes sold during each of the most recent three months:

Descriptive Statistics: Price
Variable Month     N   Mean SE     Mean   StDev   Minimum      Q1    Median       Q3
Price     4      131 95649         1456   16661     60400   82200     96200   107000
5      127 90972         1570   17696     58100   78200     88900   102700
6      120 87112         1541   16883     51800   75650     85700    99075

Variable   Month   Maximum
Price      4        137100
5        134100
6        131900

The prices of the two homes in question (\$88,500 and \$79,500) are below the mean price for all
homes sold during the most recent three-month period (\$91,367). However, the homes that are
the subject of the controversy were sold in the most recent month (June, or month code 6), during
which the mean price was just \$87,112 in a declining market -- note the declining mean selling
prices from month 4 through month 6. On this basis, it would not appear that the homes in
question were very much different from the mean price for all homes sold during the most recent
month, and one of them even sold for a higher price than the mean for the most recent month.
b. There are a number of pricing factors that could make the comparison in part (a) unfair.
In considering only the selling price, we are not taking into consideration other factors that could
affect the price of a home. Such factors could include variables such as location, age, size,
number of bedrooms, and many more variables of which real estate agents are well aware.
Regarding location, we will see in our regression in part 2a that homes in Dallas sell for a rather
large premium versus comparable homes sold elsewhere.
c. In making their argument, the complaining sellers are relying heavily on the average selling price
(\$104,250) stated in the article for all homes sold in the area during the previous twelve months
during a weakening housing market. Therein lies the weakest component of their argument. They
sold their houses during the 12th month of a 1-year period during which housing prices in the

590
2. Using multiple regression to estimate Price as a function of SqFeet. Bedrooms, Age, Dallas, and
Easton, we obtain the following Minitab printout for the most recent three months of home sales.
a. Interpreting the partial regression coefficients: On average, the price tends to increase by \$38.6
for each additional square foot of living space, by \$358 for each additional bedroom, and by \$48
for each additional year in age. Also, the price tends to be \$21,282 higher if located in Dallas and
\$132 higher if sold by Easton rather than another realtor. The positive \$132 coefficient for the
Easton variable would appear to undermine accusations by the claimants that Easton has been
engaging in a practice of underpricing its residential properties relative to other real estate
companies. Especially noteworthy is the \$21,282 premium for a home in Dallas versus elsewhere
in the metroplex area, because neither of the disputed homes is located in Dallas.
Regression Analysis: Price versus SqFeet, Bedrooms, Age, Dallas, Easton

The regression equation is
Price = 8309 + 38.6 SqFeet + 358 Bedrooms + 48 Age + 21282 Dallas + 132 Easton

Predictor      Coef   SE Coef       T       P
Constant       8309      2082    3.99   0.000
SqFeet       38.640     1.257   30.73   0.000
Bedrooms      357.8     664.8    0.54   0.591
Age            47.8     152.9    0.31   0.755
Dallas      21281.8     647.4   32.87   0.000
Easton          132      1060    0.12   0.901

S = 6069.24    R-Sq = 88.0%     R-Sq(adj) = 87.8%

Analysis of Variance
Source           DF          SS               MS          F       P
Regression        5 1.00353E+11      20070528157     544.87   0.000
Residual Error 372 13702868976          36835669
Total           377 1.14056E+11

b. In this case, we will consider the fact that each of the homes in dispute was sold during the most
recent month. Thus, the printout below will include only data for the most recent month (June, or
month code 6). For each of the two homes that are the subject of complaints, the printout includes
for each home a point estimate as well as 95% confidence and prediction intervals for homes
having comparable characteristics and being sold by a realtor other than Easton. In the printout
below, note that the “Dallas” predictive variable has been specified as 0 for each of the disputed
homes, because neither is located in Dallas.
Regression Analysis: Price versus SqFeet, Bedrooms, Age, Dallas, Easton

The regression equation is
Price = 3046 + 36.4 SqFeet + 1474 Bedrooms + 445 Age + 21456 Dallas + 624 Easton

Predictor     Coef SE Coef          T      P
Constant      3046     3116      0.98 0.330
SqFeet      36.388    2.085     17.45 0.000
Bedrooms      1474     1091      1.35 0.179
Age          445.0    218.5      2.04 0.044
Dallas     21455.6    991.0     21.65 0.000
Easton         624     1491      0.42 0.677
S = 5044.68   R-Sq = 91.4%      R-Sq(adj) = 91.1%

Analysis of Variance
Source           DF          SS              MS         F         P
Regression        5 31017720744      6203544149    243.77     0.000
Residual Error 114   2901162922        25448798
Total           119 33918883667

Predicted Values for New Observations
New
Obs    Fit SE Fit       95% CI              95% PI

591
1     88938    1277   (86408, 91469)   (78630, 99247)
2     78718     905   (76926, 80510)   (68565, 88871)

Values of Predictors for New Observations
New
Obs SqFeet Bedrooms     Age    Dallas     Easton
1    2190      3.00 4.00 0.000000 0.000000
2    1848      3.00 9.00 0.000000 0.000000

For the home that sold for \$88,500, the point estimate is \$88,938 for the selling price of a
comparable home sold by another realtor. Also, referring to the prediction interval, we have 95%
confidence that a comparable home sold by another realtor would have brought a price within the
interval from \$78,630 to \$99,247. The price for which Easton sold the home is very close to the
point estimate and well within the prediction interval. The point estimate and prediction interval
provide no evidence that would tend to support the complaint being made by this seller.
For the home that sold for \$79,500, the point estimate is \$78,718 for the selling price of a
comparable home sold by another realtor. Also, referring to the prediction interval, we have 95%
confidence that a comparable home sold by another realtor would have brought a price within the
interval from \$68,565 and \$88,871. The price for which Easton sold the home is actually slightly
more than the point estimate and is well within the prediction interval. The point estimate and
prediction interval provide no evidence that would tend to support the complaint being made by
this seller.
c.     In addition to the points made in item 2b, above, it should be noted that the regression equation
that includes only June data shows a partial regression coefficient of +\$624 for the Easton
variable. On average, a home sold during June by Easton sold for \$624 more than a comparable
home sold by another Realtor. This is yet another point that refutes the arguments of the
disgruntled sellers of the two homes in question. Based on the evidence presented above, it would
not seem that Easton is underpricing its residential properties.

CIRCUIT SYSTEMS, INC. (C)

In Chapters 11 and 14, we visited Circuit Systems, Inc., a company that was concerned about the
effectiveness of their new program for reducing the cost of absenteeism among hourly workers.
In this chapter, we will be taking a different approach to analyzing their data.

1. We will first use a multiple regression model to estimate the number of days of sick leave this year as
a function of two variables: days of sick leave taken last year and whether the employee is a
participant in the exercise program. The Minitab printout is shown below.
Regression Analysis: Sick_ThisYr versus Sick_LastYr, Exercise?

The regression equation is Sick_ThisYr = 1.53 + 0.566 Sick_LastYr - 0.955 Exercise?

Predictor          Coef   SE Coef       T        P
Constant         1.5325    0.3529    4.34    0.000
Sick_LastYr     0.56577   0.02439   23.19    0.000
Exercise?       -0.9549    0.2643   -3.61    0.000

S = 1.86447      R-Sq = 70.5%    R-Sq(adj) = 70.3%

Analysis of Variance
Source           DF      SS             MS        F       P
Regression        2 1913.93         956.97   275.29   0.000
Residual Error 230   799.54           3.48
Total           232 2713.47

592
The significance of the overall regression is quite strong, with the p-value displayed as 0.000.
Interpreting the partial regression coefficients in this model: On average, for a 1-day increase in the
number of sick days a person took last year, the model will predict a 0.566-day increase in the
number of sick days taken this year. This would indicate that the program is working in terms of
reducing the number of sick days taken. On average, a person participating in the exercise program
would tend to have 0.955 fewer sick days this year than a person not participating in the exercise
program. Both signs are as we would have expected. On the basis of this regression analysis, the
exercise program is worthy of continuation. However, keep in mind that we are only considering days
of absence, not the total cost associated with absence, which includes the \$200 subsidy for persons
participating in the exercise program.

2. The regression model explains 70.5% of the variation in days of sick leave this year, so 29.5% of the
variation in the number of sick days taken this year is not explained. Some variables that could
probably help explain some of the as-yet unexplained variation are associated with the incentive
package implemented by the company. Possible variables that are not in the database could include
the employee’s level of work satisfaction, age, gender, family size, and length of commute.

593

```
To top