Chapter 12 Homework Solutions
7. a. More.
b. The regression coefficient for Weight, 73.17, indicates the (positive) additional
average cost associated with an increase in weight of one unit. The
measurement units are dollars per unit of weight. Yes it is significant (t = 4.76,
p < 0.0005).
c. No. We had expected that, for tents of a given area, the lighter tents would cost
more (and heavier tents less). Instead we find that heavier tents cost more on
average than lighter ones with the same area.
There may be other factors involved than just area and weight. If heavier tents
are heavier because they have more useful features, then we would expect to see
higher cost for heavier tents. Instead of campers paying extra for weight (which
doesn’t make a lot of sense) they might be paying for extra features.
e. The regression coefficient for Area, –7.517, indicates the (negative) additional
average cost associated with an increase in area of one unit. The measurement
units are dollars per unit of area. Yes it is significant (t = –2.95, p = 0.006).
f. No. We had expected larger tents to cost more, all else equal. Instead, we find
that larger tents cost less on average than smaller ones that weigh the same.
There may be other factors involved than just area and weight. In order to
keep weight constant, it may be necessary to decrease area in order to add useful features
to the tent. Instead of campers paying extra for a smaller tent (which doesn’t make a lot
of sense) they might be paying for extra features.
8. a. Time (Seconds)
0 4 8 12 16 20 24 28
Number of Users
T ime (Seconds)
10 30 50 70
Load (Percent of T ime on Other T asks)
Number of Users
10 30 50 70
Load (Percent of Time on Other Tasks)
All three scatterplots show positive association. These are reasonable
relationships: the more users on a system, the greater the load on the system and
this strain would reasonably lead to slower response time.
The first graph, response time against users, shows some nonlinearity. The
relationship is more steep at the right than at the left. An additional user has a
greater effect on response time when the system is already heavily used than if
there are few users.
The other two graphs show linear structure.
b. The correlation matrix is
Time Users Load
Time 1 0.891624 0.932755
Users 0.891624 1 0.772094
Load 0.932755 0.772094 1
The correlations are all positive and relatively large, reflecting the tilted linear
relationships seen in all three graphs. The correlation between the number of
users and load on the system is the smallest, reflecting the greater random
scatter you see in the graph.
c. Predicted Time = –1.55871 + 0.088416 Users + 0.063974 Load
d. The standard error of estimate, Se, indicates that the response time can be
predicted to within 0.461 seconds.
e. Yes, the F test is significant. The R2 = 0.943 exceeds the table value (0.575) for
10 cases and 2 variables. The F statistic is 57.71 with 2 and 7 degrees of
freedom. (The p-value is 0.000045).
This tells you the prediction equation does explain a significant proportion of
the variation in response time. You can reject the null hypothesis and accept the
research hypothesis that at least one of the X variables has an effect on the
response time, on average, in the population.
f. Yes, both are significant. The t statistics are 2.98 for Users and 4.25 for Load.
Each additional user increases the response time by 0.0884 seconds, on average,
when the load is held steady. Each one percentage point increase in the load
increases the response time by 0.0640 seconds, on average, for a fixed number
of users. (The p-values are 0.020 and 0.004 respectively).
g. Standardized regression coefficients:
Users = (0.088416) 8.168367/1.701270 = 0.4245.
Load = (0.063974) 16.08858/1.701270 = 0.6050.
The standardized regression coefficients suggest that the load has a larger effect
on response time than does the number of users, since the standardized
regression coefficient for load (0.6050) is greater than that for users (0.4245). It
is interesting to note that the raw, unstandardized regression coefficients suggest
the opposite but should not be compared because they are in different
12. a. The coefficient of determination is R2 = 22.4%, giving the percentage of the
variability in salary is explained by company sales and ROE.
b. The regression coefficient for sales (in a multiple regression to predict salary
from sales and ROE) is 38.17, indicating that an extra 100 million in sales
(since sales are reported in millions of dollars) is associated with an increase of
$3,817 in salary. This is statistically significant (the 95% confidence interval for
the regression coefficient, from 15.01 to 61.33 does not include 0; the t statistic
is 3.32, and the p value is 0.0018).
c. The regression coefficient for ROE (in a multiple regression to predict salary
from sales and ROE) is 560,067, indicating that an extra percentage point of
ROE (since ROE is represented in Excel as a proportion from 0 to 1, where a
percentage point is 0.01) is associated with an increase of $5,601 in salary. This
is not statistically significant (the 95% confidence interval for the regression
coefficient, from 477,359 to 1,597,494 includes 0; the t statistic is 1.09, and
the p value is 0.282).
d. The largest salary is $3,461,000 for Kerry Killinger of Washington Mutual. The
largest predicted salary is $2,934,278 for Philip Condit of Boeing (whose actual
salary is $3,337,431). The largest residual is $1,887,197 for Steven Appleton of
Micron Technology (whose actual salary is $3,198,620).
Writing is an exercise for the student; here are some observations. The largest
residual is for a highly paid CEO (at the 95th percentile, overall, for salary) for a
firm with sales at the 79th percentile. It is reasonable that the largest predicted
salary would be at Boeing, which is the company with the highest sales, because
the regression equation predicts salary from sales (and also from ROE, which is
22. a. Predicted Circuit Miles = 101.8127+(0.986053)(Investment).
b. Residual Values
0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1
Predicted Circuit Miles (Billions)
This diagnostic plot shows extreme unequal variability. A transformation may
fix the problem.
c. Investment Natural Miles Natural
AT&T $1,300 7.1701 1,700 7.4384
MCI $500 6.2146 650 6.477
GTE $130 4.8675 110 4.7005
United Telecommunications $2,000 7.6009 1,200 7.0901
Fibertrak $1,200 7.0901 2,400 7.7832
LDX Net $110 4.7005 165 5.1059
Electra Communications $40 3.6889 72 4.2767
Microtel $60 4.0943 45 3.8067
Litel Telecommunications $57 4.0431 85 4.4427
Lightnet $500 6.2146 650 6.477
SoutherNet $90 4.4998 50 3.912
RCI $90 4.4998 87 4.4659
d. Predicted Log of Circuit Miles = 0.068032 + 1.007349 Log of Investment.
e. Residual Values
3.5 4.5 5.5 6.5 7.5
Predicted Logarithm of Circuit Miles
No further corrective action is needed. The graph shows no relationship of the
data values, only random untilted scatter with equal variability.
f. Since both circuit miles and investment are now in natural logs, the regression
coefficient gives the elasticity of Y with respect to X. This is the expected
percentage change in Y associated with a 1% increase in X. The regression
coefficient tells you that for every 1% increase in investment you will expect to
get a 1.007349% increase in circuit miles, on average.
g. The 95% confidence interval extends from 0.7935 to 1.2212, based on the
regression coefficient (1.007349), its standard error (0.095997), and the t table
value (2.228) with n–k–1 = 12–1–1 = 10 degrees of freedom.
h. Yes, because the regression coefficient is positive (1.0073) and its confidence
interval (0.7935 to 1.2212) does not include 0. Alternatively, the t statistic
(10.49) exceeds the t table value (2.228) with 10 degrees of freedom.
i. There are extra expenses per circuit mile estimated for larger projects because
the regression coefficient (1.0073) is larger than 1. There is no indication of
economies of scale.
These estimated extra expenses are not statistically significant because the
reference value of 1 is found in the confidence interval which extends from
0.7935 to 1.2212. We accept the null hypothesis. The regression coefficient is
not significantly different from 1. This is a weak conclusion. We have not
proved that there are no extra expenses, merely that we do not have convincing
evidence of such extra expenses.
26. a. Yes, the F test (tested using R2) is significant; the X variables (sales and industry
group) explain a significant proportion of the variability in CEO salaries. The
coefficient of determination 0.423 exceeds the critical value 0.190 in the R2
table for n = 49 cases and k = 4 explanatory variables:
b. On average, CEO salary increases $14.93 for each million dollars of additional
sales, all else equal. This is the regression coefficient for Sales, which gives you
this effect in thousands of CEO salary dollars per million dollars in sales, times
c. Yes, the t test for the regression coefficient for sales is statistically significant (t
= 4.90). The confidence interval (from 0.008958 to 0.020902) does not include
0 (based on the coefficient 0.014930, its standard error 0.003047, and the t table
value 1.960 with n–k–1 = 49–4–1 = 44 degrees of freedom). Alternatively, the t
statistic is 4.90, which exceeds the t table value (1.960).
In practical terms you conclude that for these types of industries, the chief
executive officers who are helping to generate larger sales tend to receive larger
salaries, on average.
d. We estimate that the CEO of a bank receives $135,550 less (because the
coefficient is negative) than does the CEO of an automotive firm, on average.
This is the proper interpretation of the regression coefficient (–135.550) for an
indicator variable (bank) compared to the baseline (automotive).
e. No (t = 0.77). On average the salary of a banker is not significantly different
from an automotive CEO, adjusting for sales. The confidence interval extends
from –483 to 212 and includes 0, indicating that there is no significant
difference. The t statistic is 0.77 which is smaller than the 1.960 which would be
needed to establish significance.
In practical terms, this says that there is no significant “industry difference”
between automotive firms and banks in the structure of CEO compensation.
There is no evidence here that an automotive firm and a bank, with the same
level of sales, will have any systematic difference in CEO salary. Of course, if
one industry group tends to have higher sales than the other, we would expect
its CEOs to receive higher salaries. But after adjusting for sales, we do not find
an average salary difference.