# chap13_part2 by dharma1001

VIEWS: 1 PAGES: 19

• pg 1
```									                                                            Solutions to End-of-Section and Chapter Review Problems     325

13.73   (h)
E.R.A. Residual Plot

15

10

5
Residuals
0

-5

-10

-15

-20
0         1               2           3           4         5         6
E.R.A.

Based on a visual inspection of the graphs of the distribution of studentized residuals
and the residuals versus E.R.A., there is no pattern. The model appears to be
(i)                       p-value = 7.28222E-05 < 0.05. Reject H0. There is evidence that the fitted linear
regression model is useful.
(j)                       81.5457  Y | X  888.2396
(k)                       68.8964  YI  100.8889
(l)                       21.7459  1  8.4395
(m)                       The “population” might be considered to be all the recent years in which baseball has
been played.
(n)                       Other independent variables might be considered for inclusion in the models are (i)
runs scored, (ii) hits allowed, (iii) walks allowed, (iv) number of errors, etc.

13.74   (a)
Scatter Diagram

80
70
60
Price per Person

50
40
30
20
10
0
0                 10       20       30     40       50      60     70       80   90
Sum of Ratings
326     Chapter 13: Simple Linear Regression

13.74    (b)                 ˆ
Y  24.2468  1.0464X
cont.    (c)                 Since no restaurant will receive a summated rating of 0, it is not meaningful to
interpret b0. For each additional unit of increase in summated rating, the estimated
average price per person will increase by \$1.0464.
(d)                 Y  24.2468  1.0464 50  \$28.07
ˆ
(e)                 SYX = 6.0352.
(f)                 r2 = 0.6584. 65.84% of the variation in price per person can be explained by the
variation in summated rating.
(g)                 r  r 2  0.6584  0.8114 .
(h)
Summated Rating Residual Plot

20

15

10
Residuals

5

0

-5

-10

-15

-20
0     10      20     30      40     50      60     70     80      90
Summated Rating

Based on a visual inspection of the residual plot of summated rating, there may be a
nonlinear relationship between price per person and summated rating. A quadratic
model may be more appropriate for the data.
(i)                 p-value is virtually 0. Reject H0. There is very strong evidence to conclude that there
is a linear relationship between price per persona and summated rating.
(j)                 \$26.55  Y | X  \$29.59
(k)                 \$16.00  YI  \$40.15
(l)                 0.8953  1  1.1975
(m)                 The linear regression model appears to have provided an adequate fit and shown a
significant linear relationship between price per person and summated rating. Since
65.84% of the variation in price per person can be explained by the variation in
summated rating, price per person is moderately useful in predicting the price. Given
the parabolic pattern in the summated rating residual plot, a quadratic regression
model may perform better in predicting price per person using summated rating.
Solutions to End-of-Section and Chapter Review Problems          327

13.75   (a)
Scatter Plot

4500000
4000000
3500000
3000000
Sales

2500000
2000000
1500000
1000000
500000
0
0   10000   20000    30000    40000     50000    60000
Income

(b)
b0  299876.8059             b1  39.1698
Y  299876.8059+39.1698X
(c) Since median family income of customer base cannot be 0, b0 just captures the portion of
the latest one-month sales total that varies with factors other than income.
b1  39.1698 means that as the median family income of customer base increases by
one dollar, the estimated average latest one-month sales total will increase by \$39.17.
(d) SYX  849860.17
(e) r 2  0.1472 . 14.72% of the total variation in the franchise's latest one-month sales total
can be explained by using the median family income of customer base.
(f) r  r  0.3837 . There is not a very strong positive linear relationship between latest
2

one-month sales total and median family income of customer base.
328     Chapter 13: Simple Linear Regression

13.75    (g)
cont.

Income Residual Plot

2000000
Residuals
0
0            10000 20000 30000 40000 50000 60000
-2000000
Income

There is a slight increase in the variance of the residuals at the higher end of the
median family income. In general, however, the assumption of homoscedasticity
seems to be intact.
10                                               100.00%
9                                               90.00%
8                                               80.00%
Frequency

7                                               70.00%
6                                               60.00%        Frequency
5                                               50.00%
4                                               40.00%        Cumulative %
3                                               30.00%
2                                               20.00%
1                                               10.00%
0                                               .00%
-1500000
-500000

1500000
500000

Residual

The histrogram does not suggest severe asymmetry nor abnormal extreme
observations. So the normality assumption also seems to be intact.
(h)     H0 :   0                                 H1 :   0
r
Test statistic: t         2.4926
1 r2
n2
Decision rule: Reject H 0 when |t|>2.0289.
Decision: Since t = 2.4926 is above the upper critical bound 2.4926, reject H 0 .
There is enough evidence to conclude that there is a linear relationship between one-
month sales total and median income of customer base.
(i)     b1  tn2 Sb1  39.1697  2.028115.7143                                               7.2995< <71.0400
Solutions to End-of-Section and Chapter Review Problems            329

13.75   (j)   (a)
cont.

Scatter Plot

5000000

4000000

3000000
Sales

2000000

1000000

0
0      5      10      15         20   25       30        35    40
Age

(b)                 Y  931626.16+21782.76X
(c)                Since median age of customer base cannot be 0, b0 just captures the portion of
the latest one-month sales total that varies with factors other than age.
b1  21782.76 means that as the median age of customer base increases by
one year, the estimated average latest one-month sales total will increase by
\$21782.76.
(d)                 SYX  919492.84
(e)                 r 2  0.0017 . Only 0.17% of the total variation in the franchise's latest one-
month sales total can be explained by using the median age of customer base.
(f)                 r  r 2  0.0413 . There is essentially no linear relationship between latest
one-month sales total and median age of customer base.
(g)

Age Residual Plot

4000000
Residuals

2000000
0
-2000000 0             10            20          30           40
Age

The residuals are very eventually spread out across different range of median
age.
330     Chapter 13: Simple Linear Regression

13.75    (j)    (h)         H0 :   0             H1 :   0
r
Test statistic: t         0.2482
1 r2
n2
Decision rule: Reject H 0 when |t|>2.0289.
Decision: Since t = 0.2482 is below the upper critical bound 2.4926, do not
reject H 0 . There is not enough evidence to conclude that there is a linear
relationship between one-month sales total and median age of customer base.
(i)         b1  tn2 Sb1  21782.76354  2.028187749.63
-156181.50< <199747.02
(k)    (a)
Scatter Diagram

4500000
4000000
3500000
3000000
Sales

2500000
2000000
1500000
1000000
500000
0
0         20          40         60        80         100
HS

There appear to be some positive linear relationship between total sales and
percentage of customer base with high school diploma.
(b)          Y  -2969741.23+59660.09X
(c)         Since the percent of customer base with high school diploma cannot be 0, b0
just captures the portion of the latest one-month sales total that varies with
factors other than HS. b1  59660.09 means that as one more percent of the
customer base have received a high school diploma, the estimated average
latest one-month sales total will increase by \$59660.09.
(d)         SYX  802003.81
(e)         r 2  0.2405 . 24.05% of the total variation in the franchise's latest one-
month sales total can be explained by the percentage of customer base with a
high school diploma.
(f)         r  r 2  0.4904 . There is some positive linear relationship between latest
one-month sales total and percentage of customer base with a high school
diploma.
Solutions to End-of-Section and Chapter Review Problems         331

13.75   (k)   (g)
cont.

HS Residual Plot

2000000

Residuals             0
0     20        40         60        80        100
-2000000
HS

The residual plot suggests there might be a violation of the homoscedasticity
assumption since the variance of the residuals increases as the percentage of
customer base with a high school diploma increases.
(h)                 H0 :   0            H1 :   0
r
Test statistic: t         3.3766
1 r2
n2
Decision rule: Reject H 0 when |t|>2.0289.
Decision: Since t = 3.3766 is above the upper critical bound 2.4926, reject
H 0 . There is enough evidence to conclude that there is a linear relationship
between one-month sales total and percentage of customer base with a high
school diploma.
(i)                 b1  tn2 Sb1  59660.09  2.028117668.885 23825.98< <95494.21
(l)   (a)
Scatter Diagram

4500000
4000000
3500000
3000000
Sales

2500000
2000000
1500000
1000000
500000
0
0         10          20             30        40         50
Collge

There is a positive linear relationship between total sales and percentage of
customer base with a college diploma.
(b)                Y  789847.38+35854.15X
332     Chapter 13: Simple Linear Regression

13.75    (l)    (c)     Since the percent of customer base with college diploma cannot be 0, b0 just
cont.                   captures the portion of the latest one-month sales total that varies with factors
other than College. b1  35854.15 means that as one more percent of the
customer base have received a college diploma, the estimated average latest
one-month sales total will increase by \$35854.15.
(d)     SYX  871329.93
(e)     r 2  0.1036 . 10.36% of the total variation in the franchise's latest one-
month sales total can be explained by the percentage of customer base with a
college diploma.
(f)     r  r 2  0.3218 . There is some positive linear relationship between latest
one-month sales total and percentage of customer base with a college
diploma.
(g)

College Residual Plot

4000000
Residu als

2000000
0
-2000000 0         10     20     30   40      50
College

The residuals are quite evenly spread out around zero even though there
might be a slight tendency for the variance to increase as the percentage of
customer base with a college diploma increases.
(h)     H0 :   0                    H1 :   0
r
Test statistic: t         2.0392
1 r2
n2
Decision rule: Reject H 0 when |t|>2.0289.
Decision: Since t = 2.0392 is above the upper critical bound 2.4926, reject
H 0 . There is enough evidence to conclude that there is a linear relationship
between one-month sales total and percentage of customer base with a
college diploma.
(i)     b1  tn2 Sb1  35854.15  2.028117582.269  195.75< <71512.60
Solutions to End-of-Section and Chapter Review Problems             333

13.75   (m)   (a)
cont.
Scatter Diagram

4500000
4000000
3500000
3000000
Sales

2500000
2000000
1500000
1000000
500000
0
-5            0        5           10        15        20         25
Growth

It is not obvious that there is any linear relationship between total sales and
annual population growth rate of customer base over the past 10 years.
(b)            Y  1595571.48+26833.54X
(c)            b0 =1595571 means the estimated average latest one-month sales total is
\$1595571 when the annual population growth rate of customer base over the
past 10 years is zero. b1  26833.54 means that as the annual population
growth rate increases by 1%, the estimated average latest one-month sales
total will increase by \$26833.54.
(d)            SYX  914466.58
(e)            r 2  0.0126 . Only 1.26% of the total variation in the franchise's latest one-
month sales total can be explained by the annual population growth rate of
customer base over the past 10 years.
(f)            r  r 2  0.1122 . If there is any linear relationship between latest one-
month sales total and annual population growth rate of customer base over
the past 10 years, it will be a very weak positive relationship.
334     Chapter 13: Simple Linear Regression

13.75    (m)    (g)

Growth Residual Plot

4000000

Residu als
2000000
0
-5
-2000000 0        5         10     15   20   25
Growth

There seems to be a diamond shape pattern of the residual distribution and,
hence, a violation of the homoscedasticity assumption. The variance is larger
the closer is the growth rate towards zero.
(h)     H0 :   0                    H1 :   0
r
Test statistic: t         0.6776
1 r2
n2
Decision rule: Reject H 0 when |t|>2.0289.
Decision: Since t = 0.6776 is below the upper critical bound 2.4926, do not
reject H 0 . There is not enough evidence to conclude that there is a linear
relationship between one-month sales total and the annual population growth
rate of customer base over the past 10 years.
(i)     b1  tn2 Sb1  26833.54  2.0281 39601.427
-53481.77< <107148.86
(n)    Percentage of customer base with a high school diploma will be the best predictor for
sales at an individual store location since it provides the highest explanatory power of
24.05% among the four models.
Solutions to End-of-Section and Chapter Review Problems   335

13.76   (a)
Scatter Diagram

100
90
80
70
% Passing    60
50
40
30
20
10
0
88        90        92            94   96        98
% Attendance

There is a very obvious positive relationship between % of students passing the
proficiency test and the daily average of the percentage of students attending class.
(b)   b0  -771.5868                               b1  8.8447
Y  -771.5869+8.8447X
(c)   Since it does not make sense for % attendance to be zero, b0  -771.5868 should be
interpreted as the portion of % passing the proficiency exam that will varies with
factors other than % attendance. b1  8.8447 implies that as daily average
percentage of students attending class increases by 1%, the estimated average
percentage of students passing the ninth-grade proficiency test will increase by
8.8447%.
(d)   SYX  10.5787
(e)   r 2  0.6024 . 60.24% of the total variation in % passing the proficiency test can be
explained by % attendance.
(f)   r  r 2  0.7762 . There is a rather strong positive linear relationship between %
of students passing the proficiency test and daily average of the % of students
attending class.
336     Chapter 13: Simple Linear Regression

13.76    (g)
cont.

% Attendance Residual Plot

50
Residuals
0
88       90          92       94   96      98
-50
% Attendance

The residuals are evenly distributed across difference range of % attendance. There
is no obvious violation of the homoscedasticity assumption.

Histogram of Residuals

16                         100.00%
14
80.00%
12
Frequency

10                         60.00%        Frequency
8
40.00%        Cumulative %
6
4
20.00%
2
0                         .00%
-3
4
11
18
-24
-17
-10

Bin

The distribution of the residuals is left skewed. However, with the exception of 2
extremely negative residuals, the histrogram is not too badly skewed.

(h)     H0 :   0                   H1 :   0
r
Test statistic: t         8.2578
1 r2
n2
Decision rule: Reject H 0 when |t|>2.014.
Decision: Since t = 8.2578 is above the upper critical bound 2.014, reject H 0 . There
is enough evidence to conclude that there is a linear relationship between % passing
and % attendance.

(i)     b1  tn2 Sb1  8.8447  2.01411.0711                 6.6874< <11.0020
Solutions to End-of-Section and Chapter Review Problems      337

13.76   (j)   (a)
cont.
Scatter Diagram

100
90
80
70
% Passing
60
50
40
30
20
10
0
0        10000     20000      30000       40000      50000
Salary

There seems to be a slightly positive relationship between % of students
passing the proficiency test and average teacher salary.
(b)             Y  23.065+0.0011X
(c)             Since it does not make sense for teach salary to be zero, b0  23.065 should
be interpreted as the portion of % passing the proficiency exam that will
varies with factors other than teacher salary. b1  0.0011 implies that as
average teacher salary increase by \$1, the estimated average percentage of
students passing the ninth-grade proficiency test will increase by 0.0011%.
(d)             SYX  16.3755
(e)             r 2  0.0474 . Only 4.74% of the total variation in % passing the proficiency
test can be explained by average teacher salary.
(f)             r  r 2  0.2177 . There seems to be a rather weak positive linear
relationship between % of students passing the proficiency test and average
teacher salary.
338     Chapter 13: Simple Linear Regression

13.76    (j)    (g)
cont.

Salary Residual Plot

50
Residuals

0
0       10000      20000       30000   40000   50000
-50
Salary

The residuals are evenly distributed across difference range of % attendance. There
is no obvious violation of the homoscedasticity assumption.

Histogram of Residuals

15                               100.00%
80.00%
Frequency

10                               60.00%        Frequency
40.00%        Cumulative %
5
20.00%
0                               .00%
5
25
-35
-15

Bin

The distribution of the residuals is slightly left skewed but not too far from a normal
distribution.

(h)                H0 :   0              H1 :   0
r
Test statistic: t         1.496
1 r2
n2
Decision rule: Reject H 0 when |t|>2.014.
Decision: Since t = 1.496 is below the upper critical bound 2.014, do not
reject H 0 . There is not enough evidence to conclude that there is a linear
relationship between % passing and average teacher salary
(i)                b1  tn2 Sb1  0.0011  2.0141 0.00073           -0.000375< <0.002542
Solutions to End-of-Section and Chapter Review Problems      339

13.76   (k)   (a)
Scatter Diagram

100
90
80
70
% Passing       60
50
40
30
20
10
0
0              1000           2000      3000       4000
Spending

There seems to be a slightly positive relationship between % of students
passing the proficiency test and spending per pupil.
(b)                        Y  35.7843+0.0109X
(c)                        Since it does not make sense for spending per pupil to be zero, b0  35.7843
should be interpreted as the portion of % passing the proficiency exam that
will varies with factors other than spending. b1  0.0109 implies that as
spending per pupil increase by \$1, the estimated average percentage of
students passing the ninth-grade proficiency test will increase by 0.019%.
(d)                        SYX  15.9984
(e)                        r 2  0.0907 . Only 9.07% of the total variation in % passing the proficiency
test can be explained by spending per pupil.
(f)                        r  r 2  0.3012 . There seems to be a rather weak positive linear
relationship between % of students passing the proficiency test and spending
per pupil.

Spending Residual Plot

50
Residuals

0
0          1000          2000      3000     4000
-50
Spending

The residuals are evenly distributed across difference range of % attendance.
There is no obvious violation of the homoscedasticity assumption.
340     Chapter 13: Simple Linear Regression

13.76    (k)    (g)
cont.
Histogram of Residuals

15                 100.00%

Frequency
80.00%
10                 60.00%     Frequency
5                  40.00%     Cumulative %
20.00%
0                  .00%

-5
-35
-25
-15

5
15
25
Bin

The distribution of the residuals is slightly left skewed. Excluding the largest
negative residual, the histrogram is quite symmetric.
(h)     H0 :   0                 H1 :   0
r
Test statistic: t         2.1192
1 r2
n2
Decision rule: Reject H 0 when |t|>2.014.
Decision: Since t = 2.1192 is above the upper critical bound 2.014, reject
H 0 . There is enough evidence to conclude that there is a linear relationship
between % passing and spending.
(i)     b1  tn2 Sb1  0.0109  2.0141 0.0052                 0.00054< <0.02129
(l)    The model with % attendance is the best model to use for predicting % passing since
it has the highest R-square of 60.24%.

13.77    (a)     Y  0.2141+0.4738X
(b)    For Exxon, the estimated value of its stock will increase by 0.47% on average when the
S&P 500 index increases by 1%.
(c)    (a)     Y  0.6064+0.3607X
(b)     For Mobil Oil, the estimated value of its stock will increase by 0.36% on
average when the S&P 500 index increases by 1%.
(d)    (a)     Y  -0.2101+0.2068X
(b)     For International Aluminum, the estimated value of its stock will increase by
0.21% on average when the S&P 500 index increases by 1%.
(e)    (a)     Y  -0.4182+0.5131X
(b)     For Sears, the estimated value of its stock will increase by 0.51% on average
when the S&P 500 index increases by 1%.
(f)    (a)     Y  -0.4471+1.4286X
(b)     For BancOne Corporation, the estimated value of its stock will increase by
1.43% on average when the S&P 500 index increases by 1%.
(g)    (a)     Y  0.0361+0.9430X
(b)     For General Motors, the estimated value of its stock will increase by 0.94%
on average when the S&P 500 index increases by 1%.
Solutions to End-of-Section and Chapter Review Problems              341

13.78   (a)

GM       Ford      IAL             HCR
GM                  1
Ford        0.856954         1
IAL         -0.06409 0.194917         1
HCR           -0.4403 -0.08868 0.598569                 1
(b)    There is a strong positive correlation of r = 0.8569 between the stock prices of GM
and Ford, an also relatively strong positive correlation of r = 0.5986 between the
prices of IAL and HCR, a moderately negative correlation of r = -0.4403 between the
prices of GM and HCR, a weak positive correlation of r = 0.1949 between the prices
of Ford and IAL, a very weak negative correlation of r = -0.08868 between the prices
of Ford and HCR, and also a very weak negative correlation of r = -0.06408 between
GM and IAL.
(c)    It is not a good idea to have all the stocks in an individual's portfolio be strongly
positively correlated for that will increase the variance, a measure of risk, of the
portfolio. Some negatively correlated stock prices in a portfolio can reduce the
combined variance, though at the price of reduced combined expected return.

SSXY
13.79   (a)    r                   0.2196
SSX SSY
(b)    H0 :                    H1 :   0
r
Test statistic: t          = -1.7146
1 r2
n2
Decision rule: Reject H 0 if the p-value is less than the level of significance
  0.05 .
Decision: Since the p value of 0.0918 is greater than the level of significance, do not
reject the null hypothesis. There is not enough evidence to conclude that there is a
linear relationship between the two variables.
(c)    The 10% level of significance is greater than the p value of 0.0918 and, hence, the
null hypothesis will be rejected. There is enough evidence at the 10% level of
significance to conclude that there is a linear relationship between the two variables.
342   Chapter 13: Simple Linear Regression

The Springville Herald Case

SH13.1        The method was placing too much emphasis on the last two time periods instead of
considering a large amount of data that was available. This overemphasis would tend
to lead to large fluctuations in the forecast accuracy when the trend differed from
what had occurred in the last two months.

SH13.2        There are many factors that might be considered other than the number of outlets.
Among them are:
1. The amount of newspaper advertising for new subscriptions.
3. Whether or not special promotional campaigns were being used in a particular
month.
4. The average experience of the telemarketers used in a given month.
5. The training programs provided to the telemarketers in a given month.

SH13.3 (a)    The statistical model initially fit to the data is a simple linear model that predicts
New Subscriptions = -413.82 + 4.40795 Telemarketing Hours

The Y intercept (b0) equal to -413.82, has no direct interpretation since sales of
below zero new subscriptions is not feasible. We can say however, that -413.82
represents a constant portion of the predicted average new subscriptions that varies
with factors other than the number of telemarketing hours.

The slope (b1) equal to 4.40795, can be interpreted to mean that for each increase of
one telemarketing hour, average new subscriptions are predicted to increase by
4.40795 per month.

The r2 value of .8617 can be interpreted to mean that 86.17% of the variation in new
subscriptions can be explained by variation in the number of telemarketing hours
from month to month.

Since the data was collected for 24 consecutive months, it is important for us to
determine whether there is any autocorrelation among the residuals. The Durbin-Watson
D statistic of 1.7009 is > 1.45, the upper critical value for n = 24 and   0.05 . This
leads us to believe that there isn't any positive autocorrelation among the residuals. We
could also examine the plot of the standardized residuals over time to see whether a
pattern existed. In this case this plot does not seem to indicate any evidence of positive
association among consecutive residuals.

Before moving further with any predictions based on the model, we need to
determine whether the simple linear model was appropriate for these data The
residual plot of the standardized residuals versus the fitted Y values (or the X values)
indicates no evidence of any pattern. Thus, we may conclude that the simple linear
model was appropriate for these data.
Solutions to End-of-Section and Chapter Review Problems            343

SH13.3 (a)
cont.        We may examine the validity of the assumption of normality among the residuals by
studying the normal probability plot. If the points in this plot fall on an approximate
straight line, there would be no reason to suspect serious departure from normality.
Since that seems to be the case here, the normality assumption does not appear to
have been seriously violated.
(b)   For X = 1,000, the predicted value of
New Subscriptions = -413.82 + 4.40795 (1000) = 3,994.1

(c)   We note that the value of 2,000 for is beyond the range of our X values. Thus, we
would be assuming that the regression model fit for a range of 704 to 1,498
telemarketing hours would be valid for a month in which the telemarketing hours was
2,000. This represents a situation which is well beyond the range of the number of
telemarketing hours that we have used in the past 24 months. Many things could
change with this increase of telemarketing hours, and the type of increase in new
subscriptions that we have had with expanding telemarketing hours may not
continue when we further expand by another 25% above the maximum amount that
we have used in the past.

```
To top