Your Federal Quarterly Tax Payments are due April 15th

# Stat Homework Solution by liaoqinmei

VIEWS: 5 PAGES: 7

• pg 1
```									                          Stat220E Homework 4 Solution

January 20, 2005

1    Problem 2.40, p.125 (15 pts)

(a) 5 pts

regress MD poverty

Source |       SS       df       MS              Number of obs   =       51
-------------+------------------------------           F( 1,     49)   =     1.68
Model | 12248.6108      1 12248.6108            Prob > F        =   0.2013
Residual | 357773.311     49 7301.49614            R-squared       =   0.0331
Total | 370021.922     50 7400.43843            Root MSE        =   85.449

------------------------------------------------------------------------------
MD |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
poverty |   4.583547   3.538867     1.30   0.201    -2.528072    11.69517
_cons |   180.7915   45.48397     3.97   0.000       89.388    272.1949
------------------------------------------------------------------------------

predict yhat

graph twoway scatter MD yhat poverty, connect(i L) clpattern(solid
solid) msymbol(0 i)

. tabstat MD, stats (mean, sd, min, p25, p50, p75, max, iqr)

variable |      mean        sd       min       p25       p50       p75       max       iqr
-------------+--------------------------------------------------------------------------------
MD | 237.6275    86.0258       150       196       221       244       702        48
----------------------------------------------------------------------------------------------

1
800
600
400
200
5         10                  15             20                25
Percent poverty

M.D.’s per 100,000         Fitted values

ˆ
From above result, we get the regression line as M D = 180 + 4.58 poverty. Also, We can see that the slope
is surprisingly positive.
(b) 5 pts
Circle the point.(The point whose MD value is near 700.)
This point has an extremely high MD value compared to other points. Most other points have about 200
or some more as MD value, but this point has over 700. Since least regression tries to minimize the sum of
residuals, this point can cause signiﬁcant changes in the line.
(c) 5 pts

regress MD poverty if MD!=702

Source |       SS       df       MS                           Number of obs        =       50
-------------+------------------------------                        F( 1,     48)        =     2.85
Model | 8421.53931      1 8421.53931                         Prob > F             =   0.0976
Residual | 141645.681     48 2950.95168                         R-squared            =   0.0561
Total |   150067.22    49 3062.59633                         Root MSE             =   54.323

------------------------------------------------------------------------------
MD |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
poverty | -4.175416    2.471639    -1.69   0.098     -9.14498    .7941475
_cons |   279.2884   31.12203     8.97   0.000     216.7134    341.8635
------------------------------------------------------------------------------

predict yhat_DC

graph twoway scatter MD yhat yhat_DC poverty, connect(i L L)
clpattern(solid shortdash solid) msymbol(O i i)
800
600
400
200
5           10                  15                  20                25
Percent poverty

M.D.’s per 100,000              Fitted values
Fitted values

ˆ
After omitting the D.C., we get the regression line as M D = 279 − 4.18 poverty. This is a very big change.
The point was really inﬂuential. Compared with the ﬁrst result, the sign of slope has changed to negative,
as expected.

2    Problem 2.64, p.137 (14 pts)

Let’s generate some data for drawing this plot.Students don’t have to generate this kind of data. Just sketch
a scatter plot that meets the needs and explain.

clear set obs 60 gen year1=exp(1.5+0.35*invnorm(uniform())) gen
year2=exp(2.2+0.16*invnorm(uniform())) gen
twoway scatter business year1, msymbol(plus) xtitle("Education(in
years)") ytitle("Income (\$)") || scatter academic year2,
msymbol(Oh)) (7 pts)
100000
80000
Income(\$)
60000   40000
20000

2   4                6             8            10           12
Education (in years)

A scatter plot may look like this. Economists employed by business ﬁrms form the higher cluster, and
those who are employed by colleges and universities form the lower cluster. We can see a strong positive
correlation within each group(0.9026 for business, 0.9472 for academic) and a negative overall correlation(-
0.2771). Data that has scatter plot like this can have negative correlation although each group has obvious
positive correlation. Looking the scatter plot based on x axis,we can divide the plot into three parts:the left
part, center part, and right part.In the left part, all the points are from business ﬁrms while all points are
from universities in the right part.Also, the mean income of the points in the left part looks bigger than in
the right part.And in the center,the points in both groups exist in about same amount. Since least square re-
gression line minimizes the sum of errors, the slope can have negative value in this situation. Now explaining
this relation in terms of income and education, businesses pay high salaries and employ mostly economists
with bachelor’s degree, while colleges pay lower salaries and employ mostly economists with doctorates, as
illustrated in the plot above. (7 pts)

3     Problem 2.69 and 2.70, p.141 142 (14 pts)

3.1    2.69

7 pts
Intelligence or support from parents may be a lurking variable. More generally, factors that may lead students
to take more math in high school may also may also lead to more success in college.If lurking variables are
present, then requiring students to take algebra and geometry may have little eﬀect on success in college.

3.2    2.70

7 pts
A more plausible explanation is that people use artiﬁcial sweeteners because they are already heavier than
most people. To control their weight, they use artiﬁcial sweeteners instead of sugar.

4     Problem 3.16, p.186 (15 pts)
(a) 5 pts
The sample size = 13147 + 15182 + 1448 = 29777
(b) 5 pts
This is a voluntary response sample. People choose whether to vote in the poll. Therefore, the sample
isn’t randomly chosen. The result might be systematically biased even when the sample size is large.
(c) 5 pts
More men watch male-dominated sports than women. In order to pay them equally, male athletes
would have to be paid less. This might reduce the quality of male athletes. Therefore, men will tend
to vote for ”No”.
5     Problem 3.55 and 3.56, p.205 (14 pts)

5.1   3.55

7 pts
First, we randomly choose one group having 9 patients, which is group 3, and the other 2 groups having 8
patients.

The unbalanced randomized design

Group 1        Treatment 1
8 patients        Level 1

Compare
the concentration
Group 2        Treatment 2           of the drug
Random assignment         8 patients        Level 2          in patients’ blood

Group 3        Treatment 3
9 patients        Level 3
5.2    3.56

7 pts
First, we randomly choose 2 groups having 2 patients, which are {level 1, injection} and {level 2, intravenous
drip}, and the other 7 groups having 3 patients.

The unbalanced randomized design

Group 1                Treatment 1
2 patients            Level 1, Injection

Group 2                Treatment 2
3 patients           Level 1, Skin patch

Group 3                Treatment 3
3 patients        Level 1, Intravenous drip

Group 4                Treatment 4
2 patients            Level 2, Injection

Compare
the concentration
Group 5                Treatment 5                  of the drug
Random assignment          3 patients           Level 2, Skin patch          in patients’ blood

Group 6                Treatment 6
3 patients        Level 2, Intravenous drip

Group 7                Treatment 7
3 patients            Level 3, Injection

Group 8                Treatment 8
3 patients           Level 3, Skin patch

Group 9                Treatment 9
3 patients        Level 3, Intravenous drip
6   Problem 4.12, p.240, (14 pts)
(a) 2 pts
(b) 3 pts
S ={x days|x ∈any nonnegative integer}
(c) 3 pts
S ={A, B, C, D, F}
(d) 3 pts
Let Acceptable be denoted as A, and Unacceptable be denoted as U.
S=
{AAAA, UUUU,
UAAA, AUAA, AAUA, AAAU,
AAUU, UUAA, AUAU, UAUA, AUUA, UAAU,
AUUU, UAUU, UUAU, UUUA}
(e) 3 pts
S ={0, 1, 2, 3, 4}

7   Problem 4.18, p.245 (14 pts)
(a) 4 pts
0.130 + 0.147 + 0.059 + 0.042 + 0.002 + 0.419 + 0.159 + 0.042 = 1
Thus, this is a legitimate probability model.
(b) 5 pts
P = 1 − 0.130 = 0.87

(c) 5 pts
P = 0.147 + 0.042 + 0.002 = 0.191

```
To top