Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Stat Homework Solution by liaoqinmei

VIEWS: 5 PAGES: 7

									                          Stat220E Homework 4 Solution



                                        January 20, 2005



1    Problem 2.40, p.125 (15 pts)

(a) 5 pts


regress MD poverty

      Source |       SS       df       MS              Number of obs   =       51
-------------+------------------------------           F( 1,     49)   =     1.68
       Model | 12248.6108      1 12248.6108            Prob > F        =   0.2013
    Residual | 357773.311     49 7301.49614            R-squared       =   0.0331
-------------+------------------------------           Adj R-squared   =   0.0134
       Total | 370021.922     50 7400.43843            Root MSE        =   85.449

------------------------------------------------------------------------------
          MD |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     poverty |   4.583547   3.538867     1.30   0.201    -2.528072    11.69517
       _cons |   180.7915   45.48397     3.97   0.000       89.388    272.1949
------------------------------------------------------------------------------

predict yhat

graph twoway scatter MD yhat poverty, connect(i L) clpattern(solid
solid) msymbol(0 i)

. tabstat MD, stats (mean, sd, min, p25, p50, p75, max, iqr)

    variable |      mean        sd       min       p25       p50       p75       max       iqr
-------------+--------------------------------------------------------------------------------
          MD | 237.6275    86.0258       150       196       221       244       702        48
----------------------------------------------------------------------------------------------




                                                 1
                         800
                         600
                         400
                         200
                               5         10                  15             20                25
                                                       Percent poverty

                                              M.D.’s per 100,000         Fitted values




                                                   ˆ
From above result, we get the regression line as M D = 180 + 4.58 poverty. Also, We can see that the slope
is surprisingly positive.
(b) 5 pts
Circle the point.(The point whose MD value is near 700.)
This point has an extremely high MD value compared to other points. Most other points have about 200
or some more as MD value, but this point has over 700. Since least regression tries to minimize the sum of
residuals, this point can cause significant changes in the line.
(c) 5 pts


 regress MD poverty if MD!=702

      Source |       SS       df       MS                           Number of obs        =       50
-------------+------------------------------                        F( 1,     48)        =     2.85
       Model | 8421.53931      1 8421.53931                         Prob > F             =   0.0976
    Residual | 141645.681     48 2950.95168                         R-squared            =   0.0561
-------------+------------------------------                        Adj R-squared        =   0.0365
       Total |   150067.22    49 3062.59633                         Root MSE             =   54.323

------------------------------------------------------------------------------
          MD |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     poverty | -4.175416    2.471639    -1.69   0.098     -9.14498    .7941475
       _cons |   279.2884   31.12203     8.97   0.000     216.7134    341.8635
------------------------------------------------------------------------------

predict yhat_DC

graph twoway scatter MD yhat yhat_DC poverty, connect(i L L)
clpattern(solid shortdash solid) msymbol(O i i)
                          800
                          600
                          400
                          200
                                            5           10                  15                  20                25
                                                                      Percent poverty

                                                             M.D.’s per 100,000              Fitted values
                                                             Fitted values




                                                        ˆ
After omitting the D.C., we get the regression line as M D = 279 − 4.18 poverty. This is a very big change.
The point was really influential. Compared with the first result, the sign of slope has changed to negative,
as expected.



2    Problem 2.64, p.137 (14 pts)

Let’s generate some data for drawing this plot.Students don’t have to generate this kind of data. Just sketch
a scatter plot that meets the needs and explain.


clear set obs 60 gen year1=exp(1.5+0.35*invnorm(uniform())) gen
business=70000+8000*((year1-5)+0.85*invnorm(uniform())) gen
year2=exp(2.2+0.16*invnorm(uniform())) gen
academic=30000+4000*((year2-7)+0.6*invnorm(uniform())) graph
twoway scatter business year1, msymbol(plus) xtitle("Education(in
years)") ytitle("Income ($)") || scatter academic year2,
msymbol(Oh)) (7 pts)
                                  100000
                                    80000
                          Income($)
                          60000   40000
                                  20000




                                                2   4                6             8            10           12
                                                                     Education (in years)

                                                                    business      academic
    A scatter plot may look like this. Economists employed by business firms form the higher cluster, and
those who are employed by colleges and universities form the lower cluster. We can see a strong positive
correlation within each group(0.9026 for business, 0.9472 for academic) and a negative overall correlation(-
0.2771). Data that has scatter plot like this can have negative correlation although each group has obvious
positive correlation. Looking the scatter plot based on x axis,we can divide the plot into three parts:the left
part, center part, and right part.In the left part, all the points are from business firms while all points are
from universities in the right part.Also, the mean income of the points in the left part looks bigger than in
the right part.And in the center,the points in both groups exist in about same amount. Since least square re-
gression line minimizes the sum of errors, the slope can have negative value in this situation. Now explaining
this relation in terms of income and education, businesses pay high salaries and employ mostly economists
with bachelor’s degree, while colleges pay lower salaries and employ mostly economists with doctorates, as
illustrated in the plot above. (7 pts)




3     Problem 2.69 and 2.70, p.141 142 (14 pts)

3.1    2.69

7 pts
Intelligence or support from parents may be a lurking variable. More generally, factors that may lead students
to take more math in high school may also may also lead to more success in college.If lurking variables are
present, then requiring students to take algebra and geometry may have little effect on success in college.


3.2    2.70

7 pts
A more plausible explanation is that people use artificial sweeteners because they are already heavier than
most people. To control their weight, they use artificial sweeteners instead of sugar.




4     Problem 3.16, p.186 (15 pts)
 (a) 5 pts
     The sample size = 13147 + 15182 + 1448 = 29777
 (b) 5 pts
     This is a voluntary response sample. People choose whether to vote in the poll. Therefore, the sample
     isn’t randomly chosen. The result might be systematically biased even when the sample size is large.
 (c) 5 pts
     More men watch male-dominated sports than women. In order to pay them equally, male athletes
     would have to be paid less. This might reduce the quality of male athletes. Therefore, men will tend
     to vote for ”No”.
5     Problem 3.55 and 3.56, p.205 (14 pts)

5.1   3.55

7 pts
First, we randomly choose one group having 9 patients, which is group 3, and the other 2 groups having 8
patients.


                                  The unbalanced randomized design

                                        Group 1        Treatment 1
                                       8 patients        Level 1


                                                                              Compare
                                                                          the concentration
                                        Group 2        Treatment 2           of the drug
             Random assignment         8 patients        Level 2          in patients’ blood


                                        Group 3        Treatment 3
                                       9 patients        Level 3
5.2    3.56

7 pts
First, we randomly choose 2 groups having 2 patients, which are {level 1, injection} and {level 2, intravenous
drip}, and the other 7 groups having 3 patients.


                                    The unbalanced randomized design

                                   Group 1                Treatment 1
                                  2 patients            Level 1, Injection


                                   Group 2                Treatment 2
                                  3 patients           Level 1, Skin patch


                                   Group 3                Treatment 3
                                  3 patients        Level 1, Intravenous drip


                                   Group 4                Treatment 4
                                  2 patients            Level 2, Injection


                                                                                        Compare
                                                                                    the concentration
                                   Group 5                Treatment 5                  of the drug
       Random assignment          3 patients           Level 2, Skin patch          in patients’ blood


                                   Group 6                Treatment 6
                                  3 patients        Level 2, Intravenous drip


                                   Group 7                Treatment 7
                                  3 patients            Level 3, Injection


                                   Group 8                Treatment 8
                                  3 patients           Level 3, Skin patch


                                   Group 9                Treatment 9
                                  3 patients        Level 3, Intravenous drip
6   Problem 4.12, p.240, (14 pts)
(a) 2 pts
    S ={In business, Closed}
(b) 3 pts
    S ={x days|x ∈any nonnegative integer}
(c) 3 pts
    S ={A, B, C, D, F}
(d) 3 pts
    Let Acceptable be denoted as A, and Unacceptable be denoted as U.
    S=
    {AAAA, UUUU,
    UAAA, AUAA, AAUA, AAAU,
    AAUU, UUAA, AUAU, UAUA, AUUA, UAAU,
    AUUU, UAUU, UUAU, UUUA}
(e) 3 pts
    S ={0, 1, 2, 3, 4}



7   Problem 4.18, p.245 (14 pts)
(a) 4 pts
    0.130 + 0.147 + 0.059 + 0.042 + 0.002 + 0.419 + 0.159 + 0.042 = 1
    Thus, this is a legitimate probability model.
(b) 5 pts
    P = 1 − 0.130 = 0.87

(c) 5 pts
    P = 0.147 + 0.042 + 0.002 = 0.191

								
To top