exam2 105 s04 sol by we9mj6AB

VIEWS: 26 PAGES: 8

									                                                    Name:




Statistics                                                                    Davidson College
Economics 105, Jan-May 2004                                                      Mark C. Foley




                                      Review # 2
                                      Suggested Solutions




Directions:    This review is closed-book, closed-notes (except for your formula sheet) to be
taken in one sitting not to exceed 4 hours. You may use a calculator and/or Excel. Perform
your calculations to 3 decimal places, where necessary.


There are 100 points on the exam. Each problem is worth 25 points.

You must show all your work to receive full credit.        Any assumptions you make and
intermediate steps should be clearly indicated. Do not simply write down a final answer to the
problems without an explanation.

Please turn in your formula sheet with your exam.

Think clearly and work efficiently.




Honor Pledge

Start time

End time
Problem 1

The government of Japan claims that average life expectancy of all its citizens’ is 83 years. You
travel to Japan, visit cemeteries and randomly find the records for eight deceased individuals,
and collect the following data on age at death: 82 77 85 76 81 91 70 82
The government is worried about advertising a life expectancy that is too high, and so wonders
if the data provide evidence that the population life expectancy is less than 83.

(a) Write down the null hypothesis and appropriate alternative hypothesis.
X = life expectancy
  X  population mean life expectancy

H 0 :  X  83
H1 :  X  83


(b) Stating any assumptions, at the 5% significance level, perform the hypothesis test on the
sample statistic scale.

                            p-value

                       .05


                                        x =80.5 83                          x
                                  xa
                                                                           t7
                              -1.895 t 7 =-1.122 0


         n

       x      2
               i    nx 2
                 52120  8( 644 ) 2
s 
 2      i 1
                            8
                                      39.714 , s X  6.302
       n 1
 X
                       7
                       x  X                        x  83
We know that  1.895  a            . So  1.895  a           xa  78.777
                        sX                          6.302
                              n                             8

So using the sample statistic scale, we fail to reject the null hypothesis because 80.5 is greater
than 78.777 (i.e., the sample mean is not in the rejection region, < 78.777).

Finally, to use the t-distribution, we must assume that X is distributed normally in the population.



(c) At the 5% significance level, perform the hypothesis test on the test statistic scale. Maintain
your assumptions from (b).

On the test statistic scale, we compare the t-statistic to the critical value of –1.895.
       80.5  83
t7               1.122 . Since –1.122 > -1.895, we again fail to reject the null at the 5% level.
       6.302
               8
(d) Place a range on the p-value using the statistical tables and, using the 5% significance level,
perform the hypothesis test using the p-value. Maintain your assumptions from (b).

The p-value is greater than .05 since we failed to reject the null at the 5% level. We also know
that it is greater than .10 since t 7 ,.10  1.415 and thus 10% of the area under a t 7 distribution is
to the left of –1.415. So because our test statistic of –1.122 > -1.415, the p-value is greater
than .10.

Excel tells me that the exact p-value is .1494.    We again reject the null hypothesis since the p-
value exceeds the significance level (.05).
Problem 2

(a) Let W be a linear combination of random variables X and Y where W  gX  hY and
g and h are constants. Prove that the  W  g 2 X  h 2 Y  2 gh XY  X  Y , where  XY is the
                                        2        2        2


correlation between X and Y .

 W  E[(W  W ) 2 ]  E[(gX  hY  ( g X  h Y ))2 ]  E[((g ( X   X )  h(Y   Y ))2 ] 
  2


E[ g 2 ( X   X ) 2  h 2 (Y   Y ) 2  2 gh( X   X )(Y   Y )]
Now bring the expectation operator through since the E[sum of stuff] = sum of expected value
of each individually,
= E[ g 2 ( X   X ) 2 ]  E[h 2 (Y   Y ) 2 ]  E[2 gh( X   X )(Y   Y )]
Bring out the constants,
= g 2 E[( X   X ) 2 ]  h 2 E[(Y   Y ) 2 ]  2 ghE[( X   X )(Y   Y )]
Rewriting the definitions,
= g 2 X  h 2 Y  2 ghCov( X , Y )
       2         2


And finally, rewriting the definition of Cov(X,Y),
= g 2 X  h 2 Y  2 gh XY  X  Y
       2         2







(b) A researcher suspected that the number of between-meal snacks eaten by students in a day
during final exams might depend on the number of tests a student had to take on that day. The
table below shows the joint probabilities, estimated from a survey. Calculate Var[2X - 3Y].

                                                    Number of Tests (X)
  Number of                  0              1          2           3         Marginal Prob (Y)
  Snacks (Y)
       0                    .07           .06             0             0           .13
       1                    .07            0              0            .03          .10
       2                    .06           .20            .14           .07          .47
       3                    .02            0             .16           .12          .30
Marginal prob (X)           .22           .26            .30           .22           1

E[X] = 0(.22) + 1(.26) + 2(.30) + 3(.22) = 1.52
E[Y] = 0(.13) + 1(.10) + 2(.47) + 3(.30) = 1.94
Cov (X,Y) = E[XY] - E[X]*E[Y] = 1*3*(.03) + 2*1*(.20) + 2*2*(.14) + 2*3*(.07) + 3*2*(.16) +
3*3*(.12) - (1.52*1.94) = 3.51 – 2.9448 = .5612
Var(Y) = E[Y2] – (E[Y])2 = 12(.10) + 22(.47) + 32(.30) – 1.942 = .9164
Var(X) = E[X2] – (E[X])2 = 12(.26) + 22(.30) + 32(.22) – 1.522 = 1.1296



Var[2X - 3Y]     = 22 *Var [X] + 32 *Var[Y] - 2*2*3Cov(X,Y)
                 = 4*1.1296 + 9*.9164 - 12*.5612
                 = 6.0316
Problem 3

(a) You are told that a 95% confidence interval for the population mean is 17.3 to 24.5. If the
population standard deviation is 18.2, how large was the sample?

                 X
X  z1 
             2
                   n
        X                                 1.96 (18 .2)
z1              (24 .5  17 .3) / 2                    3.6  n  98
    2
         n                                       n


(b) Interpret the confidence interval in part (a). Be specific.

In repeated samples, we expect 95% of the intervals to contain the true population mean.


(c) Using a properly labeled diagram (and without doing more calculations), explain how and
why the confidence interval in part (a) would change if the significance level changed to 1%.

                                                     f (x )


                                                                    .025
                                                                           .005


                                                X                                x
                                                                                  z
                         -2.57 -1.96             0             1.96 2.57




If the significance level changed to 1%, then it would be a 99% confidence interval, and it
would be wider than the one in part (a) because the z-value would be bigger (2.57 vs.
1.96). That is, more of the sampling distribution would be within the 99% CI.
(d) Using a properly labeled diagram (and without doing more calculations), explain how and
why the confidence interval in part (a) would change if the sample size were now 36.


                                       f (x )



                                                     .025     .025


                                                                         x
                                                                        z
                   -1.96               0            1.96
                                                       1.96              z
               -1.96


The standard error,
                      X       , increases as n decreases, so the sampling distribution has
                           n
greater variance, as in the darker p.d.f. above. With a 95% confidence interval still, the
value above which 2.5% of the probability lies is still 1.96 (and on the left side, it’s –1.96),
but with a larger standard error, a 95% confidence interval is now wider.
Problem 4

Let W ~ N (2,1) and Z ~ N (0,1) .

(a) Calculate the probability that W is greater than -1 and also the probability that Z is greater
than -1. Do two separate calculations.

fW ( w  1)  1  FW (1)  1  FZ ( 112 )  1  FZ (3)  1  (1  FZ (3))  .9987

f Z ( z  1)  1  FZ (1)  1  (1  FZ (1))  .8413

(b) Using a properly-labeled diagram and the appropriate z-values, clearly explain why the
answer to part (a) is different for Z than for W.

                                           f(z)               f(w)




                                                                                   w, z
                            -1             0                  2
                                                                                   z
                            -1             0
                            -3                                                     zw
                                                              0

-1 is both absolutely farther from the mean of W than from the mean of Z and relatively
farther (1 standard deviations compared to 3 standard deviations). Relative distances
determine the probabilities.

(c) The number of households ordering the pay-per-view movie Finding Nemo is normally
distributed. Twenty percent of the time fewer than 20,000 households order the movie. Only
ten percent of the time more than 28,000 households order. What are the mean and standard
deviation of the number of households ordering Finding Nemo?

P(X<20000) = .2                  and     P(X>28000) = .10
P(z<(20000-m)/k) = .2            and P(z>(28000-m)/k) = .10
(20000-m)/k = -.84 and (28000-m)/k = 1.28
Two equations, two unknowns.
2.12 k = 8000 or k = 3774
then m = 23,170
if you linearly interpolated (I didn’t make it clear to do so this time):
FZ (.84)  .7995
                    28 “steps” and we want to take 5 of them, so .84 + 5/28(.85-.84) = .84179 =
FZ (.85)  .8023
.842
FZ (1.28)  .8997
                     18 “steps” and we want to take 3 of them, so 1.28 + 3/18(1.29-1.28) =
FZ (1.29)  .9015
1.28167 = 1.282
Problem 5

(a) State, precisely, the Central Limit Theorem.

Roughly speaking it says that sample means are eventually distributed normally. That is, the
sampling distribution of the sample mean follows a normal distribution if the sample size is big
enough (>30).

More precisely, if X1, X2, …, Xn are i.i.d. observations from a population with ANY distribution
                                                X  X
having mean  X and variance  X , then
                               2
                                                         ~ N (0,1) if n is large (> 30), or equivalently,
                                                 X
                                                   n
                        2
then X ~ N (  X ,       X
                             ) if n is large.
                     n



(b) Using a graph for the population distribution and at least one graph for the sampling
distribution, illustrate and explain the central limit theorem. Label all axes and curves.

								
To top