Business Statistics 41000 Autumn 2006 Final Exam

Document Sample
Business Statistics 41000 Autumn 2006 Final Exam Powered By Docstoc
					                         Business Statistics 41000
                         Autumn 2006 Final Exam


            Name ____________Solutions_____________________
                                (please print)


    DO NOT TURN THIS PAGE OVER UNTIL YOU ARE TOLD TO DO SO.


      You have 3 hours to complete the exam. When time is called please stop
      writing immediately.

      The layout of the exam, including the number of questions and the point
      value of each question, is on the next page. Unless otherwise indicated,
      each part of each question is worth 2 points.

      You may use a calculator and two 8.5 by 11 inch “cheat sheets”. No other
      reference materials are allowed.

      Please show your work and clearly indicate your answer in the space
      provided. You may be awarded partial credit in case of arithmetic errors or
      incomplete answers, but only if your work is legible. Unsupported answers
      (e.g., just writing “fail to reject”) receive zero credit.



Students in my class are required to adhere to the standards of conduct in the GSB
Honor Code and the GSB Standards of Scholarship. The GSB Honor Code also
requires students to sign the following GSB Honor pledge:

I pledge my honor that I have not violated the Honor Code during this examination. I
further understand that discussing the contents of this exam with anyone prior to all
students completing the exam is a violation of the Honor Code.



Sign here to acknowledge: _____________________________________
There are 8 questions.




Question 1, 10 parts, 20 points                         _____

Question 2, 9 parts, 18 points                          _____

Question 3, 6 parts, 12 points                          _____

Question 4, 4 parts, 8 points                           _____

Question 5, 10 T/F questions, 10 points                 _____

Question 6, 10 parts, 21 points                         _____

Question 7, 10 parts, 19 points                         _____

Question 8, 6 parts, 12 points                          _____




Total        120 points




  Summary statistics:

                Mean             83       Note to Autumn 2006 students: Because this
                Median           86       was a challenging exam, the graders asked (and I
                                          approved) granting of ½-point partial credit for
                Std. Dev.        16       certain questions. Actual credit assigned may
                                          therefore deviate slightly from the point allocations
                75th %tile       95       indicated in these solutions.
                25th %tile       71
Question 1


Below is a scatter plot.

Each observation corresponds to an NFL football game. Before each game, one
team is considered the favorite (the team considered more likely to win) and the
other the underdog.

Before each game, oddsmakers set a number called the point spread. Suppose
you place a bet that the favorite will win the game. To win your bet, the favorite
must “beat the spread”. That is, they must beat the underdog by more points than
the spread.

On the horizontal axis below is the spread, set before the game. On the vertical
axis is diff, which is the points scored by the favorite minus points scored by the
underdog during the actual game (a positive value for diff means the favorite won).
                                                       If I had to “draw a line” through this scatter plot by
                                                       hand, it would look something like this (slope about 1,
                                                       intercept about zero). About 95% of points would be
                                                       within +/-2*(14) points of the line.




                                                                      Only TWO teams that
                                                                      had spread >10
                                                                      before the game
                                                                      actually lost!




                           About nine favorites lost
                           by 30 or more points
(a)   In this sample, about how times did the favorite lose by 30 or more points?

             (i) 1          (ii) 5       (iii) 9         (iv) 21


(b)   In this sample, about how many teams favored by 10 or more points (spread > 10)
      lost the actual game?

             (i) 0          (ii) 2       (iii) 7         (iv) 17


(c)   The sample mean of diff is:

             (i) positive                (ii) negative                (iii) about zero


(d)   Which of the two variables has a larger sample variance?

             (i) spread                  (ii) diff                    (iii) their variances
                                                                          are roughly equal

(e)   The sample correlation between spread and diff is closest to

             (i) -0.65      (ii) -0.10   (iii) 0.25      (iv) 0.89



(f)   In a regression of diff on spread, the intercept estimate, a, is closest to

             (i) -40        (ii) 0       (iii) 30        (iv) 50



(g)   In a regression of diff on spread, the slope estimate, b, is closest to

             (i) -3         (ii) 0       (iii) 1         (iv) 3


(h)   In a regression of diff on spread, the estimated standard deviation of the errors,
      se , is approximately

             (i) -7         (ii) 1       (iii) 14        (iv) 28
Now suppose we believe this data is representative of the “true” relationship between
point spreads and actual scores in NFL football games (the “population”). Also suppose
we are willing to assume the errors are iid Normal.

Suppose the Chicago Bears are favored by 14 points in this week’s game (spread = 14).

(i)   What is the 95% plug-in predictive interval for the score difference in the actual
      game (Bears’ points minus opponent’s points)?



      a + b*(spread) +/- 2*se = 14 +/- 2*(14) = (-14, 42)




(j)   If we believe our estimates for a, b, and se are correct and the errors are iid
      Normal, what is the (approximate) probability the Bears win the game?


      If we believe our model, diff ~ N( 14 , 142 )

      So Prob( diff > 0 )
            = Prob( a normal RV falls above one SD below its mean)

             =   .84
Question 2

The country returns dataset we’ve used this quarter consists monthly returns on
portfolios of assets traded on major stock exchanges in various countries. Below are
the summary statistics for the Germany portfolio.

                       Summary measures for selected variables
                                          germany
                           Count                107
                           Mean              0.0129
                           Median            0.0100
                           Variance          0.0031
                           Skewness         -0.1116

(a)   Construct a 95% confidence interval for the “true” expected return on the
      Germany portfolio.

             0.0129 +/- 2*[ sqrt( .0031 / 107 ) ]
                                       =                 ( .002135, .02367 )




(b)   Construct a 95% plug-in predictive interval for the next monthly German
      return.

             0.0129 +/- 2*sqrt(.0031) =                  ( -0.09846 , 0.1243 )




(c)   Suppose we want to test the claim that:
           “In any given month, there is a 50% chance that the Germany portfolio
           has a higher return than the France portfolio.”

      During our 107 month sample, there were 48 months in which the Germany
      portfolio had a higher return than the France portfolio. Test the appropriate null
      hypothesis at the 5% level.


             po = .5       phat = 48/107 = .4486

             z = [ (.4486 - .5) / sqrt( .5*(1-.5)/ 107 ) ] =   -1.063

                                                         FAIL TO REJECT
Below are the results from two statistical tests run on the German data in StatPro.
On the left is the “Runs Test for Randomness”. In the Runs Test, the null
hypothesis is that the data are iid.
On the right is the “Chi-Square Test of Normality”. In this test, the null hypothesis is
that the data are normally distributed.

 Runs Test Results for germany

        Number of obs                  107
        Number above cutoff             60          Test of normal fit
        Number below cutoff             47          Chi-square statistic         18.131
        Number of runs                  59          p-value                       0.034

        E(R)                        53.710
        Stdev(R)                     5.071
        Z-value                      1.043
        p-value (2-tailed)           0.297


In class we’ve talked about the iid Normal model for stock returns. Naturally, this
entails two assumptions: [1] returns are iid,
              and          [2] returns are normally distributed.

Using the tests above is one way to check that these assumptions are reasonable
based on the returns we have observed.

(d)    For the Runs Test, do you reject the null hypothesis at the 5% level? What
       does this test tell you about our assumptions ([1] and/or [2])? Give a brief
       explanation.

       The p-value for the runs test is >.05 so we FAIL TO REJECT.             (1 point)

       This tells you that based on the data, we don’t have definitive evidence
       that the returns are NOT iid.               (1 point)

(e)    For the Chi-Square Test of Normality, do you reject the null hypothesis at the
       5% level? What does this test tell you about our assumptions ([1] and/or [2])?
       Give a brief explanation.

       The p-value for the normality test is <.05 so we REJECT.            (1 point)

       This tells you that based on the data, we have evidence that the
       Germany returns are NOT normally distributed.
              (1 point)
     (f)   Which of the time series plots below shows the Germany returns?                        C

                                                                                          (Answer A, B, or C)
                        The results of the runs test suggests the data should look iid.
                                             This series is not iid.




A.




                       All of the stock return data we’ve studied this quarter has
                                been continuous. This series is discrete.




B.




C.
 Each of the statistical tools we’ve developed this quarter depends on a set of
 assumptions we make about the data. If those assumptions are violated, the results
 you get can be very misleading.


 (g)    Does the confidence interval you constructed in part (a) require assumption
        [1], [2], both, or neither? Based on your answers to parts (d) and (e), is this
        confidence interval valid? Briefly explain.


        The confidence interval requires [1], but not [2]. (1 point)

        Since we can’t reject that the data are iid, the confidence interval is
        probably ok. (1 point)

        (Note: The reason we don’t need [2] for the confidence interval is the
        Central Limit Theorem, which tells us that if we have a reasonably-sized
        sample and the data are iid, xbar will be normal regardless of whether
        the individual observations in our data are normal.)


(h)    Does the plug-in predictive interval you constructed in part (b) require
       assumption [1], [2], both, or neither? Based on your answers to parts (d) and
       (e), is this predictive interval valid? Briefly explain.


       The plug-in predictive interval requires BOTH [1] and [2]. (1 point)

       Since in part (e), we rejected the null hypothesis that the data are
       normally distributed, our predictive interval is likely NOT valid. (1 point)

       (Note: The reason you actually NEED normality for the predictive
       interval is that we’re trying to predict one single outcome. So there’s no
       “averaging” going on, and the CLT doesn’t save you!)

(i)    Would your answers to (g) and/or (h) change if we have 7 observations
       instead of 107? Again give a brief explanation.


       Yes: The confidence interval in part (g) would no longer be valid.
       (1 point)

       This is because we need a reasonably large sample to apply the Central
       Limit Theorem, and n=7 observations is not enough.
       (1 point, to get credit you must mention the Central Limit Theorem)
 Question 3

 On April 2, 2007, the Chicago Cubs will open their season with a three game series
 against the Cincinnati Reds.

 Like a lot of baseball fans, I am not sure how good the Cubs will be next year.
 Suppose I think there are three possibilities:
                C = -1         if the Cubs are a BAD team
                  = 0          if the Cubs are an AVERAGE team
                  = 1          if the Cubs are a GOOD team

                                                                      c             p(c)
 Based on what I know right now, I assign
 the following probability distribution for C:                        -1            0.25
                                                                      0              0.4
                                                                      1             0.35
 Suppose I’m sure the Reds will be an average team next season. If the Cubs are
 also an average team, they have a 50% chance to win each game the two teams
 play. If the Cubs are good this goes up to 65%, while if they are bad it is only 35%.
 Note that these probabilities are for EACH game the two teams play.

 Also suppose I am willing to assume that outcomes in different games are iid.


 (a)    Let S=1 if the Cubs sweep their season opening-series against the Reds
        (meaning they win all three games). If we assumed the Cubs are a good
        team, what is the probability they sweep the series, p(S=1|C=1) ?
                                                            Explanation: Since games are iid and given the Cubs
                                                            are good, they have a .65 probability of winning each game,
                .653 =         .274625                      the probability of three wins in a row is .653


          (Note: Please don’t report answers to six decimal places when you’re actually taking the exam!! I’m
          using Excel to write these solutions and am reporting all six decimal places to avoid rounding errors.
          In practice, you will not be counted off for reasonable rounding errors when your exam is graded.)



(b)    What is the probability the Cubs are a bad team AND they sweep the series,
       p(S=1,C=-1)?


               Similar to part (a), p(S=1|C=-1) = .353 =                                .042875


               So      p(S=1, C=-1) = p(S=1|C=-1)*p(C=-1) = .010719
(c)   What is the marginal probability the Cubs sweep the series, p(S=1)?
      [Hint: It may help you to write out the joint distribution of C and S in our
      usual two-way table format on a separate sheet, but you don’t have to.]

                                                                                     Explanation: I filled in the
                                                                                     “S=1 row” of the joint table
      Here’s the relevant row of the joint distribution:                             similarly to part (b), then
                                                     C                               added to find the marginal
                                            -1             0              1          probability of S=1.
                  S            0
                               1 0.010719                0.05 0.096119              pS(1) = 0.156838


(d)   Suppose the Cubs do sweep the series with the Reds. What is the
      probability they are a good team, p(C=1|S=1)?
      By definition, “Conditional = Joint/Marginal”, or in this case
                        p(C=1|S=1) = p(C=1,S=1) / p(S=1)
      From the table above,                        p(C=1|S=1) = (0.096119/.156838)
                                                                      =       0.612856
       Intuition: One way to interpret conditional probability is how the probabilities we assign would change
       based on observed outcomes. Going into the series, we thought there was a 35% chance the Cubs
       were good. Sweeping the series is a favorable indicator of the Cubs’ ability, since it’s much more likely a
       good team would sweep than a bad team. So we now think there’s a 61% chance the Cubs are good!

(e)   Major League Baseball teams play a total of 162 games each season. Let G
      be the number of games the Cubs win next year. Suppose (unrealistically) we
      believe games are iid and that the Cubs will have a 60% chance to win every
      game. What is the distribution of G?


              Binomial( 162, .6 )


              (1 point for saying “Binomial”, 1 point for correct n and p)



(f)   Using our “empirical rule” approximation and under the same assumptions
      as part (e), give an interval that is (approximately) 95% likely to contain the
      number of wins the Cubs have next season.
      We know that if Y ~ Binomial(n,p), then E(Y) = np and Var(Y) = np(1-p)
      So      E(G) = 162*.6 = 97.2                 and                Var(Y) = 162*.6*.4 = 38.88
              97.2 +/- 2*sqrt(38.88) =                 ( 84.73 , 109.67 )
                         Obviously this is an approximation, since you can’t win .73 or .67 of a game!!
Question 4

After acing your business statistics course and reading about how to count cards
online, you decide to move to Las Vegas to gamble for a living.

Let’s suppose that you’ve gotten very good at one particular card game. Let W be a
random variable equal to your net winnings (winnings minus your original bet, in
dollars) for each hand that you play when you place a $1 bet.

Suppose that E(W) = .0125 and Var(W) = .25

If you place a bigger bet, your net winnings for that hand are are b*W, where b is the
size of your bet.

If you play n times, your total net winnings are
                             T = W1 + W2 + … + Wn
where Wi is your winnings on the ith hand. Assume that your winnings on different
hands are iid, and that each Wi has the same distribution as W (defined above).


(a)   Suppose you bet $60 per hand. For each hand, what is the expected
      value and variance of your winnings?


      X = winnings when you bet $60 = 60*W
      By our linear formulas, E(X) = 60*E(W) =        0.75
                           Var(X) = (60)2Var(W) = 900
      (1 point each)




(b)   Based on the time it takes to deal the cards and play out each hand,
      suppose that you play 50 hands per hour. Assuming you bet $60 on
      each hand, give an interval which is 95% likely to contain your total net
      winnings after one hour.
      When you play 50 hands, T = X1 + X2 + … + X50 where each Xi is iid
      and has the mean and variance from part (a).
      Therefore, E(T) = E(X1 + … + X50) = E(X1) + E(X2) + … + E(X50)
            = 50*(.75) = $37.50
      Var(T) = Var(X1 + … + X50) = Var(X1) + … + Var(X50)
            = 50*900 = 45000
      So the 95% interval is 37.50 +/- 2*sqrt(45000) = ( -386.76 , 461.76 )
(c)   Now suppose you play this game 40 hours per week for one year. (Whew,
      this is starting to sound like work). There are 52 weeks in a year. What is
      your expected income per year from gambling? [1 point]


      50 hands/hour * 40 hours/week * 52 weeks/year = you play 104,000
      hands per year
      By the same calculation as in part (b), your expected annual income is


                                 104,000*(.75) = $78,000
      [Full credit if you got 104,000 hands per year but multiplied by
      Var(W)=.0125 instead of Var(X)=.75]


(d)   Your friend (also a GSB student, but he hasn’t taken my class) tells you,
      “Gambling for a living sounds like fun, but doing the same thing for 40 hours a
      week is too much like working.”
      Instead of playing for 40 hours each week, he says you should play 8 hours
      per week and bet $300 on each hand. He claims this would result in the
      same expected annual income, and you’d have a lot more time to party!
      Is he right? What would change if you followed your friend’s advice? Provide
      calculations to back up your answer. [3 points]


      Your friend is correct that your expected income is the same. However,
      since you are now betting $300 on each hand, the standard deviation
      (or variance) of your income is much higher!
      [1 point for recognizing standard deviation changes, 1 point each for
      calculating the standard deviation of your income in each case.]


      In part (c), SD( annual income ) = sqrt[ 104,000*Var(X) ] = $ 9,764.71
      Y = winnings per hand when you bet $300 = 300*W
      Var(Y) = (300)2Var(W) = 22500
      If you only play 8 hours/week, it’s now 50*8*52 = 20,800 hands per year,
      but since the variance of your winnings on each hand is much larger,
      SD( annual income ) = sqrt[ 20,800*Var(Y) ] = $ 21,633.31 !!
      Aside: What’s going on here? Well, when you bet $10 on a single hand, the variance of your winnings is
      (10)2Var(W) = 100*Var(W), while if you play ten iid hands betting $1 each time, the variance is only 10*Var(W).
      Intuitively, when you play ten iid hands, your wins and losses will tend to “cancel out”, so variance is smaller!
Question 5

True or False. Clearly print either T or F in the slot ___ before each statement.
Each correct answer is worth ONE POINT.


(a) __F__                  (Adding a constant leaves sample variance unchanged!)
If we add 7 to each value of a variable in our sample, the sample variance is increased
by 49.



(b) __F__                   (Sample correlation has NO units.)
Suppose we observe a sample of people in the workforce. For each person, if x is age
in years and y is income in dollars, then the sample correlation between x and y is
measured in year-dollars.



(c) __F__                   (E(X) is a “weighted average” of outcomes, weighted by
                            probabilities.)
For a discrete random variable X, the expected value E(X) is the outcome with the
highest probability of occurring.



(d) __T__                   (Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y), while
                             Var(X-Y) = Var(X) + Var(Y) - 2Cov(X,Y))
If X and Y are random variables and have negative correlation, then the variance of
their sum, X+Y, is smaller than the variance of their difference, X-Y.



(e) __T__                 ( 1 – P(both success) – P(both failure) = 1 - .52 - .52 = .5 )
If we conduct two independent Bernoulli trials and each has a .5 probability of success,
the probability that EXACTLY ONE out of the two trials is a success is .5.



(f) __F__                  ( The error is definitely NOT independent of Y!! )
In the simple regression model, Y = α + βX + ε, the error ε is assumed independent of
both the regressor X and the dependent variable, Y.
(g) ____
In the simple regression model, suppose we are NOT willing to assume the errors ε
are normal. The 95% plug-in predictive interval, a + bX +/- 2se , should still be valid
provided we have a large enough sample.




(h) __F__     (The sampling distribution is the probability distribution of our
              estimator, not the parameter.)
In statistical inference, the sampling distribution is the probability distribution of the
unknown parameter we are trying to estimate.




(i) __T__     (Probability of being MORE than 2 SD’s from mean is < .05 )
For an unbiased estimator with a normal sampling distribution, a p-value of less than
.05 means the estimate was more than TWO standard errors away from the
hypothesized value.



(j) __T__
Suppose we just conducted a statistical test and that we rejected the null hypothesis
at the 5% level. Assuming the assumptions underlying the test were correct (for
example, the data were i.i.d.), one way to interpret the phrase “at the 5% level” is as
the probability we were wrong: that is, we were willing to admit there is a 5%
probability that we rejected the null when it was actually true.
              (The “level” of a test is often referred to as the probability of a
              “Type I error”; that is, rejecting the null when it is actually true.
              Think of it this way, we said we’d reject at the 5% level if |z|>2,
              but there IS a 5% probability that a normal r.v. could be more
              than 2 s.d.’s away from its mean by pure chance!)
 Question 6

 Capital punishment (the practice of executing people convicted of crimes, usually
 murder) is highly controversial but still practiced in most of the United States. We are
 interested in investigating the relationship between capital punishment and violent crime.

 Suppose we have the following data:
            mrdratei = Murders per 100,000 population during a particular year
                          in a given state
            execi = Number of executions performed in the state in that year.
            unempi = Percentage unemployment rate in the state during that year

 We have data for 50 states plus the District of Columbia in three years (1987, 1990, and
 1993), for a total of 153 observations. The following table shows regression results from
 StatPro:

  Results of multiple regression for mrdrate

  Summary measures
      Multiple R                ??
      R-Square                  ??
      Adj R-Square            0.05
      StErr of Est            8.96

  ANOVA Table
      Source                    df          SS           MS             F          p-value
      Explained                  2        799.8        399.9       4.9798          0.0081
      Unexplained              150      12045.5         80.3

  Regression coefficients
                        Coefficient      Std Err
        Constant              0.35         2.69
        exec                  0.17         0.19
        unemp                 1.26         0.44


(a)   Using the regression results above, test the null hypothesis that “controlling for
      state-wide economic conditions, the presence of capital punishment has no
      impact on the murder rate”. Do you reject the null hypothesis at the 5% level?
      The statement can be translated as Ho: β1 = 0.
      So we are comparing an estimate, b1=.17, to the hypothesized value, β1o = 0
      Therefore            z = (.17 – 0)/.19 =       .895
                                                   FAIL TO REJECT
 (b)      The p-value associated with your test from part (a) is :                            Explanation: Since the
                                                                                              z-value is a little less
                   (i) .954            (ii) .396          (iii) .171      (iv) .032           than one, the p-value
                                                                                              must be a little greater
                                                                                              than .32.

 (c)      What is the R-square of this regression?
                   R2 = (Explained SS)/(Total SS) = Explained/(Explained + Unexplained)
                                       = 799.8 / ( 799.8 + 12045.5 ) = .0622

 (d)      In a given year, suppose the state of Texas performs 22 executions and
          has a statewide unemployment rate of 8.4%. Construct a 95% plug-in
          predictive interval for the number of murders per 100,000 population.
                   a + b1(22) + b2(8.4) +/- 2*se = 14.674 +/- 2*8.96
                                                            = ( -3.246 , 32.594 )




 When I ran this regression, I also asked StatPro to output columns of Fitted Values
 and Residuals. Below are the three observations corresponding to the District of
 Columbia in 1987, 1990, and 1993 (‘9’ is the state code for D.C. in this dataset).

        state             year      mrdrate          exec         unemp       Fitted Values     Residuals
            9               87         36.2             0            6.3
            9               90         77.8             0            6.6
            9               93         78.5             0            8.5                ??               ??

 (e)      For the 1993 observation (last row), what numbers should appear in the “Fitted
          Values” and “Residuals” columns?
                                                                                          = 78.5 – 11.06

                                   11.06                          67.44

=.35+.17(0) + 1.26(8.5)          (Fitted value)                  (Residual)


 (f)      Our usual assumption for regression models is that the errors satisfy
                                       εi ~ iid N( 0 ,σ2 )
          If we believe this assumption here, about how many standard deviations
          is the residual you calculated in part (e) away from its mean?
          Residuals have mean zero, and se is our estimate of σ, so the above
          residual is approximately:
                   (67.44 – 0)/8.96 =              7.53                   s.d.’s above the mean!!
Early in the class we talked about how outliers can affect sample means and
variances. They can also have a HUGE impact in regression analysis! Below is
the scatter plot of murder rates versus unemployment. The three District of
Columbia observations we saw on the previous page are circled.


                             DC, 1990                             Correlation = 0.240
                80



                                                  DC, 1993




                60
   m rd ra te




                40           DC, 1987




                20




                 0
                     2   4        6           8              10               12        14

                                           unemp


 [ Note: If you’ve been listening to me all quarter, this plot would have been one of
 the FIRST things you looked at!! ]
The table below shows the same regression, but with the three D.C. observations
omitted from the sample.

  Results of multiple regression for mrdrate

  Summary measures
      Multiple R                   ??
      R-Square                     ??
      Adj R-Square             0.2086
      StErr of Est                 ??

  ANOVA Table
      Source                       df             SS              MS               F         p-value
      Explained                     2           447.8           223.9         20.634         0.0000
      Unexplained                 147          1595.0            10.9

  Regression coefficients
                        Coefficient           Std Err
        Constant              2.56              0.99
        exec                  0.30              0.07
        unemp                 0.67              0.16



(g)   What is se for the new regression?
      [Note: With the three D.C. observations omitted, there are now 150 observations.]


      se = sqrt[ Unexplained SS / (n-k-1) ] = sqrt( 1595 / 147 )                     =     3.294



(h)   Based on the new results (with D.C. omitted), construct a 95%
      confidence interval for β1, the coefficient on exec in this multiple
      regression model.


      b1 +/- 2*se(b1) = .30 +/- 2*(.07) =               ( .16, .44 )



              Note: Recognize that zero is OUTSIDE this 95% CI, so if we did the hypothesis test
              from part (a) again, we would REJECT the null that executions have no relationship
              with murder rates.
 (i)    Now compare two sets of regression results. Remember, the only difference
        is that the three District of Columbia observations were included in the table
        at the beginning of the problem, while the regression on the previous page
        has the D.C. observations omitted.
        Suppose you went back and re-did part (a) after throwing out the D.C.
        observations. How would your answer change?
        In particular, how does your conclusion about the relationship between
        murder rates and capital punishment change when these three data points
        are excluded? Why does this happen? Briefly explain.
        Your answer to (i) is worth 3 points.


        •          If you throw out the D.C. observations and re-run the regression,
                   you would now conclude the effect of capital punishment is
                   statistically significant. Why?...
        •          Notice that D.C. has ZERO executions (no capital punishment).
                   D.C. also has an insanely LARGE murder rate in all three years,
                   particularly ’90 and ’93 (see part f).
        •          Therefore, when you discard the D.C. observations, the
                   relationship between executions and murder rate looks MORE
                   POSITIVE, (and in this case also turns out to be statistically
                   significant).
        [1 point for each statement]
            Note: BE VERY CAREFUL how you interpret this regression. First off, obviously the DC observations are
            outliers. In general, there is no clear cut answer as to whether you should throw them out, but either way
            you must be aware of the influence they have on your results (this is why I teach you to LOOK AT YOUR
            DATA)!! It’s also misleading to say in part (j) that the model “fits better”, because obviously throwing out
            extreme points will make your model look like a better fit!

            Also, remember that correlation is not causation: even if you throw DC out, you should NOT say “capital
            punishment causes murders”. It could be the case that, over time, certain states had higher violent crime
            rates to begin with, and adopted capital punishment as a way to address the problem.


(j)    How do R-square and se change when we throw out the D.C. observations,
       and why? Briefly explain.


       se is the sample standard deviation of the residuals. When you throw
       out the three large residuals, se decreases.
       R-square is (Explained SS)/(Explained SS + Unexplained SS). When
       you throw out the three large residuals, “Unexplained SS” goes down,
       and thus R-square increases.
       [1 point each; good intuitive explanations are ok.]
Question 7 (Simpson’s Paradox)


Suppose a certain university has two programs: Engineering, and Arts & Sciences.

Students who wish to attend the university must choose which program to apply to
(they cannot apply to both programs). They then are either accepted or rejected.

For each applicant, define the following random variables:
             E = 1 if the person applied to Engineering and
                            0 if they applied to Arts & Sciences
             A = 1 if the person is accepted and 0 if they are rejected
             G = 1 if the applicant is female, and 0 if male

The admissions office has supplied us with some data, which we have used to
construct the following probability model.



For female applicants (G=1),               For male applicants (G=0),
we have:                                   we have:

                        E                                          E

                   0         1                                0         1

           0      .08       .48                         0    .32       .12
      A                                             A
           1      .12       .32                         1    .48       .08


Half of all applicants are women, p( G = 1 ) = .5

(a)   Without knowing what program she applied to, what is the probability that
      a female applicant is accepted, p( A=1 | G=1 )? [Hint: The table on the
      left gives you joint probabilities for E and A given that G=1.]

      .44 (the marginal prob. of A=1 from the left-hand table)




(b)   For a male applicant (G=0), without knowing which program he applied to,
      what is the distribution of A?

      Bernoulli(.56) (the marginal prob. of A=1 from the right-hand table
      is .56; you could also have written out values (0,1) and probabilities.)
A recent study published in a major news magazine found that male applicants are more
likely to be accepted at this university than female applicants. This results in some very
unpleasant publicity for the university.

(c)   Based only on your answers to (a) and (b), could this study be correct?

      Yes, it looks like female applicants have a lower probability of being
      accepted than males.



The deans of the two programs consult with each other. Each assures the other that a
female applicant is just as likely to be accepted as male. The also both feel that the
quality of female and male applicants is comparable. They think something must be
wrong with the study.

(d)   Given that an applicant is female and applied to the engineering program, what
      is the probability she is accepted, p( A=1 | E=1, G=1 ) ?

      .32 / (.32 + .48)   = .4

      (This is just p(A=1|E=1) from the left-hand table)




(e)   Given that an applicant is male and applied to the engineering program, what
      is the probability he is accepted, p( A=1 | E=1, G=0 ) ?

      .08/(.08 + .12) = .4

      (This is just p(A=1|E=1) from the right-hand table)



(f)   Without knowing the gender of an applicant, what is the probability s/he is
      accepted into the engineering program?

      .4

      (You don’t have to do any math, since if the conditional probabilities
      are equal for both genders, the marginal must be the same.)
(g)   For the arts & sciences program, does the probability of being accepted depend
      on whether an applicant is male or female?
      (That is, are p( A=1 | E=0, G=1 ) and p( A=1 | E=0, G=0 ) the same? )

      No. Similar to parts (d) and (e):

      For women, .12/(.12 + .08) = .6
      For men, .48/(.48 + .32) = .6



(h)   Given that an applicant is female, what is the probability she applies to the
      engineering program? [1 point]

      .8   (This is just the marginal probability of E=1 from the left-hand table)



(i)   Given that an applicant is male, what is the probability he applies to the
      engineering program? [1 point]

      .2   (This is just the marginal probability of E=1 from the right-hand table)



(j)   Does this university discriminate against women? Explain. [3 points]


      No. Even though we observe in part (c) that female applicants are less
      likely to be admitted (if we don’t control for which program they applied to),
      we saw in parts (d)-(g) that each program is actually equally likely to admit
      male and female applicants!

      The reason that female applicants look less likely to be admitted is that,
      according to parts (h)-(i), a higher fraction of female applicants choose to
      apply to the Engineering program, which is less likely to accept applicants
      of both genders. Once we look at department-level admissions, we find that
      neither department discriminates.

      (1 point for saying “No” because each department is equally likely to admit
      males versus females; 2 points for saying that the difference arises because
      more female applicants choose to apply to the department that’s harder to
      get into.)
Question 8                         (Test-taking tip: Do this question LAST.)


Suppose we are estimating the simple linear regression model:

                                     Yi = α + β xi + ε i
Assume that the errors are distributed               εi ~ iid N(0,σ2)

Suppose we are going to estimate this model using TWO observations. We have
two KNOWN x-values, x1 and x2, and we are about to observe two Y-values, Y1
and Y2. So based on what we know now, the x-values are known constants, and
the Y-values are random variables.
Define:
                                                  Y2 − Y1
                                        B=
                                                  x2 − x1

We are thinking about using B as an estimator of the slope, β. We are interested in
asking, what is the sampling distribution of this estimator?


Suppose I make the following claim:

                “B is an unbiased estimator of β with a normal sampling distribution.”


(a)   If my claim is true, what is P( B > β ) ?               [Hint: It may help you to draw a picture!]

      .5

      “Unbiased estimator with a normal sampling distribution” means that
      E(B=β) and that B is normal — in other words the different values we could
      see for our estimator B look like a bell curve centered above β. And the
      probability a normal r.v. is bigger than its mean is .5!


(b)   Let’s say you knew that Var(B) = σΒ2 , where σΒ2 is some number. Assuming my
      claim is true and you know what σΒ2 is, construct a 95% confidence interval for β.

      B +/- 2*sqrt(σΒ2) , or equivalently                               B +/- 2* σΒ


           Note: I do not expect most b-stats students (even “A” students) to be able to do parts (c)-(e) in a timed
           exam situation. However, notice you can get (a) and (b) just by knowing what it means for an estimator
           to be unbiased, the definition of a sampling distribution, and how the sampling distribution is used to
           build a confidence interval!
Now let’s see if we can verify my claim about B (you shouldn’t just believe
everything somebody tells you about a strange estimator!!).


[Hint: The rest of this question is actually much easier if you do a little algebra up front.
I’ll help get you started. Since B is an estimator of β, see if you can rewrite B as
                                  B = β + “error”
where “error” depends on ε1, ε2, x1, and x2. The easiest way to do this is to start with
the formula above for B, and substitute α + βx1 + ε1 in for Y1 and α + βx2 + ε2 in for Y2.
You get:
                        (α + β x2 + ε 2 ) − (α + β x1 + ε 1 )
               B=
                                     x2 − x1
Now see if you can cancel some terms and get β by itself. Also remember, when you’re
doing the problems below, x1 and x2 are known constants, while ε1 and ε2 are iid N(0,σ2)
random variables. ]


(c)    What is E(B)?

             (α + β x2 + ε 2 ) − (α + β x1 + ε 1 ) β x2 + ε 2 − β x1 − ε 1
       B=                                         =
                          x2 − x1                         x2 − x1
             β ( x2 − x1 ) + ε 2 − ε 1                  ( x2 − x1 ) ε 2 − ε 1    ε − ε1
         =                                      =β                 +          =β+ 2
                        x2 − x1                           x2 − x1    x2 − x1     x2 − x1
                                                              ε 2 − ε1
             Therefore,          E (B) = β + E (                           ) =β               Note: This
                                                               x2 − x1                        verifies that “B is
                                                                                              unbiased”.
(d)    What is Var(B)?

       Using the above expression for B,

                       ⎛     ε 2 − ε 1 ⎞ Var (ε 2 − ε 1 )       2σ 2
       Var ( B ) = Var ⎜ β +           ⎟ =                =
                       ⎝     x2 − x1 ⎠     ( x 2 − x1 ) 2   ( x2 − x1 ) 2


          Explanation: Key steps include (i) recognize that β is a constant, so it doesn’t affect
          variance; (ii) We know that Var(a*X) = a2Var(X), so 1/(x2-x1) gets squared when you factor
          it out; and (iii) Var(ε2 – ε1) = Var(ε2) – Var(ε1) + 2*Cov(ε2,ε1) = σ2 + σ2 = 2σ2 .
(e)   Suppose that 0 ≤ xi ≤ 1; that is, both values x1 and x2 must be between zero
      and one. However, you get to pick the x-values in advance (then you get a
      Y-value for each x, and plug them into our estimator, B).

      If your goal is to estimate β as accurately as possible, what values should
      you choose for x1 and x2 ?

      Since B is unbiased, “estimate β as accurately as possible” means we
      want Var(B) to be as small as possible.

      Therefore, from part (d), since (x2-x1)2 shows up in the denominator of
      the variance, we want x2 and x1 as far apart as we can get. So you
      should choose x1=0 and x2=1 (or x1=1, x2=0).

        (Of course, we’d also like σ2 to be as small as possible, but that’s usually not something we
        have control over!)


(f)   Would my claim still be correct if we were NOT willing to assume the errors εi
      were normally distributed? Briefly explain.

      [Note: It is possible to get full credit for part (f) even if you don’t get parts (c)
      through (e). ]


      No. E(B) and Var(B) would still be the same (in particular, B would still be
      unbiased), but the sampling distribution would NOT be normal. When the
      errors are normal, B is normal because it is a linear combination of two
      normal r.v’s ε1 and ε2. If the errors aren’t normal, B will not be normal (the
      Central Limit Theorem won’t save us here, because there are only TWO
      observations!).