Business Statistics 41000
Autumn 2006 Final Exam
DO NOT TURN THIS PAGE OVER UNTIL YOU ARE TOLD TO DO SO.
You have 3 hours to complete the exam. When time is called please stop
The layout of the exam, including the number of questions and the point
value of each question, is on the next page. Unless otherwise indicated,
each part of each question is worth 2 points.
You may use a calculator and two 8.5 by 11 inch “cheat sheets”. No other
reference materials are allowed.
Please show your work and clearly indicate your answer in the space
provided. You may be awarded partial credit in case of arithmetic errors or
incomplete answers, but only if your work is legible. Unsupported answers
(e.g., just writing “fail to reject”) receive zero credit.
Students in my class are required to adhere to the standards of conduct in the GSB
Honor Code and the GSB Standards of Scholarship. The GSB Honor Code also
requires students to sign the following GSB Honor pledge:
I pledge my honor that I have not violated the Honor Code during this examination. I
further understand that discussing the contents of this exam with anyone prior to all
students completing the exam is a violation of the Honor Code.
Sign here to acknowledge: _____________________________________
There are 8 questions.
Question 1, 10 parts, 20 points _____
Question 2, 9 parts, 18 points _____
Question 3, 6 parts, 12 points _____
Question 4, 4 parts, 8 points _____
Question 5, 10 T/F questions, 10 points _____
Question 6, 10 parts, 21 points _____
Question 7, 10 parts, 19 points _____
Question 8, 6 parts, 12 points _____
Total 120 points
Mean 83 Note to Autumn 2006 students: Because this
Median 86 was a challenging exam, the graders asked (and I
approved) granting of ½-point partial credit for
Std. Dev. 16 certain questions. Actual credit assigned may
therefore deviate slightly from the point allocations
75th %tile 95 indicated in these solutions.
25th %tile 71
Below is a scatter plot.
Each observation corresponds to an NFL football game. Before each game, one
team is considered the favorite (the team considered more likely to win) and the
other the underdog.
Before each game, oddsmakers set a number called the point spread. Suppose
you place a bet that the favorite will win the game. To win your bet, the favorite
must “beat the spread”. That is, they must beat the underdog by more points than
On the horizontal axis below is the spread, set before the game. On the vertical
axis is diff, which is the points scored by the favorite minus points scored by the
underdog during the actual game (a positive value for diff means the favorite won).
If I had to “draw a line” through this scatter plot by
hand, it would look something like this (slope about 1,
intercept about zero). About 95% of points would be
within +/-2*(14) points of the line.
Only TWO teams that
had spread >10
before the game
About nine favorites lost
by 30 or more points
(a) In this sample, about how times did the favorite lose by 30 or more points?
(i) 1 (ii) 5 (iii) 9 (iv) 21
(b) In this sample, about how many teams favored by 10 or more points (spread > 10)
lost the actual game?
(i) 0 (ii) 2 (iii) 7 (iv) 17
(c) The sample mean of diff is:
(i) positive (ii) negative (iii) about zero
(d) Which of the two variables has a larger sample variance?
(i) spread (ii) diff (iii) their variances
are roughly equal
(e) The sample correlation between spread and diff is closest to
(i) -0.65 (ii) -0.10 (iii) 0.25 (iv) 0.89
(f) In a regression of diff on spread, the intercept estimate, a, is closest to
(i) -40 (ii) 0 (iii) 30 (iv) 50
(g) In a regression of diff on spread, the slope estimate, b, is closest to
(i) -3 (ii) 0 (iii) 1 (iv) 3
(h) In a regression of diff on spread, the estimated standard deviation of the errors,
se , is approximately
(i) -7 (ii) 1 (iii) 14 (iv) 28
Now suppose we believe this data is representative of the “true” relationship between
point spreads and actual scores in NFL football games (the “population”). Also suppose
we are willing to assume the errors are iid Normal.
Suppose the Chicago Bears are favored by 14 points in this week’s game (spread = 14).
(i) What is the 95% plug-in predictive interval for the score difference in the actual
game (Bears’ points minus opponent’s points)?
a + b*(spread) +/- 2*se = 14 +/- 2*(14) = (-14, 42)
(j) If we believe our estimates for a, b, and se are correct and the errors are iid
Normal, what is the (approximate) probability the Bears win the game?
If we believe our model, diff ~ N( 14 , 142 )
So Prob( diff > 0 )
= Prob( a normal RV falls above one SD below its mean)
The country returns dataset we’ve used this quarter consists monthly returns on
portfolios of assets traded on major stock exchanges in various countries. Below are
the summary statistics for the Germany portfolio.
Summary measures for selected variables
(a) Construct a 95% confidence interval for the “true” expected return on the
0.0129 +/- 2*[ sqrt( .0031 / 107 ) ]
= ( .002135, .02367 )
(b) Construct a 95% plug-in predictive interval for the next monthly German
0.0129 +/- 2*sqrt(.0031) = ( -0.09846 , 0.1243 )
(c) Suppose we want to test the claim that:
“In any given month, there is a 50% chance that the Germany portfolio
has a higher return than the France portfolio.”
During our 107 month sample, there were 48 months in which the Germany
portfolio had a higher return than the France portfolio. Test the appropriate null
hypothesis at the 5% level.
po = .5 phat = 48/107 = .4486
z = [ (.4486 - .5) / sqrt( .5*(1-.5)/ 107 ) ] = -1.063
FAIL TO REJECT
Below are the results from two statistical tests run on the German data in StatPro.
On the left is the “Runs Test for Randomness”. In the Runs Test, the null
hypothesis is that the data are iid.
On the right is the “Chi-Square Test of Normality”. In this test, the null hypothesis is
that the data are normally distributed.
Runs Test Results for germany
Number of obs 107
Number above cutoff 60 Test of normal fit
Number below cutoff 47 Chi-square statistic 18.131
Number of runs 59 p-value 0.034
p-value (2-tailed) 0.297
In class we’ve talked about the iid Normal model for stock returns. Naturally, this
entails two assumptions:  returns are iid,
and  returns are normally distributed.
Using the tests above is one way to check that these assumptions are reasonable
based on the returns we have observed.
(d) For the Runs Test, do you reject the null hypothesis at the 5% level? What
does this test tell you about our assumptions ( and/or )? Give a brief
The p-value for the runs test is >.05 so we FAIL TO REJECT. (1 point)
This tells you that based on the data, we don’t have definitive evidence
that the returns are NOT iid. (1 point)
(e) For the Chi-Square Test of Normality, do you reject the null hypothesis at the
5% level? What does this test tell you about our assumptions ( and/or )?
Give a brief explanation.
The p-value for the normality test is <.05 so we REJECT. (1 point)
This tells you that based on the data, we have evidence that the
Germany returns are NOT normally distributed.
(f) Which of the time series plots below shows the Germany returns? C
(Answer A, B, or C)
The results of the runs test suggests the data should look iid.
This series is not iid.
All of the stock return data we’ve studied this quarter has
been continuous. This series is discrete.
Each of the statistical tools we’ve developed this quarter depends on a set of
assumptions we make about the data. If those assumptions are violated, the results
you get can be very misleading.
(g) Does the confidence interval you constructed in part (a) require assumption
, , both, or neither? Based on your answers to parts (d) and (e), is this
confidence interval valid? Briefly explain.
The confidence interval requires , but not . (1 point)
Since we can’t reject that the data are iid, the confidence interval is
probably ok. (1 point)
(Note: The reason we don’t need  for the confidence interval is the
Central Limit Theorem, which tells us that if we have a reasonably-sized
sample and the data are iid, xbar will be normal regardless of whether
the individual observations in our data are normal.)
(h) Does the plug-in predictive interval you constructed in part (b) require
assumption , , both, or neither? Based on your answers to parts (d) and
(e), is this predictive interval valid? Briefly explain.
The plug-in predictive interval requires BOTH  and . (1 point)
Since in part (e), we rejected the null hypothesis that the data are
normally distributed, our predictive interval is likely NOT valid. (1 point)
(Note: The reason you actually NEED normality for the predictive
interval is that we’re trying to predict one single outcome. So there’s no
“averaging” going on, and the CLT doesn’t save you!)
(i) Would your answers to (g) and/or (h) change if we have 7 observations
instead of 107? Again give a brief explanation.
Yes: The confidence interval in part (g) would no longer be valid.
This is because we need a reasonably large sample to apply the Central
Limit Theorem, and n=7 observations is not enough.
(1 point, to get credit you must mention the Central Limit Theorem)
On April 2, 2007, the Chicago Cubs will open their season with a three game series
against the Cincinnati Reds.
Like a lot of baseball fans, I am not sure how good the Cubs will be next year.
Suppose I think there are three possibilities:
C = -1 if the Cubs are a BAD team
= 0 if the Cubs are an AVERAGE team
= 1 if the Cubs are a GOOD team
Based on what I know right now, I assign
the following probability distribution for C: -1 0.25
Suppose I’m sure the Reds will be an average team next season. If the Cubs are
also an average team, they have a 50% chance to win each game the two teams
play. If the Cubs are good this goes up to 65%, while if they are bad it is only 35%.
Note that these probabilities are for EACH game the two teams play.
Also suppose I am willing to assume that outcomes in different games are iid.
(a) Let S=1 if the Cubs sweep their season opening-series against the Reds
(meaning they win all three games). If we assumed the Cubs are a good
team, what is the probability they sweep the series, p(S=1|C=1) ?
Explanation: Since games are iid and given the Cubs
are good, they have a .65 probability of winning each game,
.653 = .274625 the probability of three wins in a row is .653
(Note: Please don’t report answers to six decimal places when you’re actually taking the exam!! I’m
using Excel to write these solutions and am reporting all six decimal places to avoid rounding errors.
In practice, you will not be counted off for reasonable rounding errors when your exam is graded.)
(b) What is the probability the Cubs are a bad team AND they sweep the series,
Similar to part (a), p(S=1|C=-1) = .353 = .042875
So p(S=1, C=-1) = p(S=1|C=-1)*p(C=-1) = .010719
(c) What is the marginal probability the Cubs sweep the series, p(S=1)?
[Hint: It may help you to write out the joint distribution of C and S in our
usual two-way table format on a separate sheet, but you don’t have to.]
Explanation: I filled in the
“S=1 row” of the joint table
Here’s the relevant row of the joint distribution: similarly to part (b), then
C added to find the marginal
-1 0 1 probability of S=1.
1 0.010719 0.05 0.096119 pS(1) = 0.156838
(d) Suppose the Cubs do sweep the series with the Reds. What is the
probability they are a good team, p(C=1|S=1)?
By definition, “Conditional = Joint/Marginal”, or in this case
p(C=1|S=1) = p(C=1,S=1) / p(S=1)
From the table above, p(C=1|S=1) = (0.096119/.156838)
Intuition: One way to interpret conditional probability is how the probabilities we assign would change
based on observed outcomes. Going into the series, we thought there was a 35% chance the Cubs
were good. Sweeping the series is a favorable indicator of the Cubs’ ability, since it’s much more likely a
good team would sweep than a bad team. So we now think there’s a 61% chance the Cubs are good!
(e) Major League Baseball teams play a total of 162 games each season. Let G
be the number of games the Cubs win next year. Suppose (unrealistically) we
believe games are iid and that the Cubs will have a 60% chance to win every
game. What is the distribution of G?
Binomial( 162, .6 )
(1 point for saying “Binomial”, 1 point for correct n and p)
(f) Using our “empirical rule” approximation and under the same assumptions
as part (e), give an interval that is (approximately) 95% likely to contain the
number of wins the Cubs have next season.
We know that if Y ~ Binomial(n,p), then E(Y) = np and Var(Y) = np(1-p)
So E(G) = 162*.6 = 97.2 and Var(Y) = 162*.6*.4 = 38.88
97.2 +/- 2*sqrt(38.88) = ( 84.73 , 109.67 )
Obviously this is an approximation, since you can’t win .73 or .67 of a game!!
After acing your business statistics course and reading about how to count cards
online, you decide to move to Las Vegas to gamble for a living.
Let’s suppose that you’ve gotten very good at one particular card game. Let W be a
random variable equal to your net winnings (winnings minus your original bet, in
dollars) for each hand that you play when you place a $1 bet.
Suppose that E(W) = .0125 and Var(W) = .25
If you place a bigger bet, your net winnings for that hand are are b*W, where b is the
size of your bet.
If you play n times, your total net winnings are
T = W1 + W2 + … + Wn
where Wi is your winnings on the ith hand. Assume that your winnings on different
hands are iid, and that each Wi has the same distribution as W (defined above).
(a) Suppose you bet $60 per hand. For each hand, what is the expected
value and variance of your winnings?
X = winnings when you bet $60 = 60*W
By our linear formulas, E(X) = 60*E(W) = 0.75
Var(X) = (60)2Var(W) = 900
(1 point each)
(b) Based on the time it takes to deal the cards and play out each hand,
suppose that you play 50 hands per hour. Assuming you bet $60 on
each hand, give an interval which is 95% likely to contain your total net
winnings after one hour.
When you play 50 hands, T = X1 + X2 + … + X50 where each Xi is iid
and has the mean and variance from part (a).
Therefore, E(T) = E(X1 + … + X50) = E(X1) + E(X2) + … + E(X50)
= 50*(.75) = $37.50
Var(T) = Var(X1 + … + X50) = Var(X1) + … + Var(X50)
= 50*900 = 45000
So the 95% interval is 37.50 +/- 2*sqrt(45000) = ( -386.76 , 461.76 )
(c) Now suppose you play this game 40 hours per week for one year. (Whew,
this is starting to sound like work). There are 52 weeks in a year. What is
your expected income per year from gambling? [1 point]
50 hands/hour * 40 hours/week * 52 weeks/year = you play 104,000
hands per year
By the same calculation as in part (b), your expected annual income is
104,000*(.75) = $78,000
[Full credit if you got 104,000 hands per year but multiplied by
Var(W)=.0125 instead of Var(X)=.75]
(d) Your friend (also a GSB student, but he hasn’t taken my class) tells you,
“Gambling for a living sounds like fun, but doing the same thing for 40 hours a
week is too much like working.”
Instead of playing for 40 hours each week, he says you should play 8 hours
per week and bet $300 on each hand. He claims this would result in the
same expected annual income, and you’d have a lot more time to party!
Is he right? What would change if you followed your friend’s advice? Provide
calculations to back up your answer. [3 points]
Your friend is correct that your expected income is the same. However,
since you are now betting $300 on each hand, the standard deviation
(or variance) of your income is much higher!
[1 point for recognizing standard deviation changes, 1 point each for
calculating the standard deviation of your income in each case.]
In part (c), SD( annual income ) = sqrt[ 104,000*Var(X) ] = $ 9,764.71
Y = winnings per hand when you bet $300 = 300*W
Var(Y) = (300)2Var(W) = 22500
If you only play 8 hours/week, it’s now 50*8*52 = 20,800 hands per year,
but since the variance of your winnings on each hand is much larger,
SD( annual income ) = sqrt[ 20,800*Var(Y) ] = $ 21,633.31 !!
Aside: What’s going on here? Well, when you bet $10 on a single hand, the variance of your winnings is
(10)2Var(W) = 100*Var(W), while if you play ten iid hands betting $1 each time, the variance is only 10*Var(W).
Intuitively, when you play ten iid hands, your wins and losses will tend to “cancel out”, so variance is smaller!
True or False. Clearly print either T or F in the slot ___ before each statement.
Each correct answer is worth ONE POINT.
(a) __F__ (Adding a constant leaves sample variance unchanged!)
If we add 7 to each value of a variable in our sample, the sample variance is increased
(b) __F__ (Sample correlation has NO units.)
Suppose we observe a sample of people in the workforce. For each person, if x is age
in years and y is income in dollars, then the sample correlation between x and y is
measured in year-dollars.
(c) __F__ (E(X) is a “weighted average” of outcomes, weighted by
For a discrete random variable X, the expected value E(X) is the outcome with the
highest probability of occurring.
(d) __T__ (Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y), while
Var(X-Y) = Var(X) + Var(Y) - 2Cov(X,Y))
If X and Y are random variables and have negative correlation, then the variance of
their sum, X+Y, is smaller than the variance of their difference, X-Y.
(e) __T__ ( 1 – P(both success) – P(both failure) = 1 - .52 - .52 = .5 )
If we conduct two independent Bernoulli trials and each has a .5 probability of success,
the probability that EXACTLY ONE out of the two trials is a success is .5.
(f) __F__ ( The error is definitely NOT independent of Y!! )
In the simple regression model, Y = α + βX + ε, the error ε is assumed independent of
both the regressor X and the dependent variable, Y.
In the simple regression model, suppose we are NOT willing to assume the errors ε
are normal. The 95% plug-in predictive interval, a + bX +/- 2se , should still be valid
provided we have a large enough sample.
(h) __F__ (The sampling distribution is the probability distribution of our
estimator, not the parameter.)
In statistical inference, the sampling distribution is the probability distribution of the
unknown parameter we are trying to estimate.
(i) __T__ (Probability of being MORE than 2 SD’s from mean is < .05 )
For an unbiased estimator with a normal sampling distribution, a p-value of less than
.05 means the estimate was more than TWO standard errors away from the
Suppose we just conducted a statistical test and that we rejected the null hypothesis
at the 5% level. Assuming the assumptions underlying the test were correct (for
example, the data were i.i.d.), one way to interpret the phrase “at the 5% level” is as
the probability we were wrong: that is, we were willing to admit there is a 5%
probability that we rejected the null when it was actually true.
(The “level” of a test is often referred to as the probability of a
“Type I error”; that is, rejecting the null when it is actually true.
Think of it this way, we said we’d reject at the 5% level if |z|>2,
but there IS a 5% probability that a normal r.v. could be more
than 2 s.d.’s away from its mean by pure chance!)
Capital punishment (the practice of executing people convicted of crimes, usually
murder) is highly controversial but still practiced in most of the United States. We are
interested in investigating the relationship between capital punishment and violent crime.
Suppose we have the following data:
mrdratei = Murders per 100,000 population during a particular year
in a given state
execi = Number of executions performed in the state in that year.
unempi = Percentage unemployment rate in the state during that year
We have data for 50 states plus the District of Columbia in three years (1987, 1990, and
1993), for a total of 153 observations. The following table shows regression results from
Results of multiple regression for mrdrate
Multiple R ??
Adj R-Square 0.05
StErr of Est 8.96
Source df SS MS F p-value
Explained 2 799.8 399.9 4.9798 0.0081
Unexplained 150 12045.5 80.3
Coefficient Std Err
Constant 0.35 2.69
exec 0.17 0.19
unemp 1.26 0.44
(a) Using the regression results above, test the null hypothesis that “controlling for
state-wide economic conditions, the presence of capital punishment has no
impact on the murder rate”. Do you reject the null hypothesis at the 5% level?
The statement can be translated as Ho: β1 = 0.
So we are comparing an estimate, b1=.17, to the hypothesized value, β1o = 0
Therefore z = (.17 – 0)/.19 = .895
FAIL TO REJECT
(b) The p-value associated with your test from part (a) is : Explanation: Since the
z-value is a little less
(i) .954 (ii) .396 (iii) .171 (iv) .032 than one, the p-value
must be a little greater
(c) What is the R-square of this regression?
R2 = (Explained SS)/(Total SS) = Explained/(Explained + Unexplained)
= 799.8 / ( 799.8 + 12045.5 ) = .0622
(d) In a given year, suppose the state of Texas performs 22 executions and
has a statewide unemployment rate of 8.4%. Construct a 95% plug-in
predictive interval for the number of murders per 100,000 population.
a + b1(22) + b2(8.4) +/- 2*se = 14.674 +/- 2*8.96
= ( -3.246 , 32.594 )
When I ran this regression, I also asked StatPro to output columns of Fitted Values
and Residuals. Below are the three observations corresponding to the District of
Columbia in 1987, 1990, and 1993 (‘9’ is the state code for D.C. in this dataset).
state year mrdrate exec unemp Fitted Values Residuals
9 87 36.2 0 6.3
9 90 77.8 0 6.6
9 93 78.5 0 8.5 ?? ??
(e) For the 1993 observation (last row), what numbers should appear in the “Fitted
Values” and “Residuals” columns?
= 78.5 – 11.06
=.35+.17(0) + 1.26(8.5) (Fitted value) (Residual)
(f) Our usual assumption for regression models is that the errors satisfy
εi ~ iid N( 0 ,σ2 )
If we believe this assumption here, about how many standard deviations
is the residual you calculated in part (e) away from its mean?
Residuals have mean zero, and se is our estimate of σ, so the above
residual is approximately:
(67.44 – 0)/8.96 = 7.53 s.d.’s above the mean!!
Early in the class we talked about how outliers can affect sample means and
variances. They can also have a HUGE impact in regression analysis! Below is
the scatter plot of murder rates versus unemployment. The three District of
Columbia observations we saw on the previous page are circled.
DC, 1990 Correlation = 0.240
m rd ra te
40 DC, 1987
2 4 6 8 10 12 14
[ Note: If you’ve been listening to me all quarter, this plot would have been one of
the FIRST things you looked at!! ]
The table below shows the same regression, but with the three D.C. observations
omitted from the sample.
Results of multiple regression for mrdrate
Multiple R ??
Adj R-Square 0.2086
StErr of Est ??
Source df SS MS F p-value
Explained 2 447.8 223.9 20.634 0.0000
Unexplained 147 1595.0 10.9
Coefficient Std Err
Constant 2.56 0.99
exec 0.30 0.07
unemp 0.67 0.16
(g) What is se for the new regression?
[Note: With the three D.C. observations omitted, there are now 150 observations.]
se = sqrt[ Unexplained SS / (n-k-1) ] = sqrt( 1595 / 147 ) = 3.294
(h) Based on the new results (with D.C. omitted), construct a 95%
confidence interval for β1, the coefficient on exec in this multiple
b1 +/- 2*se(b1) = .30 +/- 2*(.07) = ( .16, .44 )
Note: Recognize that zero is OUTSIDE this 95% CI, so if we did the hypothesis test
from part (a) again, we would REJECT the null that executions have no relationship
with murder rates.
(i) Now compare two sets of regression results. Remember, the only difference
is that the three District of Columbia observations were included in the table
at the beginning of the problem, while the regression on the previous page
has the D.C. observations omitted.
Suppose you went back and re-did part (a) after throwing out the D.C.
observations. How would your answer change?
In particular, how does your conclusion about the relationship between
murder rates and capital punishment change when these three data points
are excluded? Why does this happen? Briefly explain.
Your answer to (i) is worth 3 points.
• If you throw out the D.C. observations and re-run the regression,
you would now conclude the effect of capital punishment is
statistically significant. Why?...
• Notice that D.C. has ZERO executions (no capital punishment).
D.C. also has an insanely LARGE murder rate in all three years,
particularly ’90 and ’93 (see part f).
• Therefore, when you discard the D.C. observations, the
relationship between executions and murder rate looks MORE
POSITIVE, (and in this case also turns out to be statistically
[1 point for each statement]
Note: BE VERY CAREFUL how you interpret this regression. First off, obviously the DC observations are
outliers. In general, there is no clear cut answer as to whether you should throw them out, but either way
you must be aware of the influence they have on your results (this is why I teach you to LOOK AT YOUR
DATA)!! It’s also misleading to say in part (j) that the model “fits better”, because obviously throwing out
extreme points will make your model look like a better fit!
Also, remember that correlation is not causation: even if you throw DC out, you should NOT say “capital
punishment causes murders”. It could be the case that, over time, certain states had higher violent crime
rates to begin with, and adopted capital punishment as a way to address the problem.
(j) How do R-square and se change when we throw out the D.C. observations,
and why? Briefly explain.
se is the sample standard deviation of the residuals. When you throw
out the three large residuals, se decreases.
R-square is (Explained SS)/(Explained SS + Unexplained SS). When
you throw out the three large residuals, “Unexplained SS” goes down,
and thus R-square increases.
[1 point each; good intuitive explanations are ok.]
Question 7 (Simpson’s Paradox)
Suppose a certain university has two programs: Engineering, and Arts & Sciences.
Students who wish to attend the university must choose which program to apply to
(they cannot apply to both programs). They then are either accepted or rejected.
For each applicant, define the following random variables:
E = 1 if the person applied to Engineering and
0 if they applied to Arts & Sciences
A = 1 if the person is accepted and 0 if they are rejected
G = 1 if the applicant is female, and 0 if male
The admissions office has supplied us with some data, which we have used to
construct the following probability model.
For female applicants (G=1), For male applicants (G=0),
we have: we have:
0 1 0 1
0 .08 .48 0 .32 .12
1 .12 .32 1 .48 .08
Half of all applicants are women, p( G = 1 ) = .5
(a) Without knowing what program she applied to, what is the probability that
a female applicant is accepted, p( A=1 | G=1 )? [Hint: The table on the
left gives you joint probabilities for E and A given that G=1.]
.44 (the marginal prob. of A=1 from the left-hand table)
(b) For a male applicant (G=0), without knowing which program he applied to,
what is the distribution of A?
Bernoulli(.56) (the marginal prob. of A=1 from the right-hand table
is .56; you could also have written out values (0,1) and probabilities.)
A recent study published in a major news magazine found that male applicants are more
likely to be accepted at this university than female applicants. This results in some very
unpleasant publicity for the university.
(c) Based only on your answers to (a) and (b), could this study be correct?
Yes, it looks like female applicants have a lower probability of being
accepted than males.
The deans of the two programs consult with each other. Each assures the other that a
female applicant is just as likely to be accepted as male. The also both feel that the
quality of female and male applicants is comparable. They think something must be
wrong with the study.
(d) Given that an applicant is female and applied to the engineering program, what
is the probability she is accepted, p( A=1 | E=1, G=1 ) ?
.32 / (.32 + .48) = .4
(This is just p(A=1|E=1) from the left-hand table)
(e) Given that an applicant is male and applied to the engineering program, what
is the probability he is accepted, p( A=1 | E=1, G=0 ) ?
.08/(.08 + .12) = .4
(This is just p(A=1|E=1) from the right-hand table)
(f) Without knowing the gender of an applicant, what is the probability s/he is
accepted into the engineering program?
(You don’t have to do any math, since if the conditional probabilities
are equal for both genders, the marginal must be the same.)
(g) For the arts & sciences program, does the probability of being accepted depend
on whether an applicant is male or female?
(That is, are p( A=1 | E=0, G=1 ) and p( A=1 | E=0, G=0 ) the same? )
No. Similar to parts (d) and (e):
For women, .12/(.12 + .08) = .6
For men, .48/(.48 + .32) = .6
(h) Given that an applicant is female, what is the probability she applies to the
engineering program? [1 point]
.8 (This is just the marginal probability of E=1 from the left-hand table)
(i) Given that an applicant is male, what is the probability he applies to the
engineering program? [1 point]
.2 (This is just the marginal probability of E=1 from the right-hand table)
(j) Does this university discriminate against women? Explain. [3 points]
No. Even though we observe in part (c) that female applicants are less
likely to be admitted (if we don’t control for which program they applied to),
we saw in parts (d)-(g) that each program is actually equally likely to admit
male and female applicants!
The reason that female applicants look less likely to be admitted is that,
according to parts (h)-(i), a higher fraction of female applicants choose to
apply to the Engineering program, which is less likely to accept applicants
of both genders. Once we look at department-level admissions, we find that
neither department discriminates.
(1 point for saying “No” because each department is equally likely to admit
males versus females; 2 points for saying that the difference arises because
more female applicants choose to apply to the department that’s harder to
Question 8 (Test-taking tip: Do this question LAST.)
Suppose we are estimating the simple linear regression model:
Yi = α + β xi + ε i
Assume that the errors are distributed εi ~ iid N(0,σ2)
Suppose we are going to estimate this model using TWO observations. We have
two KNOWN x-values, x1 and x2, and we are about to observe two Y-values, Y1
and Y2. So based on what we know now, the x-values are known constants, and
the Y-values are random variables.
Y2 − Y1
x2 − x1
We are thinking about using B as an estimator of the slope, β. We are interested in
asking, what is the sampling distribution of this estimator?
Suppose I make the following claim:
“B is an unbiased estimator of β with a normal sampling distribution.”
(a) If my claim is true, what is P( B > β ) ? [Hint: It may help you to draw a picture!]
“Unbiased estimator with a normal sampling distribution” means that
E(B=β) and that B is normal — in other words the different values we could
see for our estimator B look like a bell curve centered above β. And the
probability a normal r.v. is bigger than its mean is .5!
(b) Let’s say you knew that Var(B) = σΒ2 , where σΒ2 is some number. Assuming my
claim is true and you know what σΒ2 is, construct a 95% confidence interval for β.
B +/- 2*sqrt(σΒ2) , or equivalently B +/- 2* σΒ
Note: I do not expect most b-stats students (even “A” students) to be able to do parts (c)-(e) in a timed
exam situation. However, notice you can get (a) and (b) just by knowing what it means for an estimator
to be unbiased, the definition of a sampling distribution, and how the sampling distribution is used to
build a confidence interval!
Now let’s see if we can verify my claim about B (you shouldn’t just believe
everything somebody tells you about a strange estimator!!).
[Hint: The rest of this question is actually much easier if you do a little algebra up front.
I’ll help get you started. Since B is an estimator of β, see if you can rewrite B as
B = β + “error”
where “error” depends on ε1, ε2, x1, and x2. The easiest way to do this is to start with
the formula above for B, and substitute α + βx1 + ε1 in for Y1 and α + βx2 + ε2 in for Y2.
(α + β x2 + ε 2 ) − (α + β x1 + ε 1 )
x2 − x1
Now see if you can cancel some terms and get β by itself. Also remember, when you’re
doing the problems below, x1 and x2 are known constants, while ε1 and ε2 are iid N(0,σ2)
random variables. ]
(c) What is E(B)?
(α + β x2 + ε 2 ) − (α + β x1 + ε 1 ) β x2 + ε 2 − β x1 − ε 1
x2 − x1 x2 − x1
β ( x2 − x1 ) + ε 2 − ε 1 ( x2 − x1 ) ε 2 − ε 1 ε − ε1
= =β + =β+ 2
x2 − x1 x2 − x1 x2 − x1 x2 − x1
ε 2 − ε1
Therefore, E (B) = β + E ( ) =β Note: This
x2 − x1 verifies that “B is
(d) What is Var(B)?
Using the above expression for B,
⎛ ε 2 − ε 1 ⎞ Var (ε 2 − ε 1 ) 2σ 2
Var ( B ) = Var ⎜ β + ⎟ = =
⎝ x2 − x1 ⎠ ( x 2 − x1 ) 2 ( x2 − x1 ) 2
Explanation: Key steps include (i) recognize that β is a constant, so it doesn’t affect
variance; (ii) We know that Var(a*X) = a2Var(X), so 1/(x2-x1) gets squared when you factor
it out; and (iii) Var(ε2 – ε1) = Var(ε2) – Var(ε1) + 2*Cov(ε2,ε1) = σ2 + σ2 = 2σ2 .
(e) Suppose that 0 ≤ xi ≤ 1; that is, both values x1 and x2 must be between zero
and one. However, you get to pick the x-values in advance (then you get a
Y-value for each x, and plug them into our estimator, B).
If your goal is to estimate β as accurately as possible, what values should
you choose for x1 and x2 ?
Since B is unbiased, “estimate β as accurately as possible” means we
want Var(B) to be as small as possible.
Therefore, from part (d), since (x2-x1)2 shows up in the denominator of
the variance, we want x2 and x1 as far apart as we can get. So you
should choose x1=0 and x2=1 (or x1=1, x2=0).
(Of course, we’d also like σ2 to be as small as possible, but that’s usually not something we
have control over!)
(f) Would my claim still be correct if we were NOT willing to assume the errors εi
were normally distributed? Briefly explain.
[Note: It is possible to get full credit for part (f) even if you don’t get parts (c)
through (e). ]
No. E(B) and Var(B) would still be the same (in particular, B would still be
unbiased), but the sampling distribution would NOT be normal. When the
errors are normal, B is normal because it is a linear combination of two
normal r.v’s ε1 and ε2. If the errors aren’t normal, B will not be normal (the
Central Limit Theorem won’t save us here, because there are only TWO