# STAT 210 Study Guide for Exam _2 by fjzhangxiaoquan

VIEWS: 7 PAGES: 9

• pg 1
```									                       STAT 210: Study Guide for Exam #2

The following are relevant question for Exam #2.

1. A vaccine was designed to eliminate a particular strain of the HIV virus, called the “MN
strain.” A study consisted of 7 AIDS patients vaccinated with the new drug and 31 AIDS
patients who were treated with a placebo (no vaccination). The table below shows the
number of patients who tested positive and negative for the MN strain during the follow-up
period of the study.

Research Question: Is the new vaccine effective in preventing the “MN strain” of the
HIV virus?

a. We have statistical evidence (using the usual error rate of 0.05) that the proportion of
patients who test positive for the MN strain is greater for the unvaccinated group. (3
pts)
TRUE               FALSE

b. Consider your answer from part a. The relative risk value from this analysis is likely
to be near 1, say between 0.95 and 1.05. (3 pts)

TRUE               FALSE

c. Create a rough sketch of the mosaic plot for this data. (4 pts)

1
I live in Rushford and drive to Winona each day to get to work. It takes about 30 minutes
to get to work, but the actual time varies due to: when I leave the house, conditions of the
road, road construction, whether or not I have to wait for train, availability of parking
spots, etc. (4 pts each)

2. In terms of my driving, which of the following would influence only the center (or location)
of the distribution for “time it takes to get to work”.

a. Drive faster on some days and drive slower on other days.
b. Drive faster everyday or drive slower everyday.
c. Changing your speed will not change the variability of this distribution.

3. In terms of my driving, which of the following would influence only the variability (or
spread) of the distribution for “time it takes to get to work”.

a. Drive faster on some days and drive slower on other days.
b. Drive faster everyday or drive slower everyday.
c. Changing your speed will not change the center of this distribution.

2
4. Understanding variation in data without numbers

Circle the most correct answer for each.
a.     Ranges…

i.   The range of each dataset is the same.
ii.   The range of Dataset E is the largest.
iii.   The range will help differentiate the amount of spread in these datasets.

b.     Standard Deviation, Take I…

i.   The standard deviation of Datasets B and D are the same.
ii.   The standard deviation of Datasets B and C are the same.
iii.   The standard deviation of Datasets A and E are the same.

c.     Standard Deviation, Take II…

i.   Dataset D has the smallest standard deviation.
ii.   Dataset D has the largest standard deviation.
iii.   Dataset D has neither the smallest or largest standard deviation.

d.     Last One…

i.   Datasets B and D will have the same standard deviation because each has
three data values at the mean.
ii.   Datasets B will have a smaller standard deviation than Dataset A because
it has more data values at the mean.
iii.   Datasets A and E will have the smallest standard deviation because they
are the most equally (or uniformly) spread out.

3
5. Consider the following research question.

Research Question: Does age have an influence on the type of cell phone usage of drivers
involved in car accident?

Answer the following using the above JMP output.

a.     What is the p-value for this test? ___________

b.     Which of the following is the best conclusion for this research questions.
a. The data supports the research question because the p-value is less than 0.05.
b. We have evidence to suggest that Age Group influences the type of cell phone
usage of drivers involved in a car accident because the p-value is less than
0.05.
c. We are 95% certain, that Age Group influences the type of cell phone usage
of drivers involved in a car accident (p-value < 0.0001).
d. The patterns in the graph are different which implies that Age Group
influences cell phone usage.

4
c.     Sketch a different mosaic plot that would provide even more evidence that Age
Group influences cell phone usage. Sketch you graph carefully and using the
same color scheme as above (Text = Black, Talk = White, and None=Gray). (4
pts)

Easter has recently passed and as we all know after Easter is a prime time to buy discounted
candy. Suppose you wanted to know the true average number of M&M in the 1.69 oz snack
sized bags. You go down to the store the day after Halloween and go a little crazy and buy
several packages so that you end up with 50 snack sized bags. You construct a 95% confidence
interval for the average number of M&M’s in a 1.69 oz snack sized bag and find that your
interval goes from 25 up to 37.
6. Suppose your roommate does the same exact thing you’ve done here and computes a 95%
confidence interval using their 50 snack size bags. Which of the following is the most
correct statement?

a. The 95% confidence interval will be exactly the same as mine, 25 up to 37, 100% of
the time.
b. The 95% confidence interval will be exactly the same as mine, 25 up to 37, about
95% of the time.
c. The 95% confidence interval should be close to mine.

5
7. Which of the following is an expected change in your interval if you doubled the number of
bags bought?

a. The lower and upper endpoints of the confidence interval should stay the same as our
M&M’s all came from the same manufacturer.
b. The width of the confidence interval will be reduced which will allow you to have a
better idea about the average number of M&M’s in a bag.
c. Your interval endpoints will become less useful because you are considering twice as
many bags.

8. Your roommate is really into statistics and asks how buying of 100 instead of 50 bags will
change the variability in the observed sample mean. Having nearly completed Stat 110 you
confidently respond with the one of the following. Which is most correct?

a. “Buying twice as many bags will decrease the variability in observed sample mean
because the average is based on twice as much information.”
b. “Buying twice as many bags will increase the variability of the observed sample
mean, but this will give a better approximation to the true average.”
c. “The variability of the observed sample mean is not influenced by the number of bags
we consider because M&M’s are massed produced from a set population (i.e. the
M&M factory).”

6
9. Scandal & Statistics
At times there is controversy around the scoring of competitions in the Olympic Games.
Consider the following situation surrounding pairs figure skating from a past Olympic
Games. The controversy here is centered on a judge from France and the possible bias she
had in her judging of the competition.

Suppose we have two judges, Judge 1 that is thought not to be bias (i.e. fair) and Judge 2
whose scoring is being questioned. To investigate, I collected the judges’ combined score
(Technical Merit + Presentation) for the Free Skate portion of the competition. The results
are given in the following table.

a. Which of the following is most true?

i.   You cannot determine whether or not Judge 2 is biased because you are only
comparing this judge to one other fair judge. You must have several fair
judges to make this study statistically valid.
ii.   The best approach for an analysis to determine possible bias is to summarize
the Judge 1 scores (i.e. get the mean, median, standard deviation, graph for
Judge 1 scores) and compare them against the analogous summary statistics
from Judge 2.
iii.   The column of differences should be used in this analysis because this ensures
comparisons are being made within each skating pair which in turn allows a
statistical analysis to detect possible bias more concretely.
iv.    It is statistically impossible to determine fairness in scoring because different
judges have different expectations.

7
b. Suppose that one of the most controversial scores was the score given to USA from
Judge 2. Judge 1 gave the USA a score of 10.3; whereas, Judge 2 gave a score of 9.9,
which is a difference of 0.4 Consider the following Z-score calculation for the USA,
the Mean Difference is -0.015 and the standard deviation of the differences is 0.2739.

Data Point  Mean 0.4  0.015
Z  Score                                  1.41
Standard Deviation   0.2739

Which of the following is most true?

i.   This Z-score is positive which suggests that Judge 2 is bias against the USA.
ii.   This Z-score suggests that the score given by Judge 2 is likely bias against the
USA.
iii.   This Z-score suggest that the score given be Judge 2 is likely not bias against
the USA.
iv.    This Z-score uses the average difference and standard deviation of the
differences which is incorrect; the average and standard deviation should be
computed from only the Judge 2 scores.

c. One confounding factor in determining possible biases in scoring is that judges could
have different expectations. That is, Judge 2 could have consistently lower marks
than Judge 1 without being bias (i.e. Judge 2 could just have higher expectations than
Judge 1). Consider the following research question.

Research Question: Do these two judges have, on average, differences in their
expectations on their combined scores?

Answer the following TRUE/FALSE questions assuming the p-value from our
analysis is 0.40 and that a 5% error rate is being used. (2 pts each)

i.   The p-value is less than 0.50, thus we have enough statistical evidence to
conclude these two judges have different expectations in their judging.
TRUE                          FALSE

ii.   The p-value is greater than 0.05; thus, we lack statistical evidence to say that
these two judges have different expectations in their judging.

TRUE                           FALSE

8
iii.   The average difference for our data is different than 0; thus, these two judges
have differences in their expectations in their judging.

TRUE                           FALSE

iv.    The p-value is greater than 0.05; thus, we lack statistical evidence to say that
any two judges will have different expectations in their judging.

TRUE                           FALSE

10. A research article reports the results of a new drug test. The drug is to be used to decrease
vision loss in people with Macular Degeneration. The article gives a p-value of 0.04 in the
analysis section. Indicate whether or not each interpretation is valid or invalid.

a. The probability of getting results as extreme as or more extreme than the
outcomes observed in this study under the assumption that the drug is not
effective (i.e. has no impact).

Valid           Invalid
b. The probability of getting results as extreme as or more extreme than the
outcomes observed in this study under the assumption that the drug is effective in
decreasing vision loss.

Valid           Invalid
c. The probability that the drug is not effective (i.e. has no impact).

Valid           Invalid
d. The probability that the drug is effective in decreasing vision loss.

Valid           Invalid
e. The probability that the outcomes of this study will be vary from sample-to-
sample.

Valid           Invalid

9

```
To top