VIEWS: 7 PAGES: 9 POSTED ON: 7/7/2011
STAT 210: Study Guide for Exam #2 The following are relevant question for Exam #2. 1. A vaccine was designed to eliminate a particular strain of the HIV virus, called the “MN strain.” A study consisted of 7 AIDS patients vaccinated with the new drug and 31 AIDS patients who were treated with a placebo (no vaccination). The table below shows the number of patients who tested positive and negative for the MN strain during the follow-up period of the study. Research Question: Is the new vaccine effective in preventing the “MN strain” of the HIV virus? a. We have statistical evidence (using the usual error rate of 0.05) that the proportion of patients who test positive for the MN strain is greater for the unvaccinated group. (3 pts) TRUE FALSE b. Consider your answer from part a. The relative risk value from this analysis is likely to be near 1, say between 0.95 and 1.05. (3 pts) TRUE FALSE c. Create a rough sketch of the mosaic plot for this data. (4 pts) 1 I live in Rushford and drive to Winona each day to get to work. It takes about 30 minutes to get to work, but the actual time varies due to: when I leave the house, conditions of the road, road construction, whether or not I have to wait for train, availability of parking spots, etc. (4 pts each) 2. In terms of my driving, which of the following would influence only the center (or location) of the distribution for “time it takes to get to work”. a. Drive faster on some days and drive slower on other days. b. Drive faster everyday or drive slower everyday. c. Changing your speed will not change the variability of this distribution. 3. In terms of my driving, which of the following would influence only the variability (or spread) of the distribution for “time it takes to get to work”. a. Drive faster on some days and drive slower on other days. b. Drive faster everyday or drive slower everyday. c. Changing your speed will not change the center of this distribution. 2 4. Understanding variation in data without numbers Circle the most correct answer for each. a. Ranges… i. The range of each dataset is the same. ii. The range of Dataset E is the largest. iii. The range will help differentiate the amount of spread in these datasets. b. Standard Deviation, Take I… i. The standard deviation of Datasets B and D are the same. ii. The standard deviation of Datasets B and C are the same. iii. The standard deviation of Datasets A and E are the same. c. Standard Deviation, Take II… i. Dataset D has the smallest standard deviation. ii. Dataset D has the largest standard deviation. iii. Dataset D has neither the smallest or largest standard deviation. d. Last One… i. Datasets B and D will have the same standard deviation because each has three data values at the mean. ii. Datasets B will have a smaller standard deviation than Dataset A because it has more data values at the mean. iii. Datasets A and E will have the smallest standard deviation because they are the most equally (or uniformly) spread out. 3 5. Consider the following research question. Research Question: Does age have an influence on the type of cell phone usage of drivers involved in car accident? Answer the following using the above JMP output. a. What is the p-value for this test? ___________ b. Which of the following is the best conclusion for this research questions. a. The data supports the research question because the p-value is less than 0.05. b. We have evidence to suggest that Age Group influences the type of cell phone usage of drivers involved in a car accident because the p-value is less than 0.05. c. We are 95% certain, that Age Group influences the type of cell phone usage of drivers involved in a car accident (p-value < 0.0001). d. The patterns in the graph are different which implies that Age Group influences cell phone usage. 4 c. Sketch a different mosaic plot that would provide even more evidence that Age Group influences cell phone usage. Sketch you graph carefully and using the same color scheme as above (Text = Black, Talk = White, and None=Gray). (4 pts) Easter has recently passed and as we all know after Easter is a prime time to buy discounted candy. Suppose you wanted to know the true average number of M&M in the 1.69 oz snack sized bags. You go down to the store the day after Halloween and go a little crazy and buy several packages so that you end up with 50 snack sized bags. You construct a 95% confidence interval for the average number of M&M’s in a 1.69 oz snack sized bag and find that your interval goes from 25 up to 37. 6. Suppose your roommate does the same exact thing you’ve done here and computes a 95% confidence interval using their 50 snack size bags. Which of the following is the most correct statement? a. The 95% confidence interval will be exactly the same as mine, 25 up to 37, 100% of the time. b. The 95% confidence interval will be exactly the same as mine, 25 up to 37, about 95% of the time. c. The 95% confidence interval should be close to mine. 5 7. Which of the following is an expected change in your interval if you doubled the number of bags bought? a. The lower and upper endpoints of the confidence interval should stay the same as our M&M’s all came from the same manufacturer. b. The width of the confidence interval will be reduced which will allow you to have a better idea about the average number of M&M’s in a bag. c. Your interval endpoints will become less useful because you are considering twice as many bags. 8. Your roommate is really into statistics and asks how buying of 100 instead of 50 bags will change the variability in the observed sample mean. Having nearly completed Stat 110 you confidently respond with the one of the following. Which is most correct? a. “Buying twice as many bags will decrease the variability in observed sample mean because the average is based on twice as much information.” b. “Buying twice as many bags will increase the variability of the observed sample mean, but this will give a better approximation to the true average.” c. “The variability of the observed sample mean is not influenced by the number of bags we consider because M&M’s are massed produced from a set population (i.e. the M&M factory).” 6 9. Scandal & Statistics At times there is controversy around the scoring of competitions in the Olympic Games. Consider the following situation surrounding pairs figure skating from a past Olympic Games. The controversy here is centered on a judge from France and the possible bias she had in her judging of the competition. Suppose we have two judges, Judge 1 that is thought not to be bias (i.e. fair) and Judge 2 whose scoring is being questioned. To investigate, I collected the judges’ combined score (Technical Merit + Presentation) for the Free Skate portion of the competition. The results are given in the following table. a. Which of the following is most true? i. You cannot determine whether or not Judge 2 is biased because you are only comparing this judge to one other fair judge. You must have several fair judges to make this study statistically valid. ii. The best approach for an analysis to determine possible bias is to summarize the Judge 1 scores (i.e. get the mean, median, standard deviation, graph for Judge 1 scores) and compare them against the analogous summary statistics from Judge 2. iii. The column of differences should be used in this analysis because this ensures comparisons are being made within each skating pair which in turn allows a statistical analysis to detect possible bias more concretely. iv. It is statistically impossible to determine fairness in scoring because different judges have different expectations. 7 b. Suppose that one of the most controversial scores was the score given to USA from Judge 2. Judge 1 gave the USA a score of 10.3; whereas, Judge 2 gave a score of 9.9, which is a difference of 0.4 Consider the following Z-score calculation for the USA, the Mean Difference is -0.015 and the standard deviation of the differences is 0.2739. Data Point Mean 0.4 0.015 Z Score 1.41 Standard Deviation 0.2739 Which of the following is most true? i. This Z-score is positive which suggests that Judge 2 is bias against the USA. ii. This Z-score suggests that the score given by Judge 2 is likely bias against the USA. iii. This Z-score suggest that the score given be Judge 2 is likely not bias against the USA. iv. This Z-score uses the average difference and standard deviation of the differences which is incorrect; the average and standard deviation should be computed from only the Judge 2 scores. c. One confounding factor in determining possible biases in scoring is that judges could have different expectations. That is, Judge 2 could have consistently lower marks than Judge 1 without being bias (i.e. Judge 2 could just have higher expectations than Judge 1). Consider the following research question. Research Question: Do these two judges have, on average, differences in their expectations on their combined scores? Answer the following TRUE/FALSE questions assuming the p-value from our analysis is 0.40 and that a 5% error rate is being used. (2 pts each) i. The p-value is less than 0.50, thus we have enough statistical evidence to conclude these two judges have different expectations in their judging. TRUE FALSE ii. The p-value is greater than 0.05; thus, we lack statistical evidence to say that these two judges have different expectations in their judging. TRUE FALSE 8 iii. The average difference for our data is different than 0; thus, these two judges have differences in their expectations in their judging. TRUE FALSE iv. The p-value is greater than 0.05; thus, we lack statistical evidence to say that any two judges will have different expectations in their judging. TRUE FALSE 10. A research article reports the results of a new drug test. The drug is to be used to decrease vision loss in people with Macular Degeneration. The article gives a p-value of 0.04 in the analysis section. Indicate whether or not each interpretation is valid or invalid. a. The probability of getting results as extreme as or more extreme than the outcomes observed in this study under the assumption that the drug is not effective (i.e. has no impact). Valid Invalid b. The probability of getting results as extreme as or more extreme than the outcomes observed in this study under the assumption that the drug is effective in decreasing vision loss. Valid Invalid c. The probability that the drug is not effective (i.e. has no impact). Valid Invalid d. The probability that the drug is effective in decreasing vision loss. Valid Invalid e. The probability that the outcomes of this study will be vary from sample-to- sample. Valid Invalid 9