STAT 210 Study Guide for Exam _2 by fjzhangxiaoquan

VIEWS: 7 PAGES: 9

									                       STAT 210: Study Guide for Exam #2

The following are relevant question for Exam #2.

1. A vaccine was designed to eliminate a particular strain of the HIV virus, called the “MN
   strain.” A study consisted of 7 AIDS patients vaccinated with the new drug and 31 AIDS
   patients who were treated with a placebo (no vaccination). The table below shows the
   number of patients who tested positive and negative for the MN strain during the follow-up
   period of the study.




       Research Question: Is the new vaccine effective in preventing the “MN strain” of the
       HIV virus?




       a. We have statistical evidence (using the usual error rate of 0.05) that the proportion of
          patients who test positive for the MN strain is greater for the unvaccinated group. (3
          pts)
                                     TRUE               FALSE

       b. Consider your answer from part a. The relative risk value from this analysis is likely
          to be near 1, say between 0.95 and 1.05. (3 pts)

                                     TRUE               FALSE

       c. Create a rough sketch of the mosaic plot for this data. (4 pts)




                                                                                                   1
       I live in Rushford and drive to Winona each day to get to work. It takes about 30 minutes
       to get to work, but the actual time varies due to: when I leave the house, conditions of the
       road, road construction, whether or not I have to wait for train, availability of parking
       spots, etc. (4 pts each)


2. In terms of my driving, which of the following would influence only the center (or location)
   of the distribution for “time it takes to get to work”.


       a. Drive faster on some days and drive slower on other days.
       b. Drive faster everyday or drive slower everyday.
       c. Changing your speed will not change the variability of this distribution.




3. In terms of my driving, which of the following would influence only the variability (or
   spread) of the distribution for “time it takes to get to work”.


       a. Drive faster on some days and drive slower on other days.
       b. Drive faster everyday or drive slower everyday.
       c. Changing your speed will not change the center of this distribution.




                                                                                                  2
4. Understanding variation in data without numbers




       Circle the most correct answer for each.
       a.     Ranges…

                 i.   The range of each dataset is the same.
                ii.   The range of Dataset E is the largest.
               iii.   The range will help differentiate the amount of spread in these datasets.

       b.     Standard Deviation, Take I…

                 i.   The standard deviation of Datasets B and D are the same.
                ii.   The standard deviation of Datasets B and C are the same.
               iii.   The standard deviation of Datasets A and E are the same.


       c.     Standard Deviation, Take II…

                 i.   Dataset D has the smallest standard deviation.
                ii.   Dataset D has the largest standard deviation.
               iii.   Dataset D has neither the smallest or largest standard deviation.


       d.     Last One…

                 i.   Datasets B and D will have the same standard deviation because each has
                      three data values at the mean.
                ii.   Datasets B will have a smaller standard deviation than Dataset A because
                      it has more data values at the mean.
               iii.   Datasets A and E will have the smallest standard deviation because they
                      are the most equally (or uniformly) spread out.




                                                                                                  3
5. Consider the following research question.

   Research Question: Does age have an influence on the type of cell phone usage of drivers
   involved in car accident?




       Answer the following using the above JMP output.


       a.     What is the p-value for this test? ___________

       b.     Which of the following is the best conclusion for this research questions.
              a. The data supports the research question because the p-value is less than 0.05.
              b. We have evidence to suggest that Age Group influences the type of cell phone
                 usage of drivers involved in a car accident because the p-value is less than
                 0.05.
              c. We are 95% certain, that Age Group influences the type of cell phone usage
                 of drivers involved in a car accident (p-value < 0.0001).
              d. The patterns in the graph are different which implies that Age Group
                 influences cell phone usage.

                                                                                              4
       c.     Sketch a different mosaic plot that would provide even more evidence that Age
              Group influences cell phone usage. Sketch you graph carefully and using the
              same color scheme as above (Text = Black, Talk = White, and None=Gray). (4
              pts)




Easter has recently passed and as we all know after Easter is a prime time to buy discounted
candy. Suppose you wanted to know the true average number of M&M in the 1.69 oz snack
sized bags. You go down to the store the day after Halloween and go a little crazy and buy
several packages so that you end up with 50 snack sized bags. You construct a 95% confidence
interval for the average number of M&M’s in a 1.69 oz snack sized bag and find that your
interval goes from 25 up to 37.
6. Suppose your roommate does the same exact thing you’ve done here and computes a 95%
   confidence interval using their 50 snack size bags. Which of the following is the most
   correct statement?

       a. The 95% confidence interval will be exactly the same as mine, 25 up to 37, 100% of
          the time.
       b. The 95% confidence interval will be exactly the same as mine, 25 up to 37, about
          95% of the time.
       c. The 95% confidence interval should be close to mine.




                                                                                               5
7. Which of the following is an expected change in your interval if you doubled the number of
   bags bought?


       a. The lower and upper endpoints of the confidence interval should stay the same as our
          M&M’s all came from the same manufacturer.
       b. The width of the confidence interval will be reduced which will allow you to have a
          better idea about the average number of M&M’s in a bag.
       c. Your interval endpoints will become less useful because you are considering twice as
          many bags.


8. Your roommate is really into statistics and asks how buying of 100 instead of 50 bags will
   change the variability in the observed sample mean. Having nearly completed Stat 110 you
   confidently respond with the one of the following. Which is most correct?


       a. “Buying twice as many bags will decrease the variability in observed sample mean
          because the average is based on twice as much information.”
       b. “Buying twice as many bags will increase the variability of the observed sample
          mean, but this will give a better approximation to the true average.”
       c. “The variability of the observed sample mean is not influenced by the number of bags
          we consider because M&M’s are massed produced from a set population (i.e. the
          M&M factory).”




                                                                                                6
9. Scandal & Statistics
   At times there is controversy around the scoring of competitions in the Olympic Games.
   Consider the following situation surrounding pairs figure skating from a past Olympic
   Games. The controversy here is centered on a judge from France and the possible bias she
   had in her judging of the competition.

   Suppose we have two judges, Judge 1 that is thought not to be bias (i.e. fair) and Judge 2
   whose scoring is being questioned. To investigate, I collected the judges’ combined score
   (Technical Merit + Presentation) for the Free Skate portion of the competition. The results
   are given in the following table.




       a. Which of the following is most true?


             i.   You cannot determine whether or not Judge 2 is biased because you are only
                  comparing this judge to one other fair judge. You must have several fair
                  judges to make this study statistically valid.
            ii.   The best approach for an analysis to determine possible bias is to summarize
                  the Judge 1 scores (i.e. get the mean, median, standard deviation, graph for
                  Judge 1 scores) and compare them against the analogous summary statistics
                  from Judge 2.
           iii.   The column of differences should be used in this analysis because this ensures
                  comparisons are being made within each skating pair which in turn allows a
                  statistical analysis to detect possible bias more concretely.
           iv.    It is statistically impossible to determine fairness in scoring because different
                  judges have different expectations.




                                                                                                  7
b. Suppose that one of the most controversial scores was the score given to USA from
   Judge 2. Judge 1 gave the USA a score of 10.3; whereas, Judge 2 gave a score of 9.9,
   which is a difference of 0.4 Consider the following Z-score calculation for the USA,
   the Mean Difference is -0.015 and the standard deviation of the differences is 0.2739.



                              Data Point  Mean 0.4  0.015
                Z  Score                                  1.41
                              Standard Deviation   0.2739


   Which of the following is most true?


      i.   This Z-score is positive which suggests that Judge 2 is bias against the USA.
     ii.   This Z-score suggests that the score given by Judge 2 is likely bias against the
           USA.
    iii.   This Z-score suggest that the score given be Judge 2 is likely not bias against
           the USA.
    iv.    This Z-score uses the average difference and standard deviation of the
           differences which is incorrect; the average and standard deviation should be
           computed from only the Judge 2 scores.


c. One confounding factor in determining possible biases in scoring is that judges could
   have different expectations. That is, Judge 2 could have consistently lower marks
   than Judge 1 without being bias (i.e. Judge 2 could just have higher expectations than
   Judge 1). Consider the following research question.

   Research Question: Do these two judges have, on average, differences in their
   expectations on their combined scores?


   Answer the following TRUE/FALSE questions assuming the p-value from our
   analysis is 0.40 and that a 5% error rate is being used. (2 pts each)


      i.   The p-value is less than 0.50, thus we have enough statistical evidence to
           conclude these two judges have different expectations in their judging.
                                          TRUE                          FALSE

     ii.   The p-value is greater than 0.05; thus, we lack statistical evidence to say that
           these two judges have different expectations in their judging.

                                          TRUE                           FALSE


                                                                                              8
             iii.   The average difference for our data is different than 0; thus, these two judges
                    have differences in their expectations in their judging.

                                                   TRUE                           FALSE

             iv.    The p-value is greater than 0.05; thus, we lack statistical evidence to say that
                    any two judges will have different expectations in their judging.

                                                   TRUE                           FALSE



10. A research article reports the results of a new drug test. The drug is to be used to decrease
    vision loss in people with Macular Degeneration. The article gives a p-value of 0.04 in the
    analysis section. Indicate whether or not each interpretation is valid or invalid.

           a. The probability of getting results as extreme as or more extreme than the
              outcomes observed in this study under the assumption that the drug is not
              effective (i.e. has no impact).

                                       Valid           Invalid
           b. The probability of getting results as extreme as or more extreme than the
              outcomes observed in this study under the assumption that the drug is effective in
              decreasing vision loss.

                                       Valid           Invalid
           c. The probability that the drug is not effective (i.e. has no impact).

                                       Valid           Invalid
           d. The probability that the drug is effective in decreasing vision loss.

                                       Valid           Invalid
           e. The probability that the outcomes of this study will be vary from sample-to-
              sample.

                                       Valid           Invalid




                                                                                                       9

								
To top