Review for Exam

Document Sample
Review for Exam Powered By Docstoc
					Final Exam & Review Parts I & II

                             Samuel Clark

       Department of Sociology, University of Washington
        Institute of Behavioral Science, University of Colorado at Boulder
      Agincourt Health and Population Unit, University of the Witwatersrand
                            Final Exam
 Tuesday June 6, 10:30-12:20 in Savery 239 (same as class)
 In-class, closed book
 Covers chapters 1-15
 About 50 multiple choice questions - similar to quizzes and


 To review you should
     – Carefully read the review sections pages 169-171, 334-337
     – Work through the review exercises on pages 171-175, 337-345
2009-05-29                                                           1
             PART I

2009-05-29            2
The first and most important question:
 “Where do the data come from?”
     – See chapter 1
A key part of the answer is the distinction between
 observational and experimental data
Good statistics starts with good designs for producing
Sampling is choosing part of the population to represent
 the whole population:
     – See chapters 2-4

2009-05-29                                                  3
2009-05-29   4
Experiments are studies that impose some treatment in
 order to observe and learn about the response
     – See chapters 5-6
The Big Idea is the randomized comparative experiment

2009-05-29                                               5
Random sampling and randomized comparative
 experiments are two of the most important statistical
 inventions of the 20th century

Both random sampling and randomized comparative
 experiments involve the deliberate use of chance to
 eliminate bias and produce a regular pattern of

The regular pattern allows us to:
     – Give margins of error
     – Make confidence statements
     – Assess statistical significance

2009-05-29                                               6
When we collect data on human beings ethical issues
 become important
     – See chapter 7
After knowing where the data come from, the next big
 question is:
 “Do the numbers make sense?”
     – Measurement is very important: validity and reliability
     – See chapter 8
Finally, it is important to evaluate numbers skeptically
 and be sure that they “make sense”
     – See chapter 9

2009-05-29                                                       7

1. Recognize the individuals and variables in a statistical
2. Distinguish observational from experimental studies
3. Identify sample surveys, censuses, and experiments

2009-05-29                                                    8

1. Identify the population in a sampling situation
2. Recognize bias due to voluntary response samples and
   other inferior sampling methods
3. Use Table A of random digits to select a simple random
   sample (SRS) from a population
4. Explain how sample surveys deal with bias and
   variability in their conclusions. Explain in simple
   language what the margin of error for a sample survey
   result tells us and what “95% confidence” means.
5. Use the quick method to get an approximate margin of
   error for 95% confidence

2009-05-29                                              9
                    SAMPLING …

6. Understand the distinction between sampling errors
   and nonsampling errors. Recognize the presence of
   undercoverage and nonresponse as sources of error in
   a sample survey. Recognize the effect of the wording
   of questions on the responses.
7. Use random digits to select a stratified random sample
   from a population when the strata are identified

2009-05-29                                              10

1. Identify the explanatory variables, treatments, response
   variables, and the subjects in an experiment
2. Recognize bias due to confounding of explanatory
   variables with lurking variables in either an
   observational study r an experiment
3. Outline the design of a completely randomized
   experiments using a diagram (similar to above). Such
   a diagram should show the sizes of the groups, the
   specific treatments, and the response variable.

2009-05-29                                               11
                    Experiments …

4. Use Table A of random digits to carry out the random
   assignment of subjects to groups in a completely
   randomized experiment
5. Make use of matched pairs or block designs when
6. Recognize the placebo effect. Recognize when the
   double-blind technique should be used. Be aware of
   weaknesses in an experiment, especially in the ability
   to generalize its conclusions.
7. Explain why a randomized comparative experiment can
   give good evidence for cause-and-effect relationships
8. Explain the meaning of statistical significance
2009-05-29                                              12
                    OTHER TOPICS

1. Explain the three first principles of data ethics. Discuss
   how they might apply in specific settings.
2. Explain how measuring leads to clearly defined
   variables in specific settings
3. Evaluate the validity of a variable as a measure of a
   given characteristic, including predictive validity
4. Explain how to reduce bias and improve reliability in
5. Recognize inconsistent numbers, implausible numbers,
   numbers so good they are suspicious, and arithmetic
6. Calculate percent increase and decrease correctly
2009-05-29                                                  13
             PART II

2009-05-29             14
 Data analysis is the art of describing data with graphs
  and numerical summaries
     – Chapter 10 presented basic graphs – pie charts, bar
       charts etc.
 Chapters 11 – 13 presented the idea of distributions to
  help us describe a variable
 The steps are:
     1. Plot the data using a graph of some kind, and think about
        what you see
     2. Interpret what you see:
             –   Shape
             –   Center
             –   Spread
             –   outliers

2009-05-29                                                     15
     – Possibly create a numerical summary
              Five-number summary or
              Mean and standard deviation
     – Possibly define a compact model such as the normal

2009-05-29                                                  16
Chapters 14 and 15 apply the same ideas to the
 relationships that may exist between two variables

2009-05-29                                            17
Relationships often raise the concept of causation; do
 changes in one variable cause changes in the other?
We know that randomized, controlled experiments are
 the gold standard for evidence that changes in one
 variable cause changes in another
Chapter 15 revealed how even strong associations
 between two variables can be observed even when
 there is no causation; no causal relationship between
 the variables
     – Remember lurking variables !!

2009-05-29                                                18
                Displaying distributions

1. Recognize categorical and quantitative variables
2. Recognize when a pie chart can and cannot be used
3. Make a bar graph of the distribution of a categorical
   variable, or in general to compare related quantities
4. Interpret pie charts and bar graphs
5. Make a line graph of a quantitative variable over time
6. Recognize patterns such as trends and seasonal
   variation in line graphs
7. Be aware of graphical abuses, especially pictograms
   and distorted scales in line graphs

2009-05-29                                                  19
              Displaying distributions …

8. Make a histogram of the distribution of a quantitative
9. Make a stemplot of the distribution of a small set of
   observations. Round data as needed to make an
   effective stemplot.

2009-05-29                                                  20
     Describing distributions (quantitative variable)

1. Look for the overall pattern of a histogram or stemplot
   and for major deviations from the pattern
2. Assess from a histogram or stemplot whether the
   shape of a distribution is roughly symmetric, distinctly
   skewed, or neither. Assess whether the distribution
   has one or more major peaks.
3. Describe the overall pattern by giving numerical
   measures of the center and spread in addition to a
   verbal description of shape

2009-05-29                                                    21
  Describing distributions (quantitative variable) …

4. Decide which measures of center and spread are more
   appropriate: the mean and standard deviation
   (especially for symmetric distributions) or the five-
   number summary (especially for skewed distributions)
5. Recognize outliers and give plausible explanations for

2009-05-29                                              22
             Numerical summaries of distributions

1. Find the median M and the quartiles Q1 and Q3 for a set
   of observations
2. Give the five-number summary and draw a boxplot;
   assess center, spread, symmetry, and skewness from a
3. Find the mean x-bar and (using a calculator) the
   standard deviation s for a small set of observations
4. Understand that the median is less affected by extreme
   observations than the mean. Recognize that skewness
   in a distribution moves the mean away from the median
   toward the long tail.

2009-05-29                                              23
             Numerical summaries of distributions …

5. Know the basic properties of the standard deviation: s
   >= 0 always; s = 0 only when all observations are
   identical and increases as the spread increases; s has
   the same units as the original measurements; s is
   greatly increased by outliers or skewness

2009-05-29                                                  24
                  Normal distributions

1. Interpret a density curve as a description of the
   distribution of a quantitative variable
2. Recognize the shape of normal curves, and estimate
   by eye both the mean and the standard deviation from
   such a curve
3. Use the 68-95-99.7 rule and symmetry to state what
   percentage of the observations from a normal
   distribution fall between two points when the points lie
   at the mean or one, two, or three standard deviations
   on either side of the mean

2009-05-29                                                    25
                Normal distributions …

4. Find and interpret the standard score of an observation
5. Use Table B to find the percentile of a value from any
   normal distribution and the value that corresponds to a
   given percentile

2009-05-29                                               26
              Scatterplots and correlation

Make a scatterplot to display the relationship between
 two quantitative variables measured on the same
 subjects. Place the explanatory variable (if any) on the
 horizontal scale of the plot
Describe the form, direction, and strength of the overall
 pattern of a scatterplot. In particular, recognize positive
 or negative association and straight-line patterns.
 Recognize outliers in a scatterplot.
Judge whether it is appropriate to use correlation to
 describe the relationship between two quantitative
 variables. Use a calculator to find the correlation r.

2009-05-29                                                     27
             Scatterplots and correlation …

4. Know the basic properties of correlation: r measures
   the strength and direction of only straight-line
   relationships; r is always a number between -1 and 1;
   r = +/-1 only for perfect straight-line relations; r moves
   away from 0 toward +/- 1 as the straight-line relation
   gets stronger

2009-05-29                                                      28
                    Regression lines

1. Explain what the slope b and the intercept a mean in
   the equation y = a + bx of a straight line
2. Draw a graph of the straight line when you are given its
3. Use a regression line, given on a graph or as an
   equation, to predict y for a given x. Recognize the
   danger of prediction outside the range of the available
4. Use r2, the square of the correlation, to describe how
   much of the variation in one variable can be accounted
   for by a straight-line relationship with another variable

2009-05-29                                                 29
                Statistics and causation

1. Give plausible explanations for an observed
   association between two variables: direct cause and
   effect, the influence of lurking variables, or both
2. Assess the strength of statistical evidence for a claim of
   causation, especially when experiments are not

2009-05-29                                                 30

Shared By: