HSCE: S3.1.1 Know the meanings of a sample from a population and a
census of a population, and distinguish between sample
statistics and population parameters
Clarification statements: Notation often tells whether you are dealing with
population parameters versus sample statistics: population mean μ versus
the sample mean x-bar, population standard deviation σ versus the sample
standard deviation sx, or population proportion p versus the sample
proportion p-hat. Students should learn the different symbols and recognize
the implication of whether they came from a census of a population or a
sample, where data on the whole population is either not available or is
difficult to collect.
An example of an applied situation would be finding what percentage of
1,000 students would be in favor of a change in school policy, either by
conducting a survey of every student’s opinion (census – find p) or
randomly selecting students from the school and surveying them (a sample
– find p-hat).
Clarifying Examples and Activities:
Example 1:
Internet project: Have students find the results of either a survey or a
census on the Internet. (Note: US Census does not actually include every
single individual, and some of the data published is actually the result of a
survey where some individuals complete more detailed questionnaires than
others. However, we call it a census since the vast majority is represented.)
The important point is that population parameters are based on data from
the entire population, and are often difficult to obtain, so samples are used to
provide estimates.
Example 2:
Brief class activity A simple class activity might be to collect information on
the height of the students in the class. The mean and standard deviation for
the whole class could be computed. Then, a smaller random sample could be
created (maybe using the random number table or a random number
generator on a TI calculator), and the sample statistics could be computed
and compared to the population data for the whole class. The variability in
results obtained from sampling could be highlighted by a discussion of what
would happen if specific individuals, such as very tall or very short students,
are included or excluded from the random sample.
HSCE: S3.1.2 Identify possible sources of bias in data collection and
sampling methods and simple experiments; describe how such
bias can be reduced and controlled by random sampling; explain
the impact of such bias on conclusions made from analysis of
the data; and know the effect of replication on the precision of
estimates.
Clarification statements:
NOTE: the information presented here is at a level of detail that
might be found in a statistics class. In selecting the level
appropriate for an Algebra 2 class with time constraints, the most
important topics would focus on sources of bias, random selection,
sample size, and steps to be taken in designing surveys and
experiments.
A biased estimate is one that consistently over- or under-estimates the
underlying population parameter, or “true” value.
Examples of common sources of bias:
1. voluntary response can create bias in surveys because respondents
generally hold strong positive or negative opinions, the group in the
middle may not respond at all (a good example is internet surveys)
2. convenience sampling creates bias because the sample is not likely to
represent the entire population
3. nonresponse bias occurs when a certain part of the population or
sample cannot be contacted or refuses to participate in a survey
4. undercoverage results from leaving out certain groups in creating
samples
Web resource: http://stattrek.com/AP-Statistics-2/Survey-Sampling-
Bias.aspx?Tutorial=AP (also has a sample assessment item)
An overview of random sampling techniques is normally included in any
introductory statistics textbook. A summary of random sampling techniques
can be found at http://stattrek.com/AP-Statistics-2/Survey-Sampling-
Methods.aspx?Tutorial=AP
In experiments, the possibility exists for “experimenter bias” (where results
are either consciously or subconsciously interpreted to favor expected
outcomes). One way to combat this is through the use of double-blind
studies where neither experimenters nor subjects are aware of the
treatment they are receiving (requiring a treatment and a placebo). Also, in
experiments, there may be lurking variables that imply a causal relationship
between the independent and dependent variables when there is actually
another factor, or lurking variable. It is often necessary to “control” for
these lurking variables in constructing the experimental groups, for
example, separating participants by age if that could possibly affect the
outcome. One overview of considerations in designing experiments can be
found at http://stattrek.com/AP-Statistics-2/Experiment.aspx?Tutorial=AP
Replication in an experimental setting generally means assigning the same
treatment to many subjects to reduce variability. Covering the topic at
this level is probably sufficient for an Algebra 2 class, but due to
potential clarification issues, more information on this topic is
included below.
Replication processes are often used to adjust standard errors of estimates,
especially when dealing with survey data. The design of a survey has an
impact on its standard error, especially because of non-response errors or
stratification in the sampling process. One simple example: suppose a
survey is stratified based on gender. This could result in a “design effect” of
a factor greater than 1, implying that you need more data (larger groups of
each gender) to get the same precision you would have obtained by using a
simple random sample with no stratification. Still, this goes back to the
general principle: the more subjects in an experiment or survey, the more
precise the estimate, assuming some type of random selection process has
been used.
* A note on replication in survey data: There are replication processes that
can simulate conducting many similar studies by selecting sub-samples from
the overall sample and analyzing the standard errors. These calculations can
be performed in different ways by statistical analysis software. A number of
concepts are related to this idea, including standard error of estimates and
confidence intervals. For more clarification on this try these links:
http://www.napier.ac.uk/depts/fhls/peas/errors.asp
http://www.westat.com/wesvar/techpapers/ACS-Replication.pdf
Clarifying Examples and Activities:
Have students design an experiment or survey. This would be a good group
activity. They could then either write up or present their survey or
experiment designs, identifying potential problems with data collection
methods or bias and the steps they have taken to address these problems.
HSCE: S3.1.3 Distinguish between an observational study and an
experimental study, and identify, in context, the conclusions
that can be drawn from each.
Include this topic with S3.1.1 and S3.1.2. There are probably results from
many observational studies on the Internet. Students could search for them
and then identify potential issues associated with lack of randomization,
especially where there might be lurking variables.
Observational study is a study that attempts to identify cause and
effect, and the person conducting the study has no control over
treatments or participants. It is different from an experimental study,
where participants and treatments are selected and assigned to
groups. The most important element lacking in an observational study
is randomization.
See this website for further discussion: http://stattrek.com/AP-
Statistics-2/Data-Collection-Methods.aspx?Tutorial=AP
Clarifying Examples and Activities:
Example 1:
Overview: PRINCIPLES OF STUDY DESIGN
There are two purposes for analyzing data: to search for patterns, and to
provide clear answers to specific questions. Exploratory data analysis (EDA)
may be followed by formal statistical inference, which answers specific
questions and provides measures of the uncertainty associated with the
answers.
Applied situations:
http://www.anu.edu.au/nceph/surfstat/surfstat-home/2-1-2.html