# AP_Course_Audit_AP_Statistics by ajizai

VIEWS: 14 PAGES: 57

• pg 1
```									         AP Course Audit: Manlius Pebble Hill (AP Statistics)

We will cover how to use the graphing calculators each time we encounter a feature that the graphing
calculator can accommodate. These features include the following: calculating the mean, calculating the
standard deviation, calculating the median, creating scatter plots, creating box plots, linear regression,
non – linear regression, 1 sample t and z tests, 2 samples t and z tests, z confidence intervals, t
confidence intervals, chi – squared tests for goodness of fit, chi – squared tests for homogeneity and
independence.

Assignments are designed so that each student uses a unique topic or ends up with a unique data set
upon which they do an independent calculation. This fosters independence of thought, confidence and
keeps students honest about what they personally understand.

Each day we cover an AP problem in class from one of the released exams. The main goal is a complete
understanding of the problem and how it relates to the day’s topic.

All graphical displays that a student creates should be done with the help of Excel or other similar
graphical tool.

Projects:

Each student must write an article for the school newspaper. Before they can submit an article they
must design a survey or experiment, create a sampling plan, gather the data, analyze the data and come
to a conclusion given the data set. Then they have to write an article summarizing what they found
along with at least one graphical aid for any reader of that article.

Day      Topic                 Description                  Activity                  Assignment            Textbook
HW
QI           Quarter I: The Data Analysis Process, Collecting Data & Methods for Describing Data
1        Variability in      We will discuss the       We will measure the They will
Inferential         concept of what is a      salinity of normal    measure the
Statistics          ‘typical’ range of values drinking water at     lengths of two
for a measurable          our school and then different types
quantity. Then we will    measure the salinity of leaves from
discuss how we can use of that same water       two different
that range of values and after a ‘toxic spill’. trees. Then
how ‘frequently’ those    Then we will try to   they will try to
measurements occur        determine whether     give a criterion
can help to make a        the drinking water    as to what
decision.                 is contaminated.      range of values
distinguish one
tree from
another.
2   Types of Data             Frequency          We create a survey      They pick two      Sec 1.4 #
and Simple                 Distribution for   that we could use to    of the United      1.9, 1.11,
Graphical                  Categorical        collect frequency       States’ top        1.19
Displays                   Data               information so that     commodities
   Frequency          we can practice         (or from the
   Relative           displaying bar charts   student’s home
Frequency          and dot plots           country) from
   Bar Charts                                 the FAO and
   Dot Plots                                  track the
production and
revenue for the
past 6 years.
Then they
describe what
they found with
a bar chart.
3   Sampling                Why sample?          Designing a survey      Exploring          Sec 2.2 #
Methods and             Sample sizes         to determine how        sampling           2.5, 2.11,
Bias                    Selection bias       many hours students     methods in the     2.13, 2.25
 Measurement          spend on homework       context farming
or response          at our school in the    (plant wilt) and
bias                 upper school. Along     how to get a
 Non-response         with a discussion       good sample
bias                 about how to do the     given some
 Conceptual bias      actual sampling.        uncontrolled
 Simple Random                                variables.
samples
 Stratified
random
sampling
 Cluster
sampling
 Systematic
sampling
 Why not
Convenience
sampling?
 Why not
volunteer
sampling?
 How important
sampling biases
are for
researchers
when designing
experiments.
4   Statistical Studies:    Why do               In groups they will     Given four         Sec 2.3 #
Observation and      statistical      pretend that they          different types    2.27, 2.33,
Experiment           studies?         have just unearthed        of statistical     2.35, 2.37
 The difference    a new archeological        studies, each
between          find. Then they will       student must
observational    try to list the types of   determine the
studies and      things they may            goal of the
experiments.     want to learn from         study and what
 When you can      the site along with        would be
draw cause and   what types of              enough
effect           information from the       information to
relationships    site would influence       draw a cause
between          a study.                   and effect
measured                                    relationship
quantities.                                 between any
 Confounding                                  measured
variables.                                  quantities.
5   Simple             The design of a   We will discuss the        Each student       Sec 2.4 #
Comparative         good              Stroop effect and          will find and      2.39, 2.41,
Experiments         experiment        each group will            evaluate           2.43, 2.45
 An example        design and perform         several famous
experiment        an experiment              experiments
 Randomization     testing the Stroop         based on the
 Blocking          effect.                    criteria that we
 Direct Control                               discussed in
 Blocking                                     class.
6   More On            Control groups
Experimental       Placebo
Design             Single blind
experiments
 Double blind
experiments
7   Survey Design      The different     We will discuss the        Each student       Sec 2.6 #
respondent.       that I give to the         National
 Comprehension     students and how to        Geographic
 Retrieval from    improve the                article ‘Opium
memory.           questions in the           Wars’ in order
 Answering the     survey.                    to refine their
questions.                                   understanding
 Common                                       of the concept
stumbling                                    ‘survey’. They
blocks in                                    will explore the
responding.                                  depth of
understanding
that the author
has of the
subject, but
compare how
small the
sample size is in
an article like
that with what
sampling.
8    Review:
Chapters 1 & 2
9    Exam: Chapters
1&2
10   Displaying           Comparative        Students explore the     Each student        Sec 3.1 #
Categorical           bar charts.        comparative hunting      will compare        3.3, 3.5,
Data:                Pie charts for     success between          the top 20          3.11, 3.15 +
Comparative Bar       categorical        Egrets and Herons        commodities         Test
Charts and Pie        data.              via comparative bar      produced by         Corrections
Charts               Stacked bar        charts and frequency     two different
charts             data.                    countries using
comparative
bar charts and
pie charts.
11   Displaying           How to             We explore different     Each student        Sec 3.2 #
Numerical Data:       construct stem     aspects of stem and      will locate a       3.17, 3.21,
Stem and Leaf         and leaf plots     leaf plots in order to   different real      3.23
Plots                Outliers           clarify the              life example of a
   Spread             construction of these    stem and leaf
plots.                   plot.
12   Displaying           Histograms for     We revisit the data      We revisit the      Sec 3.3 #
Numerical Data:       discrete           from measuring the       students            3.25, 3.27,
Frequency             numerical data.    salinity of the          measurements        3.33, 3.35
Distributions        Histograms for     school’s drinking        of the several
and Histograms        continuous         water and use            leaf lengths
numerical data     histograms to make       from two
(with the aid of   any arguments            different types
stem and leaf      clearer and visually     of trees and
plots)             appealing.               have them
   Frequency and                               describe
relative                                    visually what
frequency                                   the differences
distributions.                              are.
   Examples
13   Displaying           How to             We will create an        Each student        Sec 3.4 #
Bivariate             construct and      example of a scatter     will have to        3.41, 3.43,
Numerical Data        label a scatter    plot using raw data      make several        3.49, 3.53
plot.              from the FAO and         new scatter
   Time series        explore the meaning      plots using raw
plots.             of the trend in the      data from the
   Trends (linear     data. Then we will       FAO. They will
and non –          discuss the              also have to
linear).           implications of the      describe the
data for the leaders     trends that they
of a given country.      see along with
any
implications of
those trends.
14   Describing the        The difference   “Stringing Students      Using raw data      Sec 4.1 #
Center of a Data       between the      Along” is an activity    from the FAO,       4.5, 4.9,
Set Numerically        words            that explores how to     each student        4.13, 4.15
‘population’     sample objects like      will make
and ‘sample’.    bank queues to           several
   Mean.            determine center         estimations of
   Median.          and variability. We      the center of a
   Proportion of    look and two             data set. They
successes.       different sampling       will also have to
   Trimming data.   methods for strings      find a data set
of varying length in a   for which an
bag, and try to          average does
determine whether        not make sense.
either method shows
any sampling bias.
15   Describing the        The importance   We will revisit the      They will have      Sec 4.2 #
Variability in a       of variability   water salinity data      to describe the     4.21, 4.23,
Data Set               and spread.      and describe the         variability of      4.25, 4.29
   Standard         data’s center and        the
deviation.       variability              commodities
   Interquartile    numerically using        that they chose
range.           the concepts from        last time.
the past couple of
days.
16   Summarizing a         How box plots    “Capture –               Activity 4.2        Sec 4.3 #
Data Set: Box          can summarize    Recapture” is an         (SADA) is an        4.31,4.33,
Plots                  data.            activity that            activity that       4.35, 4.37
   Skeletal box     demonstrates a           explores the
plots.           method used by           possible shapes
   Modified box     naturalists to           of box plots
plots.           estimate the size of     given different
   Outliers.        populations that are     data sets.
   Extreme          hard to estimate. We
outliers.        will simulate the
   Cost – to –      process with
Charge ratio.    Pepperidge Farm
gold fish.
17   Interpreting          How to           “Sampling Pennies”       Each student        Sec 4.4 #
Center and             measure          is an activity that      will go back to     4.39, 4.41,
Variability:           ‘distance from   acts as an               their ERB           4.43, 4.45
Chebyshev’s            the center’ in   introduction to the      scores and find
Rule, the              terms of         concept of a             the mean,
Empirical Rule,        standard         distribution. It also    standard
and Z – Scores         deviations.      makes use of             deviation of the
   Chebyshev’s      calculations that        population and
estimate the center,     then compare
rule.              variability of a data   their score to
   The empirical      set. We can then        these. Then
rule.              check empirical         they will
   Z – scores.        results against         calculate what
   percentile         predictions for how     score they
many data points are    would have
supposed to be in a     needed in order
range.                  to get in a
certain
percentile.
18     Extra Day
19     Review:
Chapters 3 & 4
QII                       Quarter II: Bivariate Data, Probability and Distributions
20     Exam: Chapters
3&4
21     Correlation             How to             In class we       Each student must        Sec 5.1 # 5.1,
calculate          explore the       find a linear            5.5, 5.9, 5.11
correlation.       concept of        relationship in a
   What               correlation by    scholarly scientific
correlation        looking at        article and
means.             GPA scores        summarize what the
   When a set of      (for 9th, 10th    linear relationship.
bivariate          11th and 1st
numerical data     semester
has a good         senior year)
correlation.       along with
   What the           SAT scores
formula for        and ERB
correlation        scores to see
mean.              which pair of
   What               numerical
correlation2       data sets yield
means.             the strongest
correlation.
22     Linear                  Formula for the    We generate       They will                Sec 5.2 #
Regression:              y – intercept of   several data      measure/ask for the      5.17, 5.19,
Fitting a Line to        the regression     sets for          height and weight of     5.21, 5.25
Bivariate Data           line.              temperature       10 family members
   Formula for the    and try to        and calculate the
slope of the       estimate          equation of the
regression line.   absolute zero.    regression line for
   Formula for the                      their data set. They
slope of a                           will have to make a
regression line                      scatter plot of their
that goes                            data and include the
through the                          regression line. Then
origin.                              they will try to
   Examples                             predict the height and
weight of future
showing the                         members of their
difference                          family while avoiding
between lines                       the danger of
that are known                      extrapolation.
to go through
the origin and
lines that might
not go through
the origin.
   The fact that
the regression
line goes
through the
point (average
x value,
average y
values)
   Dependent
versus
independent
variable.
   Danger of
extrapolation.
   Absolute Zero.
23   Assessing the Fit      Residuals.         Students         Each student look for     Sec 5.3 #
of a Line              Predicted          match            two numerical data        5.33, 5.35,
values.            equations of     sets that they think      5.37, 5.39
   Residual plots     regression       will have a linear
   Coefficient of     lines to         relationship on
determination.     scatter plots    data.gov and then
   How residual       that are         they create a (clearly
plots can          similar to       labeled) scatter plot
uncover            each other.      for the data, calculate
curvature in a     The scatter      and graph the
data set that      plots are        regression line,
was previously     created in       calculate and
thought to be      such a way       interpret the
straight.          that only one    correlation, calculate
point changes.   and interpret the
The points       standard deviation
far away but     line.
on the
regression
line or far
away and
perpendicular
to the
regression
line. We also
compare the
correlations
in these
instances.
24   Non-Linear        We try fitting a           In this activity   Each student must go     Sec 5.4 #
Relationships     straight line to non –     we look at         home and keep track      5.47, 5.49,
and               linear data. Then we       data from the      of the temperature of    5.51, 5.53
Transformations   try changing the           NOAA               a cooling liquid (hot
regression line to a       regarding          chocolate or tea that
regression ‘curve’. We     monthly            they can drink
revisit challenge of       averages of        afterwards). Then
noticing when data         CO2 over a few     they will have to try
that looks straight is     decades and        to fit the data with a
not straight. Then we      try to fit the     line while looking for
explore the concept of     curve as           clues as to how the
‘linearizing’ data. We     accurately as      data might not be
finally make a list of     possible. The      linear. Then they
traditional                we plot the        have to try to find a
linearizations.            actual carbon      good non – linear
level versus       model. Finally they
the predicted      have to check that
carbon level       their non – linear
and calculate      model is a good fit by
the                plotting actual versus
correlation        predicted values.
between these
two, in order
to see how
good of a fit
our model is.
25   Chance                  Chance              We write out       Each student then        Sec 6.1 # 6.1,
Experiments and          experiment.         the sample         performs a similar       6.3, 6.5, 6.7
Events                  Sample space.       space for the      (but simpler)
   Event.              sum of the top     experiment at home
   Simple event.       faces after        with flipping a coin.
   Tree.               rolling two        First, they make a
   Sample space        dice. Then we      predicted sample
tree.               try to check       space. Then they
   Compliment of       this               check actual
A                   prediction         experimental values
   ‘or’ versus ‘and’   against reality    against that predicted
   disjoint            by rolling two     sample space. They
dice and add       must create a relative
the numbers        frequency histogram
on the top         of the predicted and
faces. We          actual data sets.
check the
relative
frequency of
the results
from actually
rolling the
dice against
the predicted .
26   Definition of         Classical         We explore         Each student             No textbook
Probability            definition of     the difference     performs a similar       homework.
probability.      between the        experiment to the
   Relative          classical and      bottle cap experiment
frequency         relative           at home, but with
definition of     frequency          Hershey Kisses.
probability.      definitions of
   Subjective /      probability by
weighted          writing out
definition of     the sample
probability.      space for the
   Then we           result of
discuss the       flipping a
main              plastic bottle
differences       cap. Then
between the       comparing
different         that
definitions of    prediction to
probability by    the actual
checking the      results of
predictions of    flipping a
each one          bottle cap.
against the
other.
27   Basic Properties      Probabilities     In this activity   Students then design     Sec 6.3 #
of Probability.        are between …     we encounter       a test for the false     6.15, 6.17,
   The probability   the ‘law of        version of the ‘law of   6.19, 6.21
of the whole      averages’ in       averages’ and look
sample space is   its popular,       the results of those
…                 but false form.    tests to see if they
   What property     We classify        think the ‘law of
do disjoint       various            averages’ is true or
events have in    statements         false. This also helps
the context of    that use the       introduce the concept
probability?      ‘law of            of hypothesis testing.
   What is the       averages’ and
relationship      try to find
between an        what is
event and its     correct and
compliment in     what is wrong
probability.      We use this to
   The law of        gain a
large numbers.    stronger
grasp of a
more correct
property
found in the
context of
probability,
the ‘law of
large
numbers’.
28   Conditional       Definition of     In this activity   Each student will         Sec 6.4 #
Probability        conditional       students           solve theoretically,      6.29, 6.33,
probability.      explore            and try the following     6.35, 6.37
   Why               ‘Monty Hall’       experiment: Three
conditional       problem. We        cards are put into a
probability is    introduce the      box. One card is red
needed.           problem,           on both sides, one
   How to use two    solve the          card is green on both
way tables to     problem and        sides and one card is
help calculate    then try the       red on one side and
conditional       problem            green on the other. If
probabilities.    empirically        you get a prize if you
   When you can      with cards.        correctly guess the
use conditional                      color on the other
probability.                         side of th card that
you randomly picked
form the box should
you always guess the
same color, a
different color, or are
the two strategies the
same?
29   Independence      Formula for       In this activity   Each student will do      Sec 6.5 #
independence.     students           some research on          6.41, 6.47,
   Why we need a     investigate        diffraction and           6.51, 6.57
concept like      the frequency      comment on whether
independence.     with which         each electron is
   When we can       push pins fall     acting independently
use the concept   point down.        of other electrons in
of                But they do so     the diffraction
independence.     in two             experiment.
   Examples.         different
ways. The
first way is by
dropping
push pins one
at a time. The
second way,
however, is by
dropping 10
push pins at a
time. The goal
is to see if in
the second
method push
pins fall show
independence
.
30   General          General          In this activity   In this assignment       Sec 6.6 #
Probability       Addition Rule.   we look at the     students explore the     6.59, 6.61,
Rules            General          concept of         concept (and             6.63, 6.69
Multiplication   conditional        formula) for
Rule.            probability in     conditional
   Law of Total     the context of     probability in the
Probability.     defective and      context of medical
   Bayes’           non –              tests for a disease.
Theorem.         defective          Since a medical test
parts.             can show positive
Students are       when the individual
given two          does NOT have the
types of bolts     disease, and since the
from ‘two          test can show a
different          negative when the
machines’          individual DOES have
that produce       the disease,
bolts. Each        conditional
machine has a      probability is one of
different          the appropriate tools
success rate of    for dealing with
producing          medical tests.
non –
defective
parts.
Students then
take samples
from these
bolt
collections
and compare
the
theoretical
probabilities
that they
calculated
first with the
actual
frequencies
with which a
specific type
of bolt
showed up.
31   Review:
Chapters 5 & 6
32   Exam: Chapters
5&6
33   Random                 Random             In this activity   Each student has to      Sec 7 # 7.1,
Variables               variable.          we examine         find 20 statistics, 10   7.3, 7.5, 7.7
   Discrete           the concept of     of which are from a
random             ‘streaky           discrete random
variables.         behavior’ and      variable, and 10 of
   Continuous         what               which are from a
random             constitutes        continuous random
variables.         streaky            variable.
   The difference     behavior.
between            First, as a
discrete and       class we look
continuous.        at a real
sequence of
coin flips
up sequence
of coin flips.
We try to
figure out
which is
which based
on the
‘streakiness’
of the
sequence.
Then they
construct
their own real
sequence of
coin flips and
analyze it for
streakiness.
34   Probability            Definition of a    In this activity   Each student must        Sec 7.4 #
Distributions for       probability        we create a        watch a basketball       7.27, 7.29,
Discrete Random         distribution for   probability        game and keep track      7.31, 7.37
Variables               a discrete         distribution       of the sequence of
random             for the            shots and whether
variable.          machine bolt       the shot was made or
   Properties of a    activity that      not. Then each
probability        we did before      student will have to
distribution.      where two          create a probability
difference         distribution function
machine            for the random
create bolts       variable ‘number of
that are           successful shots in a
defective or       row’.
non –
defective at
different
rates.
35   Probability            Definition of     In this activity   Each student must         Sec 7.3 #
Distributions for       probability       we create a        make a probability        7.21, 7.23
Continuous              density           probability        density function for
Random                  function for      density            the temperature
Variables               continuous        function for       readings of their
random            the                house with the goal of
variables.        continuous         being able to
   Relationship      random             distinguish one
and difference    variable pH in     student’s house from
between           different          another simply based
continuous and    liquids.           on the data /
discrete                             calculation / graphics
probability                          that they make.
distributions.
   Calculating
probabilities
using a table
for a
probability
density
function of a
continuous
random
variable.
   Why the area
represents
probability.
36   Mean and               Mean of a         In this activity   Each student will         Sec 7.4 #
Standard                random            we try to          have to gather a          7.27, 7.29,
Deviation of a          variable.         measure the        range of gas station      7.31, 7.37
Random                 Standard          ‘length of a       information,
Variable                deviation of a    mechanical         including number of
random            pencil’. The       gallons, total price,
variable.         challenge is to    price per gallon, time
   Why               measure it         of day. They will have
probability       with respect       to make a sampling
shows up in the   to a specific      plan, get permission
formula for the   individual and     to gather the data,
mean.             so get a sense     gather the data and
   Some example      of how long a      then analyze the data.
calculations      particular         They will have to
using raw data.   person likes       make a probability
the graphite       distribution function
to be when         for their data and
they write.        create a visual aid for
their function.
37   Binomial and       When to use        In this activity   When going to a           Sec 7.5 #
Geometric           the binomial       we explore         parking lot students      7.45, 7.51,
Distribution        distribution.      the geometric      will have to count        7.59, 7.61
   How to             distribution       how many cars it
calculate the      in the context     takes until they get to
probabilities      of prizes like     say a Toyota. After
associated with    ‘cracker jack’     repeating this count
the binomial       prizes where       several times each
distribution.      they have to       student will comment
   How the            buy a certain      on whether they
geometric          number of          think that their
distribution       boxes before       distribution is
relates to the     they get the       geometric and what
binomial           prize that         they think the actual
distribution.      they want to.      proportion of Toyotas
   When to use        The students       they think were in the
the geometric      make a             parking lot.
distribution.      calculation for
   How to do          predicting
calculations       how many
with the           tries it will
geometric          take and then
distribution.      test that
   Examples.          prediction
empirically.
38   Normal             The general        In this activity   Each student will         Sec 7.6 #
Distributions       shape of the       we try to          have to go home and       7.67, 7.69,
normal             measure the        make sample the           7.71, 7.73
distribution.      length of our      electric meter
   How to             classroom.         readings when they
calculate areas    From the data      get home during
using a table.     that we collect    some time frame.
   How z – score      we calculate       They will have to
relates to         the mean and       think about how they
calculating        standard           will sample the
areas.             deviations.        meter. Then after
   How                We also            they have gathered
probability        calculate a        the data they need to
notation works     probability        calculate the mean
with normal        distribution       and standard
distributions.     function from      deviation. They also
   Upper tailed,      the data.          need to thing about
lower tailed                          whether the data that
and two – tailed                      they have looks
calculations.                         normal. Assuming
   Symmetry of                           that the data is
the normal                            normal they will also
distribution.                         have to make some
guesses as to how
many measurements
they think will fall in
a certain range of
values.
39      Checking for               What does it      We check the   Each student checks        Sec 7.7 #
Normality                   mean for a data data from our    their data from the        7.81, 7.83,
set to look       measurement    electric meter             7.85, 7.89
normal?           of the length  readings for
 What do you          of our         normality.
compare to see classroom for
if a data set it  normality.
normal.
 Using
correlation
between
theoretical
cumulative
probability and
actual
cumulative
probability to
determine if
data appears to
be normal.
 How to use the
correlation
table to
determine if
data appears to
be normal or
not.
QIII                 Quarter III: Distributions, Confidence Intervals and Hypothesis Testing
40      Approximating           Sometimes the In this activity Each student                    Sec 7.8 #
Discrete                    shape of a        we explore       continues the            7.93, 7.95,
Distributions               discrete          the concept of experiment home by         7.97, 7.99
distribution is   using a          trying to measure out
similar to the    continuous       1 cup of pasta shells
shape of a        distribution     by weight to see if
continuous        to               that data produces
distribution.     approximate      better results than
 When can we          a discrete       either measuring cup
interchange the distribution       did.
two               in the context
distributions?    of ‘process
 Binomial             control’. Each
versus Normal     group will try
distributions.    to measure
 Examples of          out 1 cup of
checking          ‘ice cream’
binomial data     (pasta shells)
for normality.    in two
   Noting that the   different
two still         ways.
produce           First, each
slightly          group will try
different         to measure
probabilities.    one cup with
   What other        an opaque
distributions     measuring
are similar to    cup and
each other?       ‘intuition’.
Then each
group will use
a transparent
measuring
cup and try to
get an exact
cup. Then we
will average
the results
from each
group and try
to determine
if either
measurement
shows a
significant
over the
other. The
random
variable ‘the
number of
pasta shells in
the measuring
cup’ is a
discrete
random
variable. We
will compare
the results
from this
random
variable with
a continuous
one.
41   Statistics and      Comparing one     In this activity   Each student then      Sec 8.1 # 8.1,
Sampling             measurement       we do a            does an extension to   8.3, 8.7, 8.11
Variability          from one          similar            what we covered that
sample with       activity as in     day by taking a look
the mean of        the              at the random
one sample out     introduction     variable X = ‘number
of several         except we use    of students in one of
samples.           the random       my classes’. They
   Example using      variable X =     must create a
the random         ‘number of       probability
variable X = ‘#    children that    distribution (along
of cars in one     a trustee from   with a visual
family’            our school       representation for
compared with      has’.            that distribution) for
Y = ‘average                        the random variable
number of cars                      X and then compare it
between two                         to the probability
families’.                          distribution for Y =
   Comparing the                       ‘the average number
distribution for                    of students in two of
a random                            my classes’.
variable with
the sampling
distribution of
the sample
mean.
   How the shapes
of X and Y do
NOT have to be
the same.
42   The Sampling         Comparing the      In ‘Cents and    Each student then        Sec 8.2, 8.17,
Distribution of       distribution of    the Central      will then look up the    8.19, 8.21,
the Sampling          single             Limit            number of coins          8.23
Mean                  measurements       Theorem’ we      minted in different
of a random        explore how      years to help explain
variable X with    the Central      why the distribution
the distribution   Limit            of dates from pennies
of the averages    Theorem          that we saw in class
from samples       works. The       looked the way it did.
of size N.         distribution
   How the            that we take
distribution of    samples from
the averages       is the
from several       distribution
samples of size    dates of 100
N can have a       pennies.
different shape
than the
original
distribution.
   How the
distribution of
the averages
from different
samples of size
N become more
‘Gaussian’ as
the size of the
sample
increases.
   How the mean
of the
distribution of
the averages
from samples
of size N gets
closer to actual
population
mean from
which the
samples come.
   How the
standard
deviation of the
averages from
samples of size
N get smaller as
N increases.
   The Central
Limit Theorem.
   Using
simulation to
demonstrate
that the Central
Limit Theorem
works.
43   The Sampling         Revisiting the     In this activity   Students then             Sec 8.3 #
Distribution of       definition of      we look at the     continue this study by    8.27, 8.29,
the Sample            the sample         proportion of      looking up the            8.31, 8.33
Proportion            proportion of      non -              proportion of
successes.         Caucasian          different ethnic
   How the            students at        groups in our city and
distribution of    our school to      develop a sampling
the sample         understand         plan for determining
proportion of      the concept of     the ethnicity from a
successes is       the sampling       sample of people
related to the     distribution of    (without having to
distribution of    the sample         ask them their
the sampling       proportion.        ethnicity – i.e. simply
mean.              Each group         by watching people).
   An example of a    creates a plan     Then they compare
rope making        to sample          their sample to the
company that       students in     know/estimated size
makes ropes        the hall way    of different ethnicity
for two            for their       groups.
different          ethnicity.
groups of          Then we look
people. One        at the
group uses         distribution of
rope               the
decoratively       proportion of
and the other      non –
uses rope to       Caucasian
haul cargo. The    students in
second group       those samples
needs the rope     and compare
to withstand a     that
certain level of   distribution
force before       to the know
the rope           proportion of
breaks. How        non –
does the rope      Caucasian
company            students.
determine how
well their ropes
satisfy the
second
customer?
   A calculation of
the probability
120 rope at
least 110 of
them will be
able to
withstand the
required
amount of force
44   Review: Chapter
7&8
45   Exams: Chapters
7&8
46   Point Estimation      The definition     In this activity   Each student has to     Sec 9.1 # 9.1,
of a point         we try to          compare statistics      9.3, 9.4, 9.7
estimate           estimate the       from both sides of a
   True value of a    value of the       controversial issue
population         gravitational      and try to determine
characteristic     constant           if the statistics are
   Unbiased           using several      consistent or
statistics         different          inconsistent with
methods. We        each other.
versus biased      try to
statistics         determine if
   Precision          the method is
versus             biased and if
accuracy           the method is
valid based on
the data.
47   Large Sample         The definition     In a large jar     Each student has to       Sec 9.2 #
Confidence            of a confidence    with pennies       find an example of a      9.11, 9.13,
Interval for a        interval.          and quarters       confidence interval in    9.15, 9.17
Population           Confidence         students find      the news and explain
Proportion            level.             an                 what the statistic is
   95%                appropriate        measuring and if they
confidence         sample size in     think the interval is a
interval.          order to get at    good one.
   Large sample       least 10
confidence         quarters with
interval for the   95%
population         confidence in
proportion.        a random
   Standard error.    sample of
   Bound on the       coins from the
standard error     jar. Then they
of the             generalize
estimation B       their
associated with    calculation to
a 95%              predict
confidence         sample sizes
interval.          needed in
   Sample size        order to get N
requirements       quarters with
95%
confidence in
a random
sample of
coins from a
jar.
48   Confidence           Assumptions        In this activity   Each student extends      Sec 9.3 #
Interval for a        before using a     we consider        this activity by          9.29, 9.31,
Population Mean       one – sample z     the mean of        finding a statistic       9.33, 9.35
confidence         executives         whose mean is
interval for a     and                reported. They then
population         determine a        must calculate a
mean.              sample size to     sample size to
   Sample size        estimate the       estimate the true
requirements       true               population mean for
before use a       population         that statistic. After
one – sample       mean salary        getting a random
confidence         of executives.     sample they must
interval for a     Then we look       then compare the
population        up the             mean that they
mean.             salaries of a      calculated with the
   Student’s t –     random             reported mean.
distributions     sample of
versus z          executives
distributions.    and compare
   One – sample t    the mean with
confidence        the reported
intervals for a   mean.
sample mean.
49   Hypothesis and        A test of         In this activity   Each student must        Sec 10.1 #
Test Procedures        hypotheses or     we take            then find one            10.1,10.3,
test procedure    several            experiment and           10.5, 10.7
   Null hypothesis   experiments        describe the null
   Alternative       and                hypothesis and
hypothesis        determine          alternative
   The different     what the null      hypothesis.
possible          and
alternative       alternative
hypotheses.       hypotheses
are for each
experiment.
We also
explore why
the
researchers
did not chose
different
alternative
hypotheses.
50   Errors in             Test              In this activity   Each student must     Sec 10.2 #
Hypothesis             procedures.       we revisit the     find an example of a  10.11, 10.13,
Testing               Type I error.     ‘cards in box’     treatment with        10.15, 10.17
   Type II error.    problem and        known type I and type
   Level of          calculate the      II errors.
significance.     known type I
   How to choose     and type II
an alpha level    errors. Then
and why should    we test those
not make the      predicted
alpha level       values against
smaller than it   experimental
needs to be.      results that
we make in
class.
51   Large Sample          Test statistics   We make a          Each student then has    Sec 10. 3 #
Hypothesis Tests      P – value         hypothesis         to make a hypothesis     10.23, 10.25,
for a Population      Observed          regarding the      regarding the number     10.27, 10.29
Proportion             significance      proportion of      of students with black
boys and girls     hair at our school.
level              at our school.    Then they need to
   What the P –       Then we           decide if our student
value means.       decide if our     body is large enough
   How to phrase      student body      to perform a large –
a response to a    is large          sample hypothesis
given P – value    enough to         test for the
(accept versus     perform a         population
fail to reject).   large – sample    proportion of
   Upper tailed       hypothesis        students with black
tests and lower    test for the      hair. Then they need
tailed tests       population        to take a random
versus two         proportion of     sample to estimate
tailed tests.      boys (or girls)   the number of
   An outline of      in the school.    students with black
the steps in a     Then we take      hair at our school.
hypothesis         a random          Then each student
testing            sample to         will compare results
analysis.          estimate the      with other students in
number of         class.
boys (or girls)
at our school.
Then we
compare the
estimation
with the
actual
number.

(or we can do
a
skittles/m&m’
s related
activity)
52   Hypothesis Tests      Z and T            We make a         Each student must        Sec 10.4 #
for a Population       confidence         hypothesis for    make a hypothesis for    10.41, 10.43,
Mean                   intervals when     the mean SAT      the mean sunrise         10.45, 10.47
the population     score in our      time in our city. Then
standard           school over       they need to
deviation is       the past few      determine how many
known and not      years. Then       days they would need
known.             we determine      in order to use a
   The definition     how many          hypothesis test for a
of degrees of      years and         population mean.
freedom and        students we       Then they need to
how to             need in order     design a sampling
calculate the      to use a          plan for getting a
degrees of         hypothesis        random sample of
freedom in the     test for          days. Then after
basic sense.       population        gathering the data
   Upper tailed       mean. Then        and calculating a
and lower          we take a          sample mean they
tailed tests       random             need to compare their
versus two         sample of SAT      results with one
tailed tests.      scores from        year’s worth of
   The definition     the given          sunrise times. They
of statistically   years and          need to make a box
significant.       compare it         plot of each to show
with the           their results visually.
reported
school mean
SAT score.
53   Power and            The definition     In this activity   Each student needs to Sec 10.5 #
Probability of        of the power of    we compare         find two test         10.59, 10.61,
Type II Error         a test.            two test           procedures with       10.63, 10.65
   Visually how to    procedures.        known type I and type
think about the    First, we          II errors. Then they
power of a test.   calculate the      need to compare the
   What factors       probabilities      power of each test
have an effect     for type I and     and describe under
on the power of    type II errors     what circumstances
a test?            in each test       their conclusion is
   When the null      procedure.         valid.
hypothesis is      Then check
true versus        those
when the null      probabilities
hypothesis is      empirically in
false.             class.
54   Review:
Chapters 9 & 10
55   Exam: Chapters
9 & 10
56   Inferences           When you           In this activity   Each student must         Sec 11.1 #
Concerning the        might need to      we return to       clearly state a null      11.1, 11.5,
Difference            use a difference   our data from      hypothesis of the         11.9, 11.13
Between Two           of means.          the activity       difference of between
Population or        Comparing          where we           the mean electrical
Treatment             treatments.        measured the       usage in their house
Means Using          Formulas for       constant of        when everyone is
Independent           the difference     gravity using      awake and the mean
Samples               between            different          electrical usage in
sample means       methods. We        their house after
using              use this data      every one has gone to
independent        to determine       bed. They need to
samples.           if either          create a sampling
   Assumptions        method shows       plan to get a random
for the using      a significant      sample during those
the above          difference         times. Then after
formulas.          from each          gathering their data
other and the      and making their
accepted           calculations they
value for the      need to determine if
gravitation        the data shows any
constant.          significant difference
from their prediction.
They should
speculate any causes
given their
conclusion.
57   Inferences           The definition    In this activity   Each student must         Sec 11.2 #
Concerning the        of ‘paired’.      students           clearly state a null      11.31, 11.35,
Difference           Examples of       compare the        hypothesis for the        11.37, 11.39
Between Two           situations that   difference         difference of the
Populations or        require paired    between the        mean temperature of
Treatment             values.           mean listed        one floor of his or her
Means Using          Assumptions       weight of          family’s house
Paired Samples.       before making     candies with       compared with the
inferences        the same size      mean temperature of
about the         and the mean       another floor of
difference        measured           his/her family’s
between means     weight of          house. The
when using        candies with       temperature readings
paired samples.   the same size.     should happen at the
   Paired t                             same time. The
confidence                           students should then
intervals.                           comment on how well
paired the data sets
are. They should also
speculate as to any
causes given their
conclusion.
58   Large Sample         Assumptions       In this activity   Each student must         Sec 11.3 #
Inferences            before making     we compare         challenge a member        11.41, 1.43,
Concerning a          inferences        the difference     of their family to a      11.45, 11.47
Between Two           difference        proportion of      and try the day’s
Populations or        between two       basketball         activity at home.
Treatment             population (or    shots made by
Proportions           treatment)        team A and
proportions.      the
   Formulas for      proportion of
population (or    team B. Two
treatment)        teams from
proportions.      class make a
series of
shots and
keep track of
successful
shots and
misses. They
what they
need to do in
order to
satisfy the
assumptions
of the test.
They also
make a clear
statement
the null
hypothesis is
in this
context. After
they make
enough shots
we compare
actual
proportion of
successes
between the
two teams.
59   Chi –Squared          What the null     In this activity   Each student must         Sec 12.1 #
Tests for              hypothesis        we look at         find data on              12.1, 12.3,
Univariate             looks like for    data from          www.data.gov upon         12.5, 12.7
Categorical Data       univariate        drosophila         which they can
categorical       fruit flies and    perform a chi-
data.             compare            squared test for
   How to create     predicted          univariate categorical
the alternative   ratios of          data. They must make
hypothesis and    inherited          sure the data satisfies
how to notice     traits with        the assumptions
the alternative   actual ratios      needed in order to
hypothesis.       of inherited       perform the test.
   Expected          traits. This is    After they do the
versus            in conjunction     calculation they need
observed          with AP            to explain what the
counts.           Biology lab.       resulting chi-squared
   Chi – squared                        value means and if
value.                               the data shows any
   How to use the                       significant difference
chi – squared                        between the
values to make                       hypothesized
inferences.                          proportions or not.
   Chi – squared
tables.
 Assumptions
needed in
order to make
inferences
using the chi –
squared value.
QIV                      Quarter IV: Chi-Squared Tests, AP Exam and Topics
60    Tests for             Two ways           In this activity   Each student must        Sec 12.2 #
Homogeneity            tables.            we compare         compare several          12.17, 12.19,
and                   Marginal totals.   several            basketball teams         12.21, 12.23
Independence in       How to             different          against at least four
a Two – Way            calculate          famous             different
Table                  expected           authors            characteristics (like
values for a two   against the        rebounds, shot
way table.         following:         success proportion,
   What the null      how many           etc.) to see if their
hypothesis is      books got to       collection of
when using the     the NY Times       characteristics show
chi – squared      best-seller        significant
values and two     list, how many     differences between
– way tables.      books became       the teams. They need
   Assumptions        movies, how        to look for which
needed in          many books         characteristic or team
order to make      had sequels        shows the most
inferences         and how            contribution and in
using a chi –      many books         which direction.
squared value      they have
for two ways       published.
tables.
61    Review:
Chapters 11 & 12
62    Exam: Chapters
11 & 12
63    AP Review
64    AP Review
65    AP Review
66    AP Review
67    AP Review
68    AP Exams
69    AP Exams
70    AP Exams
71    AP Exams
72    AP Exams
73    Discrimination
74    Discrimination
75   Chapter 13
76   Chapter 13
77   ANOVA
78   ANOVA
79   ANOVA
80   ANOVA

Exploring Data
Activity: Food and Agricultural Organization (AP Statistics)

Materials: Laptop

Statistics is a tool that is meant to analyze and help us understand data. To this end we will
need several sources of data. The first source that we will make use of is the Food and
Agriculture Organization of the United Nations.

Click on ‘want to register?’

Register yourself at FAO by filling in the information for the following:

   Name
   Manlius Pebble Hill for the organization
   Educational institution for the type of organization
   USA for the country
   Check the first column of boxes (and any others that interest you)
We will use data from this website today and throughout the year.

Go to http://www.fao.org/economic/ess/en/ and find the current agricultural yearbook.

Find the spread sheets for the following:

   Total and Agricultural Population (including forestry and fisheries) (A1)
   Human Development Index and Poverty (G4)

Find the definitions for total population, agricultural population, human development index and
poverty. Find the units for each of the categories.

Copy the columns for 2009 in each of the following categories into a new excel worksheet titled
Excel Practice 1 AP Statistics. Make sure the countries match the data in each row.

   Name of Country
   total population
   agricultural population
   human development index
   Poverty Prevalence
   Year Poverty Prevalence was recorded

In a new column you are going to calculate the ratio of agricultural population to total population.
Label the column as such and then in the first row (lining up with the first country) place an
equation that looks like ‘=E11/D11’ which should represent the ratio of agricultural population to
total population of the first country. In this formula agricultural population for Afghanistan was in
column E row 11 and the agricultural population for Afghanistan was in column D row 11.

Then copy the formula in that cell and paste it to the rest of the cells in that row all the way down
to the last country. The numbers should all be different and represent each country’s ratio.

Note: all formulas in a cell for Excel should be preceded by an equal sign.
Excel has a list of statistical functions that you can use, these are listed under ‘statistical
functions’ in the help search menu. You will be using several of these functions from excel.
Some of these include the following:

   =AVERAGE( range of cells ) This produces the average of all the numbers that you
highlighted.
   =MEDIAN( range of cells) This finds the median or middle number of the cells that you
highlighted.
   =SUM (range of cells ) This adds up the numbers in the cells that you highlighted
   =STDEV (range of cells ) This produces the Sample Standard Deviation for the cells that
you highlighted.
   =CORREL ( first range of cells, second range of cells ) This produces the Correlation
between the variables represented in the given two ranges of cells [usually two columns
or two rows].

Do the following calculations with the data that you copied from the FAO statistical yearbook:

   Find the average agricultural population
   Find the median agricultural population
   Find the sum of the agricultural population and compare it to the world agricultural
population. What should be true about these numbers?
   Find the standard deviation of the agricultural population.
   Find the correlation between the column labeled Human Development Index and the
column that should represent the ratio ‘agricultural population : total population’

Make the following scatter plots and label the axes and each scatter plot:

   Human Development Index versus Proportion of Population that Farms
   Poverty Prevalence versus Proportion of Population that Farms

What is the definition of the term ‘human development index’?
What is the definition of the word ‘poverty’?

What do you notice about the overall trends in each scatter plot? Does it look like there is any
relationship between the different variables that you plotted?

What do the points on the x axis mean?

What would this suggest about what a nation should do to improve its human development
index?

Would your solution in the previous question automatically reduce poverty in a given country?
Why or why not?

What is considered the typical trend with respect to the percentage of people that farm?

Why does USA’s poverty prevalence not show up in the table?
Important aspects of this activity:

   You should always be able to analyze a data set using Excel even if you don’t remember
all the formulas. The key is that you must remember what the formulas mean, when you
can use the formulas, what the formulas can (and can’t) do.
   Taking a course in statistics allows you to become statistically literate, which will allow
you to be intelligently informed about the information that you see around you. You will
see statistical information pretty much any where you go or in many informative
documents that you will see.
   Often statistical information can help guide decisions that you would have to make in
   The statistical information also can show how your intuition is not always correct. To this
end knowing what a statistic means can help you make life choices.
   This activity demonstrates the process of collecting, displaying, describing, analyzing
and drawing conclusions from data. This process is the main process of statistics.
   The charts that we made and the descriptions of trends that we found in the charts are
example of descriptive statistics.
   The column marked total population is an example of the population of interest.
   The column called total agricultural population is an example of sample population.
   The question that asked you to make a decision based on the trends that you saw in the
data is an example of inferential statistics.

(The above activity shows that students interpret statistical results in context)

(This example also makes use of graphical exploration of data)
Assignment:

Each student must make a hypothesis for the mean sunrise time in our city. Then they need to
determine how many days they would need in order to use a hypothesis test for a population mean.
Then they need to design a sampling plan for getting a random sample of days. Then after gathering
the data and calculating a sample mean they need to compare their results with one year’s worth of
sunrise times. They need to make a box plot of each to show their results visually.

(Here the box plots incorporated median based statistics with mean based analysis)

(This assignment also shows statistical methods of exploring data)
Activity: Non – Linear Relationships and Transformation (AP Statistics)

Go to the Global Monitoring Division of the National Oceanic and Atmospheric Administration.

http://www.esrl.noaa.gov/gmd/index.html

Choose the ‘products’ tab and select search for data.

Restrict the search to ‘Carbon Dioxide’ and monthly averages. Then select the data from Ascension
Island in the UK.

Copy the data in this file and paste it into a spreadsheet.

You will have to separate the data in the column by highlighting the data and then choosing the ‘data’
tab and selecting the ‘text to columns’ option. Then sort by spaces.

Once you have done this create a scatter plot of carbon dioxide levels to month/year.

Then create an appropriate sized viewing window so that you can see the detail of each month.

Does the data look linear?

What function do you think might help straighten this data set?

When you include the regression line in the scatter plot what sorts of curviness do you notice? Describe
two ways that your scatter plot is curvy.

Even with the curviness would you still feel like you could predict the carbon levels at ascension island in
the UK?

What would be a good rule for predicting the carbon levels?

Create the following column called ‘predicted’: =337.42+(8/60)*I + cos (π *(I-2) / 6)+2*cos(π*I / 200)

Create a scatter plot of carbon level versus predicted.

Find the correlation between ‘carbon level’ and ‘predicted’.
(This assignment shows how students must interpret data in context and is shows graphical exploration
of data and well as numerical approximation of data.)

Sampling and Experimentation
The following is from the syllabus:

Sampling Methods              Why sample?                 Designing a survey to           Exploring
and Bias                      Sample sizes                determine how many              sampling methods
   Selection bias              hours students spend on         in the context
   Measurement or              homework at our school          farming (plant
response bias               in the upper school.            wilt) and how to
   Non-response bias           Along with a discussion         get a good sample
   Conceptual bias             about how to do the             given some
   Simple Random               actual sampling.                uncontrolled
samples                                                     variables.
   Stratified random
sampling
   Cluster sampling
   Systematic sampling
   Why not
Convenience
sampling?
   Why not volunteer
sampling?
   How important
sampling biases are
for researchers
when designing
experiments.
Assignment: Sampling (AP Statistics)

An experimenter requires prior knowledge of a subject before they can enact a test of any significance.
A specific experiment comes with the purpose of measuring some quantity. When sampling a
population to make the desired measurement the experimenter needs to know what variables affect
the quantity that they want to measure.

Consider the following passage from the 1957 yearbook of agriculture on Soils (p 44)

“Water is the medium that disperses the protoplasm in the cell. It is a medium by which physical
force is effected on the cell wall to bring about expansion and growth.

Only a small part of the water taken up by roots from the soils retained in the cells of the plants.
Most of the water that is absorbed is conducted to the leaves, where it is lost by evaporation or
transpiration. Since the evaporation of 1 gram of water requires 539 calories, the high rate of water loss
that takes place from leaves on hot summer days acts as an evaporative cooler. One mature tomato
plant in a warm arid climate will transpire a gallon of water in a day. As much as 700 tons of water may
be needed to produce 1 ton of alfalfa hay. The water that is transpired by a cornfield in Iowa in a
growing season is enough to cover the field to a depth of 13 to 15 inches.

The loss of water from plants is controlled by incident light energy, relative humidity,
temperature, wind, opening of pores (called stomata) in leaves, and supply of water in soil.

Incident light energy is the most important factor because the evaporation of water requires a
source of energy. Relative humidity is also important because evaporation takes place much more
rapidly in a dry atmosphere than in a humid one. The other factors I mentioned are of a relatively minor
consequence.

If water loss by transpiration exceeds water intake by the roots, a water deficit develops in the
plant, expansion of growing cells ceases, and the plant stops growing. If the water deficit continues the
plant wilts. If it becomes too severe, the plant tissues wither and die. By what means can plant cells
absorb and retain water when the atmosphere is evaporating it form the leaves and the soil is impeding
its entry in to the roots? An illustration:

When salt is applied to shredded cabbage, the tissue fluids diffuse out of the leaf slices and
dissolve the salt making a brine. The cabbage leaves become limp, or flaccid. If the limp leaves are
washed free of brine and placed in pure water, they again become stiff or turgid. This exemplifies one of
the most fundamental characteristics of the water relationships of plants. It is the diffusion of water
through a semipermeable membrane more commonly called osmosis. When two solutions differing in
concentration are separated by a membrane impermeable to the dissolved substance, water moves
from the solution of lower concentration to the one of higher concentration. “

Suppose you wanted to measure the average number of plants that showed wilt during a day under the
then choose 1 out of every k plants by rows starting at 4 p.m. until you reach 20% of the plants. You will
do this each day for a week in order to obtain a relatively random sample of plants from the fields. From
this sample you would count the number of plants that showed any wilting, find the average for each
day, find the percentage of plants that showed wilt, and then generalize the result to the whole field.
Then after looking for any trends you would make a suggestion to either keep the current farming
method or modify the farming method.

Situation # 1: A significant portion of the field is shaded by larger trees in such way that the shade would
influence the incident light hitting 40% of the plants at 4 p.m.

Situation # 2: The farmer forgot to put out the watering system two of the nights during the week.

Situation # 3: You happened to choose a week where the weather was hitting record highs. (Would the
extra wilting that you probably saw be cause to change the farming system?)

Situation # 4: You happened to choose a week of record high winds so that the water transpired by the
plants did not stay under the plants causing a higher percentage of plants to wilt.

Situation # 5: The field contains two similar looking plants that have different preferred growing
temperatures. One of the plants wilts more easily under the normal temperatures for the week that you
chose to take the sample.

Situation # 6: The farmer has managed to water the plants in such a way that they are not wilting, but
they have also stopped growing.

For two of the above situations do the following:

   Describe the problem with the sampling method.
   Make sure to include why the situation would skew the results from a 1 in k systematic sampling
method.
   Classify each type of bias that shows up.
   Also make sure to include which sampling method could correct the bias and how.

   Describe the difference between a sampling bias and a cause of wilting.
   If you found a significant portion of plants that wilted (say over 10% of the sample) what might
be some causes for the wilting?

(This examples show how we explore sampling in an actual experiment. They have to be able to
decide which sampling methods work best for a given farming situation.)

AP Problem: Blocking (AP Statistics)
(This is an AP problem that we go over to explore blocking and random assignment.)

Assignment:
Each student must clearly state a null hypothesis of the difference of between the mean electrical
usage in their house when everyone is awake and the mean electrical usage in their house after every
one has gone to bed. They need to create a sampling plan to get a random sample during those times.
Then after gathering their data and making their calculations they need to determine if the data
shows any significant difference from their prediction. They should speculate any causes given their
conclusion.

(This is an example of how students get involved in designing experiments on their own. Here they have
to create their own sampling plan and decide how they will measure electrical usage at the different
times.)

Assignment:

Each student must clearly state a null hypothesis for the difference of the mean temperature of one
floor of his or her family’s house compared with the mean temperature of another floor of his/her
family’s house. The temperature readings should happen at the same time. The students should then
comment on how well paired the data sets are. They should also speculate as to any causes given
their conclusion.

(This is another example of how students get involved in designing experiments on their own.)
Anticipating Patterns

Handout: Probability – Things to do in the face of a problem in probability (AP Statistics)

The first thing to look for when doing a problem in probability is to decide which definition of probability
the problem requires. The classical definition of probability often follows theoretical predictions. The
relative frequency definition of probability often follows strings of events from which an experimenter
records the frequency of successes.

Once you know which definition to use then the next big question to always ask is:

‘What is the sample space?’

Directly following this question as often as you can you should write out all the outcomes in the sample
space (time permitting). Then you should determine the size of the sample space.

Sample spaces for the classical definition of probability look like a finite set listing all the potential
outcomes based on the situation. For rolling two dice the sample space is the following { (1,2); (1,3);
(1,4); (1,5); (1,6); (2,1); (2,2); (2,3); (2,4); (2,5); (2,6); (3,1); (3,2); (3,3); (3,4); (3,5); (3,6); (4,1); (4,2); (4,3);
(4,4); (4,5); (4,6); (5,1); (5,2); (5,3); (5,4); (5,5); (5,6); (6,1); (6,2); (6,3); (6,4); (6,5); (6,6)} But the size of
the sample space for the classical definition of probability DOES NOT CHANGE.

Sample spaces for the relative frequency definition of probability look like strings of experiment results.
In rolling two dice the sample space might look like { (1,4); (1,7); (5,2); (6,1)} which has only four rolls or
it might have a string of 200 rolls. But with the relative frequency definition of probability the size of the
sample space CAN CHANGE.

Once you have done this you can then proceed to the problem and describe as clearly as possible in
terms of the outcomes which event the problem focuses on.

The last goal is to determine the size of the event. To do this look at the outcomes in the sample space
and circle all the outcomes that belong to the event E for the problem.

Once you have done this then you can find the quotient

size of event space E
P(E) =
size of sample space S
IMPORTANT:

Make sure to remember that the classical definition of probability and a previous relative frequency
measurement act like the prediction or theory, and that a new relative frequency measurement is like
the experiment that tests the theory. If the theory (either from the classical definition or a previous
relative frequency measurement) is a good one then the results from the new relative frequency string
should agree with the predictions.

The above process is one of the major activities of science. The above process also belongs to any
discipline that makes measurements. That is how important statistical analysis is in our society.

Do NOT try to guess the probabilities in a given problem. Instead ALWAYS use the formulas to calculate
a probability.

For disjoint events OR means ADD the probabilities

For independent events AND means MULTIPLY the probabilities

One thing that can help is if the problem uses the words ‘find the probability of E given that….’ Here you
should use the formula for conditional probability.

P(E    F)
P(E | F) =
P(F)

size of event space E
Notice that the formula for conditional probability still looks like                          , with the only
size of sample space S
difference that the sample space is now F instead of S.

Another observation that can help is if you can divide the sample space into a disjoint collection of sets
whose union is the whole sample space. Often a problem will have options that divide that sample space
into clear disjoint sets. This often indicates that you should use either the total probability rule or Bayes’
Theorem.
The law of large numbers is an assumption that the relative frequencies in a string of experiments will
get close to that actual probabilities of an event. It does not mean that the actual frequencies will ‘level
out’ , however. This means that when flipping a fair coin the percentage of head will get closer to 50% as
you increase the number of heads, but the actual number of heads minus the actual number of tails can
grow to be quite large (53,000 heads and 49,000 tails yields a percentage very close to 51% heads but
the difference between the number of heads and tails is 4000).

When estimating probabilities empirically…

It is fairly common practice to use observed long – run proportions to estimate probabilities. The
process of estimating probabilities is simple:

   Observe a very large number of chance outcomes under controlled circumstances.
   Estimate the probability of an event by using the observed proportion of occurrence and by
appealing to the interpretation of probability as a long run relative frequency and the law of
large numbers.
   Two way tables can help keep track of the information concisely
   Keep in mind the concept of independence and conditional probability when looking at the
results.

You have to be careful with statements that use the ‘law of average’ which is different form the law of
large numbers.

For every occurrence in favor of an event E there must be an occurrence that is not in favor of event E

Law of Averages (Okay version)

Eventually even unlikely events are bound to happen.

Independence and the Law of Averages

Notice that the law of averages still cannot say the following: If you have flipped 10 tails in a row then it
is more likely that the next one will be a heads’.

Independence of flips guarantees that each flip has a probability of showing heads 50% of the time.

What is unlikely is the particular string of 10 tosses that specifically you got (10 tails is just as unlikely as

(This is an example of a handout I give my students on probability. In includes the basic rules of
probability.)
AP Problem: Variability in Inferential Statistics (AP Statistics)

Example 1.2 from Statistics and Data Analysis Second Edition (p 7).

Contaminant Concentration (in parts per million in well water)

45
frequency (avaerages taken over 200

40
35

30

25
days)

Series1
20

15
10
5

0
10       11       12       13      14      15      16        17       18      19
average contamination (the average of five measurements) (in parts per
million)

As part of its regular water quality monitoring efforts, an environmental control board selects five
water specimens from a particular well each day. The concentration of contaminants in parts per
million (ppm) is measured for each of the five specimens, and then the average of the five
measurements is calculated. The histogram above summarizes the average contamination
values for 200 days.

Now suppose that a chemical spill has occurred at a manufacturing plant about 1 mile from the
well. It is not known whether a spill of this nature would contaminate ground water in the are of
the spill and , if so, whether a spill this distance from the well would affect the quality of well
water. One month after the spill, five water specimens are collected from the well. Which of the
following average measurements would suggest that be convincing evidence that the well water
was affected by the spill?

(a) 10        (b) 12   (c) 16     (d) 18        (e) 20
Type of Problem – Bar Charts and Inferential Statistics

Focus 1 – What is a ‘normal’ contaminant level for the well water?

Before the spill, the average contaminant concentration varied from day to day. An average of
16 ppm would not have been an unusual value, and so seeing an average of 16 ppm after the
spill isn’t necessarily an indication that contamination has increased. On the other hand an
average as large as 18 ppm is less common, and an average of 22 ppm is not at all typical of
the pre - spill values. Therefore, 20 ppm makes sense as an answer.

(This is an AP problem that we cover on the first day of school that includes variability.)
Normal                      The general        In this activity   Each student will         Sec 7.6 #
Distributions                shape of the       we try to          have to go home and       7.67, 7.69,
normal             measure the        make sample the           7.71, 7.73
distribution.      length of our      electric meter
   How to             classroom.         readings when they
calculate areas    From the data      get home during
using a table.     that we collect    some time frame.
   How z – score      we calculate       They will have to
relates to         the mean and       think about how they
calculating        standard           will sample the
areas.             deviations.        meter. Then after
   How probability    We also            they have gathered
notation works     calculate a        the data they need to
with normal        probability        calculate the mean
distributions.     distribution       and standard
   Upper tailed,      function from      deviation. They also
lower tailed and   the data.          need to thing about
two – tailed                          whether the data that
calculations.                         they have looks
   Symmetry of                           normal. Assuming
the normal                            that the data is
distribution.                         normal they will also
have to make some
guesses as to how
many measurements
they think will fall in
a certain range of
values.

(This is on the syllabus.)
The Sampling         Comparing the      In ‘Cents and   Each student then        Sec 8.2, 8.17,
Distribution of       distribution of    the Central     will then look up the    8.19, 8.21,
the Sampling          single             Limit           number of coins          8.23
Mean                  measurements       Theorem’ we     minted in different
of a random        explore how     years to help explain
variable X with    the Central     why the distribution
the distribution   Limit           of dates from pennies
of the averages    Theorem         that we saw in class
from samples of    works. The      looked the way it did.
size N.            distribution
   How the            that we take
distribution of    samples from
the averages       is the
from several       distribution
samples of size    dates of 100
N can have a       pennies.
different shape
than the
original
distribution.
   How the
distribution of
the averages
from different
samples of size
N become more
‘Gaussian’ as
the size of the
sample
increases.
   How the mean
of the
distribution of
the averages
from samples of
size N gets
closer to actual
population
mean from
which the
samples come.
   How the
standard
deviation of the
averages from
samples of size
N get smaller as
N increases.
   The Central
Limit Theorem.

(This is on the syllabus.)
(This is an AP problem that we go over that explores combining independent random variables.)
Statistical Inference
The syllabus includes detailed coverage of chapters on confidence intervals for a proportion,
the difference between two proportions, the mean, the difference between two means, and
the slope of the regression line. The syllabus also covers hypothesis testing and chi – squared
tests; goodness of fit and tests for homogeneity/independence. See chapters 5, and 9 – 12.
The course draws connections between all aspects of the
statistical process including design, analysis, and
conclusion
Projects:

Each student must write an article for the school newspaper. Before they can submit an article they
must design a survey or experiment, create a sampling plan, gather the data, analyze the data and come
to a conclusion given the data set. Then they have to write an article summarizing what they found
along with at least one graphical aid for any reader of that article.
The course teaches students how to communicate
methods, results and interpretations using the
vocabulary of statistics.

Assignment: Correlation (AP Statistics)

Look up “linear relationships in science” in Google’s Scholarly index and find a pair of quantities that
have a linear relationship. Make sure that you can identify the raw data that the scholars used to
demonstrate the linear relationship between the variables.

Read the article and summarize the article while including the following information.

   Describe which quantities have a linear relationship
   Include a scatter plot demonstrating the linear relationship
   Calculate the correlation for the raw data the scholars used to demonstrate the linear
relationship.
   Describe how your calculation corresponds to the results in the paper that you read.

(This an example of one assignment where each student must look up an existing study that
demonstrates a particular statistical relationship between variables. Here they have to
summarize the methodology and statistical analysis, they have to interpret and explain what
the relationship using statistical vocabulary [here like correlation].)

The course teaches students how to use graphing
calculators to enhance the development of statistical
understanding through exploring data, assessing
models, and/or analyzing data.

We will cover how to use the graphing calculators each time we encounter a feature that the graphing
calculator can accommodate. These features include the following: calculating the mean, calculating the
standard deviation, calculating the median, creating scatter plots, creating box plots, linear regression,
non – linear regression, 1 sample t and z tests, 2 samples t and z tests, z confidence intervals, t
confidence intervals, chi – squared tests for goodness of fit, chi – squared tests for homogeneity and
independence.
The course teaches students how to use graphing
calculators, tables, or computer software to enhance
the development of statistical understanding through
performing simulations.
We use simulations to help make the central limit theorem clearer and that it works
independently of the beginning distribution.
The course demonstrates the use of computers
and/or computer output to enhance the
development of statistical understanding
through exploring data, analyzing data, and/or
assessing models.
Activity: Food and Agricultural Organization (AP Statistics)

Materials: Laptop

Statistics is a tool that is meant to analyze and help us understand data. To this end we will
need several sources of data. The first source that we will make use of is the Food and
Agriculture Organization of the United Nations.

Click on ‘want to register?’

Register yourself at FAO by filling in the information for the following:

   Name
   Manlius Pebble Hill for the organization
   Educational institution for the type of organization
   USA for the country
   Check the first column of boxes (and any others that interest you)

We will use data from this website today and throughout the year.

Go to http://www.fao.org/economic/ess/en/ and find the current agricultural yearbook.
Find the spread sheets for the following:

   Total and Agricultural Population (including forestry and fisheries) (A1)
   Human Development Index and Poverty (G4)

Find the definitions for total population, agricultural population, human development index and
poverty. Find the units for each of the categories.

Copy the columns for 2009 in each of the following categories into a new excel worksheet titled
Excel Practice 1 AP Statistics. Make sure the countries match the data in each row.

   Name of Country
   total population
   agricultural population
   human development index
   Poverty Prevalence
   Year Poverty Prevalence was recorded

In a new column you are going to calculate the ratio of agricultural population to total population.
Label the column as such and then in the first row (lining up with the first country) place an
equation that looks like ‘=E11/D11’ which should represent the ratio of agricultural population to
total population of the first country. In this formula agricultural population for Afghanistan was in
column E row 11 and the agricultural population for Afghanistan was in column D row 11.

Then copy the formula in that cell and paste it to the rest of the cells in that row all the way down
to the last country. The numbers should all be different and represent each country’s ratio.

Note: all formulas in a cell for Excel should be preceded by an equal sign.

Excel has a list of statistical functions that you can use, these are listed under ‘statistical
functions’ in the help search menu. You will be using several of these functions from excel.
Some of these include the following:
   =AVERAGE( range of cells ) This produces the average of all the numbers that you
highlighted.
   =MEDIAN( range of cells) This finds the median or middle number of the cells that you
highlighted.
   =SUM (range of cells ) This adds up the numbers in the cells that you highlighted
   =STDEV (range of cells ) This produces the Sample Standard Deviation for the cells that
you highlighted.
   =CORREL ( first range of cells, second range of cells ) This produces the Correlation
between the variables represented in the given two ranges of cells [usually two columns
or two rows].

Do the following calculations with the data that you copied from the FAO statistical yearbook:

   Find the average agricultural population
   Find the median agricultural population
   Find the sum of the agricultural population and compare it to the world agricultural
population. What should be true about these numbers?
   Find the standard deviation of the agricultural population.
   Find the correlation between the column labeled Human Development Index and the
column that should represent the ratio ‘agricultural population : total population’

Make the following scatter plots and label the axes and each scatter plot:

   Human Development Index versus Proportion of Population that Farms
   Poverty Prevalence versus Proportion of Population that Farms

What is the definition of the term ‘human development index’?

What is the definition of the word ‘poverty’?
What do you notice about the overall trends in each scatter plot? Does it look like there is any
relationship between the different variables that you plotted?

What do the points on the x axis mean?

What would this suggest about what a nation should do to improve its human development
index?

Would your solution in the previous question automatically reduce poverty in a given country?
Why or why not?

What is considered the typical trend with respect to the percentage of people that farm?

Why does USA’s poverty prevalence not show up in the table?

```
To top