Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Ch. 9 Sampling Distribution

VIEWS: 39 PAGES: 48

									Ch. 9 Sampling Distribution
Recall the big picture of
statistics—we have a question
about a group that can be
answered with a number…but
the group is too large to
measure entirely; so we
measure a small part instead.
The answer to our question—the
number that we can't directly measure,
since the group is too large—is called a
parameter. A parameter measures
some feature of a population (the
large group). A parameter has one (and
only one) value—we just don't know
what it is. The most important
parameters for us are the population
mean (μ) and population proportion (p
or π).
The measurement that we took from the
small part is called a statistic. A statistic
measures some feature of a sample (the
small part of the population). Since there are
many, many possible samples that could
have been chosen, there are many, many
different possible values of a statistic. The
most important statistics for us are the
sample mean (x-bar) and sample proportion
(p-hat).
                   Compare
• parameter                     • statistic
  – mean: μ                        – mean: x-bar
  – standard deviation: σ          – standard deviation: s
  – proportion: p                  – proportion: p-hat
• Sometimes we call             • Sometimes we call
  the parameters “true”;          the statistics
  true mean, true                 “sample”; sample
  proportion, etc.                mean, sample
                                  proportion, etc.


                            5
A phone-in poll conducted by a
newspaper reported that 73% of
those who called in liked business
tycoon Donald Trump. The number
73% is a

A) Statistic
B) Sample
C) Parameter
D) Population
A phone-in poll conducted by a
newspaper reported that 73% of those
who called in liked business tycoon
Donald Trump. The unknown true
percentage of American citizens that
like Donald Trump is a

A) Statistic
B) Sample
C) Parameter
D) Population
A Statistic as a Random Variable
Since a statistic can take on many different
values, it is a variable. Since the statistic is
measured from a random sample, a statistic
is a random variable. As with all variables,
we are interested in the distribution of the
variable (in this case, a statistic). The
distribution of a statistic is called a Sampling
Distribution (for that statistic). Thus, we will
be concerned with the Sampling Distribution
of the Sample Mean, and the Sampling
Distribution of the Sample Proportion.
The sampling distribution of a statistic is

A) the probability that we obtain the statistic
in repeated random samples.
B) the mechanism that determines whether
randomization was effective.
C) the distribution of values taken by a
statistic in all possible samples of the same
size from the same population.
D) the extent to which the sample results
differ systematically from the truth.
I flip a coin 10 times and record the
proportion of heads I obtain. I then repeat
this process of flipping the coin 10 times and
recording the proportion of heads obtained
many, many times. When done, I make a
histogram of my results. This histogram
represents

A) the bias, if any, that is present.
B) the true population parameter.
C) simple random sampling.
D) the sampling distribution of the proportion
of heads in 10 flips of the coin.
Unbiased Statistics
Perhaps you are wondering why we care about the
distribution of the statistic, when what we really
want is the value of the parameter—a good
question! It turns out that statistics have very
special (and useful) relationships with the
parameters that they are estimating—provided
some conditions are met. The most important
condition is that the statistic is unbiased—the
mean of the sampling distribution is the same
as the parameter that the statistic is
estimating. So, for example, if x-bar is unbiased,
Mu-xbar = Mu-x . It turns out that we can use our
statistic (say, x-bar) to make an estimate of the
center.
12
Center

If x-bar is unbiased, then Mu-
xbar = Mu-x. For now, let's just
say that a random sample is
your ticket to an unbiased
statistic. It's actually more
complicated than that, but let's
just leave it there for now.
Spread

      X
 x 
       n    Actually, this is only
(approximately) true if the size of
the sample is quite small (less than
10%) compared to the population.
Shape
If the distribution of the population
is normal, then the distribution of
the sample mean is also normal. If
the distribution of the population
isn't normal, or if we don't know,
then we might be in trouble—we
can't calculate probabilities unless
we know the shape of the
distribution.
Fortunately, the Central Limit
Theorem comes to the rescue! It says
that as the sample size increases, the
shape of the distribution of x-bar
becomes more normal. Of course, that
brings another question—how big does
the sample need to be in order for the
distribution of the sample mean to be
approximately normal? The answer is,
it depends…
The shape of the population is the key. If it
has a shape that is approximately normal,
then you don't need a very large sample—
maybe as few as 15 would do. If the
population has a slightly skew shape, then
maybe you only need 30 in the sample. If
the population is severely skew, perhaps
you might need 45 or more. Again, the
shape of the population is the key. You need
some idea about the shape of the population
in order to know how big of a sample you'll
need in order to get that normal shape for
the distribution of x-bar.
But how do you get an idea about the shape of the
population? From the distribution of the sample
(NOT the sampling distribution). The sample is
your best guess as to the nature of the population.
If the distribution of the sample is approximately
normal, then that's good enough to assume that
the population has a distribution which is
approximately normal, in which case you don't
need a very large sample size to claim that the
shape of the distribution of is approximately
normal. On the other hand, if the shape of the
sample is terribly skewed, then you need a large
sample in order to make an approximately normal
claim about the distribution of x-bar.
19
A random sample of size 25 is to be taken from a
population that is normally distributed with mean 60
and standard deviation 10. The average J of the
observations in our sample is to be computed. The
sampling distribution of J is

A) normal with mean 60 and standard deviation 10.
B) normal with mean 60 and standard deviation 2.
C) normal with mean 60 and standard deviation 0.4.
D) normal with mean 12 and standard deviation 2.
An automobile insurer has found that repair claims
have a mean of $920 and a standard deviation of
$870. Suppose that the next 100 claims can be
regarded as a random sample from the long-run
claims process. The mean and standard deviation
of the average J of the next 100 claims is

A) mean = $920 and standard deviation = $87.
B) mean = $920 and standard deviation = $8.70.
C) mean = $92 and standard deviation = $87.
D) mean = $92 and standard deviation = $870.
For humans, gestation periods are
approximately normally distributed, with
mean 266 days and standard deviation
16 days. What is the probability that a
single child gestates for at least 270
days?

What is the probability that a (random)
sample of 5 children gestate for an
average of at least 270 days?
The average weight of great white sharks is 4000 lbs
(with standard deviation 800 lbs). Use this information
for all questions on this page.

1) Identify the parameter of interest and the statistic that
estimates it.
2) Researchers wonder if the average weight is really
lower, and plan on taking a sample of 100 sharks. Each
such sample would produce a new mean weight, xbar.
Find the values of mu-sub-xbar and sigma-sub-xbar.
3) Describe the shape of the sampling distribution of x-bar.
Justify your answer.
4) Regardless of your answer to [3], assume that the
sampling distribution of xbar is approximately normally
distributed with the mean and standard deviation you gave
in [2].
What is the probability that a sample of 100 sharks will
                              23
have a mean weight of less than 3600 lbs lbs?
Distribution of the Sample
Proportion

Center
                          p  p
If p-hat is unbiased, then ˆ
Again, a random sample is your
best bet that this condition is
met.
Suppose you are going to roll a die 60
times and record p, the proportion of
times that an even number (2, 4, or 6)
is showing. The sampling distribution of
p-hat should be centered about

A) 1/6
B) 1/3
C) ½
D) 30
Spread

       p 1 p 
p 
 ˆ
          . Again, this is actually
          n

only true (close enough) if the
size of the sample is relatively
small.
A survey asks a random sample of 1500
adults in Ohio if they support an increase in
the state sales tax from 5% to 6%, with the
additional revenue going to education. Let p
denote the proportion in the sample that say
they support the increase. Suppose that 40%
of all adults in Ohio support the increase.
The standard deviation of p is

A) .4
B) .24
C) .0126
D) .000126
A fair coin (one for which both the
probability of heads and the probability
of tails are 0.5) is tossed 60 times. The
probability that less than 1/3 of the
tosses are heads is

A) .33
B) .109
C) .09
D) 0.0043
Shape
Since is (ultimately) measuring a
qualitative variable, the population cannot
have a normal distribution. However, p-
hat itself is quantitative, so p-hat can have
a distribution that is approximately
normal. In particular, will have an
approximately normal distribution if np
and n(1 – p) are each at least 10.
               Example 9.5
• Television executives and companies who
  advertise on TV are interested in how many
  viewers watch particular television shows.
  According to 2001 Nielsen ratings, Survivor II
  was one of the most watched television shows in
  the US during every week that is aired.
• Suppose that true proportion of US adults who
  watched Survivor II is p=.37.
• Suppose we did a survey with n=100.
• Suppose we did this survey 1000 times.


                       30
31
               Example 9.5
• Television executives and companies who
  advertise on TV are interested in how many
  viewers watch particular television shows.
  According to 2001 Nielsen ratings, Survivor II
  was one of the most watched television shows in
  the US during every week that is aired.
• Suppose that true proportion of US adults who
  watched Survivor II is p=.37.
• Suppose we did a survey with n=1000.
• Suppose we did this survey 1000 times.


                       32
33
34
          Different question
• An SRS of 1500 first-year college students
  were asked whether they applied for
  admission to any other college. In fact,
  35% of all first-year students applied to
  colleges beside the one they are
  attending.
• What is the probability that the poll will be
  within 2 percentage points of the true p?


                       35
 p  p  .35
  ˆ

         .35  .65
p 
  ˆ                  .0123153021
          1500
     .33  .35
z                1.626
       .0123
     .37  .35
z                1.626
       .0123
P  1.626  Z  1.626   .9484  .0516  .8968
                        36
In 2010, Mars candy company reported that 35% of the
MM’s produced were brown MM’s. Use this information
            for all questions on this slide.

   1) What is the parameter of interest, and what statistic
                         estimates it?
  2) The student newspaper wants to see if this figure has
increased, and plans to check 45 MM’s. Each such sample
 of 45 MM’s would result in a new value of p-hat . Find the
        values of Mu-sub-p hat and sigma-sub-p hat.
3) Describe the shape of the sampling distribution of p hat .
                      Justify your answer.
4) Regardless of your answer to [3], assume that the shape
    of the sampling distribution of p-hat is approximately
 normal with the mean and standard deviation you gave in
  [2]. What is the probability that a sample of 45 MM’s will
      result in more than 26 holding doctorate degrees?
                                37
Suppose we select an SRS of size n =
100 from a large population having
proportion p of successes. Let X be the
number of successes in the sample.
For which value of p would it be safe to
assume the sampling distribution of X
is approximately normal?

A) .01
B) 1/9
C) .975
D) .9999
According to USA Today, 56% of
the residents of Alaska own cell
phones. What is the probability that
a random sample of 500 Alaskans
will contain fewer than 275 that
own cell phones?

*Solve as binomial and proportion.
In a test of ESP (extrasensory perception), the
experimenter looks at cards that are hidden from
the subject. Each card contains either a star, a
circle, a wavy line, or a square. An experimenter
looks at each of 100 cards in turn, and the subject
tries to read the experimenter's mind and name
the shape on each. What is the probability that the
subject gets more than 30 correct if the subject
does not have ESP and is just guessing?

A) 0.310.
B) 0.250.
C) 0.123.
D) 0.043.
If a statistic used to estimate a
parameter is such that the mean of its
sampling distribution is equal to the
true value of the parameter being
estimated, the statistic is said to be

A) random
B) biased
C) a proportion
D) unbiased
The variability of a statistic is described
by

A) the spread of its sampling
distribution.
B) the amount of bias present.
C) the vagueness in the wording of the
question used to collect the sample
data.
D) the stability of the population it
describes.
A random variable X has mean mX and
standard deviation sX. Suppose n
independent observations of X are taken
and the average J of these n observations is
computed. We can assert that if n is very
large, the sampling distribution of J is
approximately normal. This assertion follows
from

A) the law of large numbers
B) the central limit theorem
C) the definition of sampling distribution
D) the bell curve
A researcher initially plans to take a SRS of
size n from a population that has mean 80
and standard deviation 20. If he were to
double his sample size (to 2n), the standard
deviation of the sampling distribution of J
would change by a factor of

A) 2
B) 1/ 2
C) 2
D) 1/2
The weights of extra-large eggs have a
normal distribution with a mean of 1
ounce and a standard deviation of 0.1
ounces. The probability that a dozen
eggs weighs more than 13 ounces is
closest to

A) 0.0000.
B) 0.0020.
C) 0.1814.
D) 0.2033.
The distribution of actual weights of 8-ounce
chocolate bars produced by a certain
machine is normal with mean 8.1 ounces
and standard deviation 0.1 ounces. If a
sample of five of these chocolate bars is
selected, the probability that their average
weight is less than 8 ounces is

A) 0.0125.
B) 0.1853.
C) 0.4871.
D) 0.9873.
The distribution of actual weights of 8-ounce
chocolate bars produced by a certain
machine is normal with mean 8.1 ounces
and standard deviation 0.1 ounces. If a
sample of five of these chocolate bars is
selected, there is only a 5% chance that the
average weight of the sample of five of the
chocolate bars will be below

A) 7.94 ounces.
B)8.03 ounces.
C)8.08 ounces.
D) 8.20 ounces.
In a large population of adults, the mean IQ
is 112 with a standard deviation of 20.
Suppose 200 adults are randomly selected
for a market research campaign. The
probability that the sample mean IQ is
greater than 110 is approximately

A) 0.079.
B) 0.421.
C) 0.921.
D) 0.579.

								
To top