Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Sampling distributions

VIEWS: 1 PAGES: 46

									Variable         N     Mean    Median   TrMean    StDev   SE Mean
Drink           50   17.250    17.050   17.168    2.998     0.424
Risky           50   40.984    40.900   40.848    4.380     0.619
South           50   0.2200    0.0000   0.1818   0.4185    0.0592
Income98        50    25677     25447    25455     3681       521
MADDtota        50    6.540     7.000    6.523    1.971     0.279
ENFORCE         50    6.400     6.000    6.409    2.339     0.331
YouthPEE        50    6.740     7.000    6.750    2.732     0.386
LAWS            50    5.960     6.000    5.955    2.338     0.331

Variable   Minimum   Maximum       Q1       Q3
Drink       10.300    24.700   15.500   19.275
Risky       33.900    51.500   37.525   43.900
South       0.0000    1.0000   0.0000   0.0000
Income98     19635     37108    22567    28112
MADDtota     3.000    11.000    5.000    8.000
ENFORCE      1.000    12.000    5.000    8.000
YouthPEE     1.000    12.000    4.750    9.000
LAWS         1.000    11.000    4.750    7.000
       “Describe the univariate characteristics of your
       variables:

I operationalized underage drinking by using data available at:
http://www.samhsa.gov/oas/NHSDA/99YouthState/appb.htm#b1b
These data were estimates of the percentage of 12-17 year olds who
reported using alcohol in the past month during 1999.
The average per state is 17.250 with a standard deviation of 2.998. The
states with the lowest percentage were Utah (10.3%) and Virginia (12.8%).
Utah’s value is particularly low – nearly a full standard deviation below the
next lowest score indicating that youths in Utah are quite different from the
rest of the country (or very reluctant to admit they are the same!)
States with the highest incidence of youth drinking were North Dakota
(24.7%) and Montana (23.6%). Seeing these numbers along with high
values for South Dakota (21.2), Wyoming (22.1) and Colorado (20.8) make
it appear that underage drinking may be a particular problem in rural states.
State % of 12-17 Year Olds Who Report Using Alcohol
                 In Past Month, 1999
              15




              10
  Frequency




              5




              0

                   10.0 11.5 13.0 14.5 16.0 17.5 19.0 20.5 22.0 23.5 25.0
                                          Drink
     Steps of Hypothesis Testing
1. State Research Hypothesis: HR
           & Null Hypothesis: H0
Choose p (probability) value – most likely .05
      Weigh chance of Type I error vs. Type II
2. Choose appropriate test.
3. Compute test statistic.
4. Get critical value.
5. Compare test statistic with critical value.
6.7., 8. Make your Conclusion with a probability level.
            If test statistic > critical value
“Reject the null hypothesis and temporarily accept the research
hypothesis at the (.__) level.” .__ is given by p
           If test statistic < critical value….
“Fail to reject the null hypothesis at the (.__) level”
                 z test
                X 
             Z
                / n
When is the z test appropriate?
When we have population parameters for one
group and sample statistics for another.

What is the independent variable and what is
the dependent variable?

The INDEPENDENT VARIABLE is whatever
defines the groups you are comparing.
If you compare the mean of ideology for
a sample of Republicans to the mean of
all Democrats, your hypothesis must be:
Party Affiliation affects ideology.

   Be sure to take time to figure out
    what the DV and IV is for these
                 tests.
Directional vs. Non-Directional
         Hypotheses
    It is better to specify what kind of relationship we expect – positive or
                                    negative.

           A non-directional hypothesis doesn’t specify a direction.
                                H0: μ1 ≠ μ2

                  Could mean that μ1 > μ2 or that μ1 < μ2

       A non-directional hypothesis is called a “two-tailed” hypothesis.

 It is looking in two directions – above and below and we will reject the null if
                      we find compelling evidence of either.
So, a directional hypothesis, such as:

H1: μ1 > μ2
is referred to as a “one tail test.”

What is the null hypothesis here?
H0: μ1 ≤ μ2

So we will reject the null if and only if the value of the sample mean of
1 is demonstrably above the population mean of 2.

The critical values are different:

Level of Significance                  One Tail        Two-Tail
.05                                    1.65            1.96
.01                                    2.33            2.58
.001                                   3.09            3.29
SAMPLING DISTRIBUTION

The sample distribution is the
  distribution of all possible
 sample means that could be
 drawn from the population.
    Key Point: how does the z-score relate to
       hypothesis testing with the z test?

 The Z-test statistic is the Z-score of a particular
 distribution: the sampling distribution of sample
                       means.

  This is the frequency distribution that would be
     obtained from calculating the means of all
theoretically possible samples of a designated size
   that could be drawn from a given population.


                       Huh?
                        We have a population.

          We take a sample of size n and compute the mean.

    Keep track by placing the mean on a frequency distribution – or
                      graphing it in a histogram.

Then we do this again and place the new mean value on the frequency
                 distribution and on the histogram.

Then do this again and again until we have taken every possible sample.
    We will end up with a distribution that begins to look normally
                              distributed.

  The distribution of these means from samples is called the sampling
                      distribution of sample means.
A sample of 3 students from a class – a
population – of 6 students and measure
             students GPA
     Student               GPA
      Susan                2.1
      Karen                 2.6
       Bill                 2.3
      Calvin                1.2
      Rose                  3.0
      David                 2.4
Draw each possible sample from
       this ‘population’:


                 Karen 2.6
    Susan 2.1
                             Bill 2.3
            Calvin 1.2

    David 2.4      Rose 3.0
 With samples of n = 3 from
this population of N = 6 there
    are 20 different sample
         possibilities:

N       N!        6  5  4  3  2 1 720
 
 n  n!( N  n)!  3  2 13  2 1  36  20
 
      Note that every different sample would
           produce a different mean and sd
ONE SAMPLE = Susan + Karen +Bill / 3
                 = 2.1+2.6+2.3 / 3
              X = 7.0 / 3     = 2.3
Standard Deviation:
(2.1-2.3) 2 = .22 = .04
(2.6-2.3) 2 = .32 = .09
(2.3-2.3) 2 = 02 = 0
s2=.13/3 and s =       .043 =.21
So this one sample of 3 has a mean of 2.3 and a sd of .21
What about other samples?
► A SECOND SAMPLE
    X = Susan + Karen + Calvin
      = 2.1 + 2.6 + 1.2
            = 1.97
SD = .58

► 20th SAMPLE
    X = Karen + Rose + David
       = 2.6 + 3.0 + 2.4
       = 2.67
SD = .25
   SIMPLE EXAMPLE OF A
  SAMPLING DISTRIBUTION

► Assume the true mean of the
 population is known, in this simple case
 of 6 people and can be calculated as
 13.6/6 =  =2.27

► The  mean of the sampling distribution
 (i.e., the mean of all 20 samples) is
 2.30.
 What is a Sampling Distribution?
►A  distribution made up of every conceivable
  sample drawn from a population.
► A sampling distribution is almost always a
  hypothetical distribution because typically you
  do not have and cannot calculate every
  conceivable sample mean.
► The mean of the sampling distribution is an
  unbiased estimator of the population mean
  with a computable standard deviation.
                      Second Example from Text
We have a population that contains only 5 individuals.
X1 = 1
X2 = 2
X3 = 3
X4 = 4
X5 = 5
Since this is the population, we know that   3
We are going to draw a sample of 3.
There are 60 ways that this could be done but if we pay attention to order only 10.
Sample 1: X5,X4,X3 So X = 4
Sample 2: X5,X4,X2 So X = 3.7
Sample 3: X5,X4,X1 So X = 3.3
Sample 4: X5,X3,X2 So X = 3.3
Sample 5: X5,X3,X1 So X = 3.0
Sample 6: X5,X2,X1 So X = 2.7
Sample 7: X4,X3,X2 So X = 3.0
Sample 8: X4,X3,X1 So X = 2.7
Sample 9: X4,X2,X1 So X = 2.3
Sample 10: X3,X2,X1 So X = 2.0
         Frequency Distribution
So we can make a frequency distribution of all possible samples:
 X       f
4.00     1
3.67     1
3.33     2
3.00     2
2.67     2
2.33     1
2.00     1

Or we can make a histogram of all possible samples:
               Sampling Distribution of Sample Means

          20
Percent




          10




           0

                2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00
                                    C1
      Central Limit Theorem
If all possible random samples, each the size of
   your sample, were taken from any population
   then the sampling distribution of sample
   means will have:
► a mean equal to the population mean 
► a standard deviation equal to 
                                    n
The sampling distribution will be normally
  distributed IF EITHER:
► the parent population from which you are
   sampling is normally distributed OR
► IF the sample size is greater than n=30.
        Sampling Distribution is a
         Probability Distribution
► The mean of each sample is a random variable
  …with each mean varying according to the laws of
  probability.
► The CLT says that if we have a sample size
  greater than n > 30 the sampling distribution of
  means will be normally distributed.
► The sampling distribution would have a standard
  deviation called the standard error equal to
                                            
                                             n
      The standard error is the margin of error
          either side of the sample mean
_________________________________________________________________________________________


Mean                    +1 SE                    +2 SE                    +3SE

Note: as the sample size increases the margin
of error gets smaller, that is, the sampling
distribution gets more peaked, thus your
estimate gets more precise.
   Sampling Distribution  NDC

Given a large enough sample size, n > 30, the
sampling distribution from which you are drawing
your one sample will be normally distributed
regardless of the shape of the population’s
characteristic.

Thus, you can legitimately compute a sample mean
and sample standard deviation and make inferences
about a population characteristic.
Illustration: A NDC from a
    Uniform Distribution:
When and how the sampling distribution
approaches normality as the sample size
 increases – and thereby lets us use the
68-96-99.7 rule when making inferences
          about the population
In this example the mean would be 3.5:
  X [1  2  3  4  5  6]
                            =   21 /6 =  = 3.5
  N             6
You could also compute the sd of this
distribution:
                          X X 2
                                   …
                                           
                              N
    X            X X            X  X    2

    1       1– 3.5 = -2.5          6.25
    2       2 – 3.5 = -1.5         2.25
    3       3 – 3.5 = -.5           .25
    4       4 – 3.5 =    .5         .25
    5       5 – 3.5 = 1.5          2.25
    6       6 – 3.5 = 2.5          6.25
    21                             17.5


   21                      17.5
   3.5                      2.9  1.7
   6                        6
     ILLUSTRATION OF SAMPLING
           DISTRIBUTIONS
Slide 13: A uniform distribution of the population we would
get by repeatedly rolling one die and recording the number.
Here the numbers 1 thru 6 are equally probable, so the
distribution is flat, not a Normally Distributed variable.
In each of the ensuing figures we draw 500 different SRSs.
The figures show graphically what happens to the shape of the
sampling distribution as the size of the sample increases from
averaging over 2 throws of the die (i.e., n=2) thrown 500 times
to n=20 (each of 20 dies thrown 500 times). In all cases we
record the mean and sd for 500 samples.
           500 Samples of n = 2
We threw 2 dies; added up the total number of points of
   the 2 die, and divided by 2 to obtain the mean. We
repeated this process 500 times, each time recording the
 mean of the 2 die and used these outcomes to build the
                        histogram.
500 Samples of n = 4
500 Samples of n = 6
500 Samples of n = 10
500 Samples of n = 20
          Key Observations
► Asthe sample size increases the mean of
 the sampling distribution comes to more
 closely approximate the true population
 mean, here known to be  = 3.5

► AND-this   critical-the standard error-that
 is the standard deviation of the sampling
 distribution – gets systematically narrower.
  Three main points about sampling
            distributions
                   as the sample size gets bigger the
► Probabilistically,
  sampling distribution better approximates a
  normal distribution.
► The mean of the sampling distribution will
  more closely estimate the population parameter as
  the sample size increases.
► The standard error (SE) gets narrower and
  narrower as the sample size increases. Thus, we
  will be able to make more precise estimates of the
  whereabouts of the unknown population mean.
        THE MEAN OF THE
     SAMPLING DISTRIBUTION
                   MX  
The mean of a sampling distribution ( M X ) is made up
of all possible SRSs of the same size as your sample. Its
mean will equal the population mean from which it was
drawn.
The distribution of sample means will be normally
distributed, centered at the population mean  with a
standard deviation of the sampling distribution, called the
standard error (SE).
 ESTIMATING THE POPULATION
           MEAN
We are unlikely to ever see a sampling distribution
 because it is often impossible to draw every
 conceivable sample from a population and we never
 know the actual mean of the sampling distribution
 or the actual standard deviation of the sampling
 distribution. But, here is the good news:
We can estimate the whereabouts of the population
 mean from the sample mean and use the sample’s
 standard deviation to calculate the standard error.
 The formula for computing the standard error
 changes, depending on the statistic you are using,
 but essentially you divide the sample’s standard
 deviation by the square root of the sample size.
           Don’t be confused
    the standard deviation of your sample:


               
                    X  X   2



                         n

                       
the standard deviation of sampling distribution:
                  SE 
                        n
What we want to do now is to take the next
 step, to learn how to substantiate our
 conclusions -- to learn how to back up our
 conclusions with analyses that will reflect
 how much confidence we should have
 that our estimate of say the mean of the
 population -- which is being estimated from
 our sample -- is at or close to the true
 population mean.
Note that we rarely know the standard
deviation of the population or the standard
deviation of the sampling distribution.
 The standard error must be estimated by
using the standard deviation of your sample and
dividing by N – 1.
   The Standard Error For Samples:
                                       2


             SE 
                       (X  X )
                           N 1
                            s
or, same thing,   SE 
                           N 1
What we are trying to do is locate the unknown
whereabouts of the population mean. Probabilistically
speaking mu is at or somewhere either side of the
sample mean.
NDC as Sampling Distribution




                  μ=
         X = 97
                  99.75
        Two Steps in Statistical
         Inferencing Process
1. Calculation of “confidence intervals” from the
   sample mean and sample standard deviation within
   which we can place the unknown population mean
   with some degree of probabilistic confidence
2. Compute “test of statistical significance” (Risk
   Statements) which is designed to assess the
   probabilistic chance that the true but unknown
   population mean lies within the confidence interval
   that you just computed from the sample mean.
So, first we calculate confidence limits and then test for
  statistical significance, which is the probability of mu
  being within the CIs we computed.
Both these steps are required when making inferences
  about the whereabouts of the unknown population
  mean. Both the calculation of confidence intervals and
  then the calculation of a measure of statistical
  likelihood -- are based on the probabilistic patterns of
  a sampling distribution.
Together, the confidence limits and statistical test tells
  us the probability as to what would happen IF we
  sampled the population not once but an infinite
  number of times. That is, we are sampling from a
  sampling distribution.This kind of inferencing is the
  hallmark of statistics.
In Summary

								
To top