Measures of Central Tendency

Document Sample
Measures of Central Tendency Powered By Docstoc
					Measures of Central

“to be or not to be
•   Normal Distributions
•   Skewness & Kurtosis
•   Normal Curves and Probability
•   Z- scores
•   Confidence Intervals
•   Hypothesis Testing
•   The t-distribution
Is this normal ?






                 .5                                                   Std. Dev = 160.68
                                                                      Mean = 178.3

                0.0                                                   N = 6.00
                        100.0    200.0   300.0      400.0     500.0



                                 Frequency         Percent        Valid Percent            Percent
        Valid     70.00                 1              16.7               16.7                  16.7
                  100.00                2              33.3               33.3                  50.0
                  150.00                2              33.3               33.3                  83.3
                  500.00                1              16.7               16.7                 100.0
                  Total                 6            100.0               100.0


           N                                     Valid                                        6
                                                 Missing                                      0
           Mean                                                                       178.3333
           Skewness                                                                      2.242
           Std. Error of Skewness                                                          .845
           Kurtosis                                                                      5.219
           Std. Error of Kurtosis                                                        1.741
        Normal Distributions
• Are your curves normal?
• Why do we care about normal curves?
• What do normal curves tell us?
The curves tell us something about the distribution
  of the population
The curves allow us to make statistical inferences
  regarding the probability of some outcomes
  within some margin of error
The normal distribution
            • A distribution is easily
              depicted in a graph
              where the height of the
              line determined by the
              frequency of cases for
              the values beneath it.
            • Most cases cluster
              near the middle of a
              distribution if close to
          The Normal Curve
• Bell-shaped distribution or curve
• Perfectly symmetrical about the mean.
      Mean = median = mode
• Tails are asymptotic: closer and closer to
  horizontal axis but never reach it.
Skewness and Sample Distributions
Not all curves are normal, even if still bell-shaped
• Formula for skewness

             3(mean  median)
  Kurtosis (It’s not a disease)
• Beyond skewness, kurtosis tells us when
  our distribution may have high or low
  variance, even if normal.

• The kurtosis value for a normal distribution
  will equal 3. Anything above this is a
  peaked value (low variance) and anything
  below is platykurtic (high variance).
   Back to normal distributions
• The power of normal distributions, or those
  close to it, is that we can predict where
  cases will fall within a distribution

• For example, what are the odds, given the
  population parameter of human height, that
  someone will grow to more than eight feet?
• Answer, likely less than a .025 probability
Sample Distribution
          • What does Andre the
            Giant do to the sample

          • What is the probability
            of finding someone
            like Andre in the
          • Are you ready for
            more inferential
          • Answer: Oh boy, yes!!
 Normal Curves and probability
• We have answered the question of what
  Andre and the Sumo wrestler would do to
  the distribution
• But what about the probability of finding
  someone the same height as Andre in the
• What is the probability of finding someone
  the same height as Dr. Peña or Dr.
       More on normal curves and

Dr. Boehmer would be here   Andre would be here
       Z-Scores (no sleeping!!)
• We can standardize the central tendency
  away from the mean across different
  samples with z-scores.

•    The basic unit of the z-score is the standard

                     (Xi  X )
We can use the z-score to score each
observation as a distance from the

How far is a given observation from the
mean when its z-score = 2?

Answer: 2 standard deviations.

Approximately what percentage of cases
is a given case higher than if its z-score
= 2?

Answer: 97%
     Random Sampling Error
• Ever hear a poll report a margin of error? What
  is that?
  Random Sampling Error = standard deviation/ square
   root of the sample size

            N            As the variance of the
                          population increases, so
                          does the chance that a
                          sample could not reflect the
                          population parameters
            Standard Error
• We often refer to both the random sampling
  error with both the chance to err when
  sampling but also the error of a specific
  sample statistic, the mean. We typically
  use the term Standard Error.

• A sample statistic standard error is the
  difference between the mean of a sample
  and the mean of the population from which
  it is drawn.
             Standard Error
Example: What if most humans were 200
 pounds and only 1 million globally were 250

The random sampling error would be low
 since the chance of collecting a sample
 consisting heavily of those heavier humans
 would be unlikely. There would not be
 much error in general from sampling
 because of the low variance.
             Standard Error
• Example continued. Now, when we take a
  sample, each sample has a mean. If a
  population has low variance, so should the
  samples. We should see this reflected in
  low standard error in the mean of the
  sample, the sample statistic.

• Of course, higher variance in the
  population also causes higher error in
  samples taken from it.
           Some more notation
  Distributions         Mean         Standard Dev.

    Sample of
  observed data           X                   s
    Population             μ                  σ
    Repeated               μ                     N

                                       Random Sampling Error

Error in a Sample’s mean is the Standard Error    s   n
     Central Limit Theorem

Remember that if we took an infinite number
 of samples from a population, the means
 of these samples would be normally

Hence, the larger the sample relative to the
 population, the more likely the sample
 mean will capture the population mean.
         Confidence Intervals
• We can actually use the information we
  have about a standard deviation from the
  mean and calculate the range of values for
  which a sample would have if they were to
  fall close to the mean of the population.

• This range is based on the probability that
  the sample mean falls close to the
  population mean with a probability of .95,
  or 5% error.
    How Confident Are You?
• Are you 100% sure?
• Social scientists use a 95% as a threshold
  to test whether or not the results are
  product of chance.
• That is, we take 1 out of 20 chances to be
• What do you MEAN?
We build a 95% confidence interval to make
  sure that the mean will be within that
     Confidence Interval (CI)
                
Y  Z / 2 y
                     Y = mean
                     Z = Z score related with a 95% CI
                     σ = standard error

samplemean 1.96(or 2) * standarderror
                Building a CI
• Assume the following

   y  100
                             Y   
   y  15                       15
                          y            .750
  N  400                         400

100  (1.96)(0.750)
Upper  101.47
Lower  98.53
Why do we use 1.96?
       Calculating a 95% CI
1. Let’s look at the class population
   distribution of height
2. Is it a normal or skew distribution?
3. Let’s build a 95% CI around the mean
   height of the class
    Why do we care about CI?
• We use CI interval for hypothesis testing
• For instance, we want to know if there is
  an income difference between El Paso
  and Boston
• We want to know whether or not taking
  class at Kaplan makes a difference in our
  GRE scores
    Mean Difference testing
                           Mean USA

El Paso   Las Cruces                  Boston

                  Income levels

Shared By: