PowerPoint Presentation

Document Sample
PowerPoint Presentation Powered By Docstoc
					Measures of Central Tendency

         By Rahul Jain
            The Motivation
• Measure of central tendency are used to
  describe the typical member of a
  population.
• Depending on the type of data, typical could
  have a variety of “best” meanings.
• We will discuss four of these possible
  choices.
 4 Measures of Central Tendency
• Mean – the arithmetic average. This is used for continuous
  data.
• Median – a value that splits the data into two halves, that
  is, one half of the data is smaller than that number, the
  other half larger. May be used for continuous or ordinal
  data.
• Mode – this is the category that has the most data. As the
  description implies it is used for categorical data.
• Midrange – not used as often as the other three, it is found
  by taking the average of the lowest and highest number in
  the data set. Also primarily used for continuous data.
   Measures of Central Tendency
• The central tendency is measured by averages.
  These describe the point about which the
  various observed values cluster.

• In mathematics, an average, or central
  tendency of a data set refers to a measure of
  the "middle" or "expected" value of the data
  set.
                   Mean
• To find the mean, add all
  of the values, then divide
  by the number of values.    
                              
                                   x
                                     Population
• The lower case, Greek          N
  letter mu is used for
  population mean.            
                              x
                                   x
                                     Sample
• An “x” with a bar over         n
  it, read x-bar, is used for
  sample mean.
            Mean Example
       listing    X
    1             14
    2             17
    3             31               x-bar
    4             28   737/15 =   49.13333
    5             42
    6             43
    7             51
    8             51
    9             66
    10            70
    11            67
    12            70
    13            78
    14            62
n = 15            47
    total        737
 Arithmetic Mean of Group Data
• if 1 , z2 , z3 ,.........., zk are the mid-values and
      z
  f1 , f 2 , f 3 ,........, f k are the corresponding
  frequencies, where the subscript ‘k’ stands
  for the number of classes, then the mean is


                  z
                     fz    i i

                     f      i
Exercise-1: Find the Arithmetic Mean
Class   Frequency    x            fx
            (f)

20-29      3        24.5   73.5

30-39      5        34.5   172.5

40-49      20       44.5   890

50-59      10       54.5   545

60-69      5        64.5   322.5

Sum       N=43             2003.5
                      Median
• The median is a number chosen so that half of the
  values in the data set are smaller than that number,
  and the other half are larger.
• To find the median
   – List the numbers in ascending order
   – If there is a number in the middle (odd number of
     values) that is the median
   – If there is not a middle number (even number of values)
     take the two in the middle, their average is the median
               Median Example
listing   X        listing   X
    1     14           1         14
    2     17           2         17
    3     28           3         28
    4     31           4         31
    5     42           5         42
    6     43           6         43
    7     47           7         47
    8     51           8         51   51+53
                                              = 52
    9     51           9         53     2
   10     62          10         57
   11     66          11         62
   12     67          12         66
   13     70          13         67
   14     70          14         70
   15     78          15         70
                      16         78
                        Median
• The implication of this definition is that a
  median is the middle value of the observations
  such that the number of observations above it
  is equal to the number of observations below
  it.

   If “n” is odd                 If “n” is Even
  Me  X 1                       1             
                             Me  X n  X n 
          2
              ( n 1)
                                 2 2
                                          2
                                             1 
                                                
        Median of Group Data
                  h    n   
       M e  Lo         F
                  fo   2   

• L0 = Lower class boundary of the median
       class
• h = Width of the median class
• f0 = Frequency of the median class
• F = Cumulative frequency of the pre-
       median class
     Steps to find Median of group data
1.   Compute the less than type cumulative frequencies.
2.   Determine N/2 , one-half of the total number of cases.
3.   Locate the median class for which the cumulative frequency is
     more than N/2 .
4.   Determine the lower limit of the median class. This is L0.
5.   Sum the frequencies of all classes prior to the median class.
     This is F.
6.   Determine the frequency of the median class. This is f0.
7.   Determine the class width of the median class. This is h.
         Example-:Find Median
Age in years   Number of births   Cumulative number of
                                           births
 14.5-19.5           677                  677
 19.5-24.5          1908                 2585
 24.5-29.5          1737                 4332
 29.5-34.5          1040                 5362
 34.5-39.5           294                 5656
 39.5-44.5           91                  5747
 44.5-49.5           16                  5763
  All ages          5763                   -
                      Mode
• The mode is simply the category or value which
  occurs the most in a data set.
• If a category has radically more than the others, it
  is a mode.
• Generally speaking we do not consider more than
  two modes in a data set.
• No clear guideline exists for deciding how many
  more entries a category must have than the others
  to constitute a mode.
           Obvious Example
                                            Beach Ball Production
• There is
                                80
  obviously more
                                70
  yellow than red               60

  or blue.                      50



                    thousands
• Yellow is the                 40


  mode.                         30



• The mode is the
                                20

                                10
  class, not the                0

  frequency.                         blue               red         yellow
                   Bimodal
                 Geometry Scores For TASP

120

100

80

60

40

20

 0
      very bad     bad      neutral    good   very good
                               No Mode
Category            Frequency
1                   51               70
2                   51               60
3                   66               50

4                   62               40

5                   65               30

6                   57               20

                                     10
7                   47
                                      0
8                   43                  1   2   3   4   5   6   7   8   9
9                   64
•    Although the third category is the
     largest, it is not sufficiently
     different to be called the mode.
Example-2: Find Mean, Median and
     Mode of Ungroup Data

The weekly pocket money for 9 first year pupils was
found to be:

            3 , 12 , 4 , 6 , 1 , 4 , 2 , 5 , 8


            Mean          Median        Mode
             5              4            4
         Mode of Group Data
                     1
        M 0  L1           h
                   1   2

• L1 = Lower boundary of modal class
• Δ1 = difference of frequency between
       modal class and class before it
• Δ2 = difference of frequency between
       modal class and class after
• H = class interval
         Steps of Finding Mode
• Find the modal class which has highest frequency
• L0 = Lower class boundary of modal class
• h = Interval of modal class
• Δ1 = difference of frequency of modal
      class and class before modal class
• Δ2 = difference of frequency of modal class and
      class after modal class
          Example -4: Find Mode
Slope Angle   Midpoint (x)   Frequency (f)     Midpoint x
    (°)                                      frequency (fx)
    0-4            2              6               12

    5-9            7              12              84

  10-14           12              7               84

  15-19           17              5               85

  20-24           22              0                0

          Total                 n = 30        ∑(fx) = 265
                Midrange
• The midrange is the average of the lowest
  and highest value in the data set.
• This measure is not often used since it is
  based strictly on the two extreme values in
  the data.
      Midrange Example
      X
min   14
      17
      28
      31
      42                  14 + 78
             midrange =             = 46
      43                     2
      47
      51
      51
      62
      66
      67
      70
      70
max   78
                                                        0
                                                       20
                                                       40
                                                       60
                                                       80
                                                      100
                                                      120
                                                      140
                                                      160
                                                      180
                                                      200
                                        -6.33939635
                                       -5.447617432
                                       -4.555838513
                                       -3.664059595
                                       -2.772280676
                                       -1.880501757
                                       -0.988722839
                                        -0.09694392
                                       0.794834998
                                       1.686613917
                                       2.578392835
                                       3.470171754
                                       4.361950672
                                       5.253729591
Same mean, but y varies more than x.
                                       6.145508509
                                                               Measures of Variation




                                       7.037287428
                                                       y
                                                           x
   Three Measures of Variation
• While there are other measures, we will look at
  only three:
   – Variance
   – Standard deviation
   – Coefficient of variation
• Population mean and sample mean use an identical
  formula for calculation.
• There is a minor difference in the formulas for
  variation.
               Population Variance
• The population variance, σ2, is
  found using either of the
  formulas to the right.
• The differences are squared to          2
                                               
                                                 (x  )      2


  prevent the sum from being zero                     N
  for all cases.
• N is the size of the population, μ      2
                                               
                                                 x   2

                                                             2
  is the population mean.                         N
• Note that variance is always
  positive if x can take on more
  than one value.
  Population Standard Deviation
• The standard deviation can be thought of as
  the average amount we could expect the x’s
  in the population to differ from the mean
  value of the population.
• To get the standard deviation, simply take
  the square root of the variance.
                  Sample Variance
• The sample variance, s2, is
  found using either of the
  formulas to the right.
• The differences are squared to
  prevent the sum from being zero
  for all cases.
• The sample size is n, x-bar is
  the sample mean.
• Note that n-1 is used rather than
  n. This adjustment prevents bias
  in the estimate.
    Sample Standard Deviation
• Just like the standard deviation of a
  population, to find the standard deviation of
  a sample, take the square root of the sample
  variance.
      Coefficient of Variation
• The measures discussed so far are primarily
  useful when comparing members from the
  same population, or comparing similar
  populations.
• When looking at two or more dissimilar
  populations, it doesn’t make any more sense
  to compare standard deviations than it does
  to compare means.
   Coefficient of Variation Cont.
• Example 1: Weight loss
  programs A and B.                          A    B
• Two different programs         Mean        20   25
  with the same goal and         (weight
  target population.
                                 loss per
• While program B averages
  more weight loss, it also
                                 month)
  has less consistent results.   Standard    15   30
                                 deviation
      Coefficient of Variation Cont.
• Example 2: Weight loss
  program A and tax refund B.                 A    B
• Two different programs with     Mean        20   650
  different goals and different
  target populations.
• We know that average            Standard    15   30
  weight loss and average tax     deviation
  refund are not comparable.
  Are the standard deviations
  comparable?
   Coefficient of Variation Cont.
• In the last example we can see an argument that
  standard deviation does not give the complete
  picture.
• The coefficient of variation addresses this issue
  by establishing a ratio of the standard deviation
  to the mean. This ratio is expressed as a
  percentage.

          100s                  100
     CV       (sample) or CV       (population)
            x                     
   Coefficient of Variation Cont.
• Looking at the two
  examples. We see that in                A    B
  both cases the standard
  deviation for B is twice      CV        75% 120%
  that of A.                    Example 1
• In the first example we
  have almost twice the
  relative variation in B.
                                CV        75% 4.6%
• In the second example, we     Example 2
  have a little over 16 times
  as much variation in A.
          Measures of Position




The dot on the left is at about -1, the dot on the right is at
approximately 0.8. But where are they relative to the rest
of the values in this distribution.
   Quartiles, Percentiles and Other
               Fractiles
• We will only consider the quartile, but the same
  concept is often extended to percentages or other
  fractions.
• The median is a good starting point for finding the
  quartiles.
• Recall that to find the median, we wanted to locate
  a point so that half of the data was smaller, and the
  other half larger than that point.
                      Quartile
• For quartiles, we want to divide our data
  into 4 equal pieces.

  Suppose we had the following data set (already in order)

          2 3 7 8 8 8 9 13 17 20 21 21


  Choosing the numbers 7.5, 8.5, and 18.5 as markers would
  Divide the data into 4 groups, each with three elements.
  These numbers would be the three quartiles for this data set.
             Quartiles Continued
• Conceptually, this is easy, simply find the median, then
  treat the left hand side as if it were a data set, and find its
  median; then do the same to the right hand side.
• This is not always simple. Consider the following data set.
• 3333356888889
• The first difficulty is that the data set does not divide
  nicely.
• Using the rules for finding a median, we would get
  quartiles of 3, 6 and 8.
• The second difficulty is how many of the 3’s are in the first
  quartile, and how many in the second?
          Quartiles Continued
• For this course, let’s pretend that this is not
  an issue.
• I will give you the quartiles.
• I will not ask how many are in a quartile.
         Interquartile Range
• One method for identifying these outliers,
  involves the use of quartiles.
• The interquartile range (IQR) is Q3 – Q1.
• All numbers less than Q1 – 1.5(IQR) are
  probably too small.
• All numbers greater than Q3 + 1.5(IQR) are
  probably too large.
              Measures of Variation:
          Variance & Standard Deviation
              for GROUPED DATA
• The grouped variance is
      n  f  X m    f  X m 
                                                     f  Xm  X 
                  2                   2                               2

s 
2                                             2
                                                  
                n  n  1                s
                                                         n 1

• The grouped standard deviation is
             s s     2



                                                                          42
      Example 3-24                    : Miles Run per Week
                                 (p130)

  Find the variance and the standard deviation for the frequency distribution
  below. The data represents the number of miles that 20 runners ran during
  one week.
      Class           f     Xm            f·Xm                    f·(Xm –X)
     5.5 – 10.5       1                        1·8 = 8              1(8-24.3)2 = 265.69
                             8
    10.5 – 15.5       2                     2·13 = 26              2(13-24.3)2 = 255.38
    15.5 – 20.5       3     13              3·18 = 54              3(18-24.3)2 = 119.07
                                          5·23 = 115                  5(23-24.3)2 = 8.45
    20.5 – 25.5       5     18             4·28 =108                  4(28-24.3)2 =54.76
    25.5 – 30.5       4                     3·33 = 99              3(33-24.3)2 = 227.07
                            23              2·38 = 76              2(38-24.3)2 = 375.38
    30.5 – 35.5       3
                            28            Σf·Xm= 486              Σ f·(Xm –X) = 1305.80
    35.5 – 40.5       2
                     20     33
                            38
X
    f Xm
             
               486
                    24.3                            s2 
                                                            1305.80
                                                                     68.726315
   n f       20                                            20  1


 s  s2  68.726315  8.2901335  8.3
                                   43
                Mean Deviation
• The mean deviation is an average of absolute
  deviations of individual observations from the central
  value of a series. Average deviation about mean
                                 k

                                f     i   xi  x
                    MD x     i 1
                                           n
• k = Number of classes
• xi= Mid point of the i-th class
• fi= frequency of the i-th class
     Coefficient of Mean Deviation

• The third relative measure is the coefficient of mean
  deviation. As the mean deviation can be computed from
  mean, median, mode, or from any arbitrary value, a general
  formula for computing coefficient of mean deviation may
  be put as follows:


                                      Mean deviation
    Coefficient of mean deviation =                  100
                                          Mean
           Coefficient of Range
• The coefficient of range is a relative measure
  corresponding to range and is obtained by the
  following formula:

                                LS
         Coefficient of range      100
                                LS

• where, “L” and “S” are respectively the largest and
  the smallest observations in the data set.
  Coefficient of Quartile Deviation

• The coefficient of quartile deviation is
  computed from the first and the third
  quartiles using the following formula:

                                    Q3  Q1
Coefficient of quartile deviation          100
                                    Q3  Q1
                 Assignment-1
• Find the following measurement of dispersion
  from the data set given in the next page:

   – Range, Percentile range, Quartile Range
   – Quartile deviation, Mean deviation, Standard deviation
   – Coefficient of variation, Coefficient of mean deviation,
     Coefficient of range, Coefficient of quartile deviation
    Data for Assignment-1
 Marks    No. of students   Cumulative
                            frequencies
40-50           6               6
50-60           11              17
60-70           19              36
70-80           17              53
80-90           13              66
90-100          4               70
Total           70

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:9/14/2012
language:English
pages:49