# PowerPoint Presentation

Document Sample

Measures of Central Tendency

By Rahul Jain
The Motivation
• Measure of central tendency are used to
describe the typical member of a
population.
• Depending on the type of data, typical could
have a variety of “best” meanings.
• We will discuss four of these possible
choices.
4 Measures of Central Tendency
• Mean – the arithmetic average. This is used for continuous
data.
• Median – a value that splits the data into two halves, that
is, one half of the data is smaller than that number, the
other half larger. May be used for continuous or ordinal
data.
• Mode – this is the category that has the most data. As the
description implies it is used for categorical data.
• Midrange – not used as often as the other three, it is found
by taking the average of the lowest and highest number in
the data set. Also primarily used for continuous data.
Measures of Central Tendency
• The central tendency is measured by averages.
These describe the point about which the
various observed values cluster.

• In mathematics, an average, or central
tendency of a data set refers to a measure of
the "middle" or "expected" value of the data
set.
Mean
• To find the mean, add all
of the values, then divide
by the number of values.    

x
Population
• The lower case, Greek          N
letter mu is used for
population mean.            
x
x
Sample
• An “x” with a bar over         n
it, read x-bar, is used for
sample mean.
Mean Example
listing    X
1             14
2             17
3             31               x-bar
4             28   737/15 =   49.13333
5             42
6             43
7             51
8             51
9             66
10            70
11            67
12            70
13            78
14            62
n = 15            47
total        737
Arithmetic Mean of Group Data
• if 1 , z2 , z3 ,.........., zk are the mid-values and
z
f1 , f 2 , f 3 ,........, f k are the corresponding
frequencies, where the subscript ‘k’ stands
for the number of classes, then the mean is

z
fz    i i

f      i
Exercise-1: Find the Arithmetic Mean
Class   Frequency    x            fx
(f)

20-29      3        24.5   73.5

30-39      5        34.5   172.5

40-49      20       44.5   890

50-59      10       54.5   545

60-69      5        64.5   322.5

Sum       N=43             2003.5
Median
• The median is a number chosen so that half of the
values in the data set are smaller than that number,
and the other half are larger.
• To find the median
– List the numbers in ascending order
– If there is a number in the middle (odd number of
values) that is the median
– If there is not a middle number (even number of values)
take the two in the middle, their average is the median
Median Example
listing   X        listing   X
1     14           1         14
2     17           2         17
3     28           3         28
4     31           4         31
5     42           5         42
6     43           6         43
7     47           7         47
8     51           8         51   51+53
= 52
9     51           9         53     2
10     62          10         57
11     66          11         62
12     67          12         66
13     70          13         67
14     70          14         70
15     78          15         70
16         78
Median
• The implication of this definition is that a
median is the middle value of the observations
such that the number of observations above it
is equal to the number of observations below
it.

If “n” is odd                 If “n” is Even
Me  X 1                       1             
Me  X n  X n 
2
( n 1)
2 2
        2
1 

Median of Group Data
h    n   
M e  Lo         F
fo   2   

• L0 = Lower class boundary of the median
class
• h = Width of the median class
• f0 = Frequency of the median class
• F = Cumulative frequency of the pre-
median class
Steps to find Median of group data
1.   Compute the less than type cumulative frequencies.
2.   Determine N/2 , one-half of the total number of cases.
3.   Locate the median class for which the cumulative frequency is
more than N/2 .
4.   Determine the lower limit of the median class. This is L0.
5.   Sum the frequencies of all classes prior to the median class.
This is F.
6.   Determine the frequency of the median class. This is f0.
7.   Determine the class width of the median class. This is h.
Example-:Find Median
Age in years   Number of births   Cumulative number of
births
14.5-19.5           677                  677
19.5-24.5          1908                 2585
24.5-29.5          1737                 4332
29.5-34.5          1040                 5362
34.5-39.5           294                 5656
39.5-44.5           91                  5747
44.5-49.5           16                  5763
All ages          5763                   -
Mode
• The mode is simply the category or value which
occurs the most in a data set.
• If a category has radically more than the others, it
is a mode.
• Generally speaking we do not consider more than
two modes in a data set.
• No clear guideline exists for deciding how many
more entries a category must have than the others
to constitute a mode.
Obvious Example
Beach Ball Production
• There is
80
obviously more
70
yellow than red               60

or blue.                      50

thousands
• Yellow is the                 40

mode.                         30

• The mode is the
20

10
class, not the                0

frequency.                         blue               red         yellow
Bimodal
Geometry Scores For TASP

120

100

80

60

40

20

0
No Mode
Category            Frequency
1                   51               70
2                   51               60
3                   66               50

4                   62               40

5                   65               30

6                   57               20

10
7                   47
0
8                   43                  1   2   3   4   5   6   7   8   9
9                   64
•    Although the third category is the
largest, it is not sufficiently
different to be called the mode.
Example-2: Find Mean, Median and
Mode of Ungroup Data

The weekly pocket money for 9 first year pupils was
found to be:

3 , 12 , 4 , 6 , 1 , 4 , 2 , 5 , 8

Mean          Median        Mode
5              4            4
Mode of Group Data
1
M 0  L1           h
1   2

• L1 = Lower boundary of modal class
• Δ1 = difference of frequency between
modal class and class before it
• Δ2 = difference of frequency between
modal class and class after
• H = class interval
Steps of Finding Mode
• Find the modal class which has highest frequency
• L0 = Lower class boundary of modal class
• h = Interval of modal class
• Δ1 = difference of frequency of modal
class and class before modal class
• Δ2 = difference of frequency of modal class and
class after modal class
Example -4: Find Mode
Slope Angle   Midpoint (x)   Frequency (f)     Midpoint x
(°)                                      frequency (fx)
0-4            2              6               12

5-9            7              12              84

10-14           12              7               84

15-19           17              5               85

20-24           22              0                0

Total                 n = 30        ∑(fx) = 265
Midrange
• The midrange is the average of the lowest
and highest value in the data set.
• This measure is not often used since it is
based strictly on the two extreme values in
the data.
Midrange Example
X
min   14
17
28
31
42                  14 + 78
midrange =             = 46
43                     2
47
51
51
62
66
67
70
70
max   78
0
20
40
60
80
100
120
140
160
180
200
-6.33939635
-5.447617432
-4.555838513
-3.664059595
-2.772280676
-1.880501757
-0.988722839
-0.09694392
0.794834998
1.686613917
2.578392835
3.470171754
4.361950672
5.253729591
Same mean, but y varies more than x.
6.145508509
Measures of Variation

7.037287428
y
x
Three Measures of Variation
• While there are other measures, we will look at
only three:
– Variance
– Standard deviation
– Coefficient of variation
• Population mean and sample mean use an identical
formula for calculation.
• There is a minor difference in the formulas for
variation.
Population Variance
• The population variance, σ2, is
found using either of the
formulas to the right.
• The differences are squared to          2

(x  )      2

prevent the sum from being zero                     N
for all cases.
• N is the size of the population, μ      2

x   2

   2
is the population mean.                         N
• Note that variance is always
positive if x can take on more
than one value.
Population Standard Deviation
• The standard deviation can be thought of as
the average amount we could expect the x’s
in the population to differ from the mean
value of the population.
• To get the standard deviation, simply take
the square root of the variance.
Sample Variance
• The sample variance, s2, is
found using either of the
formulas to the right.
• The differences are squared to
prevent the sum from being zero
for all cases.
• The sample size is n, x-bar is
the sample mean.
• Note that n-1 is used rather than
in the estimate.
Sample Standard Deviation
• Just like the standard deviation of a
population, to find the standard deviation of
a sample, take the square root of the sample
variance.
Coefficient of Variation
• The measures discussed so far are primarily
useful when comparing members from the
same population, or comparing similar
populations.
• When looking at two or more dissimilar
populations, it doesn’t make any more sense
to compare standard deviations than it does
to compare means.
Coefficient of Variation Cont.
• Example 1: Weight loss
programs A and B.                          A    B
• Two different programs         Mean        20   25
with the same goal and         (weight
target population.
loss per
• While program B averages
more weight loss, it also
month)
has less consistent results.   Standard    15   30
deviation
Coefficient of Variation Cont.
• Example 2: Weight loss
program A and tax refund B.                 A    B
• Two different programs with     Mean        20   650
different goals and different
target populations.
• We know that average            Standard    15   30
weight loss and average tax     deviation
refund are not comparable.
Are the standard deviations
comparable?
Coefficient of Variation Cont.
• In the last example we can see an argument that
standard deviation does not give the complete
picture.
• The coefficient of variation addresses this issue
by establishing a ratio of the standard deviation
to the mean. This ratio is expressed as a
percentage.

100s                  100
CV       (sample) or CV       (population)
x                     
Coefficient of Variation Cont.
• Looking at the two
examples. We see that in                A    B
both cases the standard
deviation for B is twice      CV        75% 120%
that of A.                    Example 1
• In the first example we
have almost twice the
relative variation in B.
CV        75% 4.6%
• In the second example, we     Example 2
have a little over 16 times
as much variation in A.
Measures of Position

The dot on the left is at about -1, the dot on the right is at
approximately 0.8. But where are they relative to the rest
of the values in this distribution.
Quartiles, Percentiles and Other
Fractiles
• We will only consider the quartile, but the same
concept is often extended to percentages or other
fractions.
• The median is a good starting point for finding the
quartiles.
• Recall that to find the median, we wanted to locate
a point so that half of the data was smaller, and the
other half larger than that point.
Quartile
• For quartiles, we want to divide our data
into 4 equal pieces.

2 3 7 8 8 8 9 13 17 20 21 21

Choosing the numbers 7.5, 8.5, and 18.5 as markers would
Divide the data into 4 groups, each with three elements.
These numbers would be the three quartiles for this data set.
Quartiles Continued
• Conceptually, this is easy, simply find the median, then
treat the left hand side as if it were a data set, and find its
median; then do the same to the right hand side.
• This is not always simple. Consider the following data set.
• 3333356888889
• The first difficulty is that the data set does not divide
nicely.
• Using the rules for finding a median, we would get
quartiles of 3, 6 and 8.
• The second difficulty is how many of the 3’s are in the first
quartile, and how many in the second?
Quartiles Continued
• For this course, let’s pretend that this is not
an issue.
• I will give you the quartiles.
• I will not ask how many are in a quartile.
Interquartile Range
• One method for identifying these outliers,
involves the use of quartiles.
• The interquartile range (IQR) is Q3 – Q1.
• All numbers less than Q1 – 1.5(IQR) are
probably too small.
• All numbers greater than Q3 + 1.5(IQR) are
probably too large.
Measures of Variation:
Variance & Standard Deviation
for GROUPED DATA
• The grouped variance is
n  f  X m    f  X m 
 f  Xm  X 
2                   2                               2

s 
2                                             2

n  n  1                s
n 1

• The grouped standard deviation is
s s     2

42
Example 3-24                    : Miles Run per Week
(p130)

Find the variance and the standard deviation for the frequency distribution
below. The data represents the number of miles that 20 runners ran during
one week.
Class           f     Xm            f·Xm                    f·(Xm –X)
5.5 – 10.5       1                        1·8 = 8              1(8-24.3)2 = 265.69
8
10.5 – 15.5       2                     2·13 = 26              2(13-24.3)2 = 255.38
15.5 – 20.5       3     13              3·18 = 54              3(18-24.3)2 = 119.07
5·23 = 115                  5(23-24.3)2 = 8.45
20.5 – 25.5       5     18             4·28 =108                  4(28-24.3)2 =54.76
25.5 – 30.5       4                     3·33 = 99              3(33-24.3)2 = 227.07
23              2·38 = 76              2(38-24.3)2 = 375.38
30.5 – 35.5       3
28            Σf·Xm= 486              Σ f·(Xm –X) = 1305.80
35.5 – 40.5       2
20     33
38
X
 f Xm

486
 24.3                            s2 
1305.80
 68.726315
n f       20                                            20  1

s  s2  68.726315  8.2901335  8.3
43
Mean Deviation
• The mean deviation is an average of absolute
deviations of individual observations from the central
value of a series. Average deviation about mean
k

f     i   xi  x
MD x     i 1
n
• k = Number of classes
• xi= Mid point of the i-th class
• fi= frequency of the i-th class
Coefficient of Mean Deviation

• The third relative measure is the coefficient of mean
deviation. As the mean deviation can be computed from
mean, median, mode, or from any arbitrary value, a general
formula for computing coefficient of mean deviation may
be put as follows:

Mean deviation
Coefficient of mean deviation =                  100
Mean
Coefficient of Range
• The coefficient of range is a relative measure
corresponding to range and is obtained by the
following formula:

LS
Coefficient of range      100
LS

• where, “L” and “S” are respectively the largest and
the smallest observations in the data set.
Coefficient of Quartile Deviation

• The coefficient of quartile deviation is
computed from the first and the third
quartiles using the following formula:

Q3  Q1
Coefficient of quartile deviation          100
Q3  Q1
Assignment-1
• Find the following measurement of dispersion
from the data set given in the next page:

– Range, Percentile range, Quartile Range
– Quartile deviation, Mean deviation, Standard deviation
– Coefficient of variation, Coefficient of mean deviation,
Coefficient of range, Coefficient of quartile deviation
Data for Assignment-1
Marks    No. of students   Cumulative
frequencies
40-50           6               6
50-60           11              17
60-70           19              36
70-80           17              53
80-90           13              66
90-100          4               70
Total           70

DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 2 posted: 9/14/2012 language: English pages: 49