# Numerical Measures

Document Sample

```					Numerical Measures
Numerical Measures
•   Measures of Central Tendency (Location)
•   Measures of Non Central Location
•   Measure of Variability (Dispersion,
•   Measures of Shape
Measures of Central Tendency
(Location)
• Mean                    Central Location
• Median   0.14
0.12

• Mode
0.1
0.08
0.06
0.04
0.02
0
0   5   10   15   20   25
Measures of Non-central
Location
Non - Central
• Quartiles, Mid-Hinges
Location
• Percentiles      0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0   5   10    15   20   25
Measure of Variability
• Variance, standard deviation
• Range
• Inter-Quartile Range                      Variability

0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0   5   10   15   20   25
Measures of Shape
• Skewness
0.16                                                 0.14                                                                 0.16
0.14                                                 0.12                                                                 0.14
0.12                                                  0.1                                                                 0.12
0.1                                                                                                                       0.1
0.08
0.08                                                                                                                      0.08
0.06
0.06                                                                                                                      0.06
0.04                                                 0.04                                                                 0.04
0.02                                                 0.02                                                                 0.02
0                                                   0                                                                     0
0        5        10       15       20   25          0             5       10        15        20        25               0        5    10       15       20       25

• Kurtosis
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0                                      0                                                                                  0
-3        -2       -1        0        1    2   3                     0       5        10        15        20        25              -3   -2   -1       0    1        2        3
Summation Notation
Summation Notation
Let x1, x2, x3, … xn denote a set of n numbers.
Then the symbol n
 xi
i 1
denotes the sum of these n numbers
x1 + x2 + x3 + …+ xn
Example
Let x1, x2, x3, x4, x5 denote a set of 5 denote the
set of numbers in the following table.
i   1    2    3    4   5

xi   10   15   21   7   13
Then the symbol
5
 xi
i 1
denotes the sum of these 5 numbers
x1 + x2 + x3 + x4 + x 5
= 10 + 15 + 21 + 7 + 13
= 66
Meaning of parts of summation notation

Final value for i

n
 expression in i 
i m
each term of the sum

Quantity changing        Starting value for i
in each term of the
sum
Example
Again let x1, x2, x3, x4, x5 denote a set of 5
denote the set of numbers in the following
table.
i   1    2    3    4   5

xi   10   15   21   7   13
Then the symbol
4
x      3
i
i 2
denotes the sum of these 3 numbers
x x x
3
2
3
3
3
4
= 153 + 213 + 73
= 3375 + 9261 + 343
= 12979
Measures of Central Location
(Mean)
Mean
Let x1, x2, x3, … xn denote a set of n numbers.
Then the mean of the n numbers is defined as:
n
 xi        x1  x2  x3    xn 1  xn
x    i 1

n                   n
Example
Again let x1, x2, x3, x4, x5 denote a set of 5
denote the set of numbers in the following
table.
i   1    2    3    4   5

xi   10   15   21   7   13
Then the mean of the 5 numbers is:
5
 xi       x1  x2  x3  x4  x5
x   i 1

5               5

10  15  21  7  13 66
                          13.2
5            5
Interpretation of the Mean
Let x1, x2, x3, … xn denote a set of n numbers.
Then the mean, x , is the centre of gravity of
those the n numbers.

That is if we drew a horizontal line and placed
a weight of one at each value of xi , then the
balancing point of that system of mass is at
the point x .
x1 x3   x4       x2   xn

x
In the Example

7    10   13   15         21

0          10              20
x  13.2
The mean, x , is also approximately the
center of gravity of a histogram

30
25
20
15
10
5
0
60 - 70   70 - 80   80 - 90   90 - 100 100 - 110 110 - 120 120 - 130 130 - 140 140 - 150

x
Measures of Central Location
(Median)
The Median
Let x1, x2, x3, … xn denote a set of n numbers.
Then the median of the n numbers is defined
as the number that splits the numbers into two
equal parts.
To evaluate the median we arrange the
numbers in increasing order.
If the number of observations is odd there will
be one observation in the middle.
This number is the median.

If the number of observations is even there
will be two middle observations.
The median is the average of these two
observations
Example
Again let x1, x2, x3, x3 , x4, x5 denote a set of 5
denote the set of numbers in the following
table.
i    1    2    3    4   5

xi   10   15   21   7   13
The numbers arranged in order are:
7   10 13 15 21

Unique “Middle” observation –
the median
Example 2
Let x1, x2, x3 , x4, x5 , x6 denote the 6 denote
numbers:
23 41 12 19 64 8
Arranged in increasing order these
observations would be:
8      12 19 23 41 64

Two “Middle” observations
Median
= average of two “middle” observations =
19  23 42
    21
2     2
Example
The data on N = 23 students
Variables
• Verbal IQ
• Math IQ
Data Set #3
The following table gives data on Verbal IQ, Math IQ,
for 23 students who have recently completed a reading improvement program

Initial           Final
Student      IQ          IQ     Acheivement      Acheivement

1       86            94              1.1              1.7
2      104           103              1.5              1.7
3       86            92              1.5              1.9
4      105           100              2.0              2.0
5      118           115              1.9              3.5
6       96           102              1.4              2.4
7       90            87              1.5              1.8                                   Initial       Final
9      105            96              1.7              1.7            IQ         IQ       Acheivement   Acheivement
10       84            80              1.6              1.7   Means   97.57     100.30        1.526         2.100
11       94            87              1.6              1.7
12      119           116              1.7              3.1
13       82            91              1.2              1.8
14       80            93              1.0              1.7
15      109           124              1.8              2.5
16      111           119              1.4              3.0
17       89            94              1.6              1.8
18       99           117              1.6              2.6
19       94            93              1.4              1.4
20       99           110              1.4              2.0
21       95            97              1.5              1.3
22      102           104              1.7              3.1
23      102            93              1.6              1.9

Total     2244         2307              35.1            48.3
Computing the Median
Stem leaf Diagrams

Median = middle
observation =12th
observation
Summary

Initial       Final
IQ        IQ     Acheivement   Acheivement
Means    97.57    100.30      1.526         2.100
Median    96        97          1.5          1.9
• The mean is the centre of gravity of a set of
observations. The balancing point.
• The median splits the obsevations equally in
two parts of approximately 50%
• The median splits the area under a
histogram in two parts of 50%
• The mean is the balancing point of a
histogram
0.16
0.14
0.12
0.1
0.08
0.06       50%
0.04             50%
0.02
0
0     5       10   15   20       25
median            x
• For symmetric distributions the mean and
the median will be approximately the same
value
0.14
0.12
0.1
0.08
0.06
0.04           50%        50%
0.02
0
0   5         10         15   20   25

Median & x
• For Positively skewed distributions the
mean exceeds the median
• For Negatively skewed distributions the
median exceeds the mean
0.16
0.14
0.12
0.1
0.08
0.06       50%
0.04             50%
0.02
0
0     5       10   15   20    25
median            x
• An outlier is a “wild” observation in the
data
• Outliers occur because
– of errors (typographical and computational)
– Extreme cases in the population
• The mean is altered to a significant degree
by the presence of outliers
• Outliers have little effect on the value of the
median
• This is a reason for using the median in
place of the mean as a measure of central
location
• Alternatively the mean is the best measure
of central location when the data is
Normally distributed (Bell-shaped)
Review
Summarizing Data

Graphical Methods
Histogram                                                  Grouped Freq Table
8
Verbal IQ Math IQ
7
6
70 to 80       1        1
5                                                           80 to 90       6        2
4
3                                                           90 to 100      7        11
2
1
100 to 110      6        4
0
70 to 80 80 to 90   90 to   100 to   110 to   120 to
110 to 120      3        4
100     110      120      130
120 to 130      0        1

Stem-Leaf Diagram
8           024669
9           04455699
10           224559
11           189
12
Numerical Measures
•   Measures of Central Tendency (Location)
•   Measures of Non Central Location
•   Measure of Variability (Dispersion,
•   Measures of Shape

The objective is to reduce the data to a small
number of values that completely describe the
data and certain aspects of the data.
Measures of Central Location
(Mean)
Mean
Let x1, x2, x3, … xn denote a set of n numbers.
Then the mean of the n numbers is defined as:
n
 xi        x1  x2  x3    xn 1  xn
x    i 1

n                   n
Interpretation of the Mean
Let x1, x2, x3, … xn denote a set of n numbers.
Then the mean, x , is the centre of gravity of
those the n numbers.

That is if we drew a horizontal line and placed
a weight of one at each value of xi , then the
balancing point of that system of mass is at
the point x .
x1 x3   x4       x2   xn

x
The mean, x , is also approximately the
center of gravity of a histogram

30
25
20
15
10
5
0
60 - 70   70 - 80   80 - 90   90 - 100 100 - 110 110 - 120 120 - 130 130 - 140 140 - 150

x
The Median
Let x1, x2, x3, … xn denote a set of n numbers.
Then the median of the n numbers is defined
as the number that splits the numbers into two
equal parts.
To evaluate the median we arrange the
numbers in increasing order.
If the number of observations is odd there will
be one observation in the middle.
This number is the median.

If the number of observations is even there
will be two middle observations.
The median is the average of these two
observations
Measures of Non-Central
Location
•   Percentiles
•   Quartiles (Hinges, Mid-hinges)
Definition
The P×100 Percentile is a point , xP ,
underneath a distribution that has a fixed
proportion P of the population (or sample)
below that value
0.16
0.14
0.12
0.1
0.08
0.06       P×100 %
0.04
0.02
0
0     5       10    15      20      25
xP
Definition (Quartiles)
The first Quartile , Q1 ,is the 25 Percentile ,
x0.25

0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
25 %
0
0          5   10   15   20      25
x0.25
The second Quartile , Q2 ,is the 50th Percentile
, x0.50

0.16
0.14
0.12
0.1
0.08
0.06
50 %
0.04
0.02
0
0      5   10    15     20     25
x0.50
• The second Quartile , Q2 , is also the
median and the 50th percentile
The third Quartile , Q3 ,is the 75th Percentile ,
x0.75

0.16
0.14
0.12
0.1
0.08
0.06
0.04        75 %
0.02
0
0    5      10    15     20      25
x0.75
The Quartiles – Q1, Q2, Q3
divide the population into 4 equal parts of 25%.

0.16
0.14
0.12
0.1
0.08
0.06
25 %
0.04
0.02                     25 % 25 %
25 %
0
0         Q
Q1 5 2        Q310      15   20   25
Computing
Percentiles and Quartiles
• There are several methods used to compute
percentiles and quartiles. Different
computer packages will use different
methods
• Sometimes for small samples these
methods will agree (but not always)
• For large samples the methods will agree
within a certain level of accuracy
Computing Percentiles and
Quartiles – Method 1
• The first step is to order the observations in
increasing order.
• We then compute the position, k, of the
P×100 Percentile.
k = P × (n+1)
Where n = the number of observations
Example
The data on n = 23 students
Variables
• Verbal IQ
• Math IQ
We want to compute the 75th percentile and
the 90th percentile
The position, k, of the 75th Percentile.
k = P × (n+1) = .75 × (23+1) = 18
The position, k, of the 90th Percentile.
k = P × (n+1) = .90 × (23+1) = 21.6
When the position k is an integer the
percentile is the kth observation (in order of
magnitude) in the data set.
For example the 75th percentile is the 18th (in
size) observation
When the position k is an not an integer but an
integer(m) + a fraction(f).
i.e.    k=m+f
then the percentile is
xP = (1-f) × (mth observation in size)
+ f × (m+1st observation in size)
In the example the position of the 90th percentile is:
k = 21.6
Then
x.90 = 0.4(21st observation in size)
+ 0.6(22nd observation in size)
When the position k is an not an integer but an
integer(m) + a fraction(f).
i.e.    k=m+f
then the percentile is
xP = (1-f) × (mth observation in size)
+ f × (m+1st observation in size)
mth obs                              (m+1)st obs

xp = (1- f) ( mth obs) + f [(m+1)st obs]

x p  mth obs

1  f mth obs  f m  1st obs mth obs
m  1 obs  m obs
st      th
m  1st obs  mth obs


              
f m  1 obs  f mth obs
st
 f
m  1 obs  m obs
st      th
When the position k is an not an integer but an
integer(m) + a fraction(f).
i.e. k = m + f
mth obs                             (m+1)st obs

xp = (1- f) ( mth obs) + f [(m+1)st obs]

x p  mth obs
 f
m  1
st
obs  m obs
th

Thus the position of xp is 100f% through the
interval between the mth observation and the
(m +1)st observation
Example
The data Verbal IQ on n = 23 students
arranged in increasing order is:
80 82 84 86 86 89 90 94
94 95 95 96 99 99 102 102
104 105 105 109 111 118 119
x0.75 = 75th percentile = 18th observation
in size =105
(position k = 18)
x0.90 = 90th percentile
= 0.4(21st observation in size)
+ 0.6(22nd observation in size)
= 0.4(111)+ 0.6(118) = 115.2
(position k = 21.6)
An Alternative method for
computing Quartiles – Method 2
• Sometimes this method will result in the
same values for the quartiles.
• Sometimes this method will result in the
different values for the quartiles.
• For large samples the two methods will
result in approximately the same answer.
Let x1, x2, x3, … xn denote a set of n numbers.

The first step in Method 2 is to arrange the
numbers in increasing order.

From the arranged numbers we compute the
median.
This is also called the Hinge
Example
Consider the 5 numbers:
10 15 21 7            13
Arranged in increasing order:
7     10 13 15 21

Median
(Hinge)
The median (or Hinge) splits the observations
in half
The lower mid-hinge (the first quartile) is the
“median” of the lower half of the observations
(excluding the median).

The upper mid-hinge (the third quartile) is the
“median” of the upper half of the observations
(excluding the median).
Consider the five number in increasing order:
Lower                       Upper
Half                        Half

7      10    13       15      21

Median
Upper Mid-Hinge            (Hinge)            Upper Mid-Hinge
(First Quartile)            13               (Third Quartile)
(7+10)/2 =8.5                                    (15+21)/2 = 18
Computing the median and the quartile using
the first method:
Position of the median: k = 0.5(5+1) = 3
Position of the first Quartile: k = 0.25(5+1) = 1.5
Position of the third Quartile: k = 0.75(5+1) = 4.5
7   10     13      15   21

Q1 = 8. 5            Q2 = 13             Q3 = 18
• Both methods result in the same value
• This is not always true.
Example
The data Verbal IQ on n = 23 students
arranged in increasing order is:

80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119

Upper Mid-Hinge
Lower Mid-Hinge            Median
(Hinge)            (Third Quartile)
(First Quartile)
96                     105
89
Computing the median and the quartile using
the first method:
Position of the median: k = 0.5(23+1) = 12
Position of the first Quartile: k = 0.25(23+1) = 6
Position of the third Quartile: k = 0.75(23+1) = 18

80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119

Q1 = 89         Q2 = 96               Q3 = 105
• Many programs compute percentiles,
quartiles etc.
• Each may use different methods.
• It is important to know which method is
being used.
• The different methods result in answers that
are close when the sample size is large.
Box-Plots
Box-Whisker Plots

• A graphical method of displaying
data
• An alternative to the histogram
and stem-leaf diagram
To Draw a Box Plot
• Compute the Hinge (Median, Q2) and the
Mid-hinges (first & third quartiles – Q1
and Q3 )
• We also compute the largest and smallest
of the observations – the max and the
min
• The five number summary
min, Q1, Q2, Q3, max
Example
The data Verbal IQ on n = 23 students
arranged in increasing order is:

80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119

min = 80      Q1 = 89          Q2 = 96               Q3 = 105           max = 119
The Box Plot is then drawn
• Drawing above an axis a “box” from Q1
to Q3.
• Drawing vertical line in the box at the
median, Q2
• Drawing whiskers at the lower and upper
ends of the box going down to the min
and up to max.
Lower                      Upper
Box        Whisker
Whisker

min         Q1   Q2     Q3             max
Example
The data Verbal IQ on n = 23 students
arranged in increasing order is:
min = 80
Q1 = 89
This is sometimes called
Q2 = 96         the five-number summary
Q3 = 105
max = 119
Box Plot of Verbal IQ

70   80        90      100    110   120   130
130

120

110
Box Plot can also be
drawn vertically
100

90

80

70
Box-Whisker plots
(Verbal IQ, Math IQ)
Box-Whisker plots
(Initial RA, Final RA )
Summary
Information contained in the box plot

25%       25%
25%                               25%

Middle 50%
of population
• An outlier is a “wild” observation in the
data

• Outliers occur because
– of errors (typographical and computational)
– Extreme cases in the population

• We will now consider the drawing of box-
plots where outliers are identified
To Draw a Box Plot we need to:
• Compute the Hinge (Median, Q2) and the
Mid-hinges (first & third quartiles – Q1
and Q3 )
• The difference Q3– Q1 is called the inter-
quartile range (denoted by IQR)
• To identify outliers we will compute the
inner and outer fences
The fences are like the fences at a prison. We
expect the entire population to be within both
sets of fences.

If a member of the population is between the
inner and outer fences it is a mild outlier.

If a member of the population is outside of the
outer fences it is an extreme outlier.
Inner fences
Lower inner fence
f1 = Q1 - (1.5)IQR

Upper inner fence
f2 = Q3 + (1.5)IQR
Outer fences
Lower outer fence
F1 = Q1 - (3)IQR

Upper outer fence
F2 = Q3 + (3)IQR
• Observations that are between the lower and
upper inner fences are considered to be
non-outliers.
• Observations that are outside the inner
fences but not outside the outer fences are
considered to be mild outliers.
• Observations that are outside outer fences
are considered to be extreme outliers.
• mild outliers are plotted individually in a
box-plot using the symbol
• extreme outliers are plotted individually in
a box-plot using the symbol
• non-outliers are represented with the box
and whiskers with
– Max = largest observation within the fences
– Min = smallest observation within the fences
Box-Whisker plot                   Extreme outlier
representing the data    Mild outliers
that are not outliers

Inner fences
Outer fence
Example

Data collected on n = 109 countries in 1995.
Data collected on k = 25 variables.
The variables
1. Population Size (in 1000s)
2. Density = Number of people/Sq kilometer
3. Urban = percentage of population living in
cities
4. Religion
5. lifeexpf = Average female life expectancy
6. lifeexpm = Average male life expectancy
7. literacy = % of population who read
8. pop_inc = % increase in popn size (1995)
9. babymort = Infant motality (deaths per
1000)
10. gdp_cap = Gross domestic product/capita
11. Region = Region or economic group
12. calories = Daily calorie intake.
13. aids = Number of aids cases
14. birth_rt = Birth rate per 1000 people
15. death_rt = death rate per 1000 people
16. aids_rt = Number of aids cases/100000
people
17. log_gdp = log10(gdp_cap)
18. log_aidsr = log10(aids_rt)
19. b_to_d =birth to death ratio
20. fertility = average number of children in
family
21. log_pop = log10(population)
22. cropgrow = ??
23. lit_male = % of males who can read
24. lit_fema = % of females who can read
25. Climate = predominant climate
The data file as it appears in SPSS
Consider the data on infant mortality
Stem-Leaf diagram stem = 10s, leaf = unit digit

0   4455555666666666777778888899
1   0122223467799
2   0001123555577788
3   45567999
4   135679
5   011222347
6   03678
7   4556679
8   5
9   4
10   1569
11   0022378
12   46
13   7
14
15
16   8
Summary Statistics

median = Q2 = 27

Quartiles
Lower quartile = Q1 = the median of lower half
Upper quartile = Q3 = the median of upper half
12  12            66  67
Q1           12, Q3           66.5
2                  2
Interquartile range (IQR)
IQR = Q1 - Q3 = 66.5 – 12 = 54.5
The Outer Fences
lower = Q1 - 3(IQR) = 12 – 3(54.5) = - 151.5
upper = Q3 = 3(IQR) = 66.5 + 3(54.5) = 230.0
No observations are outside of the outer fences

The Inner Fences
lower = Q1 – 1.5(IQR) = 12 – 1.5(54.5) = - 69.75
upper = Q3 = 1.5(IQR) = 66.5 + 1.5(54.5) = 148.25

Only one observation (168 – Afghanistan) is
outside of the inner fences – (mild outlier)
Box-Whisker Plot of Infant Mortality

0
0         50       100        150      200

Infant Mortality
Example 2
In this example we are looking at the weight
gains (grams) for rats under six diets differing
in level of protein (High or Low) and source
of protein (Beef, Cereal, or Pork).

– Ten test animals for each diet
Table
Gains in weight (grams) for rats under six diets
differing in level of protein (High or Low)
and source of protein (Beef, Cereal, or Pork)

Level               High Protein                        Low protein

Source     Beef          Cereal       Pork       Beef       Cereal      Pork

Diet      1             2             3          4           5           6
73             98           94         90         107          49
102             74           79         76          95          82
118             56           96         90          97          73
104            111           98         64          80          86
81             95          102         86          98          81
107             88          102         51          74          97
100             82          108         72          74         106
87             77           91         90          67          70
117             86          120         95          89          61
111             92          105         78          58          82
Median      103.0           87.0        100.0       82.0        84.5        81.5
Mean        100.0           85.9         99.5       79.2        83.9        78.7
IQR          24.0           18.0         11.0       18.0        23.0        16.0
PSD         17.78          13.33         8.15      13.33       17.04       11.05
Variance    229.11        225.66       119.17     192.84      246.77      273.79
Std. Dev.   15.14          15.02        10.92      13.89       15.71       16.55
Box Plots: Weight Gains for Six Diets
130

120
High Protein                      Low Protein

110

100
Weight Gain

90

80

70

60

50                                                              Non-Outlier Max
Beef     Cereal      Pork         Beef     Cereal          Non-Outlier Min
Pork
Median; 75%
40
1         2          3           4          5        6   25%

Diet
Conclusions
• Weight gain is higher for the high protein
meat diets
• Increasing the level of protein - increases
weight gain but only if source of protein is a
meat source
Next topic:
Numerical Measures of
Variability

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 10 posted: 12/8/2011 language: pages: 112
How are you planning on using Docstoc?