# AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH

Document Sample

```					         AAEC 4302
STATISTICAL METHODS IN
AGRICULTURAL RESEARCH

Descriptive Statistics: Chapter 3
Univariate Statistics of Central Tendency

• They focus on a single variable which has n
available observations; for example, deer
weight in the biological data set

• Measures of central tendency attempt to
measure the typical value taken by a given
variable
Univariate Statistics of Central Tendency

• There are three alternative statistics (i.e.
formulas) to measure the central tendency of
a variable:
* The Mean
* The Median
* The Mode
Univariate Statistics of Central Tendency

• The mean (or average) is the most common
and useful measure of central tendency
1 n
* Mean of    X   Xi
n i 1

* To calculate the mean, all of the observations
(values) of X are added and the result is divided by
the number of observations (mean deer weight =
61.77 Kg)
Univariate Statistics of Central Tendency

• Proof: The sum of the deviations of the
observations from the mean is always equal to
zero: d i  X i  X
*
*
   di       
Xi  X
  X X
i                  X  nX
i

 nX   X               X  nX
 nX  nX  0
Univariate Statistics of Central Tendency

• The median value of X (Xmed) is simply the value
taken by the middle observation on X after the
observations have been ordered.
* If there is an odd number of observations the
median is unambiguous
* If there is an even number of observations, there is
no single middle observations
* In the later case, by convention, the median is
calculated by averaging out the values of the two
middle observations on X:
(median deer weight = (64+64)/2 = 64 Kg)
Univariate Statistics of Central Tendency

• The mode is the most frequently occurring
value of X, which may not be unique
• Mode of X is 66
Univariate Statistics of Central Tendency

• In statistics, the mean is the most common
measure of the central tendency or typical value
taken by a given variable, while the median and
the mode are mostly neglected.

• However, the median can sometimes be more
useful to describe the typical value of X, since the
mean is very sensitive to extreme values of X.
Univariate Statistics of Central Tendency

• For example, if the 15 smallest deer weights
are ignored; the mean increases markedly
from 61.77 Kg to 64.0 Kg while the median
only goes from 64 Kg to 65Kg

• The mode may be a useful statistic in the
case of a discrete variable, but not for
continuous variables because each
observation value is likely to be unique
Univariate Statistics of Dispersion
p 45

• A measure of dispersion is a statistic (formula) that
indicates how spread (i.e. disperse) the values of a
given variable are

• The range is a measure of dispersion given by the
difference between the greatest and the smallest
value of X in the n observations available

For example, in the Deer Data Set, the range is
61, the difference between the maximum weight of
93Kg and the minimum weight of 32Kg.
Univariate Statistics: Dispersion

• As demonstrated before, the mean or
average deviation of X from its mean
  di     (X  X)
               i 
 n          n     
                  

is always zero (the positive and
negative deviations cancel out in the
summation), which makes it a useless
measure of dispersion.
Univariate Statistics: Dispersion

• The mean absolute deviation (MAD),
calculated by:

 d i     (X  X) 
i  
 n
             n      

                    

solves the “canceling out” problem.
Univariate Statistics: Dispersion

MAD in deer weight = 9.00 Kg;
max absolute deer weight deviation is
93 Kg - 61.77 Kg = 31.23 Kg
min absolute deer weight deviation is
32 Kg – 61.77 Kg = -29.77 Kg

It has an intuitive appeal since it represents the
“typical deviation without regard to sign”
Univariate Statistics: Dispersion

• An alternative way to address the
canceling out problem is by squaring the
deviations from the mean to obtain the
mean squared deviation (MSD):
 di
2
   X  X   2

         i

n                     n
MSD=143.54
Univariate Statistics: Dispersion

• Problem of squaring can be solved by taking
the square root of the MSD to obtain the root
mean squared deviation (RMSD):
   X  X   2

RMSD  MSD               i           = 11.98
n
• When calculating the RMSD, the squaring of
the deviations gives a greater importance to
the deviations that are larger in absolute value,
which may or may not be desirable
Univariate Statistics: Dispersion

• For statistical reasons, it turns out that a
slight variation of the RMSD, known as the
standard deviation (S or SX), is more
desirable as a measure of dispersion.

   X  X  2

sX            i

n  1     = 12.01   (3.6)
Univariate Statistics of Dispersion
p 46

• n-1 is known as the degrees of freedom
in calculating SX: Intuitively, once X is
known, only n-1 observation values are
free to vary, one is predetermined by X

• When a sample of data is taken to learn
about the population from which it is
drawn, SX is often the best estimate of
the degree of dispersion of the data in
the population

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 0 posted: 4/12/2013 language: English pages: 17