Statistics
David Cole
UCE
Statistics
Descriptive Statistics Inferential Statistics
Estimation Hypothesis Testing
UCE
Descriptive Statistics
• The simplest form of statistics.
• They provide summaries about the
sample and the measurements
• Plus simple graphical analysis such
as bar charts.
UCE
Descriptive Statistics
• Analysis of one variable at a time.
• Major characteristics (parameters)
– the distribution
– the central tendency
– the dispersion
– proportion
UCE
The distribution
• A summary of the frequency of
individual values
UCE
class interval class mark absolute frequency
0.00- 9.99 5 1
Frequency
10.00-19.99 15 3
Distribution
20.00-29.99 25 8
30.00-39.99 35 18
40.00-49.99 45 24
50.00-59.99 55 22
60.00-69.99 65 15
70.00-79.99 75 8
80.00-89.99 85 0
UCE 90.00-99.99 95 1
Frequency Distribution
Bar Chart
UCE
Normal Distribution
• Normal distributions are a family of
distributions that have the same
general shape.
• These are
symmetric with
scores more
concentrated in the
middle than in the
tails.
UCE
Normal Distribution
FREQUENCY
VALUE
UCE
Normal Distribution
• Many physiological and psychological
variables are distributed approximately
normally.
• Measures of blood pressures, heights and
memory are among the many variables
approximately normally distributed.
• It is easy for mathematical statisticians to
work with. and many kinds of statistical tests
can be derived for normal distributions.
UCE
Geometric Distribution
UCE
Geometric Distribution
UCE
Uniform Distribution
UCE
Uniform
Distribution
UCE
Measurements of Central
Tendency
• Mean
• Median
• Mode
UCE
Mean
• consider the test score values:
• 15, 20, 21, 20, 36, 15, 25, 15
• The sum of these 8 values is 167, so
the mean is 167/8 = 20.875
• Problem: outliers
UCE
Median
• 15, 20, 21, 20, 36, 15, 25, 15
• If we order the 8 scores shown
above, we would get:
• 15,15,15,20,20,21,25,36
• There are 8 scores and score #4 and
#5 represent the halfway point.
• Since both of these scores are 20,
the median is 20
UCE
Mode
• 15, 20, 21, 20, 36, 15, 25, 15
• The mode is the most frequently
occurring value in the set of scores.
• Order the scores as shown below
• 15,15,15,20,20,21,25,36
• The value 15 occurs three times and
is the mode
UCE
Normal Distribution
• In a normal distribution the three
parameters: the mean , the median
and the mode are equal.
mean
= median
UCE mode
Question
• Find the mean, median and mode for
the following distribution
12 15 3 9 9 12 15 17 21 7 12
• Is this a normal distribution?
UCE
Dispersion
• Refers to the spread of the values
around the central tendency.
– Range
– Standard Deviation
UCE
Range
• 15, 20, 21, 20, 36, 15, 25, 15
• The range is simply the highest value
minus the lowest value.
• In the above distribution, the high value is
36 and the low is 15, so the range is
36 - 15 = 21.
• An outlier can greatly exaggerate the
range
UCE
Standard Deviation
• Gives the average difference
Between the observations and the
mean
15 - 20.875 = -5.875
20 - 20.875 = -0.875
21 - 20.875 = +0.125
20 - 20.875 = -0.875
36 - 20.875 = 15.125
15 - 20.875 = -5.875
25 - 20.875 = +4.125
15 - 20.875 = -5.875
UCE
Standard Deviation
UCE
Variance
UCE
Computer Package SPSS
• Input the eight scores to SPSS :
N 8
Mean 20.8750
Median 20.0000
Mode 15.00
Std. Deviation 7.0799
Variance 50.1250
UCE Range 21.00
Standard Deviation
UCE
Standard
Deviation
UCE
Standard Deviation
UCE
Quantiles
• Values that divide a distribution into
proportions
• Quartiles divide the distribution into
quarters
• Percentiles divide the distribution into
1/100 s
• The 50th percentile divides the distribution
into two halves
UCE
Quantiles
UCE
5th and 95th percentiles
UCE
Questions
• What are the major parameters in
descriptive statistics
• Define the median in terms of
quantiles
• Approximately what % of a
population lies between 2 SD
UCE
Inferential Statistics
• With inferential statistics you are
trying to reach conclusions which
extend beyond the immediate data
• Inferential statistics are used to draw
inferences about a population from a
sample
UCE
Inferential Statistics
• Two main methods
– Estimation
– Hypothesis Testing
UCE
Estimation
• In estimation the sample is used to
estimate a parameter (e.g. the mean)
of the population
• In addition a confidence interval
about this estimate is constructed
UCE
Confidence Interval
• Normally any parameter of the
population such as mean, standard
deviation or proportion is estimated
from a single sample
• The confidence interval is calculated
from the sample standard deviation
and the sample size
UCE
Confidence Interval
• A range of values that has a
specified probability of containing
the parameter being estimated
• The usual probabilities are 95% and
99%
UCE
Confidence Interval - example
• Random sample of 40 people from a
population of 10000
• 32% of this sample are hypertensive
• 95% confidence interval = 27% - 37%
• This means that if 100 samples of 40 were
taken from the population
• 95 of these samples would have between
27% - 37% hypertensive subjects
UCE
Questions
• You are undertaking research on systolic
blood pressure
• Your sample gives a mean systolic BP of
140 mm Hg with a 95% confidence interval
of 125 – 166 mm Hg
• If you take 100 samples from the
population how many samples are likely to
have a mean systolic BP outside this
range?
UCE
Hypothesis Testing
• A hypothesis is a prediction about
the outcome of research
UCE
Hypothesis Testing
• The researcher starts with a
hypothesis about the population –
called the alternative hypothesis HA.
This is the prediction to be evaluated
– e.g. drug A controls arthritis pain
• The null hypothesis HO is stated -
this is the logical opposite of HA.
Null = “no change” “no difference”
UCE
Hypothesis Testing
• The significance level is set (usually
0.05) – This is the probability of wrongly
rejecting the null hypothesis
• Collect data from group on drug A
(experimental group) and group on
placebo (control group)
• Undertake statistical test on the data (test
of significance)
UCE
Hypothesis Testing
• The statistical test results in a “p” value –
the probability of the null hypothesis
being true
• p(HO is true) reject HO
• p(HO is true) > retain HO
• If we reject HO we can accept the
alternative hypothesis H A that drug A
controls arthritis pain
UCE
Tests of Significance
• t-test
• Wilcoxon rank sum test
• Mann-Whitney U test
• chi-squared test
• The type of test used depends on the
type of data, the distribution and the
number of groups being compared
UCE
UCE