Overview of Data and Decisions so far
Overview of Data and Decisions
Where are we now?
If you have a group of data, and calculate
what its mean and standard deviation are
then you can characterise any item in that
sample by a single “standardised” value that
describes how many standard deviations it
is away from the mean.
Calculating standardised values
The z value is the number of standard deviations
(that an item in a sample (x) is either below (-)
or above (+) the mean (.
i.e. z is the standardised value.
How to think about standardisation
Like converting a different set of units
(celcius fahrenheit or AUS$ US$)
Two ways of expressing the same reality
There’s a formula to convert backwards and
forwards between the original data and the
Useful for knowing “where” a piece of data
lies in a distribution
Useful for comparing different samples
Use for calculating probabilities if you
know that the sample comes from a
particular distribution like the normal
Discrete vs continuous
So far we have looked at a discrete
Now we will talk about a continuous
What is the difference?
The normal distribution is of special interest
for two main reasons.
1) it is an approximation to many probability
distributions, especially when the number of
observations is high.
2) it plays a key role in market research and
Features of Normal probability
Each normal distribution can be differentiated
and represented by its mean and standard
The highest point on the curve is the mean,
median, and mode.
Can theoretically have values from negative
infinity to positive infinity.
The Normal curve
The probability density function which defines
the normal curve is:
1 x / 2 2
where is the mean, is the standard
deviation, = 3.14159, and e = 2.71828.
We can use calculus to calculate the area
under the curve.
“Standard normal” distribution
If you “standardise” any normally distributed data
set then get exactly the same distribution:
Lots of work has been done to figure out as much
information as possible about the standard normal
So if you know the “standardised” value of any
piece of data in a normal distribution, you know
exactly the probability of being above or below it.
e.g z = 1 has 18% probability of being above and
82% probability of being below.
Sales of cheese at a roadside stall is known
to have a mean of 20 kg per day and a
standard deviation of 6 kg per day.
Since cheese is relatively expensive and it
is important to sell fresh cheese each day,
the owner of the roadside stall wants to
stock the right amount. He wants the chance
of running out to be small but also doesn’t
want there to be much left over.
To get a feel for how many to stock, he
wants to know:
What is the probability of selling less than
16 kg of cheese?
What is the probability of selling between 8
and 32 kgs of cheese?
How much cheese would you need to have
on hand at the start of the day to have less
than 10% chance of running out?
Graphing normal probabilities
34 6 34
P(Sales < 34) P(6 < Sales < 34)
P(Sales > 11.5) P(25 < Sales < 30)
Excel - Normal distribution
stddev, 1) gives you the
probability for the
region up to x in a
normal distribution with
the mean and stddev
Using Excel for normal dist
B We want
A P(Sales < 16)
Use Excel formula =normdist(16,20,6,1) to
get P(Sales < 16) = 0.2514
Determining the probability
of sales between 8 and 32
P(8 < Sales < 32) = P(-2 < z < 2)
Area up to a Area up to a
z-score z-score of
of -2 is 2 is 0.9772
-2 0 2
P(8 < Sales < 32) = 0.9772 - 0.0228 = 0.9544
Excel - Normal distribution
converting from probabilities
The function =norminv(p,
mean, stddev) gives you the
point on the distribution that
has a probability of p below
it (in a normal distribution
with the mean and stddev
Area in upper tail of
Shaded area distribution = 0.1
x What is x?
P(Sales < x) = 0.9
Result you get is 27.68. So
Need to stock about 28 kg of cheese each day.
Necessary assumptions for
using the Normal distribution
Bell-shaped frequency histogram
Mean of sample data must be good
estimate of population mean.
Standard deviation of sample data must be
good estimate of population standard
Does the situation meet the necessary
assumptions for a particular model?
Use collected data to evaluate the
validity of the model.
What is a QQ plot?
Informal method of detecting whether a
data set is normally distributed
The closer the line is to 45 degrees, the
more normally distributed the data set
The height of Australian women is
normally distributed with a mean of 164 cm
and a standard deviation of 8 cm.
If we chose a woman at random in this
class, what is the probability that she would
be taller than 174 cm? (Are you completely
happy with your answer? Why or why not?)
If I’m interested in recruiting the tallest 1%
of women for a basketball team, at what
height will I accept people?
Estimates of rates of learning disabilities
vary but according to some estimates, about
4% of children are thought to have dyslexia.
A school with 130 children is considering
whether to run a special program for these
children. It decides that it will do it if more
than 5 children in their school qualify.
What is the probability that the school will
run a special program?
More on reading (dis)abilities
In a 1992 study, 415 children were tested on their
reading ability. Their scores were then compared
against a “norm-referenced group” (a large
random sample with normally distributed reading
abilities). Since 63 (or 15%) of the children
performed below the 25th percentile from the
norm-referenced group, the study concluded that
about 15% of all children are learning disabled.
(Study conducted by US National Institute for
Child Health and Human Development)
Discuss in small groups
Solve using Excel
Discuss as a class – applications to other
Quick review: Binomial
Only two outcomes, success or failure, on a
Probability of outcome remains constant
from trial to trial -- statistical independence.
Quick review: Binomial
The number of students getting financial aid out
of a sample of 10. On average, 10% of students
The number of Australian-made cars in a
company car-park with 40 spaces. On average,
80% of cars at this company are Australian-made
The number of defective items in a sample of 50.
Process produces about 5% defectives usually.
Normal distribution examples
The number of hours to complete a project.
The weight of a Mars bar in grams.
The number of kilos lost during the first
month of a weight loss program.
What did we do?
Calculated probabilities for the normal
Discussed the assumptions needed to use the
Talked about the normal approximation to the
Excel functions covered today
QQ plot in Statpro for determining whether
a data set is normally distributed
What did you learn today that makes a
difference to the way you manage?
What are the three most important things to
remember from today’s lecture?
Read supplementary readings on Principles
of Variation, Testing the Mean and
Try webpage demonstrations of Central