# Normal

Document Sample

```					The Normal Distribution, Hypothesis Testing, and t tests
 Chapter(s) in basic textbook
 Wild & Seber (2000). Chance encounters: A first course in data analysis and inference. John Wiley & Sons.

 Howell (Chapters 3, 4, and 7)

The Normal distribution
 ( x   )2  1 f ( x)  exp   2 2  2 

The Normal distribution
 Chest measurements of 5738 Scottish soldiers by Belgian scholar Lambert Quetelet (1796-1874)
 First application of the Normal distribution to human data

CHANCE ENCOUNTERS
0.2 0.1

Normal density curve has

= 39.8 in.,

= 2.05 in.

0.0 35 40 45

(a) Chest measurements of Quetelet’s Scottish soldiers (in.)
.06

Normal density curve has
.04 .02 .00

= 174 cm,

= 6.57 cm

150

160

170

180

190

200

(b) Heights of the 4294 men in the workforce database (cm)

Figure 6.2.1 Two standardized histograms with approximating Normal densitycurves.
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Slide 4

Wild & Seber © Wiley 2000

CHANCE ENCOUNTERS

The Normal distribution density curve
 Is symmetric about the mean  Mean = Median

Figure 6.2.2 50% 50%
Mean

Slide 5

Wild & Seber © Wiley 2000

CHANCE ENCOUNTERS

Effects of  and 

(a) Changing
shifts the curve along the axis =
2=

(b) Increasing
increases the spread and flattens the curve
1 1

=6
2=

6 12

140
1

160

180
2 =174

200

140

160
1

180

200

= 160

=

2 =170

Slide 6

Wild & Seber © Wiley 2000

CHANCE ENCOUNTERS

Understanding the standard deviation 

(c) Probabilities and numbers of standard deviations Shaded area = 0.683 Shaded area = 0.954 Shaded area = 0.997













68% chance of falling between  and 

95% chance of falling  between   and

99.7% chance of falling between   and 

Slide 7

Wild & Seber © Wiley 2000

Standardizing
 For any Xi value from a Normal population with mean  and standard deviation , the value
Z Xi  



Tells us how many standard deviations from the mean the value Xi is located  Z ~ N(0,1): standard Normal distribution
 Z score  Normal deviate

CHANCE ENCOUNTERS

The sample mean has a sampling distribution
Sampling batches of Scottish soldiers and taking chest
measurements. Pop mean = 39.8 in, Pop sd = 2.05 in
Sample number 1 2 3 4 5 6 7 8 9 10 11 12 34 36 38 40 42 Chest measurement (in.) 44 46

(a) 12 samples of size n = 6

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999.

Fig. 7.2.1 (a)

Slide 9

Wild & Seber © Wiley 2000

CHANCE ENCOUNTERS

Twelve samples of size 24
Sample number

(b) 12 samples of size n = 24

1 2 3 4 5 6 7 8 9 10 11 12
34 36 38 40 42 Chest measurement (in.) 44 46

Fig. 7.2.1 (b)

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Slide 10

Wild & Seber © Wiley 2000

CHANCE ENCOUNTERS

Histograms from 100,000 samples
(a) n = 6
0.5 0.0 37 38 39 40 41 42

1.0 0.5 0.0

(b) n = 24

37

38

39

40

41

42

(c) n = 100
1.5 1.0 0.5 0.0 37 40 41 38 39 Sample mean of chest measurements (in.) 42

Figure 7.2.2

Standardised histograms of the sample means from 100,000 samples of soldiers (n soldiers per sample).

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Slide 11

Wild & Seber © Wiley 2000

CHANCE ENCOUNTERS

Mean and std dev of the sampling distribution
E(sample mean) = Population mean

Population standard deviation sd(sample mean) = Sample size
Standard error of the mean (SEM)

E(X )  E(X)  ,

sd( X)  sd )  (X  n n
Slide 12
Wild & Seber © Wiley 2000

Sampling distribution of the mean
 If random samples of size n are drawn from a normal population, the means of these samples will conform to a normal distribution  Normal deviate of means Z  X i  


allows us to ask probability statements about means  What is the probability of obtaining a random sample of nine measurements with a mean larger than 50 mm from a population having a mean of 47 mm and an SD of 12 mm? (if the answer is not obvious, try it at home…)

Sampling distribution of the mean
 The distribution of means from a non-Normal population will not be normal

 Is this a problem?

CHANCE ENCOUNTERS

Central Limit Effect -- Histograms of sample means
(a) Triangular (a) Triangular n=1 n=1
2 2 1 1 0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0.2 0.4 0.6 0.8 1.0 2 2 1 1 0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0.2 0.4 0.6 0.8 1.0 2 2 1 1 0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0.2 0.4 0.6 0.8 1.0 3 3

n=2 n=2

4 4 3 3 2 2 1

n=4 n=4

n = 10 n = 10
5 4 5 4 3 2 3 2 1 0 1 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Wild & Seber © Wiley 2000

1 0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0.2 0.4 0.6 0.8 1.0
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000. From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Slide 15

CHANCE ENCOUNTERS

Central Limit Effect -- Histograms of sample means
(b) Uniform n=1
2 2 3 2 1 1 1 0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0 0.0 0.2 0.4 0.6 0.8 1.0

n=2

0.2

0.4

0.6

0.8

1.0

n=4
3 2 1 0 0.0 0.2 0.4 0.6 0.8 1.0 4 3 2 1 0 0.0

n = 10

0.2

0.4

0.6

0.8

1.0

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Slide 16

Wild & Seber © Wiley 2000

CHANCE ENCOUNTERS

Central Limit Effect -- Histograms of sample means
(a) Exponential
1.0 0.8 0.6 0.4 0.2 0.0 0 1 2 3 4 5 6

n=1
1.0 0.8 0.6 0.4 0.2 0.0 0 1 2 3 4 5 6 0.8 0.6 0.4 0.2 0.0 0 1

n=2

2

3

4

n=4
1.0 0.8 0.6 0.4 0.2 0.0 0 1 2 3 1.2 0.8 0.4 0.0 0

n = 10

1

2

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Slide 17

Wild & Seber © Wiley 2000

CHANCE ENCOUNTERS

Central Limit Effect -- Histograms of sample means

n=1
3 3 2 1 0 0.0 0.2 0.4 0.6 0.8 1.0 2 1 0 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 0 0.0 0.2

n=2

0.4

0.6

0.8

1.0

n=4
3 2 1 0 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 0 0.0

n = 10

0.2

0.4

0.6

0.8

1.0

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Slide 18

Wild & Seber © Wiley 2000

CHANCE ENCOUNTERS

Central Limit Theorem:
When sampling from almost any distribution,

X is approximately Normally distributed in large samples.

Slide 19

Wild & Seber © Wiley 2000

Population and sample
E(X )  E(X)  , sd( X)  sd )  (X  n n

 Population sigma is unknown… good estimate is the sample standard error of the mean

CHANCE ENCOUNTERS

The standard error of the mean
The standard error of the sample mean is …. an estimate of the std dev of the sample mean a measure of the precision of the sample mean as an estimate of the population mean given by se( x ) = Sample standard deviation
Sample size

sx se(x ) = n
Slide 21
Wild & Seber © Wiley 2000

Hypothesis testing
 H0:   0 (null hypothesis)  Ha:   0 (alternate hypothesis)  Type I error: probability of rejecting of the null hypothesis when it is in fact true: 
 Finding a difference that is not there…

 Type II error: probability of not rejecting the null hypothesis when it is in fact false: 
 Not finding a difference when it is there…

 Power of a statistical test: 1 – : probability of rejecting the null hypothesis when it is in fact false and should be rejected

Type I and Type II errors
 Type I:  is the specified significance level  Type II:  generally unspecified and unknown  Both types of errors may be reduced simultaneously by increasing n

t distribution
Z Xi  

  is a property of the population: always unknown…  Solution: use s ( X ) as approximation
 Good estimate of  only if n is VERY large!



 Better strategy: utilize t
Xi   t s( X )

which is distributed according to a t distribution

 For hypotheses concerning the mean: df = n – 1
 df → ∞ t converges to the Normal distribution

t distribution

df = inf

df = 3 df = 1

CHANCE ENCOUNTERS

Student’s t-distribution
 For random samples from a Normal distribution,

T  (X  ) / se(X )
is exactly distributed as Student(df = n - 1)

 but methods we shall base upon this distribution for T work well even for small samples sampled from distributions which are quite non-Normal.

Slide 26

Wild & Seber © Wiley 2000

Robustness
 Fortunately, the t test is robust: validity is not seriously affected by moderate deviations from the normality assumption
 Effect of non-normality is greater for smaller ; effect decreases as n increases;  For symmetric distributions there is little effect of departure from normality;  Much smaller effect for two-tailed testing than for onetailed testing.

 Reporting variability in data  Describing the population being sampled: X and SD  Describing the precision of estimation of the population mean: SE
 SE < SD SD

SE

2 x SE intervals
X  2s
 Examination of a t table will show that this interval approximates the 95% confidence interval of the mean

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 52 posted: 10/16/2009 language: English pages: 29