Module on SAMPLING DISTRIBUTIONS Stanley L. Sclove, PhD
CONTENTS
Worksheet 1 2 3 CONTENTS EXERCISES Illustration with N = 4 and n = 2 POPULATION N=4 mean variance standard deviation DEFINITION OF SAMPLING DISTRIBUTION SAMPLING WITH REPLACEMENT n=2 distribution of sample mean: mean, variance and standard deviation SAMPLING WITHOUT REPLACEMENT n=2 distribution of sample mean: mean, variance and standard deviation SUMMARY
4 5
6
7
Exercises to accompany PBS Chapter 3 (Producing Data) Consider a population of size N = 4. The individuals are A, B, C, D, and their weights are 145, 150, 155, and 170 pounds, respectively. 1. (a) (b) (c) Compute the population mean. Compute the population variance. Compute the population standard deviation.
2.
List all possible samples of size n = 2, taken without replacement from this population of size N = 4. (a) (b) (c) (d) How many samples are there ? List the samples, showing the individuals and their weights. Compute the mean of each sample. Compute the mean of these means. Verify that it is equal to the population mean.
POPULATION Computation of population parameters Four individuals, heights in cm. N = 4 Individual Height (cm.) A 145 B 150 C 155 D 170
heights distances squared Individual (cm.) deviations to mean deviations A 145 -10 10 100 B 150 -5 5 25 C 155 0 0 0 D 170 15 15 225 620 350 divide by N = 4 divide by N = 4 155 cm. 87.5
^ ^
| mean m
| variance s2 9.35 cm.
^
| SD s
DEFINITION The sampling distribution of a statistic
Think of all possible potential samples from the population. Imagine that the statistic is computed for each of these samples and that the resulting values are formed into a histogram. This histogram is the sampling distribution of that statistic, its distribution across all possible samples.
Sampling WITH Replacement
N=4
n=2
The ordered samples are listed first, because they are equiprobable. Sampling Distribution values, 16 x v probs 145.0 1 147.5 2 150.0 3 152.5 2 155.0 1 157.5 2 160.0 2 162.5 2 165.0 0 167.5 0 170.0 1 16 Computation of variance of sample mean (v-m)2 145.0 100.00 295.0 56.25 450.0 25.00 305.0 6.25 155.0 0.00 315.0 6.25 320.0 25.00 325.0 56.25 0.0 100.00 0.0 156.25 170.0 225.00 2480.0 700.00 16 16.00 m = 155.0 43.75 or 87.5/2 or s2/n
ordered sample ( A, A ) ( A, B ) ( B, A ) ( A, C ) ( C, A ) ( A, D ) ( D, A ) ( B, B ) ( B, C ) ( C, B ) ( B, D ) ( D, B ) ( C, C ) ( C, D ) ( D, C ) ( D, D )
unordered 16 x sample prob { A, A } 1 { A, B } 2 { { { { { { { { A, C } A, D } B, B } B, C } B, D } C, C } C, D } D, D } 2 2 1 2 2 1 2 1 16
sample heights mean { 145 , ## } 145.0 { 145 , ## } 147.5 { 145 , ## } { 145 , ## } { 150 , ## } { 150 , ## } { 150 , ## } { 155 , ## } { 155 , ## } { 170 , ## } 150.0 157.5 150.0 152.5 160.0 155.0 162.5 170.0
Sampling WITHOUT Replacement
N=4, n=2 Sampling Distribution
ordered sample ( ( ( ( ( ( ( ( ( ( ( A, B, A, C, A, D, B, C, B, D, C, B A C A D A C B D B D ) ) ) ) ) ) ) ) ) ) )
unordered sample prob { A, B } { A, C } { A, D } { B, C } { B, D } { C, D } 1/6 1/6 1/6 1/6 1/6 1/6 1
heights { 145 , 150 } { 145 , 155 } { 145 , 170 } { 150 , 155 } { 150 , 170 } { 155 , 170 }
sample values, 6 x mean v prob 147.5 150.0 157.5 152.5 160.0 162.5 147.5 150.0 152.5 157.5 160.0 162.5
(v-m)2
1 147.5 56.25 1 150.0 25.00 1 152.5 6.25 1 157.5 6.25 1 160.0 25.00 1 162.5 56.25
6 930.0 ##### mean = 155.0 29.17 /\ | 2 (s /n)(FPC), where FPC = (N-n)/(N-1)
( D, C )
SUMMARY
SAMPLING DISTRIBUTIONS The sampling distribution is the bridge between probability and statistics. POPULATION mean variance standard deviation m s2 s
SAMPLING DISTRIBUTION OF THE MEAN (distribution of the sample mean) mean: equal to the mean of the parent population variance: depends upon the sampling method -SAMPLING WITH REPLACEMENT variance of the parent pop'n, divided by n, i.e., s 2 /n SAMPLING WITHOUT REPLACEMENT (s 2 /n)*FPC, where FPC = (N-n)/(N-1) Usually n << N, so FPC is close to 1 and can be ignored. CENTRAL LIMIT THEOREM Even if the parent distribution is very skewed, the distribution of the sample mean is approximately Normal, unless n is small.