Document Sample

The Normal Distribution, Hypothesis Testing, and t tests Chapter(s) in basic textbook Wild & Seber (2000). Chance encounters: A first course in data analysis and inference. John Wiley & Sons. Howell (Chapters 3, 4, and 7) The Normal distribution ( x )2 1 f ( x) exp 2 2 2 The Normal distribution Chest measurements of 5738 Scottish soldiers by Belgian scholar Lambert Quetelet (1796-1874) First application of the Normal distribution to human data CHANCE ENCOUNTERS 0.2 0.1 Normal density curve has = 39.8 in., = 2.05 in. 0.0 35 40 45 (a) Chest measurements of Quetelet’s Scottish soldiers (in.) .06 Normal density curve has .04 .02 .00 = 174 cm, = 6.57 cm 150 160 170 180 190 200 (b) Heights of the 4294 men in the workforce database (cm) Figure 6.2.1 Two standardized histograms with approximating Normal densitycurves. From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000. Slide 4 Wild & Seber © Wiley 2000 CHANCE ENCOUNTERS The Normal distribution density curve Is symmetric about the mean Mean = Median Figure 6.2.2 50% 50% Mean Slide 5 Wild & Seber © Wiley 2000 CHANCE ENCOUNTERS Effects of and (a) Changing shifts the curve along the axis = 2= (b) Increasing increases the spread and flattens the curve 1 1 =6 2= 6 12 140 1 160 180 2 =174 200 140 160 1 180 200 = 160 = 2 =170 Slide 6 Wild & Seber © Wiley 2000 CHANCE ENCOUNTERS Understanding the standard deviation (c) Probabilities and numbers of standard deviations Shaded area = 0.683 Shaded area = 0.954 Shaded area = 0.997 68% chance of falling between and 95% chance of falling between and 99.7% chance of falling between and Slide 7 Wild & Seber © Wiley 2000 Standardizing For any Xi value from a Normal population with mean and standard deviation , the value Z Xi Tells us how many standard deviations from the mean the value Xi is located Z ~ N(0,1): standard Normal distribution Z score Normal deviate CHANCE ENCOUNTERS The sample mean has a sampling distribution Sampling batches of Scottish soldiers and taking chest measurements. Pop mean = 39.8 in, Pop sd = 2.05 in Sample number 1 2 3 4 5 6 7 8 9 10 11 12 34 36 38 40 42 Chest measurement (in.) 44 46 (a) 12 samples of size n = 6 From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999. Fig. 7.2.1 (a) Slide 9 Wild & Seber © Wiley 2000 CHANCE ENCOUNTERS Twelve samples of size 24 Sample number (b) 12 samples of size n = 24 1 2 3 4 5 6 7 8 9 10 11 12 34 36 38 40 42 Chest measurement (in.) 44 46 Fig. 7.2.1 (b) From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000. Slide 10 Wild & Seber © Wiley 2000 CHANCE ENCOUNTERS Histograms from 100,000 samples (a) n = 6 0.5 0.0 37 38 39 40 41 42 1.0 0.5 0.0 (b) n = 24 37 38 39 40 41 42 (c) n = 100 1.5 1.0 0.5 0.0 37 40 41 38 39 Sample mean of chest measurements (in.) 42 Figure 7.2.2 Standardised histograms of the sample means from 100,000 samples of soldiers (n soldiers per sample). From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000. Slide 11 Wild & Seber © Wiley 2000 CHANCE ENCOUNTERS Mean and std dev of the sampling distribution E(sample mean) = Population mean Population standard deviation sd(sample mean) = Sample size Standard error of the mean (SEM) E(X ) E(X) , sd( X) sd ) (X n n Slide 12 Wild & Seber © Wiley 2000 Sampling distribution of the mean If random samples of size n are drawn from a normal population, the means of these samples will conform to a normal distribution Normal deviate of means Z X i allows us to ask probability statements about means What is the probability of obtaining a random sample of nine measurements with a mean larger than 50 mm from a population having a mean of 47 mm and an SD of 12 mm? (if the answer is not obvious, try it at home…) Sampling distribution of the mean The distribution of means from a non-Normal population will not be normal Is this a problem? CHANCE ENCOUNTERS Central Limit Effect -- Histograms of sample means (a) Triangular (a) Triangular n=1 n=1 2 2 1 1 0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0.2 0.4 0.6 0.8 1.0 2 2 1 1 0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0.2 0.4 0.6 0.8 1.0 2 2 1 1 0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0.2 0.4 0.6 0.8 1.0 3 3 n=2 n=2 4 4 3 3 2 2 1 n=4 n=4 n = 10 n = 10 5 4 5 4 3 2 3 2 1 0 1 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Wild & Seber © Wiley 2000 1 0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0.2 0.4 0.6 0.8 1.0 From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000. From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000. Slide 15 CHANCE ENCOUNTERS Central Limit Effect -- Histograms of sample means (b) Uniform n=1 2 2 3 2 1 1 1 0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0 0.0 0.2 0.4 0.6 0.8 1.0 n=2 0.2 0.4 0.6 0.8 1.0 n=4 3 2 1 0 0.0 0.2 0.4 0.6 0.8 1.0 4 3 2 1 0 0.0 n = 10 0.2 0.4 0.6 0.8 1.0 From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000. Slide 16 Wild & Seber © Wiley 2000 CHANCE ENCOUNTERS Central Limit Effect -- Histograms of sample means (a) Exponential 1.0 0.8 0.6 0.4 0.2 0.0 0 1 2 3 4 5 6 n=1 1.0 0.8 0.6 0.4 0.2 0.0 0 1 2 3 4 5 6 0.8 0.6 0.4 0.2 0.0 0 1 n=2 2 3 4 n=4 1.0 0.8 0.6 0.4 0.2 0.0 0 1 2 3 1.2 0.8 0.4 0.0 0 n = 10 1 2 From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000. Slide 17 Wild & Seber © Wiley 2000 CHANCE ENCOUNTERS Central Limit Effect -- Histograms of sample means (b) Quadratic U n=1 3 3 2 1 0 0.0 0.2 0.4 0.6 0.8 1.0 2 1 0 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 0 0.0 0.2 n=2 0.4 0.6 0.8 1.0 n=4 3 2 1 0 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 0 0.0 n = 10 0.2 0.4 0.6 0.8 1.0 From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000. Slide 18 Wild & Seber © Wiley 2000 CHANCE ENCOUNTERS Central Limit Theorem: When sampling from almost any distribution, X is approximately Normally distributed in large samples. Slide 19 Wild & Seber © Wiley 2000 Population and sample E(X ) E(X) , sd( X) sd ) (X n n Population sigma is unknown… good estimate is the sample standard error of the mean CHANCE ENCOUNTERS The standard error of the mean The standard error of the sample mean is …. an estimate of the std dev of the sample mean a measure of the precision of the sample mean as an estimate of the population mean given by se( x ) = Sample standard deviation Sample size sx se(x ) = n Slide 21 Wild & Seber © Wiley 2000 Hypothesis testing H0: 0 (null hypothesis) Ha: 0 (alternate hypothesis) Type I error: probability of rejecting of the null hypothesis when it is in fact true: Finding a difference that is not there… Type II error: probability of not rejecting the null hypothesis when it is in fact false: Not finding a difference when it is there… Power of a statistical test: 1 – : probability of rejecting the null hypothesis when it is in fact false and should be rejected Type I and Type II errors Type I: is the specified significance level Type II: generally unspecified and unknown Both types of errors may be reduced simultaneously by increasing n t distribution Z Xi is a property of the population: always unknown… Solution: use s ( X ) as approximation Good estimate of only if n is VERY large! Better strategy: utilize t Xi t s( X ) which is distributed according to a t distribution For hypotheses concerning the mean: df = n – 1 df → ∞ t converges to the Normal distribution t distribution df = inf df = 3 df = 1 CHANCE ENCOUNTERS Student’s t-distribution For random samples from a Normal distribution, T (X ) / se(X ) is exactly distributed as Student(df = n - 1) but methods we shall base upon this distribution for T work well even for small samples sampled from distributions which are quite non-Normal. Slide 26 Wild & Seber © Wiley 2000 Robustness Fortunately, the t test is robust: validity is not seriously affected by moderate deviations from the normality assumption Effect of non-normality is greater for smaller ; effect decreases as n increases; For symmetric distributions there is little effect of departure from normality; Much smaller effect for two-tailed testing than for onetailed testing. Variability about the mean Reporting variability in data Describing the population being sampled: X and SD Describing the precision of estimation of the population mean: SE SE < SD SD SE 2 x SE intervals X 2s Examination of a t table will show that this interval approximates the 95% confidence interval of the mean

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 52 |

posted: | 10/16/2009 |

language: | English |

pages: | 29 |

OTHER DOCS BY dkkauwe

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.