VIEWS: 8 PAGES: 30 POSTED ON: 3/26/2011
Statistical Inference and The Normal Distribution STA 570 401-402 Spring 2006 Review of Inference The group of all individuals we are interested in is called the population. We rarely actually observe the entire population. If our question is “will extending the school year by 5 days increase student learning?” then we are interested in ALL students. We are never going to design an experiment involving ALL students. Parameters Numerical aspects of the population are called parameters. If our population is all people who drive to work, one parameter is their average drive time each morning. Because we rarely see the entire population, parameters are typically unknown. The goal of inference is to estimate these unknown parameters. Samples and Statistics We typically observe a small fraction of the population (we’d prefer to see all of it, but that just typically isn’t practical). The group we observe is called the sample. We see them, we can measure them, etc. Any numerical aspect of the sample is called a statistic. Suppose again we are interested in the drive time of all drivers, and we send out a survey. The people who respond are the sample. Their average drive time is called the sample mean. Statistics to Parameters Fortunately, probability theory tells us that if our sample is drawn correctly (i.e. randomly), then our statistic will be close to our parameter, allowing us to make educated guesses about the parameter of interest. Drawing a random sample is sometimes easy, and sometimes difficult (stay tuned, we’ll cover this more as we go). For now, we’re going to assume we have a good sample. Remember the main idea We do NOT see the parameter, we DO see the statistic. Probability theory says there is a little “tether” connecting the two. Imagine seeing a hot air balloon (the statistic) on a tether over some treetops. You can’t see where on the ground it is tethered (the parameter), but you can make a good guess. Some limitations of the tether idea I like the tether idea, but there are limitations on how far it applies. The “tether” is only probabilistic. It says things like “there is a 95% chance the statistic will be within (some number) of the parameter” and “there is a 99% chance the statistic will be within (some other number) of the parameter”, and so on. More on tethers, continued To get a larger probability, you have to increase the length of the tether. This, I hope, is intuitive. To be more sure of the result, you have to give the statistics more room to move. If you’re aiming at a dartboard, there is a small chance you’ll hit the little circle in the middle. There is a larger chance you’ll hit the dartboard (it’s bigger). There is a great chance you’ll hit the wall. The bigger the target, the better the chance of hitting it. Hence, the longer the tether, the better the chance of finding the parameter. Binomial distribution review Recall a binomial setting consists of a set of 1) dichotomous (two-valued) responses 2) equal chance of success for each response 3) independence (responses do not influence each other) Inference with Binomial distributions Under the binomial setting, if p is the population proportion, then the sample proportion phat has a 95% chance of being within the region p ± 1.96 sqrt(p(1-p)/n) In practice, p is unknown, so we use phat to construct our tether length as well. The length of the tether (really called the “margin of error”) is 1.96 sqrt(phat(1-phat)/n) Binomial Confidence intervals In practice, suppose we have n observations in a binomial setting. We can use those to compute phat (p remains unknown). A 95% confidence interval for p is Phat ± 1.96 sqrt(phat(1-phat)/n) To get a 90% confidence interval, replace 1.96 with 1.645. To get a 99% confidence interval, replace 1.96 with 2.576. Typically large values are used, but you could in theory find a 50% confidence interval, where the coefficient is 0.674 Another example Does a personal phone call make students more likely to enroll? Suppose you sample 200 admitted students at random and make a personal phone call encouraging them to attend your university. Of those 200, 127 eventually enroll. Construct a 90% confidence interval for the proportion of called students who enroll. Another example continued Population = all students who may receive a phone call. Sample = the students you actually called (the 200) phat = 127/200 = 63.5% For 90% confidence, the margin of error is 1.645 sqrt(phat(1-phat)/n) = 1.645 sqrt(0.635*0.365/200) = 1.645 sqrt(0.034) = 0.056. The 90% confidence interval is 0.635 ± 0.056, or between 57.9% and 69.1% To repeat, because it’s important If you want more confidence (a better chance of your interval containing the parameter), you have to increase the width of your interval (that’s why the coefficients increase, from 1.645 for 90% to 2.576 for 99%) Larger sample sizes produce more accuracy than smaller sample sizes. Normal Distributions So where did the 1.96, the 1.645, and the 2.576 come from? Answer – the normal distribution, also known as a Gaussian distribution, the error function, the “bell curve”, and probably others. In any case, the normal distribution is your friend. You’ve probably all seen a bell curve… The Normal distribution is common Lots of real data follows a normal shape. For example 1) Many/most biometric measurements (heights, femur lengths, skull diameters, etc.) 2) Scores on many standardized exams (IQ tests) are forced into a normal shape before reporting 3) Many quality control measurements, if you take the log first, have a normal shape. When sampling from a normal Normal distributions are typically characterized by two numbers, their mean or “expected value” which corresponds to the peak, and their “standard deviation” which is the distance from the mean to the inflection point. Large standard deviations result in “spread out” normals. Small standard deviations result in “strongly peaked” distributions. Two normals, corresponding to different standard deviations. Mean=100, std.dev = 16 Mean=100, std.dev = 4 Probabilities from a Normal distribution Normal distributions have a nice property that, knowing the mean (μ) and standard deviation (σ), we can tell how much data will fall in any region. Examples – the normal distribution is symmetric, so 50% of the data is smaller than μ and 50% is larger than μ. More Normal Probabilities It is always true that about 68% of the data appears within 1 standard deviation of the mean (so about 68% of the data appears in the region μ±σ) Yet more normal probabilities It is also true about 95% of the appears within 2 standard deviation of the mean, and about 99.7% of the data appear within 3 standard deviations of the mean (so it’s VERY rare to go beyond 3 standard deviations Preview of coming attractions, the EXACT number is that 95% of the data is within 1.96 standard deviations of the mean. That’s where the 1.96 comes from. 95% within 2 standard deviations, 99.7% within 3 standard deviations Computing more general probabilities Suppose you want to know how much data appears within 1.5 standard deviations of the mean, or how much data appears between 1.3 and 1.7 standard deviations of the mean. Real answer – use SAS or any of several other programs. Another way There is another way of computing normal probabilities that is 1) the way it used to be done, back in pre-handy-computer days, 2) useful for understanding more about the normal distribution. The number of standard deviations an observation is from the mean is called the Z- score for that observation. Z-score examples If μ=100 and σ=16 (this is true of IQ scores in the U.S.), then an observation X=125 is 25 points above the mean, which corresponds to 25/16 = 1.5625 standard deviations above the mean. If general, a Z-score for an observation X is Z=(X-μ)/σ Observations above the mean get positive Z- scores, observations below the mean get negative Z-scores. Computing probabilities with Z-scores Fortunately, the Z-score is all you need to know to compute probabilities from a normal distribution. The reason is that Z-scores map directly to percentiles. For each Z-score SAS can provide the percentile (to be shown in lab). For example, if the Z-score is 1, the percentile is 84.13%. If the Z-score is 2.3, then the percentile is 98.93% Probabilities between Z-scores Again, IQ scores are normally distributed with mean 100 and standard deviation 16. How many people have IQ scores between 90 and 120? Compute the corresponding Z-scores. For 90, the Z- score is (90-100)/16 = -0.625. For 120, the Z-score is (120-100)/16 = 1.25. Find the corresponding percentiles (SAS). The percentile for Z=1.25 is 89.43%. The percentile for Z=(-0.625) is 26.6%. The amount between these is 89.43 – 26.60 = 62.83% Comparing observations from different normal distributions The central idea is that a Z-score corresponds to a percentile for the observations. If you have observations from multiple normal distributions, you can compute the Z- score for each observations and compare which has the “better” score. Example Suppose you have two students, one with a 23 on the ACT (mean 22 and standard deviation 3) and another with a 1220 on the SAT (mean 900 and standard deviation 250). The Z-score for the student with the ACT is (23-22)/3 = 0.33 while the Z-score for the student with the SAT is (1220-900)/250 = 1.28. The student with the SAT performed much better (relative to peers on the exam).