VIEWS: 1 PAGES: 46 POSTED ON: 11/28/2011
Variable N Mean Median TrMean StDev SE Mean Drink 50 17.250 17.050 17.168 2.998 0.424 Risky 50 40.984 40.900 40.848 4.380 0.619 South 50 0.2200 0.0000 0.1818 0.4185 0.0592 Income98 50 25677 25447 25455 3681 521 MADDtota 50 6.540 7.000 6.523 1.971 0.279 ENFORCE 50 6.400 6.000 6.409 2.339 0.331 YouthPEE 50 6.740 7.000 6.750 2.732 0.386 LAWS 50 5.960 6.000 5.955 2.338 0.331 Variable Minimum Maximum Q1 Q3 Drink 10.300 24.700 15.500 19.275 Risky 33.900 51.500 37.525 43.900 South 0.0000 1.0000 0.0000 0.0000 Income98 19635 37108 22567 28112 MADDtota 3.000 11.000 5.000 8.000 ENFORCE 1.000 12.000 5.000 8.000 YouthPEE 1.000 12.000 4.750 9.000 LAWS 1.000 11.000 4.750 7.000 “Describe the univariate characteristics of your variables: I operationalized underage drinking by using data available at: http://www.samhsa.gov/oas/NHSDA/99YouthState/appb.htm#b1b These data were estimates of the percentage of 12-17 year olds who reported using alcohol in the past month during 1999. The average per state is 17.250 with a standard deviation of 2.998. The states with the lowest percentage were Utah (10.3%) and Virginia (12.8%). Utah’s value is particularly low – nearly a full standard deviation below the next lowest score indicating that youths in Utah are quite different from the rest of the country (or very reluctant to admit they are the same!) States with the highest incidence of youth drinking were North Dakota (24.7%) and Montana (23.6%). Seeing these numbers along with high values for South Dakota (21.2), Wyoming (22.1) and Colorado (20.8) make it appear that underage drinking may be a particular problem in rural states. State % of 12-17 Year Olds Who Report Using Alcohol In Past Month, 1999 15 10 Frequency 5 0 10.0 11.5 13.0 14.5 16.0 17.5 19.0 20.5 22.0 23.5 25.0 Drink Steps of Hypothesis Testing 1. State Research Hypothesis: HR & Null Hypothesis: H0 Choose p (probability) value – most likely .05 Weigh chance of Type I error vs. Type II 2. Choose appropriate test. 3. Compute test statistic. 4. Get critical value. 5. Compare test statistic with critical value. 6.7., 8. Make your Conclusion with a probability level. If test statistic > critical value “Reject the null hypothesis and temporarily accept the research hypothesis at the (.__) level.” .__ is given by p If test statistic < critical value…. “Fail to reject the null hypothesis at the (.__) level” z test X Z / n When is the z test appropriate? When we have population parameters for one group and sample statistics for another. What is the independent variable and what is the dependent variable? The INDEPENDENT VARIABLE is whatever defines the groups you are comparing. If you compare the mean of ideology for a sample of Republicans to the mean of all Democrats, your hypothesis must be: Party Affiliation affects ideology. Be sure to take time to figure out what the DV and IV is for these tests. Directional vs. Non-Directional Hypotheses It is better to specify what kind of relationship we expect – positive or negative. A non-directional hypothesis doesn’t specify a direction. H0: μ1 ≠ μ2 Could mean that μ1 > μ2 or that μ1 < μ2 A non-directional hypothesis is called a “two-tailed” hypothesis. It is looking in two directions – above and below and we will reject the null if we find compelling evidence of either. So, a directional hypothesis, such as: H1: μ1 > μ2 is referred to as a “one tail test.” What is the null hypothesis here? H0: μ1 ≤ μ2 So we will reject the null if and only if the value of the sample mean of 1 is demonstrably above the population mean of 2. The critical values are different: Level of Significance One Tail Two-Tail .05 1.65 1.96 .01 2.33 2.58 .001 3.09 3.29 SAMPLING DISTRIBUTION The sample distribution is the distribution of all possible sample means that could be drawn from the population. Key Point: how does the z-score relate to hypothesis testing with the z test? The Z-test statistic is the Z-score of a particular distribution: the sampling distribution of sample means. This is the frequency distribution that would be obtained from calculating the means of all theoretically possible samples of a designated size that could be drawn from a given population. Huh? We have a population. We take a sample of size n and compute the mean. Keep track by placing the mean on a frequency distribution – or graphing it in a histogram. Then we do this again and place the new mean value on the frequency distribution and on the histogram. Then do this again and again until we have taken every possible sample. We will end up with a distribution that begins to look normally distributed. The distribution of these means from samples is called the sampling distribution of sample means. A sample of 3 students from a class – a population – of 6 students and measure students GPA Student GPA Susan 2.1 Karen 2.6 Bill 2.3 Calvin 1.2 Rose 3.0 David 2.4 Draw each possible sample from this ‘population’: Karen 2.6 Susan 2.1 Bill 2.3 Calvin 1.2 David 2.4 Rose 3.0 With samples of n = 3 from this population of N = 6 there are 20 different sample possibilities: N N! 6 5 4 3 2 1 720 n n!( N n)! 3 2 13 2 1 36 20 Note that every different sample would produce a different mean and sd ONE SAMPLE = Susan + Karen +Bill / 3 = 2.1+2.6+2.3 / 3 X = 7.0 / 3 = 2.3 Standard Deviation: (2.1-2.3) 2 = .22 = .04 (2.6-2.3) 2 = .32 = .09 (2.3-2.3) 2 = 02 = 0 s2=.13/3 and s = .043 =.21 So this one sample of 3 has a mean of 2.3 and a sd of .21 What about other samples? ► A SECOND SAMPLE X = Susan + Karen + Calvin = 2.1 + 2.6 + 1.2 = 1.97 SD = .58 ► 20th SAMPLE X = Karen + Rose + David = 2.6 + 3.0 + 2.4 = 2.67 SD = .25 SIMPLE EXAMPLE OF A SAMPLING DISTRIBUTION ► Assume the true mean of the population is known, in this simple case of 6 people and can be calculated as 13.6/6 = =2.27 ► The mean of the sampling distribution (i.e., the mean of all 20 samples) is 2.30. What is a Sampling Distribution? ►A distribution made up of every conceivable sample drawn from a population. ► A sampling distribution is almost always a hypothetical distribution because typically you do not have and cannot calculate every conceivable sample mean. ► The mean of the sampling distribution is an unbiased estimator of the population mean with a computable standard deviation. Second Example from Text We have a population that contains only 5 individuals. X1 = 1 X2 = 2 X3 = 3 X4 = 4 X5 = 5 Since this is the population, we know that 3 We are going to draw a sample of 3. There are 60 ways that this could be done but if we pay attention to order only 10. Sample 1: X5,X4,X3 So X = 4 Sample 2: X5,X4,X2 So X = 3.7 Sample 3: X5,X4,X1 So X = 3.3 Sample 4: X5,X3,X2 So X = 3.3 Sample 5: X5,X3,X1 So X = 3.0 Sample 6: X5,X2,X1 So X = 2.7 Sample 7: X4,X3,X2 So X = 3.0 Sample 8: X4,X3,X1 So X = 2.7 Sample 9: X4,X2,X1 So X = 2.3 Sample 10: X3,X2,X1 So X = 2.0 Frequency Distribution So we can make a frequency distribution of all possible samples: X f 4.00 1 3.67 1 3.33 2 3.00 2 2.67 2 2.33 1 2.00 1 Or we can make a histogram of all possible samples: Sampling Distribution of Sample Means 20 Percent 10 0 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 C1 Central Limit Theorem If all possible random samples, each the size of your sample, were taken from any population then the sampling distribution of sample means will have: ► a mean equal to the population mean ► a standard deviation equal to n The sampling distribution will be normally distributed IF EITHER: ► the parent population from which you are sampling is normally distributed OR ► IF the sample size is greater than n=30. Sampling Distribution is a Probability Distribution ► The mean of each sample is a random variable …with each mean varying according to the laws of probability. ► The CLT says that if we have a sample size greater than n > 30 the sampling distribution of means will be normally distributed. ► The sampling distribution would have a standard deviation called the standard error equal to n The standard error is the margin of error either side of the sample mean _________________________________________________________________________________________ Mean +1 SE +2 SE +3SE Note: as the sample size increases the margin of error gets smaller, that is, the sampling distribution gets more peaked, thus your estimate gets more precise. Sampling Distribution NDC Given a large enough sample size, n > 30, the sampling distribution from which you are drawing your one sample will be normally distributed regardless of the shape of the population’s characteristic. Thus, you can legitimately compute a sample mean and sample standard deviation and make inferences about a population characteristic. Illustration: A NDC from a Uniform Distribution: When and how the sampling distribution approaches normality as the sample size increases – and thereby lets us use the 68-96-99.7 rule when making inferences about the population In this example the mean would be 3.5: X [1 2 3 4 5 6] = 21 /6 = = 3.5 N 6 You could also compute the sd of this distribution: X X 2 … N X X X X X 2 1 1– 3.5 = -2.5 6.25 2 2 – 3.5 = -1.5 2.25 3 3 – 3.5 = -.5 .25 4 4 – 3.5 = .5 .25 5 5 – 3.5 = 1.5 2.25 6 6 – 3.5 = 2.5 6.25 21 17.5 21 17.5 3.5 2.9 1.7 6 6 ILLUSTRATION OF SAMPLING DISTRIBUTIONS Slide 13: A uniform distribution of the population we would get by repeatedly rolling one die and recording the number. Here the numbers 1 thru 6 are equally probable, so the distribution is flat, not a Normally Distributed variable. In each of the ensuing figures we draw 500 different SRSs. The figures show graphically what happens to the shape of the sampling distribution as the size of the sample increases from averaging over 2 throws of the die (i.e., n=2) thrown 500 times to n=20 (each of 20 dies thrown 500 times). In all cases we record the mean and sd for 500 samples. 500 Samples of n = 2 We threw 2 dies; added up the total number of points of the 2 die, and divided by 2 to obtain the mean. We repeated this process 500 times, each time recording the mean of the 2 die and used these outcomes to build the histogram. 500 Samples of n = 4 500 Samples of n = 6 500 Samples of n = 10 500 Samples of n = 20 Key Observations ► Asthe sample size increases the mean of the sampling distribution comes to more closely approximate the true population mean, here known to be = 3.5 ► AND-this critical-the standard error-that is the standard deviation of the sampling distribution – gets systematically narrower. Three main points about sampling distributions as the sample size gets bigger the ► Probabilistically, sampling distribution better approximates a normal distribution. ► The mean of the sampling distribution will more closely estimate the population parameter as the sample size increases. ► The standard error (SE) gets narrower and narrower as the sample size increases. Thus, we will be able to make more precise estimates of the whereabouts of the unknown population mean. THE MEAN OF THE SAMPLING DISTRIBUTION MX The mean of a sampling distribution ( M X ) is made up of all possible SRSs of the same size as your sample. Its mean will equal the population mean from which it was drawn. The distribution of sample means will be normally distributed, centered at the population mean with a standard deviation of the sampling distribution, called the standard error (SE). ESTIMATING THE POPULATION MEAN We are unlikely to ever see a sampling distribution because it is often impossible to draw every conceivable sample from a population and we never know the actual mean of the sampling distribution or the actual standard deviation of the sampling distribution. But, here is the good news: We can estimate the whereabouts of the population mean from the sample mean and use the sample’s standard deviation to calculate the standard error. The formula for computing the standard error changes, depending on the statistic you are using, but essentially you divide the sample’s standard deviation by the square root of the sample size. Don’t be confused the standard deviation of your sample: X X 2 n the standard deviation of sampling distribution: SE n What we want to do now is to take the next step, to learn how to substantiate our conclusions -- to learn how to back up our conclusions with analyses that will reflect how much confidence we should have that our estimate of say the mean of the population -- which is being estimated from our sample -- is at or close to the true population mean. Note that we rarely know the standard deviation of the population or the standard deviation of the sampling distribution. The standard error must be estimated by using the standard deviation of your sample and dividing by N – 1. The Standard Error For Samples: 2 SE (X X ) N 1 s or, same thing, SE N 1 What we are trying to do is locate the unknown whereabouts of the population mean. Probabilistically speaking mu is at or somewhere either side of the sample mean. NDC as Sampling Distribution μ= X = 97 99.75 Two Steps in Statistical Inferencing Process 1. Calculation of “confidence intervals” from the sample mean and sample standard deviation within which we can place the unknown population mean with some degree of probabilistic confidence 2. Compute “test of statistical significance” (Risk Statements) which is designed to assess the probabilistic chance that the true but unknown population mean lies within the confidence interval that you just computed from the sample mean. So, first we calculate confidence limits and then test for statistical significance, which is the probability of mu being within the CIs we computed. Both these steps are required when making inferences about the whereabouts of the unknown population mean. Both the calculation of confidence intervals and then the calculation of a measure of statistical likelihood -- are based on the probabilistic patterns of a sampling distribution. Together, the confidence limits and statistical test tells us the probability as to what would happen IF we sampled the population not once but an infinite number of times. That is, we are sampling from a sampling distribution.This kind of inferencing is the hallmark of statistics. In Summary