Hypothesis Testing and p-values Sporiš Goran, PhD. http://kif.hr/predmet/mki http://www.science4performance.com/ Overview We will discuss two approaches to hypothesis testing: 1) Using confidence intervals 2) Using critical t or z values, or p values Hypothesis Test A statistical hypothesis is simply a claim about a population that can be put to the test by drawing a random sample. Elements of a Hypothesis Test 1. The null hypothesis, Ho: Specifies hypothesized values for one or more of the population parameters 2. The alternative hypothesis, HA: A statement which says that the population parameter is something other than the value specified by the null hypothesis Elements of a Hypothesis Test 3. The error level, or , which is just 1 - the confidence level (in terms of probability) 4. One-tailed versus two-tailed tests Example #1 Suppose we want to know if there is a difference in the salaries for male and female professors. We might take two samples, one of men and one of women, to determine their respective mean salary levels. The calculated M1 and M2 are estimates of the population means, 1 and 2. Ho: 1 - 2 = 0, or Ho: 1 = 2 HA: 1 - 2 0, or HA: 1 2 Example #1 (continued) This is stated as a two-tailed test. If you believe that women make less than men, then the alternative hypothesis might be something like: HA : 1 - 2 > 0 Example #2 As an employee of the Federal Trade Commission, you are vigilant in your stand against false or misleading advertising. A manufacturer of razor blades claims that their new blades give on average 15 good shaves. You conduct a small test by asking 10 randomly chosen men to each try one of these new razor blades. The average number of good shaves reported is 13 and the standard deviation is 3.62. The manufacturer claims that the true number of shaves (or population value) is 15, or: Ho: = 15 Example #2 (continued) If we want to challenge the manufacturers claim, we might employ a one-tailed test, where the alternative hypothesis would be: HA: < 15 Or if we were agnostic, we could use a two- tailed test: HA: 15 Example #3 A more general test that we will see when we get to regression is where the null hypothesis is equal to zero, and we want to know if our parameters have a statistically significant effect, or are different from zero. For example, suppose a researcher wants to determine if the amount of electoral rules are related to voter turnout. Suppose the impact of electoral rules on voter turnout is called . A typical hypothesis test will be something like the following: Ho: Electoral rules have no impact on voter turnout, or = 0 HA: Electoral rules affect voter turnout, or 0 Testing Hypotheses: Confidence Intervals Let's start with the first example I gave where we want to see if there is a difference in the mean salary level (in thousands of dollars) of male and female professors. Suppose that for men (M1 = 16, n1=10, (X1 - M1)2 = 106) and for women (M2 = 11, n2 = 5, (X2 - M2)2 = 40). Testing Hypotheses: Confidence Intervals We calculate the confidence interval as: (1 - 2) = (M1 - M2) +/- t/2 sp(1/n1 + 1/n2) We need to calculate the value of sp, which is just: sp2 = (X1 - M1)2 + (X2 - M2)2 (n1 - 1) + (n2 - 1) = (106 + 40)/[(10-1) + (5-1)] = 146/13 = 11.2, thus sp = 11.2 = 3.35 Testing Hypotheses: Confidence Intervals We plug this back into our confidence interval to obtain: (1 - 2) = (16 - 11) +/- 2.16 (3.35)* (1/10 + 1/5) = 5 +/- 4 Note: 2.16 is the critical value of t, for 95% confidence, 13 df (for a two tailed test). Testing Hypotheses: Confidence Intervals With 95% confidence, the difference between our means is estimated to be between 1 and 9, thus the claim that there is no difference cannot be accepted, i.e., we can reject the null hypothesis. Zero is not contained in the interval. In general, any hypothesis that lies outside the confidence interval may be rejected. Thus the confidence interval may be regarded as the set of acceptable hypotheses. Testing Hypotheses: p-values Let's go back to our one-tailed test by the FTC employee, who wants to determine if the razor blade manufacturer's claim of 15 good shaves is valid. Ho: = 15 HA: < 15 Testing Hypotheses: p-values A p-value is just the probability that the sample value would be as large as the value actually observed if Ho is true. In other words, the p-value summarizes how much agreement there is between the data and the null hypothesis. In this case, the null is that the razors give 15 good shaves. Testing Hypotheses: p-values We start by calculating the t or z statistic associated with our observed value. We would use t in this problem because the sample size is small (N=10), and the population standard deviation is unknown (when the sample size is large, t and z are equivalent). Testing Hypotheses: p-values t = M - o = 13 - 15 = -1.74 s/N 3.62/10 We can think of t as: t = estimate - null hypothesis standard error If the null hypothesis is zero, then t = estimate/standard error. In this case, the t ratio simply measures the size of the estimate relative to its standard error. Testing Hypotheses: p-values Now we want to find the area beyond that value of t, which gives us the p-value. In this problem, t = -1.74. To find the p-value, we need to take into account our degrees of freedom, df = n-1 in this problem, which is 9. We go to the t-table and look to see where our calculated t falls relative to the cutoff values for various probability values. Our value of t is between the t values of 1.38 (p=.10) and 1.83 (p=.05). Thus we can say that our p-value is between .10 and .05. Comparing our p-value or t/z statistic with a critical p-value or t/z A classical hypothesis test consists of setting a critical value, which will give us the reject and accept regions. For example, for a one-tailed test, with 95% confidence ( = .05), we use a value of z = 1.64 as our critical value, or for a two-tailed test, we use z = 1.96. Comparison (continued) We reject the null hypothesis if our calculated t or z is beyond the critical t or z, or if the p- value is . In the above example, a 95% critical t value for 9 df is 1.83. Since our calculated t does not exceed the critical t (or fall in the reject region), we must accept the null hypothesis, the manufacturer's claim of 15 good shaves. Also, our p-value is larger than , which is .05. The Critical Region A way to think about a calculated value in the critical region is that: 1) Ho is true, but we have been exceedingly unlucky and got a very improbable sample. 2) Ho is not true after all. Thus it is no surprise that our observed value was so high or low. The Critical Region When we calculated the difference between male and female professors' salaries, if the difference is really large, then we would expect to find something in the tails, very far away from the center of the distribution where the difference is zero. Type I and Type II Errors Choosing an alpha level is tricky because it sets the level at which we will reject the null hypothesis. And there is a chance that the higher this value is, the greater the chance that we will falsely reject a true Ho. Type I and Type II Errors State of the World Ho Accepted Ho Rejected If Ho is true Correct decision Type I error Pr = 1- Pr = If Ho is false Type II error Correct decision Probability = Probability = 1 - = power of the test Type I and Type II Errors To give you an analogy, in a court of law we assume people are innocent (the null hypothesis) until proven guilty. A Type I error would be finding an innocent man guilty. A Type II error would be letting a guilty man go free. Which is worse? Type I and Type II Errors By decreasing our error or alpha level, we will increase the chance of a Type II error (accepting the null when it is really false) because we make the criteria for rejection more stringent. The only way that error can be reduced without increasing the probability of a Type II error is by increasing the sample size.
Pages to are hidden for
"Hypothesis Testing"Please download to view full document