Document Sample

Introduction to Hypothesis Testing Everyday phenomenon We ask ourselves questions everyday. Here are a few examples: Is it going to rain today? Will the coin turn up Heads? Will this movie be good? Uh-oh! Is this exam going to hard? Can I jump off the top of Miriam Hall and survive? (actually you may not want to test this one – at least not everyday). Everyday phenomenon Clearly, we do make decisions based on our answers to these questions. How do we make our decisions? Usually based on apriori probabilities or aposteriori probabilities Apriori and Aposteriori probabilities An apriori probability is one that can be mathematically computed based on the possible outcomes. An aposteriori probability is one is estimated based on experience or an understanding of factors that affect the phenomena. Apriori and Aposteriori probabilities The probability of the outcome of a coin-tossing experiment can be determined prior to tossing the coin. Hence, apriori. The probability of precipitation on a given day cannot be determined apriori. – The probability of rain on a particular day can only be determined after studying the various associated phenomena with rain. Hence, aposteriori. Questions lead to hypotheses The earlier questions can also be re-phrased as hypotheses. We then decide to either accept the hypotheses or reject them and take action accordingly. Questions lead to hypotheses For example, Is it going to rain today? can be rephrased as “It is not going to rain today” or “It is going to rain today” etc. We test such hypotheses everyday Typically, we form hypotheses based on some observation, result, expectation, or suspicion. Examples of such observations or expectations: – The suspect’s hat was found near the victim’s body. – The biochemistry of a new drug suggests that it must be more effective in reducing cancerous cells than the current drug. – This sample of herring looks somewhat bigger than the herring we are used to (the Atlantic herring, Clupea harengus) – they probably belong to another species. (I have to include at least one fish example!) – There are more mosquito larvae in this stream when compared to the one upstream. The initial hypothesis Our first hypothesis is conservative, where we deny our expectation or observation For the examples given, our hypotheses would be: – The suspect is not guilty. – This new drug is no more effective in reducing cancerous cells than the current drug. – Although this sample of herring looks somewhat bigger than Clupea harengus, they do not belong to another species. – This mosquito larvae density in this stream is no different from that of the one upstream. The hypothesis of no difference Note the language in forming the hypotheses. We have used terms such as not, no more, do not, no different from. That is, we are stating that the observation or result in question could easily have come from the “regular” or “original” population. In other words, it is no different from the values in the original or regular population. The reference population is not the expected population The regular or original population in our examples would be – The population of innocent people – The population of results when cancerous cells are treated with the current drug – The population of Clupea harengus – The mosquito larval population in the upstream stretch of the river. The null hypothesis Because we are conservative in this hypothesis and state that the observed result (e.g., sample mean) is no different from the regular population mean, – We term this initial hypothesis the null hypothesis, or hypothesis of no difference. – The null hypothesis is symbolized by H0. The logic OK. We have formed the null hypothesis. What next? Well, we then test it. What does that entail? This is done by computing the probability of the null hypothesis being true, given the evidence. (How we do this depends on the variable we are dealing with – to be discussed at length during the semester.) If this probability is very low then we reject the null hypothesis; if the probability is high, we accept it. The alternative hypothesis Obviously, if we reject the null hypothesis, we need to accept some other hypothesis – an alternative hypothesis. So we really begin by making two hypotheses – the null hypothesis and an alternative hypothesis (symbolized H1). We test the null hypothesis. If we accept it, that’s the end of that story. If we reject it, we then are forced to accept the alternative hypothesis. The alternative hypothesis Note that only the null hypothesis (H0) is tested, and if H0 is rejected, the alternative hypothesis (H1) is automatically accepted (without further testing). It is therefore, very important that we make the hypotheses carefully. The two hypotheses must also complement each other (i.e., there should be no third alternative) Example The saola (Pseudoryx nghetinhensis), is a species of ruminant from Vietnam that was unknown to science until 1993. (I’m making this part up): 17 specimens were found, and 14 were males and 3 were females. Because of this disparity the question might arise: is the sex-ratio in this species really 1:1? Example…contd. How do we test this? The possibilities are the following: – The sex-ratio of this species is really 1:1 but it just happened by chance that this sample had a skewed ratio. – The sex-ratio of this species is biased in favor of males. – The two sexes are unequal in frequency. Example…contd. The question is: – Which of these scenarios is true? – And how do we find out? Example…contd. One way of resolving the issue of course, is to go out into the jungles of Vietnam and find more saolas. An easier and cheaper way (if less fun) is to use probability theory and what we have learnt about known distributions (in this case, the binomial distribution). Of course, we have to assume that the group of saolas was randomly sampled. Example…contd. It seems reasonable to assume the binomial distribution (remember the conditions?). We now approach the problem thus: Our null hypothesis is that the population sex-ratio is truly 1:1, and that any observed difference is just by chance. Example…contd. How do we test if the null hypothesis is true? We determine the probability of obtaining a sex-ratio of 14:3 or worse (i.e., more biased) in a sample of 17 randomly chosen animals, when the population sex-ratio of the saolas is truly 1:1. In other words, what is the probability of observing a female proportion of 3/17 = 0.1765 or lesser in a random sample of 17 animals if the true proportion is 0.5? We can answer this question by computing binomial probabilities with n = 17 and p = q = 0.5. Example…contd. n =17, p = q = 0.5 P(Y) = nCYpYq(n-Y) (# of females) (# of males) Relative exp. Y (n – Y) freqs (f) 0 17 0.000,007,63 P(Y ≤ 3) = 1 16 0.000,129,71 0.006,363,42 2 15 0.001,037,68 3 14 0.005,188,40 4 13 0.018,157,91 Example…contd. Therefore, P(Y ≤ 3) = 0.006,363,42. Typically, the probability of an outcome must be at least 5% (P ≥ 0.05) before we accept it as not unlikely. This means that if there is at least a 5% chance that the null hypothesis is correct, we do not want to take the risk of going wrong by rejecting it. Does 5% look like it’s too small? Do you want to be more certain that the null hypothesis is true before you accept it? Why only 5% Think about the examples we saw earlier. – Before you condemn a person to jail or worse, don’t you want to be as certain of his guilt as possible before you reject the null hypothesis of not guilty? – Don’t you want to be as sure as possible that drug #2 is better than the current drug before you spend $$ on it? After all, the current drug has been known to work. – You don’t want to look like a chump, so don’t you want to be as certain as you can be before announcing to the world that you’ve found a new species of herring? – Let us assume that tons of $$ and time has already been spent on researching rshI gene. So now, if you want people to shift focus from rshI to lasI, then you want to be very sure that lasI is indeed more involved in biofilm formation than rshI. Example…contd. By this standard, the probability of our null hypothesis (getting a sample sex-ratio of 14:3 from a population with true ratio 1:1) would be considered very low. So we can reasonably conclude that the sample is unlikely to have come from a distribution whose mean sex-ratio is 1:1. Example…contd. So we have decided to reject the null hypothesis that the sex- ratio of the saolo population is 1:1 Now, if the sample did not come from a population with sex- ratio 1:1, what kind of a population did it come from? That is, what is our alternative hypothesis? The way we approach this question depends upon what we suspect. For example, in the present problem, we suspect that the sex- ratio is biased in favor of males. In such cases, the alternative hypothesis can be just that: the sex-ratio is biased in favor of males. Example…contd. If our alternative hypothesis is that males outnumber females in this species, then because we have rejected the null hypothesis of 1:1 ratio, we have to accept the alternative hypothesis that males outnumber females. Note that we did not test the alternative hypothesis, although this was the hypothesis of interest. Instead, we tested the null hypothesis, and decided, based on the test, whether to accept it or accept the alternative hypothesis. Example…contd. The area of interest in in only one tail of the distribution P(Y≤3) Sex-ratio in Sex-ratio in favor of males favor of females Sex ratio of 1:1 Example…contd. If, on the other hand, we have reason to believe that the sex-ratio could be biased in either direction, then because we have rejected the hypothesis of 1:1 ratio, the alternative is that we have to accept is that the two sexes are different in frequency: the males could be more than the females or vice-versa. Example…contd. The alternative scenario being considered here is that a sample of 3 females and 14 males is as likely as a sample of 3 males and 14 females. That is, if Y is the number of females, P(Y ≤ 3) or P(Y ≥ 14) should be the same. Therefore, the probability of either one happening is given by the sum of the two probabilities. Example…contd. [P(females ≤ 3)] n =17, p = q = 0.5 P(Y) = nCYpYq(n-Y) (# of females) (# of males) Relative exp. Y (n – Y) freqs (f) 0 17 0.000,007,63 P(Y ≤ 3) = 1 16 0.000,129,71 0.006,363,42 2 15 0.001,037,68 3 14 0.005,188,40 4 13 0.018,157,91 Example…contd. [P(females ≥ 14)] n =17, p = q = 0.5 P(Y) = nCYpYq(n-Y) (# of females) (# of males) Relative exp. Y (n – Y) freqs (f) 13 4 0.018,157,91 14 3 0.005,188,40 P(Y ≥ 14) = 15 2 0.001,037,68 0.006,363,42 16 1 0.000,129,71 17 0 0.000,007,63 Example…contd. Therefore, P(#females ≤ 3) = 0.006,363,42, P(#females ≥ 14) = 0.006,363,42 Example…contd. Probability now in both tails of the distribution P(#females ≤ 3) P(#females ≥ 14) Sex-ratio in Sex-ratio in favor of males favor of females Sex ratio of 1:1 Example…contd. Therefore, P(#females or #males ≤ 3) = 0.006,363,42 + 0.006,363,42 = 0.012,726,84 Therefore, if our alternative hypothesis is simply a biased sex- ratio, we need to look at this probability, and decide if it is large enough to accept the hypothesis of 1:1 ratio, or small enough to reject it and accept the hypothesis of a biased sex-ratio The risk of accepting the wrong hypothesis Of course, since we do not know the truth, we accept or reject the null hypothesis at some risk. There are two kinds of risk. The Hypotheses…contd. (The risks) Actually Actually Innocent Guilty Correct Wrong Acquitted Decision Decision Wrong Correct Convicted Decision Decision The Hypotheses…contd. (The risks) H0 True H0 False Correct Wrong H0 Accepted Decision Decision Wrong Correct H0 Rejected Decision Decision The Hypotheses…contd. (The risks) The probability of rejecting a true null hypothesis is termed the Type I Error (also “Level of Significance”). It is symbolized by and is typically sought to be minimized because it is considered the more serious error. The Hypotheses…contd. (The risks) The probability of accepting a false null hypothesis is termed the Type II Error, and is symbolized by , and is typically considered the less serious error. Let us now formally answer the question of the sex-ratio in the saola. The Hypotheses…contd. The saola (Pseudoryx nghetinhensis), is a species of ruminant from Vietnam that was unknown to science until 1993. Again, as before, 17 specimens were found, and 14 were males and 3 were females. Two alternative hypothesis questions: Are there more males in the population than females? The question could also have been: Is the sex-ratio different from 1:1? Let us test these two one at a time. The Hypotheses…contd. H0: There are no more males than females in the saola (Pseudoryx nghetinhensis) population (i.e., the sex ratio is 1:1). H1: There are more males than females in the saola. (Therefore, this is a one-tailed hypothesis.) Level of significance: = 0.05 and 0.01 (This is the level of risk one is willing to take in rejecting the null hypothesis when it is actually true; the so-called Type I error.) The Hypotheses…contd. Test statistic: T = P(Y) follows the binomial distribution. Decision criteria: If P(Y ≤ 3) ≥ then accept H0 If P(Y ≤ 3) < then reject H0 Computation of the test statistic: P(Y ≤ 3) = 0.006,363,42 The Hypotheses…contd. Decision: Since P(Y ≤ 3) < (0.05 or 0.01), we reject H0 – Therefore, we accept H1. Conclusion: From the sample evidence we conclude that there are more males than females in the saola (P < 0.01) The Hypotheses…contd. What if we had made the other alternative hypothesis? H0: The sex-ratio in the saola (Pseudoryx nghetinhensis) is 1:1. H1: The sex-ratio in the saola (Pseudoryx nghetinhensis) is not 1:1. (Therefore, this is a two-tailed hypothesis.) Level of significance: = 0.05 and 0.01 The Hypotheses…contd. Test statistic: T = P(Y) follows the binomial distribution. Computation of the test statistic: P(#males ≤ 3) + P(#females ≤ 3) = 2(0.006,363,42) = 0.012,726,84 The Hypotheses…contd. Decision: Since T is itself a probability here, we can compare it directly to . – In this case, T < 0.05, but not ≤ 0.01. – Therefore, we reject H0 at a risk ≤ 5% (confidence ≥ 95%) – But since T > 0.01, we cannot be > 99% confident that we are correct in rejecting the null hypothesis. – Therefore, depending on the level of confidence we are looking for, we either reject H0 or accept it. The Hypotheses…contd. Conclusion: From the sample evidence we conclude that we can be > 95% confident that the sex-ratio of the saola is significantly different from 1:1. We cannot, however, be more than 99% confident that the sex-ratio is different from 1:1. Example 2 You are a commercial fish farmer who has grown rainbow trout (Oncorhynchus mykiss) for many years. You have spent a good deal of money and time in training your staff to produce good yields year after year. Example 2 By the end of eight months, your fish average 15 inches, with a standard deviation of 4 inches. Along comes a smart-aleck who claims to know more about growing rainbow trout than you do.. Example 2 He promises that at the end of eight months, your fish would average 18 inches if you invested in his method. What should you do? Example 2 On the one hand you do not want to change anything because you are doing fine, and any change means re-training – more expense, more time, etc. On the other hand, an average of 18 inches at the end of eight months is really good, and you are tempted. Example 2 So you conduct a test – a test of hypothesis. The test involves using his procedure on a sample of your fish for eight months and coming to a decision. Example 2 You use his procedure on a sample of your fish (say, 50 fish). At the end of eight months you find that the mean length of the 50 fish grown using the new procedure is 16.9 cm. Example 2 Clearly, the mean length of the fish is not 18 inches. But then, not all of your fish are 15 inches either. Some are small and some are large. They are 15 inches on average. Example 2 Maybe it was just this sample that turned out to be less than 18 inches. Maybe another sample would be 18 inches or more. Example 2 On the other hand, what if this was actually a good sample, and another one is worse?! So should you show him the door or should you adopt his procedure? How do you decide? Example 2 Do a test of hypothesis! H0: The sample is drawn from a population of rainbow trout whose mean length is 15”. (This means that it is essentially no different from your fish) H1: The sample is drawn from a population of rainbow trout whose mean length is 18”. This is a one-tailed hypothesis. Level of significance: = 0.01. You want to make really sure (≥99%) that the new procedure works. Example - 2 Your fish His claim 12” 14” 15” 16” 17” 18” 19” Fish length 16.9 cm Example - 2 Test statistic: Mean length follows a normal distribution, since n >30. Therefore, the test statistic is T Y ~ N(0,1) / n T = (16.9 – 15)/0.566 = 3.3569 P(Z ≥ 3.36) = 1 – (0.5 + 0.4996) = 0.0004 Example - 2 Decision: Since P(T) < we reject H0 and accept H1. Conclusion: Based on the sample, we accept the salesman’s claim that his procedure increases the growth rate of rainbow trout to 18 inches (p < 0.001), because we could not accept the null hypothesis. Uh-oh Do you see a problem with the test that we just performed? Our H1 was that his procedure resulted in fish with the average size of 18 inches at the end of the second year. Uh-oh And we were forced to accept it because we rejected H0 If he had claimed that the fish would grow to 19 inches, we would have had to accept that as well! Or 20 or whatever his claim was, simply because we rejected the null hypothesis. Uh-oh Clearly there is something wrong. We have to concede one thing, however: The fish did grow to be significantly larger than your fish [“significantly” means “more than just by chance”] Uh-oh How do we come to this conclusion of significant difference? We found that the probability that the sample came from the same population as the fish with mean length of 15” was very low (< 0.001). We therefore had to reject H0. However, that meant that we had to automatically accept H1, whatever it was! Uh-oh But, but, but, 18 inches is probably too much! It doesn’t look like his claim of 18 inches is correct. (And the guy looked kinda sleazy too!) Uh-oh But we must admit that from the sample evidence, it certainly looks like his procedure significantly increased the growth rate of the fish, but probably not as much as he claimed. Can we test this? Uh-oh Of course we can! Let’s formulate the test in the following manner: Uh-oh H0: The true length of the treated fish is 18 inches. – What have we just done? To give him the benefit of doubt, we have decided to use his claim as the null hypothesis. If you do this, then even if you are only 5% sure that H0 is true, you will have to accept his claim. So how we formulate the hypotheses is very important H1: The true length of the treated fish is less than 18 inches. Level of significance: = 0.05 and 0.01 Uh-oh Test statistic: Mean length follows a normal distribution, since n >30. Therefore, the test statistic is T = Z = (16.9 – 18)/0.566 = -1.94 P(Z ≤ -1.94) = 1 – (0.5 + 0.4738) = 0.0262 Example - 2 Decision: Since P(T) < 0.05 (but not < 0.01) we reject H0 at the 5% level of significance, but not at the 1% level of significance. Conclusion: The evidence, based on the sample, does not support the salesman’s claim of the fish growing to 18 inches in two years if his procedure is followed (p < 0.05). However, the evidence does not permit us to be ≥ 99% certain of our conclusion. Statistical Power Distribution under H0 15” Distribution under H0 Alpha 15” Critical value Distribution under H0 Distribution Under H1 Critical value Alpha 15” 18” Distribution under H0 Distribution under H1 Alpha (Type I Error) 15” 18” Beta Critical value (Type II Error) Distribution under Distribution H1 (under red curve) Under H0 Power = (1 – Beta) Green shade 15” 18” Beta Alpha (Type II Error) Critical value (Type I Error) Statistical Power Type I error is defined as the error in rejecting a true null hypothesis. Type II error is defined as the error in accepting a false null hypothesis. Statistical Power Statistical Power is defined as the probability of rejecting the null hypothesis when it is actually false. Statistical Power Clearly, we want to minimize both Type I and Type II errors. In other words we want to accept the null hypothesis when it is true and reject it when it is false (the latter refers to increasing the power of the test). Statistical Power In a test of hypothesis, we specify the extent of Type I error as the level of significance, Alpha. However, we do not specify the Type II error (Beta). Statistical Power In other words, we do not specify the power of the test. However, it is possible to compute the power of a test for a given alternative hypothesis. Statistical Power For example, it was hypothesized (H0) above that the sample of fish with the new treatment actually belonged to the population with mean 15”. The question is: How powerful is the test in rejecting this hypothesis if the sample really belonged to a population with mean = 18”. Distribution under Distribution H1 (under red curve) Under H0 Power = (1 – Beta) Green shade 15” 18” Beta Alpha (Type II Error) Critical value (Type I Error) Statistical Power SEM = 0.566 (given) If Alpha = 0.05, Z = 1.645 for a one-tailed test. That is, Y-bar for (Z = 1.645) = (0.566)(1.645) + 15 = 15.93 Statistical Power That is, the critical value in terms of Y-bar = 15.93 Therefore, the area in green is computed the following way: Distribution Distribution under H1 under H0 Power = (1 – Beta) 15” 18” Beta Alpha Critical value (Type II Error) (Type I Error (15.93) =0.05) Statistical Power Z = [15.93 - 18]/0.566 = -3.66 P(Z ≥ -3.66) = 0.9999 Therefore, power = 99.99% This means that if (H1: pop mean = 18”) is actually true, then the probability of rejecting the null hypothesis (of μ=15) is 99.99%. Distribution Distribution under H1 under H0 Power = (1 – Beta =0.9999) 15” 18” Beta Alpha Critical value (Type II Error (Type I Error (15.93) =0.0001) =0.05) Power Calculators Many available online: Example How to increase Statistical Power? Increasing alpha – clearly not a good solution! Using 1-tailed rather than 2-tailed tests – possible, but not always. Increasing the sample size – another reason to use large sample sizes! Sample size calculator By the same token, we can calculate the sample size required for a given level of statistical power. Many available online Example.

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 2 |

posted: | 3/28/2012 |

language: | |

pages: | 92 |

OTHER DOCS BY fjzhangxiaoquan

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.