Hypothesis_Testing

Document Sample
Hypothesis_Testing Powered By Docstoc
					Introduction to Hypothesis
          Testing
Everyday phenomenon

    We ask ourselves questions everyday. Here
    are a few examples:

   Is it going to rain today?
   Will the coin turn up Heads?
   Will this movie be good?
   Uh-oh! Is this exam going to hard?
   Can I jump off the top of Miriam Hall and survive?
    (actually you may not want to test this one – at least
    not everyday).
Everyday phenomenon

   Clearly, we do make decisions based on our answers
    to these questions.

   How do we make our decisions?

   Usually based on apriori probabilities or aposteriori
    probabilities
Apriori and Aposteriori probabilities

   An apriori probability is one that can be
    mathematically computed based on the possible
    outcomes.

   An aposteriori probability is one is estimated based
    on experience or an understanding of factors that
    affect the phenomena.
Apriori and Aposteriori probabilities

   The probability of the outcome of a coin-tossing
    experiment can be determined prior to tossing the
    coin. Hence, apriori.

   The probability of precipitation on a given day
    cannot be determined apriori.
     – The probability of rain on a particular day can
       only be determined after studying the various
       associated phenomena with rain. Hence,
       aposteriori.
Questions lead to hypotheses

   The earlier questions can also be re-phrased
    as hypotheses.

   We then decide to either accept the
    hypotheses or reject them and take action
    accordingly.
Questions lead to hypotheses

   For example,

   Is it going to rain today? can be rephrased as
    “It is not going to rain today” or “It is going to
    rain today”

   etc.
We test such hypotheses everyday

   Typically, we form hypotheses based on some observation,
    result, expectation, or suspicion.
   Examples of such observations or expectations:
    –   The suspect’s hat was found near the victim’s body.
    –   The biochemistry of a new drug suggests that it must be more
        effective in reducing cancerous cells than the current drug.
    –   This sample of herring looks somewhat bigger than the herring we
        are used to (the Atlantic herring, Clupea harengus) – they
        probably belong to another species. (I have to include at least one
        fish example!)
    –   There are more mosquito larvae in this stream when compared to
        the one upstream.
            The initial hypothesis

   Our first hypothesis is conservative, where we deny
    our expectation or observation
   For the examples given, our hypotheses would be:
    –   The suspect is not guilty.
    –   This new drug is no more effective in reducing cancerous
        cells than the current drug.
    –   Although this sample of herring looks somewhat bigger than
        Clupea harengus, they do not belong to another species.
    –   This mosquito larvae density in this stream is no different
        from that of the one upstream.
The hypothesis of no difference

   Note the language in forming the hypotheses.
    We have used terms such as not, no more, do
    not, no different from.

   That is, we are stating that the observation or
    result in question could easily have come
    from the “regular” or “original” population.
    In other words, it is no different from the
    values in the original or regular population.
The reference population is not the
       expected population

   The regular or original population in our
    examples would be
    –   The population of innocent people
    –   The population of results when cancerous cells are
        treated with the current drug
    –   The population of Clupea harengus
    –   The mosquito larval population in the upstream
        stretch of the river.
             The null hypothesis

   Because we are conservative in this
    hypothesis and state that the observed result
    (e.g., sample mean) is no different from the
    regular population mean,
    –   We term this initial hypothesis the null
        hypothesis, or hypothesis of no difference.
    –   The null hypothesis is symbolized by H0.
                       The logic

   OK. We have formed the null hypothesis. What
    next?

   Well, we then test it. What does that entail?

   This is done by computing the probability of the null
    hypothesis being true, given the evidence. (How we
    do this depends on the variable we are dealing with –
    to be discussed at length during the semester.)

   If this probability is very low then we reject the null
    hypothesis; if the probability is high, we accept it.
      The alternative hypothesis

   Obviously, if we reject the null hypothesis, we need
    to accept some other hypothesis – an alternative
    hypothesis.

   So we really begin by making two hypotheses – the
    null hypothesis and an alternative hypothesis
    (symbolized H1).

   We test the null hypothesis. If we accept it, that’s the
    end of that story. If we reject it, we then are forced to
    accept the alternative hypothesis.
      The alternative hypothesis

   Note that only the null hypothesis (H0) is tested, and
    if H0 is rejected, the alternative hypothesis (H1) is
    automatically accepted (without further testing).

   It is therefore, very important that we make the
    hypotheses carefully.

   The two hypotheses must also complement each
    other (i.e., there should be no third alternative)
    Example

   The saola (Pseudoryx nghetinhensis), is a species of
    ruminant from Vietnam that was unknown to science
    until 1993.

   (I’m making this part up): 17 specimens were found, and
    14 were males and 3 were females.

   Because of this disparity the question might arise: is the
    sex-ratio in this species really 1:1?
    Example…contd.


   How do we test this?

   The possibilities are the following:
    –   The sex-ratio of this species is really 1:1 but it just
        happened by chance that this sample had a skewed
        ratio.
    –   The sex-ratio of this species is biased in favor of
        males.
    –   The two sexes are unequal in frequency.
    Example…contd.


   The question is:

    –   Which of these scenarios is true?


    –   And how do we find out?
    Example…contd.


   One way of resolving the issue of course, is to go out into
    the jungles of Vietnam and find more saolas.

   An easier and cheaper way (if less fun) is to use
    probability theory and what we have learnt about known
    distributions (in this case, the binomial distribution).

   Of course, we have to assume that the group of saolas
    was randomly sampled.
    Example…contd.

   It seems reasonable to assume the binomial
    distribution (remember the conditions?).

   We now approach the problem thus:

   Our null hypothesis is that the population
    sex-ratio is truly 1:1, and that any observed
    difference is just by chance.
    Example…contd.

   How do we test if the null hypothesis is true?

   We determine the probability of obtaining a sex-ratio of 14:3 or
    worse (i.e., more biased) in a sample of 17 randomly chosen animals,
    when the population sex-ratio of the saolas is truly 1:1.

   In other words, what is the probability of observing a female
    proportion of 3/17 = 0.1765 or lesser in a random sample of 17
    animals if the true proportion is 0.5?

   We can answer this question by computing binomial probabilities
    with n = 17 and p = q = 0.5.
    Example…contd.

   n =17, p = q = 0.5

   P(Y) = nCYpYq(n-Y)
     (# of females)      (# of males)   Relative exp.
            Y              (n – Y)          freqs
                                              (f)

           0                 17         0.000,007,63
                                                         P(Y ≤ 3) =
           1                 16         0.000,129,71
                                                        0.006,363,42
           2                 15         0.001,037,68
           3                 14         0.005,188,40
           4                 13         0.018,157,91
Example…contd.

   Therefore, P(Y ≤ 3) = 0.006,363,42.

   Typically, the probability of an outcome must be at
    least 5% (P ≥ 0.05) before we accept it as not unlikely.
    This means that if there is at least a 5% chance that the
    null hypothesis is correct, we do not want to take the
    risk of going wrong by rejecting it.

   Does 5% look like it’s too small? Do you want to be
    more certain that the null hypothesis is true before you
    accept it?
                          Why only 5%
   Think about the examples we saw earlier.

     –   Before you condemn a person to jail or worse, don’t you want to be as certain
         of his guilt as possible before you reject the null hypothesis of not guilty?

     –   Don’t you want to be as sure as possible that drug #2 is better than the
         current drug before you spend $$ on it? After all, the current drug has been
         known to work.

     –   You don’t want to look like a chump, so don’t you want to be as certain as you
         can be before announcing to the world that you’ve found a new species of
         herring?

     –   Let us assume that tons of $$ and time has already been spent on researching
         rshI gene. So now, if you want people to shift focus from rshI to lasI, then
         you want to be very sure that lasI is indeed more involved in biofilm
         formation than rshI.
Example…contd.

   By this standard, the probability of our null
    hypothesis (getting a sample sex-ratio of 14:3
    from a population with true ratio 1:1) would be
    considered very low.

   So we can reasonably conclude that the sample
    is unlikely to have come from a distribution
    whose mean sex-ratio is 1:1.
Example…contd.
   So we have decided to reject the null hypothesis that the sex-
    ratio of the saolo population is 1:1

   Now, if the sample did not come from a population with sex-
    ratio 1:1, what kind of a population did it come from? That is,
    what is our alternative hypothesis?

   The way we approach this question depends upon what we
    suspect.

   For example, in the present problem, we suspect that the sex-
    ratio is biased in favor of males. In such cases, the alternative
    hypothesis can be just that: the sex-ratio is biased in favor of
    males.
Example…contd.
   If our alternative hypothesis is that males outnumber females
    in this species,

   then because we have rejected the null hypothesis of 1:1 ratio,

   we have to accept the alternative hypothesis that males
    outnumber females.

   Note that we did not test the alternative hypothesis, although
    this was the hypothesis of interest.

   Instead, we tested the null hypothesis, and decided, based on
    the test, whether to accept it or accept the alternative
    hypothesis.
Example…contd.



The area of interest in
in only one tail of the
distribution


     P(Y≤3)

          Sex-ratio in                        Sex-ratio in
          favor of males                      favor of females
                           Sex ratio of 1:1
Example…contd.

   If, on the other hand, we have reason to believe that
    the sex-ratio could be biased in either direction,

   then because we have rejected the hypothesis of 1:1
    ratio,

   the alternative is that we have to accept is that the
    two sexes are different in frequency: the males could
    be more than the females or vice-versa.
Example…contd.

   The alternative scenario being considered here is
    that a sample of 3 females and 14 males is as likely as
    a sample of 3 males and 14 females.

   That is, if Y is the number of females,
     P(Y ≤ 3) or P(Y ≥ 14) should be the same.

   Therefore, the probability of either one happening is
    given by the sum of the two probabilities.
    Example…contd. [P(females ≤ 3)]

   n =17, p = q = 0.5

   P(Y) = nCYpYq(n-Y)
     (# of females)      (# of males)   Relative exp.
            Y              (n – Y)          freqs
                                              (f)

           0                 17         0.000,007,63
                                                         P(Y ≤ 3) =
           1                 16         0.000,129,71
                                                        0.006,363,42
           2                 15         0.001,037,68
           3                 14         0.005,188,40
           4                 13         0.018,157,91
    Example…contd. [P(females ≥ 14)]

   n =17, p = q = 0.5

   P(Y) = nCYpYq(n-Y)
     (# of females)      (# of males)   Relative exp.
            Y              (n – Y)          freqs
                                              (f)

           13                 4         0.018,157,91
           14                 3         0.005,188,40
                                                         P(Y ≥ 14) =
           15                 2         0.001,037,68
                                                        0.006,363,42
           16                 1         0.000,129,71
           17                 0         0.000,007,63
Example…contd.

   Therefore,
    P(#females ≤ 3) = 0.006,363,42,
    P(#females ≥ 14) = 0.006,363,42
Example…contd.

                        Probability now in both
                        tails of the distribution




 P(#females ≤ 3)                                    P(#females ≥ 14)

       Sex-ratio in                                 Sex-ratio in
       favor of males                               favor of females
                            Sex ratio of 1:1
Example…contd.

   Therefore,
     P(#females or #males ≤ 3)
     = 0.006,363,42 + 0.006,363,42
     = 0.012,726,84

   Therefore, if our alternative hypothesis is simply a biased sex-
    ratio, we need to look at this probability,

   and decide if it is large enough to accept the hypothesis of 1:1
    ratio, or small enough to reject it and accept the hypothesis of a
    biased sex-ratio
     The risk of accepting the wrong
               hypothesis

   Of course, since we do not know the truth, we
    accept or reject the null hypothesis at some
    risk.

   There are two kinds of risk.
The Hypotheses…contd. (The risks)


               Actually   Actually
               Innocent    Guilty


               Correct    Wrong
   Acquitted
               Decision   Decision


               Wrong      Correct
   Convicted
               Decision   Decision
The Hypotheses…contd. (The risks)


                 H0 True    H0 False


                 Correct    Wrong
   H0 Accepted
                 Decision   Decision


                 Wrong      Correct
   H0 Rejected
                 Decision   Decision
The Hypotheses…contd. (The risks)

   The probability of rejecting a true null
    hypothesis is termed the Type I Error (also
    “Level of Significance”).

   It is symbolized by  and is typically sought
    to be minimized because it is considered the
    more serious error.
The Hypotheses…contd. (The risks)

   The probability of accepting a false null
    hypothesis is termed the Type II Error, and is
    symbolized by ,

   and is typically considered the less serious
    error.

   Let us now formally answer the question of
    the sex-ratio in the saola.
The Hypotheses…contd.
   The saola (Pseudoryx nghetinhensis), is a species of ruminant
    from Vietnam that was unknown to science until 1993.

   Again, as before, 17 specimens were found, and 14 were males
    and 3 were females.

Two alternative hypothesis questions:
 Are there more males in the population than females?
 The question could also have been: Is the sex-ratio different
  from 1:1?

   Let us test these two one at a time.
The Hypotheses…contd.

   H0: There are no more males than females in the
    saola (Pseudoryx nghetinhensis) population (i.e., the
    sex ratio is 1:1).

   H1: There are more males than females in the saola.
    (Therefore, this is a one-tailed hypothesis.)

   Level of significance:  = 0.05 and 0.01
    (This is the level of risk one is willing to take in
    rejecting the null hypothesis when it is actually true;
    the so-called Type I error.)
The Hypotheses…contd.

   Test statistic: T = P(Y) follows the binomial
    distribution.

   Decision criteria:
    If P(Y ≤ 3) ≥  then accept H0
    If P(Y ≤ 3) <  then reject H0


   Computation of the test statistic:
    P(Y ≤ 3) = 0.006,363,42
The Hypotheses…contd.

   Decision: Since P(Y ≤ 3) <  (0.05 or 0.01),
    we reject H0
    –   Therefore, we accept H1.


   Conclusion: From the sample evidence we
    conclude that there are more males than
    females in the saola (P < 0.01)
The Hypotheses…contd.

   What if we had made the other alternative
    hypothesis?
   H0: The sex-ratio in the saola (Pseudoryx
    nghetinhensis) is 1:1.
   H1: The sex-ratio in the saola (Pseudoryx
    nghetinhensis) is not 1:1. (Therefore, this is a
    two-tailed hypothesis.)
   Level of significance:  = 0.05 and 0.01
The Hypotheses…contd.

   Test statistic: T = P(Y) follows the binomial
    distribution.

   Computation of the test statistic:
    P(#males ≤ 3) + P(#females ≤ 3) = 2(0.006,363,42)
      = 0.012,726,84
The Hypotheses…contd.

   Decision: Since T is itself a probability here, we can
    compare it directly to .
    –   In this case, T < 0.05, but not ≤ 0.01.
    –   Therefore, we reject H0 at a risk ≤ 5% (confidence ≥ 95%)
    –   But since T > 0.01, we cannot be > 99% confident that we
        are correct in rejecting the null hypothesis.
    –   Therefore, depending on the level of confidence we are
        looking for, we either reject H0 or accept it.
The Hypotheses…contd.

   Conclusion: From the sample evidence we
    conclude that we can be > 95% confident that
    the sex-ratio of the saola is significantly
    different from 1:1.

   We cannot, however, be more than 99%
    confident that the sex-ratio is different from
    1:1.
    Example 2


   You are a commercial fish farmer who has grown
    rainbow trout (Oncorhynchus mykiss) for many
    years.

   You have spent a good deal of money and time in
    training your staff to produce good yields year
    after year.
    Example 2

   By the end of eight months, your fish average 15
    inches, with a standard deviation of 4 inches.

   Along comes a smart-aleck who claims to know
    more about growing rainbow trout than you do..
    Example 2


   He promises that at the end of eight months,
    your fish would average 18 inches if you invested
    in his method.

   What should you do?
    Example 2

   On the one hand you do not want to change
    anything because you are doing fine, and any
    change means re-training – more expense, more
    time, etc.

   On the other hand, an average of 18 inches at the
    end of eight months is really good, and you are
    tempted.
    Example 2


   So you conduct a test – a test of hypothesis.

   The test involves using his procedure on a
    sample of your fish for eight months and coming
    to a decision.
    Example 2

   You use his procedure on a sample of your fish
    (say, 50 fish).

   At the end of eight months you find that the
    mean length of the 50 fish grown using the new
    procedure is 16.9 cm.
    Example 2


   Clearly, the mean length of the fish is not 18
    inches.

   But then, not all of your fish are 15 inches either.
    Some are small and some are large. They are 15
    inches on average.
    Example 2


   Maybe it was just this sample that turned out to
    be less than 18 inches.

   Maybe another sample would be 18 inches or
    more.
    Example 2

   On the other hand, what if this was actually a
    good sample, and another one is worse?!

   So should you show him the door or should you
    adopt his procedure?

   How do you decide?
    Example 2

   Do a test of hypothesis!

   H0: The sample is drawn from a population of rainbow trout whose
    mean length is 15”. (This means that it is essentially no different
    from your fish)

   H1: The sample is drawn from a population of rainbow trout whose
    mean length is 18”. This is a one-tailed hypothesis.

   Level of significance:  = 0.01. You want to make really sure (≥99%)
    that the new procedure works.
Example - 2



            Your fish                   His claim




12”   14”      15”       16”      17”     18”       19”
                        Fish length
                                          16.9 cm
Example - 2

   Test statistic: Mean length follows a normal distribution, since
    n >30.
   Therefore, the test statistic is


          T
             Y    ~ N(0,1)
                 / n
   T = (16.9 – 15)/0.566 = 3.3569

   P(Z ≥ 3.36) = 1 – (0.5 + 0.4996)
                = 0.0004
Example - 2

   Decision: Since P(T) <  we reject H0 and
    accept H1.

   Conclusion: Based on the sample, we accept
    the salesman’s claim that his procedure
    increases the growth rate of rainbow trout to
    18 inches (p < 0.001), because we could not
    accept the null hypothesis.
Uh-oh

   Do you see a problem with the test that we
    just performed?

   Our H1 was that his procedure resulted in fish
    with the average size of 18 inches at the end
    of the second year.
Uh-oh

   And we were forced to accept it because we
    rejected H0

   If he had claimed that the fish would grow to
    19 inches, we would have had to accept that
    as well! Or 20 or whatever his claim was,
    simply because we rejected the null
    hypothesis.
Uh-oh

   Clearly there is something wrong.

   We have to concede one thing, however: The
    fish did grow to be significantly larger than
    your fish [“significantly” means “more than
    just by chance”]
Uh-oh

   How do we come to this conclusion of significant
    difference?

   We found that the probability that the sample came
    from the same population as the fish with mean
    length of 15” was very low (< 0.001). We therefore
    had to reject H0.

   However, that meant that we had to automatically
    accept H1, whatever it was!
Uh-oh

   But, but, but, 18 inches is probably too much!

   It doesn’t look like his claim of 18 inches is
    correct.

   (And the guy looked kinda sleazy too!)
Uh-oh

   But we must admit that from the sample
    evidence, it certainly looks like his procedure
    significantly increased the growth rate of the
    fish, but probably not as much as he claimed.

   Can we test this?
Uh-oh

   Of course we can!

   Let’s formulate the test in the following
    manner:
Uh-oh

   H0: The true length of the treated fish is 18 inches.

     –   What have we just done? To give him the benefit of doubt, we
         have decided to use his claim as the null hypothesis. If you do this,
         then even if you are only 5% sure that H0 is true, you will have to
         accept his claim. So how we formulate the hypotheses is very
         important

   H1: The true length of the treated fish is less than 18 inches.

   Level of significance:  = 0.05 and 0.01
Uh-oh

   Test statistic: Mean length follows a normal
    distribution, since n >30.

   Therefore, the test statistic is

   T = Z = (16.9 – 18)/0.566 = -1.94

   P(Z ≤ -1.94) = 1 – (0.5 + 0.4738)
                = 0.0262
Example - 2

   Decision: Since P(T) < 0.05 (but not < 0.01) we
    reject H0 at the 5% level of significance, but not at
    the 1% level of significance.

   Conclusion: The evidence, based on the sample,
    does not support the salesman’s claim of the fish
    growing to 18 inches in two years if his procedure is
    followed (p < 0.05). However, the evidence does not
    permit us to be ≥ 99% certain of our conclusion.
Statistical Power
Distribution
under H0




               15”
                       Distribution
                       under H0


                           Alpha



15”
      Critical value
Distribution
under H0
                                              Distribution
                                              Under H1

               Critical value

                                      Alpha

                     15”        18”
Distribution
under H0
                                                               Distribution
                                                               under H1

                                                 Alpha
                                              (Type I Error)


                    15”                       18”
                   Beta
                               Critical value
               (Type II Error)
                                                                 Distribution under
Distribution                                                     H1 (under red curve)
Under H0
                                                                        Power
                                                                     = (1 – Beta)
                                                                     Green shade


                     15”                         18”
                   Beta                            Alpha
               (Type II Error)   Critical value (Type I Error)
Statistical Power

   Type I error is defined as the error in
    rejecting a true null hypothesis.

   Type II error is defined as the error in
    accepting a false null hypothesis.
Statistical Power

   Statistical Power is defined as the probability
    of rejecting the null hypothesis when it is
    actually false.
Statistical Power

   Clearly, we want to minimize both Type I and
    Type II errors.

   In other words we want to accept the null
    hypothesis when it is true and reject it when it
    is false (the latter refers to increasing the
    power of the test).
Statistical Power

   In a test of hypothesis, we specify the extent
    of Type I error as the level of significance,
    Alpha.

   However, we do not specify the Type II error
    (Beta).
Statistical Power

   In other words, we do not specify the power
    of the test.

   However, it is possible to compute the power
    of a test for a given alternative hypothesis.
    Statistical Power

   For example, it was hypothesized (H0) above that the
    sample of fish with the new treatment actually
    belonged to the population with mean 15”.

   The question is: How powerful is the test in rejecting
    this hypothesis if the sample really belonged to a
    population with mean = 18”.
                                                                 Distribution under
Distribution                                                     H1 (under red curve)
Under H0
                                                                        Power
                                                                     = (1 – Beta)
                                                                     Green shade


                     15”                         18”
                   Beta                            Alpha
               (Type II Error)   Critical value (Type I Error)
Statistical Power

   SEM = 0.566 (given)
   If Alpha = 0.05, Z = 1.645 for a one-tailed
    test.

   That is, Y-bar for (Z = 1.645) =
    (0.566)(1.645) + 15 = 15.93
Statistical Power

   That is, the critical value in terms of Y-bar =
    15.93
   Therefore, the area in green is computed the
    following way:
                                                                Distribution
Distribution                                                    under H1
under H0
                                                                    Power
                                                                 = (1 – Beta)




                        15”                     18”
                   Beta                            Alpha
                                 Critical value
               (Type II Error)                  (Type I Error
                                    (15.93)
                                                   =0.05)
Statistical Power

   Z = [15.93 - 18]/0.566 = -3.66
   P(Z ≥ -3.66) = 0.9999
   Therefore, power = 99.99%

   This means that if (H1: pop mean = 18”) is
    actually true, then the probability of rejecting
    the null hypothesis (of μ=15) is 99.99%.
                                                               Distribution
Distribution                                                   under H1
under H0
                                                                  Power
                                                                = (1 – Beta
                                                                 =0.9999)


                       15”                     18”
                   Beta                           Alpha
                                Critical value
               (Type II Error                  (Type I Error
                                   (15.93)
                 =0.0001)                         =0.05)
Power Calculators

   Many available online:

   Example
How to increase Statistical Power?

   Increasing alpha – clearly not a good
    solution!
   Using 1-tailed rather than 2-tailed tests –
    possible, but not always.
   Increasing the sample size – another reason
    to use large sample sizes!
Sample size calculator

   By the same token, we can calculate the
    sample size required for a given level of
    statistical power.

   Many available online

   Example.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:3/28/2012
language:
pages:92