Hypothesis_Testing

Document Sample

```					Introduction to Hypothesis
Testing
Everyday phenomenon

We ask ourselves questions everyday. Here
are a few examples:

   Is it going to rain today?
   Will the coin turn up Heads?
   Will this movie be good?
   Uh-oh! Is this exam going to hard?
   Can I jump off the top of Miriam Hall and survive?
(actually you may not want to test this one – at least
not everyday).
Everyday phenomenon

   Clearly, we do make decisions based on our answers
to these questions.

   How do we make our decisions?

   Usually based on apriori probabilities or aposteriori
probabilities
Apriori and Aposteriori probabilities

   An apriori probability is one that can be
mathematically computed based on the possible
outcomes.

   An aposteriori probability is one is estimated based
on experience or an understanding of factors that
affect the phenomena.
Apriori and Aposteriori probabilities

   The probability of the outcome of a coin-tossing
experiment can be determined prior to tossing the
coin. Hence, apriori.

   The probability of precipitation on a given day
cannot be determined apriori.
– The probability of rain on a particular day can
only be determined after studying the various
associated phenomena with rain. Hence,
aposteriori.

   The earlier questions can also be re-phrased
as hypotheses.

   We then decide to either accept the
hypotheses or reject them and take action
accordingly.

   For example,

   Is it going to rain today? can be rephrased as
“It is not going to rain today” or “It is going to
rain today”

   etc.
We test such hypotheses everyday

   Typically, we form hypotheses based on some observation,
result, expectation, or suspicion.
   Examples of such observations or expectations:
–   The suspect’s hat was found near the victim’s body.
–   The biochemistry of a new drug suggests that it must be more
effective in reducing cancerous cells than the current drug.
–   This sample of herring looks somewhat bigger than the herring we
are used to (the Atlantic herring, Clupea harengus) – they
probably belong to another species. (I have to include at least one
fish example!)
–   There are more mosquito larvae in this stream when compared to
the one upstream.
The initial hypothesis

   Our first hypothesis is conservative, where we deny
our expectation or observation
   For the examples given, our hypotheses would be:
–   The suspect is not guilty.
–   This new drug is no more effective in reducing cancerous
cells than the current drug.
–   Although this sample of herring looks somewhat bigger than
Clupea harengus, they do not belong to another species.
–   This mosquito larvae density in this stream is no different
from that of the one upstream.
The hypothesis of no difference

   Note the language in forming the hypotheses.
We have used terms such as not, no more, do
not, no different from.

   That is, we are stating that the observation or
result in question could easily have come
from the “regular” or “original” population.
In other words, it is no different from the
values in the original or regular population.
The reference population is not the
expected population

   The regular or original population in our
examples would be
–   The population of innocent people
–   The population of results when cancerous cells are
treated with the current drug
–   The population of Clupea harengus
–   The mosquito larval population in the upstream
stretch of the river.
The null hypothesis

   Because we are conservative in this
hypothesis and state that the observed result
(e.g., sample mean) is no different from the
regular population mean,
–   We term this initial hypothesis the null
hypothesis, or hypothesis of no difference.
–   The null hypothesis is symbolized by H0.
The logic

   OK. We have formed the null hypothesis. What
next?

   Well, we then test it. What does that entail?

   This is done by computing the probability of the null
hypothesis being true, given the evidence. (How we
do this depends on the variable we are dealing with –
to be discussed at length during the semester.)

   If this probability is very low then we reject the null
hypothesis; if the probability is high, we accept it.
The alternative hypothesis

   Obviously, if we reject the null hypothesis, we need
to accept some other hypothesis – an alternative
hypothesis.

   So we really begin by making two hypotheses – the
null hypothesis and an alternative hypothesis
(symbolized H1).

   We test the null hypothesis. If we accept it, that’s the
end of that story. If we reject it, we then are forced to
accept the alternative hypothesis.
The alternative hypothesis

   Note that only the null hypothesis (H0) is tested, and
if H0 is rejected, the alternative hypothesis (H1) is
automatically accepted (without further testing).

   It is therefore, very important that we make the
hypotheses carefully.

   The two hypotheses must also complement each
other (i.e., there should be no third alternative)
Example

   The saola (Pseudoryx nghetinhensis), is a species of
ruminant from Vietnam that was unknown to science
until 1993.

   (I’m making this part up): 17 specimens were found, and
14 were males and 3 were females.

   Because of this disparity the question might arise: is the
sex-ratio in this species really 1:1?
Example…contd.

   How do we test this?

   The possibilities are the following:
–   The sex-ratio of this species is really 1:1 but it just
happened by chance that this sample had a skewed
ratio.
–   The sex-ratio of this species is biased in favor of
males.
–   The two sexes are unequal in frequency.
Example…contd.

   The question is:

–   Which of these scenarios is true?

–   And how do we find out?
Example…contd.

   One way of resolving the issue of course, is to go out into
the jungles of Vietnam and find more saolas.

   An easier and cheaper way (if less fun) is to use
probability theory and what we have learnt about known
distributions (in this case, the binomial distribution).

   Of course, we have to assume that the group of saolas
was randomly sampled.
Example…contd.

   It seems reasonable to assume the binomial
distribution (remember the conditions?).

   We now approach the problem thus:

   Our null hypothesis is that the population
sex-ratio is truly 1:1, and that any observed
difference is just by chance.
Example…contd.

   How do we test if the null hypothesis is true?

   We determine the probability of obtaining a sex-ratio of 14:3 or
worse (i.e., more biased) in a sample of 17 randomly chosen animals,
when the population sex-ratio of the saolas is truly 1:1.

   In other words, what is the probability of observing a female
proportion of 3/17 = 0.1765 or lesser in a random sample of 17
animals if the true proportion is 0.5?

   We can answer this question by computing binomial probabilities
with n = 17 and p = q = 0.5.
Example…contd.

   n =17, p = q = 0.5

   P(Y) = nCYpYq(n-Y)
(# of females)      (# of males)   Relative exp.
Y              (n – Y)          freqs
(f)

0                 17         0.000,007,63
P(Y ≤ 3) =
1                 16         0.000,129,71
0.006,363,42
2                 15         0.001,037,68
3                 14         0.005,188,40
4                 13         0.018,157,91
Example…contd.

   Therefore, P(Y ≤ 3) = 0.006,363,42.

   Typically, the probability of an outcome must be at
least 5% (P ≥ 0.05) before we accept it as not unlikely.
This means that if there is at least a 5% chance that the
null hypothesis is correct, we do not want to take the
risk of going wrong by rejecting it.

   Does 5% look like it’s too small? Do you want to be
more certain that the null hypothesis is true before you
accept it?
Why only 5%
   Think about the examples we saw earlier.

–   Before you condemn a person to jail or worse, don’t you want to be as certain
of his guilt as possible before you reject the null hypothesis of not guilty?

–   Don’t you want to be as sure as possible that drug #2 is better than the
current drug before you spend \$\$ on it? After all, the current drug has been
known to work.

–   You don’t want to look like a chump, so don’t you want to be as certain as you
can be before announcing to the world that you’ve found a new species of
herring?

–   Let us assume that tons of \$\$ and time has already been spent on researching
rshI gene. So now, if you want people to shift focus from rshI to lasI, then
you want to be very sure that lasI is indeed more involved in biofilm
formation than rshI.
Example…contd.

   By this standard, the probability of our null
hypothesis (getting a sample sex-ratio of 14:3
from a population with true ratio 1:1) would be
considered very low.

   So we can reasonably conclude that the sample
is unlikely to have come from a distribution
whose mean sex-ratio is 1:1.
Example…contd.
   So we have decided to reject the null hypothesis that the sex-
ratio of the saolo population is 1:1

   Now, if the sample did not come from a population with sex-
ratio 1:1, what kind of a population did it come from? That is,
what is our alternative hypothesis?

   The way we approach this question depends upon what we
suspect.

   For example, in the present problem, we suspect that the sex-
ratio is biased in favor of males. In such cases, the alternative
hypothesis can be just that: the sex-ratio is biased in favor of
males.
Example…contd.
   If our alternative hypothesis is that males outnumber females
in this species,

   then because we have rejected the null hypothesis of 1:1 ratio,

   we have to accept the alternative hypothesis that males
outnumber females.

   Note that we did not test the alternative hypothesis, although
this was the hypothesis of interest.

   Instead, we tested the null hypothesis, and decided, based on
the test, whether to accept it or accept the alternative
hypothesis.
Example…contd.

The area of interest in
in only one tail of the
distribution

P(Y≤3)

Sex-ratio in                        Sex-ratio in
favor of males                      favor of females
Sex ratio of 1:1
Example…contd.

   If, on the other hand, we have reason to believe that
the sex-ratio could be biased in either direction,

   then because we have rejected the hypothesis of 1:1
ratio,

   the alternative is that we have to accept is that the
two sexes are different in frequency: the males could
be more than the females or vice-versa.
Example…contd.

   The alternative scenario being considered here is
that a sample of 3 females and 14 males is as likely as
a sample of 3 males and 14 females.

   That is, if Y is the number of females,
P(Y ≤ 3) or P(Y ≥ 14) should be the same.

   Therefore, the probability of either one happening is
given by the sum of the two probabilities.
Example…contd. [P(females ≤ 3)]

   n =17, p = q = 0.5

   P(Y) = nCYpYq(n-Y)
(# of females)      (# of males)   Relative exp.
Y              (n – Y)          freqs
(f)

0                 17         0.000,007,63
P(Y ≤ 3) =
1                 16         0.000,129,71
0.006,363,42
2                 15         0.001,037,68
3                 14         0.005,188,40
4                 13         0.018,157,91
Example…contd. [P(females ≥ 14)]

   n =17, p = q = 0.5

   P(Y) = nCYpYq(n-Y)
(# of females)      (# of males)   Relative exp.
Y              (n – Y)          freqs
(f)

13                 4         0.018,157,91
14                 3         0.005,188,40
P(Y ≥ 14) =
15                 2         0.001,037,68
0.006,363,42
16                 1         0.000,129,71
17                 0         0.000,007,63
Example…contd.

   Therefore,
P(#females ≤ 3) = 0.006,363,42,
P(#females ≥ 14) = 0.006,363,42
Example…contd.

Probability now in both
tails of the distribution

P(#females ≤ 3)                                    P(#females ≥ 14)

Sex-ratio in                                 Sex-ratio in
favor of males                               favor of females
Sex ratio of 1:1
Example…contd.

   Therefore,
P(#females or #males ≤ 3)
= 0.006,363,42 + 0.006,363,42
= 0.012,726,84

   Therefore, if our alternative hypothesis is simply a biased sex-
ratio, we need to look at this probability,

   and decide if it is large enough to accept the hypothesis of 1:1
ratio, or small enough to reject it and accept the hypothesis of a
biased sex-ratio
The risk of accepting the wrong
hypothesis

   Of course, since we do not know the truth, we
accept or reject the null hypothesis at some
risk.

   There are two kinds of risk.
The Hypotheses…contd. (The risks)

Actually   Actually
Innocent    Guilty

Correct    Wrong
Acquitted
Decision   Decision

Wrong      Correct
Convicted
Decision   Decision
The Hypotheses…contd. (The risks)

H0 True    H0 False

Correct    Wrong
H0 Accepted
Decision   Decision

Wrong      Correct
H0 Rejected
Decision   Decision
The Hypotheses…contd. (The risks)

   The probability of rejecting a true null
hypothesis is termed the Type I Error (also
“Level of Significance”).

   It is symbolized by  and is typically sought
to be minimized because it is considered the
more serious error.
The Hypotheses…contd. (The risks)

   The probability of accepting a false null
hypothesis is termed the Type II Error, and is
symbolized by ,

   and is typically considered the less serious
error.

   Let us now formally answer the question of
the sex-ratio in the saola.
The Hypotheses…contd.
   The saola (Pseudoryx nghetinhensis), is a species of ruminant
from Vietnam that was unknown to science until 1993.

   Again, as before, 17 specimens were found, and 14 were males
and 3 were females.

Two alternative hypothesis questions:
 Are there more males in the population than females?
 The question could also have been: Is the sex-ratio different
from 1:1?

   Let us test these two one at a time.
The Hypotheses…contd.

   H0: There are no more males than females in the
saola (Pseudoryx nghetinhensis) population (i.e., the
sex ratio is 1:1).

   H1: There are more males than females in the saola.
(Therefore, this is a one-tailed hypothesis.)

   Level of significance:  = 0.05 and 0.01
(This is the level of risk one is willing to take in
rejecting the null hypothesis when it is actually true;
the so-called Type I error.)
The Hypotheses…contd.

   Test statistic: T = P(Y) follows the binomial
distribution.

   Decision criteria:
If P(Y ≤ 3) ≥  then accept H0
If P(Y ≤ 3) <  then reject H0

   Computation of the test statistic:
P(Y ≤ 3) = 0.006,363,42
The Hypotheses…contd.

   Decision: Since P(Y ≤ 3) <  (0.05 or 0.01),
we reject H0
–   Therefore, we accept H1.

   Conclusion: From the sample evidence we
conclude that there are more males than
females in the saola (P < 0.01)
The Hypotheses…contd.

hypothesis?
   H0: The sex-ratio in the saola (Pseudoryx
nghetinhensis) is 1:1.
   H1: The sex-ratio in the saola (Pseudoryx
nghetinhensis) is not 1:1. (Therefore, this is a
two-tailed hypothesis.)
   Level of significance:  = 0.05 and 0.01
The Hypotheses…contd.

   Test statistic: T = P(Y) follows the binomial
distribution.

   Computation of the test statistic:
P(#males ≤ 3) + P(#females ≤ 3) = 2(0.006,363,42)
= 0.012,726,84
The Hypotheses…contd.

   Decision: Since T is itself a probability here, we can
compare it directly to .
–   In this case, T < 0.05, but not ≤ 0.01.
–   Therefore, we reject H0 at a risk ≤ 5% (confidence ≥ 95%)
–   But since T > 0.01, we cannot be > 99% confident that we
are correct in rejecting the null hypothesis.
–   Therefore, depending on the level of confidence we are
looking for, we either reject H0 or accept it.
The Hypotheses…contd.

   Conclusion: From the sample evidence we
conclude that we can be > 95% confident that
the sex-ratio of the saola is significantly
different from 1:1.

   We cannot, however, be more than 99%
confident that the sex-ratio is different from
1:1.
Example 2

   You are a commercial fish farmer who has grown
rainbow trout (Oncorhynchus mykiss) for many
years.

   You have spent a good deal of money and time in
training your staff to produce good yields year
after year.
Example 2

   By the end of eight months, your fish average 15
inches, with a standard deviation of 4 inches.

   Along comes a smart-aleck who claims to know
more about growing rainbow trout than you do..
Example 2

   He promises that at the end of eight months,
your fish would average 18 inches if you invested
in his method.

   What should you do?
Example 2

   On the one hand you do not want to change
anything because you are doing fine, and any
change means re-training – more expense, more
time, etc.

   On the other hand, an average of 18 inches at the
end of eight months is really good, and you are
tempted.
Example 2

   So you conduct a test – a test of hypothesis.

   The test involves using his procedure on a
sample of your fish for eight months and coming
to a decision.
Example 2

   You use his procedure on a sample of your fish
(say, 50 fish).

   At the end of eight months you find that the
mean length of the 50 fish grown using the new
procedure is 16.9 cm.
Example 2

   Clearly, the mean length of the fish is not 18
inches.

   But then, not all of your fish are 15 inches either.
Some are small and some are large. They are 15
inches on average.
Example 2

   Maybe it was just this sample that turned out to
be less than 18 inches.

   Maybe another sample would be 18 inches or
more.
Example 2

   On the other hand, what if this was actually a
good sample, and another one is worse?!

   So should you show him the door or should you

   How do you decide?
Example 2

   Do a test of hypothesis!

   H0: The sample is drawn from a population of rainbow trout whose
mean length is 15”. (This means that it is essentially no different

   H1: The sample is drawn from a population of rainbow trout whose
mean length is 18”. This is a one-tailed hypothesis.

   Level of significance:  = 0.01. You want to make really sure (≥99%)
that the new procedure works.
Example - 2

12”   14”      15”       16”      17”     18”       19”
Fish length
16.9 cm
Example - 2

   Test statistic: Mean length follows a normal distribution, since
n >30.
   Therefore, the test statistic is

T
Y    ~ N(0,1)
/ n
   T = (16.9 – 15)/0.566 = 3.3569

   P(Z ≥ 3.36) = 1 – (0.5 + 0.4996)
= 0.0004
Example - 2

   Decision: Since P(T) <  we reject H0 and
accept H1.

   Conclusion: Based on the sample, we accept
the salesman’s claim that his procedure
increases the growth rate of rainbow trout to
18 inches (p < 0.001), because we could not
accept the null hypothesis.
Uh-oh

   Do you see a problem with the test that we
just performed?

   Our H1 was that his procedure resulted in fish
with the average size of 18 inches at the end
of the second year.
Uh-oh

   And we were forced to accept it because we
rejected H0

   If he had claimed that the fish would grow to
19 inches, we would have had to accept that
as well! Or 20 or whatever his claim was,
simply because we rejected the null
hypothesis.
Uh-oh

   Clearly there is something wrong.

   We have to concede one thing, however: The
fish did grow to be significantly larger than
your fish [“significantly” means “more than
just by chance”]
Uh-oh

   How do we come to this conclusion of significant
difference?

   We found that the probability that the sample came
from the same population as the fish with mean
length of 15” was very low (< 0.001). We therefore

   However, that meant that we had to automatically
accept H1, whatever it was!
Uh-oh

   But, but, but, 18 inches is probably too much!

   It doesn’t look like his claim of 18 inches is
correct.

   (And the guy looked kinda sleazy too!)
Uh-oh

   But we must admit that from the sample
evidence, it certainly looks like his procedure
significantly increased the growth rate of the
fish, but probably not as much as he claimed.

   Can we test this?
Uh-oh

   Of course we can!

   Let’s formulate the test in the following
manner:
Uh-oh

   H0: The true length of the treated fish is 18 inches.

–   What have we just done? To give him the benefit of doubt, we
have decided to use his claim as the null hypothesis. If you do this,
then even if you are only 5% sure that H0 is true, you will have to
accept his claim. So how we formulate the hypotheses is very
important

   H1: The true length of the treated fish is less than 18 inches.

   Level of significance:  = 0.05 and 0.01
Uh-oh

   Test statistic: Mean length follows a normal
distribution, since n >30.

   Therefore, the test statistic is

   T = Z = (16.9 – 18)/0.566 = -1.94

   P(Z ≤ -1.94) = 1 – (0.5 + 0.4738)
= 0.0262
Example - 2

   Decision: Since P(T) < 0.05 (but not < 0.01) we
reject H0 at the 5% level of significance, but not at
the 1% level of significance.

   Conclusion: The evidence, based on the sample,
does not support the salesman’s claim of the fish
growing to 18 inches in two years if his procedure is
followed (p < 0.05). However, the evidence does not
permit us to be ≥ 99% certain of our conclusion.
Statistical Power
Distribution
under H0

15”
Distribution
under H0

Alpha

15”
Critical value
Distribution
under H0
Distribution
Under H1

Critical value

Alpha

15”        18”
Distribution
under H0
Distribution
under H1

Alpha
(Type I Error)

15”                       18”
Beta
Critical value
(Type II Error)
Distribution under
Distribution                                                     H1 (under red curve)
Under H0
Power
= (1 – Beta)

15”                         18”
Beta                            Alpha
(Type II Error)   Critical value (Type I Error)
Statistical Power

   Type I error is defined as the error in
rejecting a true null hypothesis.

   Type II error is defined as the error in
accepting a false null hypothesis.
Statistical Power

   Statistical Power is defined as the probability
of rejecting the null hypothesis when it is
actually false.
Statistical Power

   Clearly, we want to minimize both Type I and
Type II errors.

   In other words we want to accept the null
hypothesis when it is true and reject it when it
is false (the latter refers to increasing the
power of the test).
Statistical Power

   In a test of hypothesis, we specify the extent
of Type I error as the level of significance,
Alpha.

   However, we do not specify the Type II error
(Beta).
Statistical Power

   In other words, we do not specify the power
of the test.

   However, it is possible to compute the power
of a test for a given alternative hypothesis.
Statistical Power

   For example, it was hypothesized (H0) above that the
sample of fish with the new treatment actually
belonged to the population with mean 15”.

   The question is: How powerful is the test in rejecting
this hypothesis if the sample really belonged to a
population with mean = 18”.
Distribution under
Distribution                                                     H1 (under red curve)
Under H0
Power
= (1 – Beta)

15”                         18”
Beta                            Alpha
(Type II Error)   Critical value (Type I Error)
Statistical Power

   SEM = 0.566 (given)
   If Alpha = 0.05, Z = 1.645 for a one-tailed
test.

   That is, Y-bar for (Z = 1.645) =
(0.566)(1.645) + 15 = 15.93
Statistical Power

   That is, the critical value in terms of Y-bar =
15.93
   Therefore, the area in green is computed the
following way:
Distribution
Distribution                                                    under H1
under H0
Power
= (1 – Beta)

15”                     18”
Beta                            Alpha
Critical value
(Type II Error)                  (Type I Error
(15.93)
=0.05)
Statistical Power

   Z = [15.93 - 18]/0.566 = -3.66
   P(Z ≥ -3.66) = 0.9999
   Therefore, power = 99.99%

   This means that if (H1: pop mean = 18”) is
actually true, then the probability of rejecting
the null hypothesis (of μ=15) is 99.99%.
Distribution
Distribution                                                   under H1
under H0
Power
= (1 – Beta
=0.9999)

15”                     18”
Beta                           Alpha
Critical value
(Type II Error                  (Type I Error
(15.93)
=0.0001)                         =0.05)
Power Calculators

   Many available online:

   Example
How to increase Statistical Power?

   Increasing alpha – clearly not a good
solution!
   Using 1-tailed rather than 2-tailed tests –
possible, but not always.
   Increasing the sample size – another reason
to use large sample sizes!
Sample size calculator

   By the same token, we can calculate the
sample size required for a given level of
statistical power.

   Many available online

   Example.

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 2 posted: 3/28/2012 language: pages: 92