# Hypothesis_Testing by fanzhongqing

VIEWS: 3 PAGES: 6

• pg 1
```									                     A Paradigm for Hypothesis Testing

Statistical hypothesis testing always follows the same series of steps:

A    Formulate the null hypothesis (the statement "on trial"). As the following
discussion will show, if your ultimate goal is to conclude that you have
evidence supporting a claim, you must take the opposite of that claim as

B    Look at your data, and find the version of the null hypothesis which comes
closest to fitting your data. (Some null hypotheses have only one version, but
others may be "true" in many different specific ways. This step is analogous
to "giving the accused the benefit of the doubt" in a criminal trial.)

C    Determine what you would have "expected" your study to yield, had it been
performed in a world where this "fitted" version of the null hypothesis was true.

D    Measure how "different" your actual study result is from this expectation.

E    Compute the probability that a study such as yours, conducted in a world
where the fitted version of the null hypothesis really is true, would - just by
chance (i.e., due to sampling error) - yield a difference this large or larger,
in a direction that contradicts the null hypothesis. This probability is the
significance level of your data, with respect to the null hypothesis.

F1   If the significance level is a "large" percentage, then you conclude that the
data does not provide meaningful evidence against the null hypothesis. You've
found that, in a world where the statement is true, you'd frequently see
evidence "like" what you're seeing. Note that you don't conclude that the data
supports the null hypothesis - only that it doesn't provide much of a

F2   If, on the other hand, the significance level is near 0, then you know that
either (1) you were very unlucky when you collected your data, and obtained
a very misrepresentative sample, or (2) the null hypothesis is false. Since
you don't expect to be very unlucky on a regular basis, your data, all by
itself, makes you very suspicious. You are entitled to say that your data
strongly contradicts the null hypothesis, and strongly supports the alternative
(the opposite of the null hypothesis).

A personal interpretation of the numerical significance level into words
expressing the "strength" of the evidence can be generated by playing
through our "coin-flipping" exercise.
A Paradigm for Hypothesis Testing, Applied

Consider our example involving the commercial loan officer of a bank. The credit
manager claimed that the mean balance due on customer accounts was at least
\$300. We took a sample of 64 customers, and found a sample mean of \$280,
with a sample standard deviation of \$120.

A   Formulate the null hypothesis (the statement "on trial"). As the following
discussion will show, if your ultimate goal is to conclude that you have
evidence supporting a claim, you must take the opposite of that claim as

We're not out to "prove" a statement of our own here. The credit manager of
the firm seeking a loan has made a claim: "The mean balance due (and
soon to be paid) on customer credit accounts is at least \$300." This is our
null hypothesis.

B   Look at your data, and find the version of the null hypothesis which comes
closest to fitting your data. (Some null hypotheses have only one version, but
others may be "true" in many different specific ways. This step is analogous
to "giving the accused the benefit of the doubt" in a criminal trial.)

Our study yielded a sample mean of \$280. The precise version of the null
hypothesis that comes closest to matching this is that the mean balance
due is exactly \$300.

C   Determine what you would have "expected" your study to yield, had it been
performed in a world where this "fitted" version of the null hypothesis was true.

We would have expected a sample mean of \$300.

D   Measure how "different" your actual study result is from this expectation.

Our actual result is below this expectation by \$20.

E   Compute the probability that a study such as yours, conducted in a world
where the fitted version of the null hypothesis really is true, would - just by
chance (i.e., due to sampling error) - yield a difference this large or larger,
in a direction that contradicts the null hypothesis. This probability is the
significance level of your data, with respect to the null hypothesis.

To contradict the null hypothesis by this much or more, we would need to have
obtained a sample mean of \$280 or less.

One standard-deviation's-worth of "fuzz" (more precisely, "exposure to
sampling error") in our estimation procedure is

\$15      #NAME?

If the true population mean is \$300, and a standard-deviation's-worth of
uncertainty in our estimate is \$15, then the chance that - just due to bad
luck - we'd get an estimate of \$280 or less is
9.12%      #NAME?
9.36%      #NAME?

This is the significance level of our data, with respect to the null hypothesis.

F1   If the significance level is a "large" percentage, then you conclude that the
data does not provide meaningful evidence against the null hypothesis. You've
found that, in a world where the statement is true, you'd frequently see
evidence "like" what you're seeing. Note that you don't conclude that the data
supports the null hypothesis - only that it doesn't provide much of a

F2   If, on the other hand, the significance level is near 0, then you know that
either (1) you were very unlucky when you collected your data, and obtained
a very misrepresentative sample, or (2) the null hypothesis is false. Since
you don't expect to be very unlucky on a regular basis, your data, all by
itself, makes you very suspicious. You are entitled to say that your data
strongly contradicts the null hypothesis, and strongly supports the alternative
(the opposite of the null hypothesis).

A personal interpretation of the numerical significance level into words
expressing the "strength" of the evidence can be generated by playing
through our "coin-flipping" exercise.

Personally, I'd interpret this as "a bit of evidence against the null hypothesis."
A Paradigm for Hypothesis Testing (Yttrium.xls, 13 and 14)

A      Formulate the null hypothesis (the statement "on trial"). As the following
discussion will show, if your ultimate goal is to conclude that you have
evidence supporting a claim, you must take the opposite of that claim as

13 average time spent online by subscribers is ≤ 800 minutes/month
m time  800

14 average increase in time online associated with an additional \$1000 in monthly salary is ≤ 70 minutes
1000*coef income  70 (in the most complete model, since we're dealing with an "effect" here)

In both cases, we wish to make an affirmative assertion, so we need to take the
opposite as our null hypothesis.

B      Look at your data, and find the version of the null hypothesis which comes
closest to fitting your data. (Some null hypotheses have only one version, but
others may be "true" in many different specific ways. This step is analogous
to "giving the accused the benefit of the doubt" in a criminal trial.)              Univariate statistics

13 m time = 800                                                                        mean
standard deviation
14 1000*coef income = 70                                                               standard error of the mean

Regression: time

coefficient
C      Determine what you would have "expected" your study to yield, had it been           std error of coef
performed in a world where this "fitted" version of the null hypothesis was true.

13 a sample mean of 800

14 an estimated coefficient of 0.07

D      Measure how "different" your actual study result is from this expectation.

13 22.67 above

14 0.007889 above

E      Compute the probability that a study such as yours, conducted in a world
where the fitted version of the null hypothesis really is true, would - just by
chance (i.e., due to sampling error) - yield a difference this large or larger,
in a direction that contradicts the null hypothesis. This probability is the
significance level of your data, with respect to the null hypothesis.

13 11.195%        #NAME?
11.339%        #NAME?

Only sample means above 822.57 would be at least this contradictory, so we
use the upper tail of the normal distribution.

14    3.268%      #NAME?
3.268%      #NAME?
3.422%      #NAME?

F1    If the significance level is a "large" percentage, then you conclude that the
data does not provide meaningful evidence against the null hypothesis. You've
found that, even in a world where the statement is true, you'd frequently see
evidence "like" what you're seeing. Note that you don't conclude that the data
supports the null hypothesis - only that it doesn't provide much of a

F2    If, on the other hand, the significance level is near 0, then you know that
either (1) you were very unlucky when you collected your data, and obtained
a very misrepresentative sample, or (2) the null hypothesis is false. Since
you don't expect to be very unlucky on a regular basis, your data, all by
itself, makes you very suspicious. You are entitled to say that your data
strongly contradicts the null hypothesis, and strongly supports the alternative
(the opposite of the null hypothesis).

A personal interpretation of the numerical significance level into words
expressing the "strength" of the evidence can be generated by playing
through our "coin-flipping" exercise.

13 only a little bit of evidence supporting the desired claim

14 strong (but not extremely strong) evidence supporting the desired claim
time
822.67
186.3936
error of the mean    18.63936

constant   sex      age     income
65.259 32.4438     6.8591 0.077889
25.01844 11.23912 0.699278 0.004281

```
To top