Hypothesis Testing by dffhrtcv3

VIEWS: 6 PAGES: 27

									Hypothesis Testing
  and p-values


Sporiš Goran, PhD.
http://kif.hr/predmet/mki
http://www.science4performance.com/
Overview

We will discuss two approaches to
   hypothesis testing:
1) Using confidence intervals
2) Using critical t or z values, or p values
Hypothesis Test


 A statistical hypothesis is simply a claim
 about a population that can be put to
 the test by drawing a random sample.
Elements of a Hypothesis Test

 1. The null hypothesis, Ho: Specifies
  hypothesized values for one or more of
  the population parameters
 2. The alternative hypothesis, HA: A
  statement which says that the
  population parameter is something
  other than the value specified by the
  null hypothesis
Elements of a Hypothesis Test

 3. The error level, or , which is just 1 -
 the confidence level (in terms of
 probability)
 4. One-tailed versus two-tailed tests
Example #1

 Suppose we want to know if there is a difference in
 the salaries for male and female professors. We
 might take two samples, one of men and one of
 women, to determine their respective mean salary
 levels. The calculated M1 and M2 are estimates of
 the population means, 1 and 2.
 Ho: 1 - 2 = 0, or Ho: 1 = 2
 HA: 1 - 2  0, or HA: 1  2
Example #1 (continued)

This is stated as a two-tailed test. If you
 believe that women make less than
 men, then the alternative hypothesis
 might be something like:

  HA : 1 -  2 > 0
Example #2

 As an employee of the Federal Trade Commission, you are
 vigilant in your stand against false or misleading advertising. A
 manufacturer of razor blades claims that their new blades give
 on average 15 good shaves. You conduct a small test by
 asking 10 randomly chosen men to each try one of these new
 razor blades. The average number of good shaves reported is
 13 and the standard deviation is 3.62. The manufacturer claims
 that the true number of shaves (or population value) is 15, or:



 Ho:  = 15
Example #2 (continued)

If we want to challenge the manufacturers
   claim, we might employ a one-tailed test,
   where the alternative hypothesis would be:
   HA:  < 15

Or if we were agnostic, we could use a two-
  tailed test:
  HA:   15
Example #3

   A more general test that we will see when we get to regression
   is where the null hypothesis is equal to zero, and we want to
   know if our parameters have a statistically significant effect, or
   are different from zero.
   For example, suppose a researcher wants to determine if the
   amount of electoral rules are related to voter turnout. Suppose
   the impact of electoral rules on voter turnout is called . A
   typical hypothesis test will be something like the following:

Ho: Electoral rules have no impact on voter turnout, or  = 0
HA: Electoral rules affect voter turnout, or   0
Testing Hypotheses: Confidence
Intervals
Let's start with the first example I gave
  where we want to see if there is a
  difference in the mean salary level (in
  thousands of dollars) of male and
  female professors. Suppose that for
  men (M1 = 16, n1=10, (X1 - M1)2 = 106)
  and for women (M2 = 11, n2 = 5, (X2 -
  M2)2 = 40).
Testing Hypotheses: Confidence
Intervals
We calculate the confidence interval as:

(1 - 2) = (M1 - M2) +/- t/2 sp(1/n1 + 1/n2)

We need to calculate the value of sp, which is just:
sp2 = (X1 - M1)2 + (X2 - M2)2
       (n1 - 1) + (n2 - 1)

= (106 + 40)/[(10-1) + (5-1)] = 146/13
= 11.2, thus sp = 11.2 = 3.35
Testing Hypotheses: Confidence
Intervals
We plug this back into our confidence
 interval to obtain:
(1 - 2)   = (16 - 11) +/- 2.16 (3.35)* (1/10 + 1/5)
            = 5 +/- 4
Note: 2.16 is the critical value of t, for
 95% confidence, 13 df (for a two tailed
 test).
Testing Hypotheses: Confidence
Intervals
With 95% confidence, the difference between
  our means is estimated to be between 1 and
  9, thus the claim that there is no difference
  cannot be accepted, i.e., we can reject the
  null hypothesis. Zero is not contained in the
  interval.
In general, any hypothesis that lies outside the
  confidence interval may be rejected. Thus
  the confidence interval may be regarded as
  the set of acceptable hypotheses.
Testing Hypotheses: p-values

Let's go back to our one-tailed test by the
  FTC employee, who wants to determine
  if the razor blade manufacturer's claim
  of 15 good shaves is valid.
  Ho:  = 15
 HA:  < 15
Testing Hypotheses: p-values

A p-value is just the probability that the
  sample value would be as large as the
  value actually observed if Ho is true. In
  other words, the p-value summarizes
  how much agreement there is between
  the data and the null hypothesis. In this
  case, the null is that the razors give 15
  good shaves.
Testing Hypotheses: p-values

 We start by calculating the t or z statistic
 associated with our observed value.
 We would use t in this problem because
 the sample size is small (N=10), and the
 population standard deviation  is
 unknown (when the sample size is
 large, t and z are equivalent).
Testing Hypotheses: p-values
t = M - o = 13 - 15 = -1.74
     s/N     3.62/10
   We can think of t as:
t = estimate - null hypothesis
       standard error
If the null hypothesis is zero, then
t = estimate/standard error.
In this case, the t ratio simply measures the size
   of the estimate relative to its standard error.
Testing Hypotheses: p-values
Now we want to find the area beyond that value of t,
  which gives us the p-value. In this problem, t = -1.74.
To find the p-value, we need to take into account our
  degrees of freedom, df = n-1 in this problem, which is
  9.
We go to the t-table and look to see where our
  calculated t falls relative to the cutoff values for
  various probability values. Our value of t is between
  the t values of 1.38 (p=.10) and 1.83 (p=.05). Thus
  we can say that our p-value is between .10 and .05.
Comparing our p-value or t/z statistic with a
critical p-value or t/z

A classical hypothesis test consists of
  setting a critical value, which will give us
  the reject and accept regions. For
  example, for a one-tailed test, with 95%
  confidence ( = .05), we use a value of
  z = 1.64 as our critical value, or for a
  two-tailed test, we use z = 1.96.
Comparison (continued)
 We reject the null hypothesis if our calculated
 t or z is beyond the critical t or z, or if the p-
 value is  .
 In the above example, a 95% critical t value
 for 9 df is 1.83. Since our calculated t does
 not exceed the critical t (or fall in the reject
 region), we must accept the null hypothesis,
 the manufacturer's claim of 15 good shaves.
 Also, our p-value is larger than , which is
 .05.
The Critical Region

 A way to think about a calculated value
 in the critical region is that:
  1) Ho is true, but we have been
 exceedingly unlucky and got a very
 improbable sample.
   2) Ho is not true after all. Thus it is no
 surprise that our observed value was so
 high or low.
The Critical Region

When we calculated the difference
between male and female professors'
salaries, if the difference is really large,
then we would expect to find something
in the tails, very far away from the
center of the distribution where the
difference is zero.
Type I and Type II Errors

Choosing an alpha level is tricky because
 it sets the level at which we will reject
 the null hypothesis. And there is a
 chance that the higher this value is, the
 greater the chance that we will falsely
 reject a true Ho.
Type I and Type II Errors
State of the World   Ho Accepted        Ho Rejected
If Ho is true        Correct decision   Type I error
                     Pr = 1-           Pr = 

If Ho is false       Type II error      Correct decision
                     Probability =     Probability = 1 - 
                                        = power of the test
Type I and Type II Errors

To give you an analogy, in a court of law
  we assume people are innocent (the
  null hypothesis) until proven guilty.
A Type I error would be finding an
  innocent man guilty.
A Type II error would be letting a guilty
  man go free.
Which is worse?
Type I and Type II Errors

By decreasing our error or alpha level, we will
  increase the chance of a Type II error
  (accepting the null when it is really false)
  because we make the criteria for rejection
  more stringent.
The only way that error can be reduced without
  increasing the probability of a Type II error is
  by increasing the sample size.

								
To top