Tw o Ind ep e nde nt Sa mples

Document Sample
Tw o Ind ep e nde nt Sa mples Powered By Docstoc
					Statistics 371, Fall 2004   Pooled Standard Error
                            If we wish to assume that the two population standard deviations
                            are equal, σ1 = σ2, then it makes sense to use data from both
                            samples to estimate the common population standard deviation.

                            We estimate the common population variance with a weighted
                            average of the sample variances, weighted by the degrees of
                                                                (n1 − 1)s2 + (n2 − 1)s2
                                                    pooled =
                                                                         1            2
                                                                      n1 + n2 − 2
                            The pooled standard error is then as below.
                                                                             1    1
                                                        SEpooled = spooled      +
                                                                             n1   n2

                                                                                                                                                                                                                                              Two Independent Samples
                            Statistics 371, Fall 2004                                                             4

                            Sampling Distributions                                                                                                        Comparing Two Groups
                            The sampling distribution of the difference in sample means has                                                                  • Chapter 7 describes two ways to compare two populations

                                                                                                                      University of Wisconsin - Madison
                            these characteristics.                                                                                                            on the basis of independent samples: a confidence interval
                                                                                                                                                              for the difference in population means and a hypothesis test.

                                                                                                                                                                     Department of Statistics
                                                                                                                                                            • The basic structure of the confidence interval is the same
                              • Mean: µ1 − µ2
                                                                                                                                                              as in the previous chapter — an estimate plus or minus a
                                                                                                                                                              multiple of a standard error.
                                           σ1    2
                                                σ2                                                                                                          • Hypothesis testing will introduce several new concepts.
                              • SD:        n1 + n2
                                                                                               October 18, 2004

                              • Shape: Exactly normal if both populations are normal,

                                                                                                                                                                                                                Bret Larget
                                approximately normal if populations are not normal but both
                                sample sizes are sufficiently large.

                            Statistics 371, Fall 2004                                                             5                                       Statistics 371, Fall 2004                                                                                     1

                            Theory for Confidence Interval                                                                                                 Setting
                            The recipe for constructing a confidence interval for a single pop-                                                              • Model two populations as buckets of numbered balls.
                            ulation mean is based on facts about the sampling distribution                                                                  • The population means are µ1 and µ2, respectively.
                            of the statistic                                                                                                                • The population standard deviations are σ1 and σ2, respec-
                                                            Y −µ                                                                                              tively.
                                                      T =         .
                                                           SE(Y )                                                                                           • We are interested in estimating µ1 − µ2 and in testing the
                            Similarly, the theory for confidence intervals for µ1 − µ2 is based                                                                hypothesis that µ1 = µ2.
                            on the sampling distribution of the statistic
                                                                ¯    ¯
                                                               (Y1 − Y2) − (µ1 − µ2)
                                                         T =
                                                                       ¯     ¯
                                                                   SE(Y1 − Y2)
                            where we standardize by subtracting the mean and dividing by                                                                         mean               µ1           (1)      (1)          mean    µ2    (2)      (2)
                                                                                                                                                                                                y1 ,..., yn1                        y1 ,..., yn2
                            the standard deviation of the sampling distribution.
                                                                                                                                                                   sd               σ1                  s1               sd    σ2           s2
                                                                                                                                                                                                y1                                  y2
                            If both populations are normal and if we know the population

                            Statistics 371, Fall 2004                                                             6                                       Statistics 371, Fall 2004                                                                                     2

                            Theory for Confidence Interval                                                                                                                   ¯    ¯
                                                                                                                                                          Standard Error of y1 − y2
                            standard deviations, then                                                                                                     The standard error of the difference in two sample means is an
                                                                                                                                                         empirical measure of how far the difference in sample means
                                                           ¯    ¯
                                                                                        
                                                          (Y1 − Y2) − (µ1 − µ2)                                                                           will typically be from the difference in the respective population
                                                                                        
                                      Pr −1.96 ≤                                  ≤ 1.96     = 0.95
                                                                  2
                                                                  σ1    2
                                                                       σ2                                                                                means.
                                                                  n1 + n2
                                                                                        
                                                                                        
                                                                                                                                                                                                                              s2    2
                                                                                                                                                                                                                               1 + s2
                            where we can choose z other than 1.96 for different confidence                                                                                                                     ¯
                                                                                                                                                                                                     SE(¯1 − y2) =
                                                                                                                                                                                                                              n1   n2
                            levels. This statement is true because the expression in the
                            middle has a standard normal distribution.
                                                                                                                                                          An alternative formula is
                            But in practice, we don’t know the population standard devia-
                            tions. If we substitute in sample estimates instead, we get this.                                                                                                  ¯
                                                                                                                                                                                       SE(¯1 − y2) =
                                                                                                                                                                                          y                       (SE(¯1))2 + (SE(¯2))2
                                                                                                                                                                                                                      y           y
                                                                                   
                                                                                   
                                                           ¯    ¯
                                                                                   
                                                          (Y1 − Y2) − (µ1 − µ2)
                                                                                   
                                                                                                                                                          This formula reminds us of how to find the length of the
                                                                                   
                                           Pr −t ≤                                ≤t    = 0.95
                                                                 s2
                                                                   1   s2
                                                                                                                                                          hypotenuse of a triangle.
                                                                  n1 + n2
                                                                                   
                                                                                   

                            We need to choose different end points to account for the
                                                                                                                                                          (Variances add, but standard deviations don’t.)
                            additional randomness in the denominator.
                            Statistics 371, Fall 2004                                                             6                                       Statistics 371, Fall 2004                                                                                     3
Example Using R                                                             Theory for Confidence Interval
Exercise 7.21                                                               It turns out that the sampling distribution of the statistic above
                                                                            is approximately a t distribution where the degrees of freedom
This exercise examines the growth of bean plants under red and
                                                                            should be estimated from the data as well.
green light. A 95% confidence interval is part of the output
                                                                            Algebraic manipulation leads to the following expression.
> ex7.21 = read.table("lights.txt", header = T)                                                                                                   
> str(ex7.21)
                                                                                                       s2
                                                                                                         1  s2                              s2  s2 
                                                                               Pr     ¯    ¯
                                                                                     (Y1 − Y2 ) − t        + 2 ≤ µ1 − µ2 ≤ (Y1 − Y2 ) + t
                                                                                                                            ¯    ¯           1
                                                                                                                                               + 2 = 0.95
‘data.frame’:        42 obs. of 2 variables:                                                           n1  n2                              n1  n2 
 $ height: num 8.4 8.4 10 8.8 7.1 9.4 8.8 4.3 9 8.4 ...
 $ color : Factor w/ 2 levels "green","red": 2 2 2 2 2 2 2 2 2 2 ...         We use a t multiplier so that the area between −t and t under
> attach(ex7.21)
> t.test(height ~ color)                                                    a t distribution with the estimated degrees of freedom will be
        Welch Two Sample t-test                                             0.95.
data: height by color
t = 1.1432, df = 38.019, p-value = 0.2601
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.4479687 1.6103216
sample estimates:
mean in group green   mean in group red
           8.940000            8.358824
Statistics 371, Fall 2004                                              9    Statistics 371, Fall 2004                                                       6

Example Assuming Equal Variances                                            Confidence Interval for µ1 − µ2
For the same data, were we to assume that the population                    The confidence interval for differences in population means has
variances were equal, the degrees of freedom, the standard error,           the same structure as that for a single population mean.
and the confidence interval are all slightly different.                                              (Estimate) ± (t Multiplier) × SE
> t.test(height ~ color, var.equal = T)                                     The only difference is that for this more complicated setting, we
        Two Sample t-test                                                   have more complicated formulas for the standard error and the
data: height by color
                                                                            degrees of freedom.
t = 1.1064, df = 40, p-value = 0.2752
alternative hypothesis: true difference in means is not equal to 0          Here is the df formula.
95 percent confidence interval:
 -0.4804523 1.6428053                                                                                           (SE2 + SE2)2
                                                                                                                    1      2
                                                                                                 df =
sample estimates:                                                                                         SE4/(n1 − 1) + SE4/(n2 − 1)
                                                                                                            1               2
mean in group green   mean in group red
           8.940000            8.358824                                     where SEi = si/ ni for i = 1, 2.

                                                                            As a check, the value is often close to n1 + n2 − 2. (This will
                                                                            be exact if s1 = s2 and if n1 = n2.) The value from the messy
                                                                            formula will always be between the smaller of n1 − 1 and n2 − 1
                                                                            and n1 + n2 − 2.
Statistics 371, Fall 2004                                              10   Statistics 371, Fall 2004                                                       7

Hypothesis Tests                                                            Example
  • Hypothesis tests are an alternative approach to statistical             Exercise 7.12
  • Unlike confidence intervals where the goal is estimation with            In this example, subjects with high blood pressure are randomly
    assessment of likely precision of the estimate, the goal of             allocated to two treatments. The biofeedback group receives
    hypothesis testing is to ascertain whether or not data is               relaxation training aided by biofeedback and meditation over
    consistent with what we might expect to see assuming that               eight weeks. The control group does not. Reduction in systolic
    a hypothesis is true.                                                   blood pressure is tabulated here.
  • The logic of hypothesis testing is a probabilistic form of proof
    by contradiction.                                                                                        Biofeedback       Control
  • In logic, if we can say that a proposition H leads to a                                             n              99           93
    contradiction, then we have proved H false and have proved                                          ¯
                                                                                                        y            13.8          4.0
                                                                                                        SE           1.34         1.30
    {notH} to be true.
  • In hypothesis testing, if observed data is highly unlikely under
                                                                            For 190 degrees of freedom (which come from both the simple
    an assumed hypothesis H, then there is strong (but not
                                                                            and messy formulas) the table says to use 1.977 (140 is rounded
    definitive) evidence that the hypothesis is false.
                                                                            down) whereas with R you find 1.973.

Statistics 371, Fall 2004                                              11   Statistics 371, Fall 2004                                                       8

Logic of Hypothesis Tests                                                   Example
All of the hypothesis tests we will see this semester fall into this        A calculator or R can compute the margin of error.
general framework.
                                                                            > se = sqrt(1.34^2 + 1.3^2)
                                                                            > tmult = qt(0.975, 190)
                                                                            > me = round(tmult * se, 1)
 1. State a null hypothesis and an alternative hypothesis.                  > se
                                                                            [1] 1.866976
                                                                            > tmult
                                                                            [1] 1.972528
 2. Gather data and compute a test statistic.                               > me
                                                                            [1] 3.7

 3. Consider the sampling distribution of the test statistic
                                                                                 We are 95% confident that the mean reduction in systolic
    assuming that the null hypothesis is true.
                                                                                 blood pressure due to the biofeedback treatment in a
                                                                                 population of similar individuals to those in this study
 4. Compute a p-value, a measure of how consistent the data                      would be between 6.1 and 13.5 mm more than the mean
    is with the null hypothesis in consideration of a specific                    reduction in the same population undergoing the control
    alternative hypothesis.                                                      treatment.

Statistics 371, Fall 2004                                              12   Statistics 371, Fall 2004                                                       8
Example: Calculate a Test Statistic                                      Logic of Hypothesis Tests
If the population means are equal, their difference is zero. This          5. Assess the strength of the evidence against the null hypoth-
test statistic tells us that the actual observed difference in sample         esis in the context of the problem.
means is 1.99 standard errors away from zero.

                                                                         We will introduce all of these concepts in the setting of testing
                                                                         the equality of two population means, but the general ideas
                                                                         will reappear in many settings throughout the remainder of the

Statistics 371, Fall 2004                                           15   Statistics 371, Fall 2004                                      12

Example: Find the Sampling
                                                                         Wisconsin Fast Plants Example
The sampling distribution of the test statistic is a t distribution        • In an experiment, seven Wisconsin Fast Plants (Brassica
with degrees of freedom calculated by the messy formula. This                campestris) were grown with a treatment of Ancymidol
useful R code computes it. If you type this in and save your work            (ancy) and eight control plants were given ordinary water.
space at the end of a session, you can use it again in the future.         • The null hypothesis is that the treatment has no effect on
                                                                             plant growth (as measured by the height of the plant after
> getDF = function(s1, n1, s2, n2) {
+     se1 = s1/sqrt(n1)
                                                                             14 days of growth).
+     se2 = s2/sqrt(n2)                                                    • The alternative hypothesis is that the treatment has an effect
+     return((se1^2 + se2^2)^2/(se1^4/(n1 - 1) + se2^4/(n2 - 1)))
                                                                             which would result in different mean growth amounts
+ }
> getDF(4.8, 8, 4.7, 7)                                                    • A summary of the sample data is as follows. The eight
[1] 12.80635                                                                 control plants had a mean growth of 15.9 cm and standard
                                                                             deviation 4.8 cm. The seven ancy plants had a mean growth
                                                                             of 11.0 cm and standard deviation 4.7 cm.
                                                                           • The question is, is it reasonable to think that the observed
                                                                             difference in sample means of 4.9 cm is due to chance
                                                                             variation alone, or is there evidence that some of the
                                                                             difference is due to the ancy treatment?
Statistics 371, Fall 2004                                           16   Statistics 371, Fall 2004                                      13

Example: Compute a P-Value                                               Example: State Hypotheses
To describe how likely it is to see such a test statistic, we can        Let µ1 be the population mean growth with the control condi-
ask what is the probability that chance alone would result in a          tions and let µ2 be the population mean with ancy.
test statistic at least this far from zero? The answer is the area
below −1.99 and above 1.99 under a t density curve with 12.8             The null and alternative hypotheses are expressed as
degrees of freedom.
                                                                                                     H0 : µ1 = µ2      HA : µ1 = µ2
With the t-table, we can only calculate this p-value within a            We state statistical hypotheses as statements about population
range. If we round down to 12 df, the t statistic is bracketed
between 1.912 and 2.076 in the table. Thus, the area to the
right of 1.99 is between 0.03 and 0.04. The p-value in this
problem is twice as large because we need to include as well the
area to the left of −1.99. So, 0.06 < p < 0.08.

With, R we can be more precise.

> p = 2 * pt(-ts, getDF(4.8, 8, 4.7, 7))
> p
[1] 0.06783269
Statistics 371, Fall 2004                                           17   Statistics 371, Fall 2004                                      14

Example: Interpreting a P-Value                                          Example: Calculate a Test Statistic
The smaller the p-value, the more inconsistent the data is with          In the setting of a difference between two independent sample
the null hypothesis, the stronger the evidence is against the null       means, our test statistic is
hypothesis in favor of the alternative.                                                                           ¯
                                                                                                            (¯1 − y2) − (µ1 − µ2)
                                                                                                                     1   s2
                                                                                                                    n1 + n2
Traditionally, people have measured statistical significance by
comparing a p-value with arbitrary significance levels such as            (Your book adds a subscript, ts, to remind you that this is
                                                                         computed from the sample.)
α = 0.05. The phrase “statistically significant at the 5% level”
means that the p-value is smaller than 0.05.                             For the data, we find this.
                                                                         > se = sqrt(4.8^2/8 + 4.7^2/7)
In reporting results, it is best to report an actual p-value and         > se
not simply a statement about whether or not it is “statistically         [1] 2.456769
                                                                         > ts = (15.9 - 11)/se
significant”.                                                             > ts
                                                                         [1] 1.994489

                                                                         The standard error tells us that we would expect that the
                                                                         observed difference in sample means would typically differ from
                                                                         the population difference in sample means by about 2.5 cm.
Statistics 371, Fall 2004                                           18   Statistics 371, Fall 2004                                      15
Type I and Type II Errors                                                                        Example: Summarizing the Results
There are two possible decision errors.                                                          For this example, I might summarize the results as follows.

  • Rejecting a true null hypothesis is a Type I error.
                                                                                                      There is slight evidence (p = 0.068, two-sided indepen-
  • You can interpret α = Pr {rejecting H0 | H0 is true}, so α is
                                                                                                      dent sample t-test) that there is a difference in the mean
    the probability of a Type I error. (You cannot make a Type I
                                                                                                      height at 14 days between Wisconsin Fast Plants grown
    error when the null hypothesis is false.)
                                                                                                      with ordinary water and those grown with Ancymidol.
  • Not rejecting a false null hypothesis is a Type II error.
  • It is convention to use β as the probability of a Type II
    error, or β = Pr {not rejecting H0 | H0 is false}. If the null                               Generally speaking, a confidence interval is more informative
    hypothesis is false, one of the many possible alternative                                    than a p-value because it estimates a difference in the units of
    hypotheses is true. It is typical to calculate β separately                                  the problem, which allows the reader with background knowledge
    for each possible alternative. (In this setting, for each value                              in the subject area to assess both the statistical significance and
    of µ1 − µ2.)                                                                                 the practical importance of the observed difference. In contrast,
  • Power is the probability of rejecting a false null hypothesis.                               a hypothesis test examines statistical significance alone.
    Power = 1 − β.

Statistics 371, Fall 2004                                                                   23   Statistics 371, Fall 2004                                           19

More on P -Values                                                                                Rejection Regions
Another way to think about P -values is to recognize that they                                   Suppose that we were asked to make a decision about a
depend on the values of the data, and so are random variables.                                   hypothesis based on data. We may decide, for example to reject
Let P be the p-value from a test.                                                                the null hypothesis if the p-value were smaller than 0.05 and to
                                                                                                 not reject the null hypothesis if the p-value were larger than 0.05.
  • If the null hypothesis is true, then P is a random variable
    distributed uniformly between 0 and 1.                                                       This procedure has a significance level of 0.05, which means that
  • In other words, the probability density of P is a flat rectangle.                             if we follow the rule, there is a probability of 0.05 of rejecting
  • Notice that this implies that Pr {P < c} = c for any number c                                a true null hypothesis. (We would need further assumptions to
    between 0 and 1. If the null is true, there is a 5% probability                              calculate the probability of not rejecting a false null hypothesis.)
    that P is less than 0.05, a 1% probability P is less than 0.01,
    and so on.                                                                                   Rejecting the null hypothesis occurs precisely when the test
  • On the other hand, if the alternative hypothesis is true, then                               statistic falls into a rejection region, in this case either the upper
    the distribution of P will be not be uniform and instead will                                or lower 2.5% tail of the sampling distribution.
    be shifted toward zero.

Statistics 371, Fall 2004                                                                   24   Statistics 371, Fall 2004                                           20

                                                                                                 Relationship between t tests and
                                                                                                 confidence intervals
We can explore these statements with a simulation based on the                                   The rejection region corresponds exactly to the test statistics for
Wisconsin Fast Plants example. The first histogram shows p-                                       which a 95% confidence interval contains 0.
values from 10,000 samples where µ1 − µ2 = 0 while the second
assumes that µ1 − µ2 = 5. Both simulations use σ1 = σ2 = 4.8
but the calculation of the p-value does not.                                                          We would reject the null hypothesis H0 : µ1 − µ2 = 0
                                                                                                      versus the two-sided alternative at the α = 0.05 level of
          Sampling Dist of P under Null               Sampling Dist of P under Alt.
                                                                                                      significance if and only if a 95% confidence interval for
                                                                                                      µ1 − µ2 does not contain 0.

                                                                                                 We could make similar statements for general α and a (1 − α) ×
                                                                                                 100% confidence interval.

    0.0      0.2   0.4    0.6    0.8      1.0   0.0      0.2   0.4    0.6    0.8      1.0

                     P−value                                     P−value

Statistics 371, Fall 2004                                                                   25   Statistics 371, Fall 2004                                           21

More P-value Interpretations                                                                     Comparing α and P -values
A verbal definition of a p-value is as follows.                                                     • In this setting, the significance level α and p-values are both
                                                                                                     areas under t curves, but they are not the same thing.
                                                                                                   • The significance level is a prespecified, arbitrary value, that
     The p-value of the data is the probability calculated
                                                                                                     does not depend on the data.
     assuming that the null hypothesis is true of obtaining a
                                                                                                   • The p-value depends on the data.
     test statistic that deviates from what is expected under
                                                                                                   • If a decision rule is to reject the null hypothesis when the
     the null (in the direction of the alternative hypothesis)
                                                                                                     test statistic is in a rejection region, this is equivalent to
     at least as much as the actual data does.
                                                                                                     rejecting the null hypothesis when the p-value is less than
                                                                                                     the significance level α.
The p-value is not the probability that the null hypothesis is true.
Interpreting the p-value in this way will mislead you!

Statistics 371, Fall 2004                                                                   26   Statistics 371, Fall 2004                                           22
Exercise 7.54                                                           Example for P-value Interpretation
Calculate a test statistic.                                             In a medical testing setting, we may want a procedure that
                                                                        indicates when a subject has a disease. We can think of the
> ts = (31.96 - 25.32)/sqrt(12.05^2/25 + 13.78^2/25)
> ts                                                                    decision healthy as corresponding to a null hypothesis and the
[1] 1.813664                                                            decision ill as corresponding to the alternative hypothesis.

Find the null sampling distribution.
                                                                        Consider now a situation where 1% of a population has a disease.
                                                                        Suppose that a test has an 80% chance of detecting the disease
The book reports a t distribution with 47.2 degrees of freedom.
                                                                        when a person has the disease (so the power of the test is 80%)
We can check this.
                                                                        and that the test has a 95% of correctly saying the person does
> degf = getDF(12.05, 25, 13.78, 25)                                    not have the disease when the person does not (so there is a 5%
> degf
[1] 47.16131
                                                                        chance of a false positive, or false rejecting the null).

Compute a (one-sided) p-value.

> p = 1 - pt(ts, degf)
> p
[1] 0.03804753
Statistics 371, Fall 2004                                          30   Statistics 371, Fall 2004                                                  27

Exercise 7.54                                                           Example (cont.)
Summarize the results.                                                  Here is a table of the results in a hypothetical population of
                                                                        100,000 people.
                                                                                                                  True Situation
     There is fairly strong evidence that the drug would                                                      Healthy          Ill
     provide more pain relief than the placebo on average                                                   (H0 is true) (H0 is false)    Total
                                                                         Test       Negative                  94,050          200        94,250
     for a population of women similar to those in this study                       (do not reject H0 )
     (p = 0.038, one-sided independent sample t-test).                   Result     Positive                   4,950          800         5,750
                                                                                    (reject H0 )
                                                                                    Total                     99,000         1,000       100,000

Notice that this result is “statistically significant at the 5% level”   Notice that of the 5750 times H0 was rejected (so that the
because the p-value is less than 0.05.                                  the test indicated illness), the person was actually healthy
                                                                        4950/5750 = 86% the time!
For a two-sided test, the p-value would be twice as large, and
not significant at the 5% level.                                         A rule that rejects H0 when the p-value is less than 5% only
                                                                        rejects 5% of the true null hypotheses, but this can be a large
                                                                        proportion of the total number of rejected hypotheses when the
                                                                        false null hypotheses occur rarely.
Statistics 371, Fall 2004                                          30   Statistics 371, Fall 2004                                                  28

Validity of t Methods                                                   One-tailed Tests
  • All of the methods seen so far are formally based on the              • Often, we are interested not only in demonstrating that two
    assumption that populations are normal.                                 population means are different, but in demonstrating that
  • In practice, they are valid as long as the sampling distribution        the difference is in a particular direction.
    of the difference in sample means is approximately normal,             • Instead of the two-sided alternative µ1 = µ2, we would
    which occurs when the sample sizes are large enough                     choose one of two possible one-sided alternatives, µ1 < µ2 or
    (justified by the Central Limit Theorem).                                µ1 > µ 2 .
  • Specifically, we need the sampling distribution of the test            • For the alternative hypothesis HA : µ1 < µ2, the p-value is
    statistic to have an approximate t distribution.
                                                                            the area to the left of the test statistic.
  • But what if the sample sizes are small and the samples
                                                                          • For the alternative hypothesis HA : µ1 > µ2, the p-value is
    indicate non-normality in the populations?
                                                                            the area to the right of the test statistic.
  • One approach is to transform the data, often by taking loga-
                                                                          • If the test statistic is in the direction of the alternative
    rithms, so that the transformed distribution is approximately
    normal.                                                                 hypothesis, the p-value from a one-sided test will be half
  • The textbook suggests a nonparametric method called the                 the p-value of a two-sided test.
    Wilcoxon-Mann-Whitney test that is based on converting the
    data to ranks.
  • I will show an alternative called a permutation test.
Statistics 371, Fall 2004                                          31   Statistics 371, Fall 2004                                                  29

Permutation Tests                                                       Exercise 7.54
  • The idea of a permutation test in this setting is quite             The following data comes from an experiment to test the efficacy
    straightforward.                                                    of a drug to reduce pain in women after child birth. Possible pain
  • We begin by computing the difference in sample means for             relief scores vary from 0 (no relief) to 56 (complete relief).
    the two samples of sizes n1 and n2.
  • Now, imagine taking the group labels and mixing them up                                                        Pain Relief Score
    (permuting them) and then assigning them at random to the                              Treatment          n    mean       sd
    observations. We could then again calculate a difference in                             Drug               25   31.96    12.05
    sample means.                                                                          Placebo            25   25.32    13.78
  • Next, imagine doing this process over and over and collecting
    the permutation sampling distribution of the difference in
                                                                        State hypotheses.
    sample means.
  • If the difference in sample means for the actual grouping
    of the data is atypical as compared to the differences from          Let µ1 be the population mean score for the drug. and µ2 be
    random groupings, this indicates evidence that the actual           the population mean score for the placebo.
    grouping is associated with the measured variable.
  • The p-value would be the proportion of random relabellings
    with sample mean differences at least as extreme as that
    from the original groups.                                                                       H0 : µ1 = µ2       HA : µ1 > µ2
Statistics 371, Fall 2004                                          32   Statistics 371, Fall 2004                                                  30
Permutation Tests
  • With very small samples, it is possible to enumerate all
    possible ways to divide the n1 + n2 total observations into
    groups of size n1 and n2.
  • An R function can carry out a permutation test.

Statistics 371, Fall 2004                                               32

Soil cores were taken from two areas, an area under an opening in a forest
canopy (the gap) and a nearby area under an area of heavy tree growth (the
growth). The amount of carbon dioxide given off by each soil core (in mol
CO2 g soil/hr).

> growth = c(17, 20, 170, 315, 22, 190, 64)
> gap = c(22, 29, 13, 16, 15, 18, 14, 6)
> boxplot(list(growth = growth, gap = gap))


                   growth           gap

Statistics 371, Fall 2004                                               33

Example Permutation Test in R
> library(exactRankTests)
> perm.test(growth, gap)
        2-sample Permutation Test

data: growth and gap
T = 798, p-value = 0.006371
alternative hypothesis: true mu is not equal to 0

     There is very strong evidence (p = 0.0064, two sample
     permutation test) that the soil respiration rates are
     different in the gap and growth areas.

Statistics 371, Fall 2004                                               34

Shared By: