# Tw o Ind ep e nde nt Sa mples

Document Sample

```					Statistics 371, Fall 2004   Pooled Standard Error
If we wish to assume that the two population standard deviations
are equal, σ1 = σ2, then it makes sense to use data from both
samples to estimate the common population standard deviation.

We estimate the common population variance with a weighted
average of the sample variances, weighted by the degrees of
freedom.
(n1 − 1)s2 + (n2 − 1)s2
s2
pooled =
1            2
n1 + n2 − 2
The pooled standard error is then as below.
1    1
SEpooled = spooled      +
n1   n2

Two Independent Samples
Statistics 371, Fall 2004                                                             4

Sampling Distributions                                                                                                        Comparing Two Groups
The sampling distribution of the diﬀerence in sample means has                                                                  • Chapter 7 describes two ways to compare two populations

these characteristics.                                                                                                            on the basis of independent samples: a conﬁdence interval
for the diﬀerence in population means and a hypothesis test.

Department of Statistics
• The basic structure of the conﬁdence interval is the same
• Mean: µ1 − µ2
as in the previous chapter — an estimate plus or minus a
multiple of a standard error.
2
σ1    2
σ2                                                                                                          • Hypothesis testing will introduce several new concepts.
• SD:        n1 + n2
October 18, 2004

• Shape: Exactly normal if both populations are normal,

Bret Larget
approximately normal if populations are not normal but both
sample sizes are suﬃciently large.

Statistics 371, Fall 2004                                                             5                                       Statistics 371, Fall 2004                                                                                     1

Theory for Conﬁdence Interval                                                                                                 Setting
The recipe for constructing a conﬁdence interval for a single pop-                                                              • Model two populations as buckets of numbered balls.
ulation mean is based on facts about the sampling distribution                                                                  • The population means are µ1 and µ2, respectively.
of the statistic                                                                                                                • The population standard deviations are σ1 and σ2, respec-
¯
Y −µ                                                                                              tively.
T =         .
¯
SE(Y )                                                                                           • We are interested in estimating µ1 − µ2 and in testing the
Similarly, the theory for conﬁdence intervals for µ1 − µ2 is based                                                                hypothesis that µ1 = µ2.
on the sampling distribution of the statistic
¯    ¯
(Y1 − Y2) − (µ1 − µ2)
T =
¯     ¯
SE(Y1 − Y2)
where we standardize by subtracting the mean and dividing by                                                                         mean               µ1           (1)      (1)          mean    µ2    (2)      (2)
y1 ,..., yn1                        y1 ,..., yn2
the standard deviation of the sampling distribution.
sd               σ1                  s1               sd    σ2           s2
y1                                  y2
If both populations are normal and if we know the population

Statistics 371, Fall 2004                                                             6                                       Statistics 371, Fall 2004                                                                                     2

Theory for Conﬁdence Interval                                                                                                                   ¯    ¯
Standard Error of y1 − y2
standard deviations, then                                                                                                     The standard error of the diﬀerence in two sample means is an



                                                                empirical measure of how far the diﬀerence in sample means
¯    ¯
                                              
(Y1 − Y2) − (µ1 − µ2)                                                                           will typically be from the diﬀerence in the respective population

                                              

Pr −1.96 ≤                                  ≤ 1.96     = 0.95
                        2
σ1    2
σ2                                                                                means.
n1 + n2

                                              

                                              
s2    2
1 + s2
where we can choose z other than 1.96 for diﬀerent conﬁdence                                                                                                                     ¯
SE(¯1 − y2) =
y
n1   n2
levels. This statement is true because the expression in the
middle has a standard normal distribution.
An alternative formula is
But in practice, we don’t know the population standard devia-
tions. If we substitute in sample estimates instead, we get this.                                                                                                  ¯
SE(¯1 − y2) =
y                       (SE(¯1))2 + (SE(¯2))2
y           y
                                    
                                    
¯    ¯
                                    
(Y1 − Y2) − (µ1 − µ2)
                                    
This formula reminds us of how to ﬁnd the length of the
                                    
Pr −t ≤                                ≤t    = 0.95
                  s2
1   s2
2

hypotenuse of a triangle.
n1 + n2

                                    

                                    

We need to choose diﬀerent end points to account for the
(Variances add, but standard deviations don’t.)
Statistics 371, Fall 2004                                                             6                                       Statistics 371, Fall 2004                                                                                     3
Example Using R                                                             Theory for Conﬁdence Interval
Exercise 7.21                                                               It turns out that the sampling distribution of the statistic above
is approximately a t distribution where the degrees of freedom
This exercise examines the growth of bean plants under red and
should be estimated from the data as well.
green light. A 95% conﬁdence interval is part of the output
below.
Algebraic manipulation leads to the following expression.
> str(ex7.21)
                   s2
1  s2                              s2  s2 
Pr     ¯    ¯
(Y1 − Y2 ) − t        + 2 ≤ µ1 − µ2 ≤ (Y1 − Y2 ) + t
¯    ¯           1
+ 2 = 0.95
‘data.frame’:        42 obs. of 2 variables:                                                           n1  n2                              n1  n2 
\$ height: num 8.4 8.4 10 8.8 7.1 9.4 8.8 4.3 9 8.4 ...
\$ color : Factor w/ 2 levels "green","red": 2 2 2 2 2 2 2 2 2 2 ...         We use a t multiplier so that the area between −t and t under
> attach(ex7.21)
> t.test(height ~ color)                                                    a t distribution with the estimated degrees of freedom will be
Welch Two Sample t-test                                             0.95.
data: height by color
t = 1.1432, df = 38.019, p-value = 0.2601
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.4479687 1.6103216
sample estimates:
mean in group green   mean in group red
8.940000            8.358824
Statistics 371, Fall 2004                                              9    Statistics 371, Fall 2004                                                       6

Example Assuming Equal Variances                                            Conﬁdence Interval for µ1 − µ2
For the same data, were we to assume that the population                    The conﬁdence interval for diﬀerences in population means has
variances were equal, the degrees of freedom, the standard error,           the same structure as that for a single population mean.
and the conﬁdence interval are all slightly diﬀerent.                                              (Estimate) ± (t Multiplier) × SE
> t.test(height ~ color, var.equal = T)                                     The only diﬀerence is that for this more complicated setting, we
Two Sample t-test                                                   have more complicated formulas for the standard error and the
data: height by color
degrees of freedom.
t = 1.1064, df = 40, p-value = 0.2752
alternative hypothesis: true difference in means is not equal to 0          Here is the df formula.
95 percent confidence interval:
-0.4804523 1.6428053                                                                                           (SE2 + SE2)2
1      2
df =
sample estimates:                                                                                         SE4/(n1 − 1) + SE4/(n2 − 1)
1               2
mean in group green   mean in group red
√
8.940000            8.358824                                     where SEi = si/ ni for i = 1, 2.

As a check, the value is often close to n1 + n2 − 2. (This will
be exact if s1 = s2 and if n1 = n2.) The value from the messy
formula will always be between the smaller of n1 − 1 and n2 − 1
and n1 + n2 − 2.
Statistics 371, Fall 2004                                              10   Statistics 371, Fall 2004                                                       7

Hypothesis Tests                                                            Example
• Hypothesis tests are an alternative approach to statistical             Exercise 7.12
inference.
• Unlike conﬁdence intervals where the goal is estimation with            In this example, subjects with high blood pressure are randomly
assessment of likely precision of the estimate, the goal of             allocated to two treatments. The biofeedback group receives
hypothesis testing is to ascertain whether or not data is               relaxation training aided by biofeedback and meditation over
consistent with what we might expect to see assuming that               eight weeks. The control group does not. Reduction in systolic
a hypothesis is true.                                                   blood pressure is tabulated here.
• The logic of hypothesis testing is a probabilistic form of proof
• In logic, if we can say that a proposition H leads to a                                             n              99           93
contradiction, then we have proved H false and have proved                                          ¯
y            13.8          4.0
SE           1.34         1.30
{notH} to be true.
• In hypothesis testing, if observed data is highly unlikely under
For 190 degrees of freedom (which come from both the simple
an assumed hypothesis H, then there is strong (but not
and messy formulas) the table says to use 1.977 (140 is rounded
deﬁnitive) evidence that the hypothesis is false.
down) whereas with R you ﬁnd 1.973.

Statistics 371, Fall 2004                                              11   Statistics 371, Fall 2004                                                       8

Logic of Hypothesis Tests                                                   Example
All of the hypothesis tests we will see this semester fall into this        A calculator or R can compute the margin of error.
general framework.
> se = sqrt(1.34^2 + 1.3^2)
> tmult = qt(0.975, 190)
> me = round(tmult * se, 1)
1. State a null hypothesis and an alternative hypothesis.                  > se
[1] 1.866976
> tmult
[1] 1.972528
2. Gather data and compute a test statistic.                               > me
[1] 3.7

3. Consider the sampling distribution of the test statistic
We are 95% conﬁdent that the mean reduction in systolic
assuming that the null hypothesis is true.
blood pressure due to the biofeedback treatment in a
population of similar individuals to those in this study
4. Compute a p-value, a measure of how consistent the data                      would be between 6.1 and 13.5 mm more than the mean
is with the null hypothesis in consideration of a speciﬁc                    reduction in the same population undergoing the control
alternative hypothesis.                                                      treatment.

Statistics 371, Fall 2004                                              12   Statistics 371, Fall 2004                                                       8
Example: Calculate a Test Statistic                                      Logic of Hypothesis Tests
If the population means are equal, their diﬀerence is zero. This          5. Assess the strength of the evidence against the null hypoth-
test statistic tells us that the actual observed diﬀerence in sample         esis in the context of the problem.
means is 1.99 standard errors away from zero.

We will introduce all of these concepts in the setting of testing
the equality of two population means, but the general ideas
will reappear in many settings throughout the remainder of the
semester.

Statistics 371, Fall 2004                                           15   Statistics 371, Fall 2004                                      12

Example: Find the Sampling
Wisconsin Fast Plants Example
Distribution
The sampling distribution of the test statistic is a t distribution        • In an experiment, seven Wisconsin Fast Plants (Brassica
with degrees of freedom calculated by the messy formula. This                campestris) were grown with a treatment of Ancymidol
useful R code computes it. If you type this in and save your work            (ancy) and eight control plants were given ordinary water.
space at the end of a session, you can use it again in the future.         • The null hypothesis is that the treatment has no eﬀect on
plant growth (as measured by the height of the plant after
> getDF = function(s1, n1, s2, n2) {
+     se1 = s1/sqrt(n1)
14 days of growth).
+     se2 = s2/sqrt(n2)                                                    • The alternative hypothesis is that the treatment has an eﬀect
+     return((se1^2 + se2^2)^2/(se1^4/(n1 - 1) + se2^4/(n2 - 1)))
which would result in diﬀerent mean growth amounts
+ }
> getDF(4.8, 8, 4.7, 7)                                                    • A summary of the sample data is as follows. The eight
[1] 12.80635                                                                 control plants had a mean growth of 15.9 cm and standard
deviation 4.8 cm. The seven ancy plants had a mean growth
of 11.0 cm and standard deviation 4.7 cm.
• The question is, is it reasonable to think that the observed
diﬀerence in sample means of 4.9 cm is due to chance
variation alone, or is there evidence that some of the
diﬀerence is due to the ancy treatment?
Statistics 371, Fall 2004                                           16   Statistics 371, Fall 2004                                      13

Example: Compute a P-Value                                               Example: State Hypotheses
To describe how likely it is to see such a test statistic, we can        Let µ1 be the population mean growth with the control condi-
ask what is the probability that chance alone would result in a          tions and let µ2 be the population mean with ancy.
test statistic at least this far from zero? The answer is the area
below −1.99 and above 1.99 under a t density curve with 12.8             The null and alternative hypotheses are expressed as
degrees of freedom.
H0 : µ1 = µ2      HA : µ1 = µ2
With the t-table, we can only calculate this p-value within a            We state statistical hypotheses as statements about population
range. If we round down to 12 df, the t statistic is bracketed
parameters.
between 1.912 and 2.076 in the table. Thus, the area to the
right of 1.99 is between 0.03 and 0.04. The p-value in this
problem is twice as large because we need to include as well the
area to the left of −1.99. So, 0.06 < p < 0.08.

With, R we can be more precise.

> p = 2 * pt(-ts, getDF(4.8, 8, 4.7, 7))
> p
[1] 0.06783269
Statistics 371, Fall 2004                                           17   Statistics 371, Fall 2004                                      14

Example: Interpreting a P-Value                                          Example: Calculate a Test Statistic
The smaller the p-value, the more inconsistent the data is with          In the setting of a diﬀerence between two independent sample
the null hypothesis, the stronger the evidence is against the null       means, our test statistic is
hypothesis in favor of the alternative.                                                                           ¯
(¯1 − y2) − (µ1 − µ2)
y
t=
s2
1   s2
2
n1 + n2
Traditionally, people have measured statistical signiﬁcance by
comparing a p-value with arbitrary signiﬁcance levels such as            (Your book adds a subscript, ts, to remind you that this is
computed from the sample.)
α = 0.05. The phrase “statistically signiﬁcant at the 5% level”
means that the p-value is smaller than 0.05.                             For the data, we ﬁnd this.
> se = sqrt(4.8^2/8 + 4.7^2/7)
In reporting results, it is best to report an actual p-value and         > se
not simply a statement about whether or not it is “statistically         [1] 2.456769
> ts = (15.9 - 11)/se
signiﬁcant”.                                                             > ts
[1] 1.994489

The standard error tells us that we would expect that the
observed diﬀerence in sample means would typically diﬀer from
the population diﬀerence in sample means by about 2.5 cm.
Statistics 371, Fall 2004                                           18   Statistics 371, Fall 2004                                      15
Type I and Type II Errors                                                                        Example: Summarizing the Results
There are two possible decision errors.                                                          For this example, I might summarize the results as follows.

• Rejecting a true null hypothesis is a Type I error.
There is slight evidence (p = 0.068, two-sided indepen-
• You can interpret α = Pr {rejecting H0 | H0 is true}, so α is
dent sample t-test) that there is a diﬀerence in the mean
the probability of a Type I error. (You cannot make a Type I
height at 14 days between Wisconsin Fast Plants grown
error when the null hypothesis is false.)
with ordinary water and those grown with Ancymidol.
• Not rejecting a false null hypothesis is a Type II error.
• It is convention to use β as the probability of a Type II
error, or β = Pr {not rejecting H0 | H0 is false}. If the null                               Generally speaking, a conﬁdence interval is more informative
hypothesis is false, one of the many possible alternative                                    than a p-value because it estimates a diﬀerence in the units of
hypotheses is true. It is typical to calculate β separately                                  the problem, which allows the reader with background knowledge
for each possible alternative. (In this setting, for each value                              in the subject area to assess both the statistical signiﬁcance and
of µ1 − µ2.)                                                                                 the practical importance of the observed diﬀerence. In contrast,
• Power is the probability of rejecting a false null hypothesis.                               a hypothesis test examines statistical signiﬁcance alone.
Power = 1 − β.

Statistics 371, Fall 2004                                                                   23   Statistics 371, Fall 2004                                           19

More on P -Values                                                                                Rejection Regions
Another way to think about P -values is to recognize that they                                   Suppose that we were asked to make a decision about a
depend on the values of the data, and so are random variables.                                   hypothesis based on data. We may decide, for example to reject
Let P be the p-value from a test.                                                                the null hypothesis if the p-value were smaller than 0.05 and to
not reject the null hypothesis if the p-value were larger than 0.05.
• If the null hypothesis is true, then P is a random variable
distributed uniformly between 0 and 1.                                                       This procedure has a signiﬁcance level of 0.05, which means that
• In other words, the probability density of P is a ﬂat rectangle.                             if we follow the rule, there is a probability of 0.05 of rejecting
• Notice that this implies that Pr {P < c} = c for any number c                                a true null hypothesis. (We would need further assumptions to
between 0 and 1. If the null is true, there is a 5% probability                              calculate the probability of not rejecting a false null hypothesis.)
that P is less than 0.05, a 1% probability P is less than 0.01,
and so on.                                                                                   Rejecting the null hypothesis occurs precisely when the test
• On the other hand, if the alternative hypothesis is true, then                               statistic falls into a rejection region, in this case either the upper
the distribution of P will be not be uniform and instead will                                or lower 2.5% tail of the sampling distribution.
be shifted toward zero.

Statistics 371, Fall 2004                                                                   24   Statistics 371, Fall 2004                                           20

Relationship between t tests and
Simulation
conﬁdence intervals
We can explore these statements with a simulation based on the                                   The rejection region corresponds exactly to the test statistics for
Wisconsin Fast Plants example. The ﬁrst histogram shows p-                                       which a 95% conﬁdence interval contains 0.
values from 10,000 samples where µ1 − µ2 = 0 while the second
assumes that µ1 − µ2 = 5. Both simulations use σ1 = σ2 = 4.8
but the calculation of the p-value does not.                                                          We would reject the null hypothesis H0 : µ1 − µ2 = 0
versus the two-sided alternative at the α = 0.05 level of
Sampling Dist of P under Null               Sampling Dist of P under Alt.
signiﬁcance if and only if a 95% conﬁdence interval for
µ1 − µ2 does not contain 0.

We could make similar statements for general α and a (1 − α) ×
100% conﬁdence interval.

0.0      0.2   0.4    0.6    0.8      1.0   0.0      0.2   0.4    0.6    0.8      1.0

P−value                                     P−value

Statistics 371, Fall 2004                                                                   25   Statistics 371, Fall 2004                                           21

More P-value Interpretations                                                                     Comparing α and P -values
A verbal deﬁnition of a p-value is as follows.                                                     • In this setting, the signiﬁcance level α and p-values are both
areas under t curves, but they are not the same thing.
• The signiﬁcance level is a prespeciﬁed, arbitrary value, that
The p-value of the data is the probability calculated
does not depend on the data.
assuming that the null hypothesis is true of obtaining a
• The p-value depends on the data.
test statistic that deviates from what is expected under
• If a decision rule is to reject the null hypothesis when the
the null (in the direction of the alternative hypothesis)
test statistic is in a rejection region, this is equivalent to
at least as much as the actual data does.
rejecting the null hypothesis when the p-value is less than
the signiﬁcance level α.
The p-value is not the probability that the null hypothesis is true.
Interpreting the p-value in this way will mislead you!

Statistics 371, Fall 2004                                                                   26   Statistics 371, Fall 2004                                           22
Exercise 7.54                                                           Example for P-value Interpretation
Calculate a test statistic.                                             In a medical testing setting, we may want a procedure that
indicates when a subject has a disease. We can think of the
> ts = (31.96 - 25.32)/sqrt(12.05^2/25 + 13.78^2/25)
> ts                                                                    decision healthy as corresponding to a null hypothesis and the
[1] 1.813664                                                            decision ill as corresponding to the alternative hypothesis.

Find the null sampling distribution.
Consider now a situation where 1% of a population has a disease.
Suppose that a test has an 80% chance of detecting the disease
The book reports a t distribution with 47.2 degrees of freedom.
when a person has the disease (so the power of the test is 80%)
We can check this.
and that the test has a 95% of correctly saying the person does
> degf = getDF(12.05, 25, 13.78, 25)                                    not have the disease when the person does not (so there is a 5%
> degf
[1] 47.16131
chance of a false positive, or false rejecting the null).

Compute a (one-sided) p-value.

> p = 1 - pt(ts, degf)
> p
[1] 0.03804753
Statistics 371, Fall 2004                                          30   Statistics 371, Fall 2004                                                  27

Exercise 7.54                                                           Example (cont.)
Summarize the results.                                                  Here is a table of the results in a hypothetical population of
100,000 people.
True Situation
There is fairly strong evidence that the drug would                                                      Healthy          Ill
provide more pain relief than the placebo on average                                                   (H0 is true) (H0 is false)    Total
Test       Negative                  94,050          200        94,250
for a population of women similar to those in this study                       (do not reject H0 )
(p = 0.038, one-sided independent sample t-test).                   Result     Positive                   4,950          800         5,750
(reject H0 )
Total                     99,000         1,000       100,000

Notice that this result is “statistically signiﬁcant at the 5% level”   Notice that of the 5750 times H0 was rejected (so that the
because the p-value is less than 0.05.                                  the test indicated illness), the person was actually healthy
4950/5750 = 86% the time!
For a two-sided test, the p-value would be twice as large, and
not signiﬁcant at the 5% level.                                         A rule that rejects H0 when the p-value is less than 5% only
rejects 5% of the true null hypotheses, but this can be a large
proportion of the total number of rejected hypotheses when the
false null hypotheses occur rarely.
Statistics 371, Fall 2004                                          30   Statistics 371, Fall 2004                                                  28

Validity of t Methods                                                   One-tailed Tests
• All of the methods seen so far are formally based on the              • Often, we are interested not only in demonstrating that two
assumption that populations are normal.                                 population means are diﬀerent, but in demonstrating that
• In practice, they are valid as long as the sampling distribution        the diﬀerence is in a particular direction.
of the diﬀerence in sample means is approximately normal,             • Instead of the two-sided alternative µ1 = µ2, we would
which occurs when the sample sizes are large enough                     choose one of two possible one-sided alternatives, µ1 < µ2 or
(justiﬁed by the Central Limit Theorem).                                µ1 > µ 2 .
• Speciﬁcally, we need the sampling distribution of the test            • For the alternative hypothesis HA : µ1 < µ2, the p-value is
statistic to have an approximate t distribution.
the area to the left of the test statistic.
• But what if the sample sizes are small and the samples
• For the alternative hypothesis HA : µ1 > µ2, the p-value is
indicate non-normality in the populations?
the area to the right of the test statistic.
• One approach is to transform the data, often by taking loga-
• If the test statistic is in the direction of the alternative
rithms, so that the transformed distribution is approximately
normal.                                                                 hypothesis, the p-value from a one-sided test will be half
• The textbook suggests a nonparametric method called the                 the p-value of a two-sided test.
Wilcoxon-Mann-Whitney test that is based on converting the
data to ranks.
• I will show an alternative called a permutation test.
Statistics 371, Fall 2004                                          31   Statistics 371, Fall 2004                                                  29

Permutation Tests                                                       Exercise 7.54
• The idea of a permutation test in this setting is quite             The following data comes from an experiment to test the eﬃcacy
straightforward.                                                    of a drug to reduce pain in women after child birth. Possible pain
• We begin by computing the diﬀerence in sample means for             relief scores vary from 0 (no relief) to 56 (complete relief).
the two samples of sizes n1 and n2.
• Now, imagine taking the group labels and mixing them up                                                        Pain Relief Score
(permuting them) and then assigning them at random to the                              Treatment          n    mean       sd
observations. We could then again calculate a diﬀerence in                             Drug               25   31.96    12.05
sample means.                                                                          Placebo            25   25.32    13.78
• Next, imagine doing this process over and over and collecting
the permutation sampling distribution of the diﬀerence in
State hypotheses.
sample means.
• If the diﬀerence in sample means for the actual grouping
of the data is atypical as compared to the diﬀerences from          Let µ1 be the population mean score for the drug. and µ2 be
random groupings, this indicates evidence that the actual           the population mean score for the placebo.
grouping is associated with the measured variable.
• The p-value would be the proportion of random relabellings
with sample mean diﬀerences at least as extreme as that
from the original groups.                                                                       H0 : µ1 = µ2       HA : µ1 > µ2
Statistics 371, Fall 2004                                          32   Statistics 371, Fall 2004                                                  30
Permutation Tests
• With very small samples, it is possible to enumerate all
possible ways to divide the n1 + n2 total observations into
groups of size n1 and n2.
• An R function can carry out a permutation test.

Statistics 371, Fall 2004                                               32

Example
Soil cores were taken from two areas, an area under an opening in a forest
canopy (the gap) and a nearby area under an area of heavy tree growth (the
growth). The amount of carbon dioxide given oﬀ by each soil core (in mol
CO2 g soil/hr).

> growth = c(17, 20, 170, 315, 22, 190, 64)
> gap = c(22, 29, 13, 16, 15, 18, 14, 6)
> boxplot(list(growth = growth, gap = gap))

300
200
100
0

growth           gap

Statistics 371, Fall 2004                                               33

Example Permutation Test in R
> library(exactRankTests)
> perm.test(growth, gap)
2-sample Permutation Test

data: growth and gap
T = 798, p-value = 0.006371
alternative hypothesis: true mu is not equal to 0

There is very strong evidence (p = 0.0064, two sample
permutation test) that the soil respiration rates are
diﬀerent in the gap and growth areas.

Statistics 371, Fall 2004                                               34

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 2 posted: 2/14/2012 language: pages: 6
How are you planning on using Docstoc?