# Ex by gegeshandong

VIEWS: 23 PAGES: 8

• pg 1
```									   Biom 109:                Lab 8                                                             379
Comparison of 2 Means: Independent vs Dependent Samples

Lab 8 Goals:
 Use Minitab to run hypothesis tests comparing 2 populations.
 Determine if the comparison between the 2 populations is
dependent or independent.

Open the Lab 8 Minitab file at the following website:

www.humboldt.edu/~tsp1  Biom 109  Data files              Lab_8

Ex. 1)      Clones are genetically identical cells descended from the
same individual. Researchers have identified a single poplar clone
that yields fast-growing, hardy trees. These trees may one day serve
as an alternative to conventional fuel as an energy resource.
Researchers at The Pennsylvania State University planted Poplar
Clone 252 on two different sites--one, a rich site by a creek (Site 1),
and the other, a dry, sandy site on a ridge (Site 2). Let’s examine if
there is a significant difference in the mean tree diameter in inches
between two sites. Run a hypothesis test and build a confidence
interval. Interpret the result.

Step 1: Preliminary testing of samples:
Whether building a confidence interval or performing a hypothesis test, when faced
with 2 samples the first step will be to determine independence and normality.

1a)    Are the two samples independent?
Consider whether the measurement of a sample from one group could possibly be interpreted
as a measurement from the second group. If the data sets are paired in a “before versus after”
or a “method A versus a method B” scenario, then we have a dependent data set and must
apply a one sample t-test on the column of differences. If however the two populations are
separate form one another; different populations measured in different locations, then we
have independent data sets and must apply a two sample t-test.

 Independent data sets: 2 different populations of trees, measured at 2 different sites. 1pt

1b)    Are the samples drawn from normal populations?
We must address normality for any hypothesis involving a population mean.
If we are given summarized data then we have no choice but to proclaim “We
assume normality.” But if we are given raw data, as in this case, we use Minitab to
determine whether our data sets are normally distributed.
Biom 109:                Lab 8                                                                                           380
Comparison of 2 Means: Independent vs Dependent Samples

Step 1: Preliminary testing of samples Continued:

Then: MTB > norm c1                                                MTB > norm c2
Probability Plot of Creek Site 1                                           Probability Plot of Sandy Site 2
Normal                                                                     Normal
99                                                                         99
Mean     2.598                                                               Mean      3.028
95
StDev   0.9158             95
StDev     1.284
90
N           10             90
N            10
80                                                                         80
70
Percent

Percent
60                                              P-Value 0.164              60                                                P-Value   0.379
50                                                                         50
40                                                                         40
30                                                                         30
20                                                                         20
10                                                                         10
5                                                                          5

1
1
0       1        2       3         4   5                                   0       1      2      3      4     5   6
Creek Site 1                                                               Sandy Site 2

Minitab answers with normality test                                   Both data sets are normally       2pt
p-values which we incorporate into                                     distributed because:
a brief answer using correct notation:                                       p1               p2  
0.164 > 0.01        0.379 > 0.01

Step 2: Declare the Parameters of Interest:

 1  Mean Poplar tree diameter in inches for trees grown at the creek site.                                     2pt
 2  Mean Poplar tree diameter in inches for trees grown at the ridge site.

              H 0 : 1   2          3pts

Step 3: State the Hypothesis and the LOS:                                                 H A : 1   2

  0.05

Steps 4-6: Use Minitab to run a hypothesis test and calculate the confidence interval

        MTB > twos 95 c1 c2;                                                                2pts
SUBC> Alte 0.
TWOSAMPLE T FOR Diameter
Site   N      MEAN     STDEV   SE MEAN
1     10     2.598     0.916      0.29
2     10      3.03       1.28     0.41
95 PCT CI FOR MU 1 - MU 2: ( -1.49, 0.63)

TTEST MU 1 = MU 2 (VS NE): T= -0.86                                        P=0.40            DF=        16
Biom 109:                Lab 8                                                                  381
Comparison of 2 Means: Independent vs Dependent Samples

Step 7: Conclusion.

      Stat: At the 5% LOS we do not           t Sample  0.86       p              3pts
reject H 0 : 1   2 because:                         (p = 0.40) > 0.05

      English:                                                                       1pt
There is not a statistically significant difference in the width of tree diameters.

Find a 95% confidence interval on the mean difference of the tree widths.
(From Minitab output)

      “The 95% confidence interval in the difference between the width of tree diameters of
the Creek site- Sandy ridge site is (-1.49, 0.62) inches. There is not a significant
difference in at either of the two sites.”                                   2pts

Interpreting the CI: Note that as long as the confidence interval straddles zero our
conclusion is brief. It is when the confidence interval does not contain zero that we must
specify which population is larger or smaller than the other and by how much.

Recall from lab 9 that the inequality sign    Minitab code:                   Interpretation:
used in the alternate hypothesis is
indicated with a sub-command of
MTB > twos 95 c1 c2;
SUBC> alternative -1.
H 0 : 1   2
“alternative” with the number of -1, 0, or                                     H A : 1   2
1 as shown in the table at the right.

Note: Only the 2 tailed hypothesis will
MTB > twos 95 c1 c2;
SUBC> alte 0.
H 0 : 1   2
yield a confidence interval.                                                   H A : 1   2
The one tailed hypothesis test will return
only an upper or a lower bound.

MTB > twos 95 c1 c2;
H 0 : 1   2
SUBC> alte 1.                    H A : 1   2
Biom 109:                Lab 8                                                                                           382
Comparison of 2 Means: Independent vs Dependent Samples

Ex. 2) Medical researchers recorded blood cholesterol levels
in mg/dL of 28 heart-attack victims 2 days following the
attack were taken as a control. Test the hypothesis at the
10% LOS that heart-attack victims suffer from a higher
cholesterol count than those without heart-attacks. Build and
interpret a 90% confidence interval about the mean
difference in cholesterol levels between the 2 populations.

Step 1a:         Are the two samples independent?                                    

 Independent data sets: 2 different populations of patients.
One with heart attacks, the other without heart attacks.
1pt

Step 1b)         Are the samples drawn from normal populations?

Then: MTB > norm c1                                                     MTB > norm c2

Probability Plot of 2 day                                             Probability Plot of control
Normal                                                                 Normal

99                                                                       99
Mean      253.9                                                               Mean      193.1
95                                           StDev     47.71             95                                                StDev     22.30
90                                                                       90
N            28                                                               N            30
80                                                                       80
60
Percent

Percent

P-Value   0.259             60                                                P-Value   0.099
50                                                                       50
40                                                                       40
30                                                                       30
20                                                                       20
10                                                                       10
5                                                                        5

1                                                                        1
150     200      250        300   350                                     150   175      200             225    250
2 day                                                                control

Ex.2) Preliminary Checks:                                          Both data sets are normally 2pts
Minitab answers with normality test                                 distributed because:
p-values which we incorporate into                                        p1               p2  
a brief answer using correct notation:                                0.259 > 0.01        0.099 > 0.01

Step 2: Declare the Parameters of Interest:

            1  Mean cholesterol level in mg/dL for heart-attack victims.                             2pts
 2  Mean cholesterol level in mg/dL for controls.
Biom 109:                Lab 8                                                                  383
Comparison of 2 Means: Independent vs Dependent Samples

Step 3: State the Hypothesis and the LOS:                      H 0 : 1   2       3pts
H A : 1   2

  0.10

Steps 4-6: Use Minitab to run a hypothesis test and calculate the confidence interval

      MTB > twos 90 c4 c5;                   2pts                Note that only the lower bound,
not a confidence interval was not
SUBC> Alte 1.                                            reported in the conclusion because
Two-Sample T-Test and CI: Heart Attack, Controls                this was a one tailed test. Only 2
tailed hypothesis tests will report
N    Mean     StDev    SE Mean                on confidence intervals.
Heart Attack     28   253.9      47.7        9.0
Controls         30   193.1      22.3        4.1

Difference = mu (Heart Attack) - mu (Controls)                  The p-value can never be zero, it is
Estimate for difference: 60.7952                                very small and rounded to zero at
90% lower bound for difference: 48.5037                         the 3rd decimal.
T-Test of difference = 0 (vs >):
T-Value = 6.15 P-Value = 0.000 DF = 37                          We must conclude that p < 0.001

Step 7: Conclusion.

      Stat: At the 10% LOS we                   t Sample  6.15      p               3pts
reject H 0 : 1   2 because:                           (p <0.001) < 0.10

      English:                                                                        1pt
Heart attack patients have a statistically greater cholesterol level than that of controls.

Find a 90% confidence interval
on the mean cholesterol difference between the 2 populations. Interpret the interval.

        “The 90% confidence interval in        MTB > twos 90 c4 c5
the difference between the cholesterol
Two-Sample T-Test and CI: Heart Attack, Controls
levels of heart attack patients and                              N    Mean   StDev    SE Mean
controls is (44.1047, 77.4857)mg/dL.            Heart Attack    28   253.9    47.7        9.0
Controls        30   193.1    22.3        4.1
The cholesterol level of heart attack
patients is higher than that of the             Difference = mu (Heart Attack) - mu
controls by at least 44.1 mg/dL but not         (Controls)
Estimate for difference: 60.7952
more than 77.5 mg/dL.”         3pts             90% CI for difference: (44.1047, 77.4857)
T-Test of difference = 0 (vs not =):
T-Value = 6.15 P-Value = 0.000 DF = 37
Biom 109:                Lab 8                                                                                      384
Comparison of 2 Means: Independent vs Dependent Samples

Dependent Samples: Recognize the wording for a paired data set.

Ex 3. An experiment was designed to estimate the mean difference in weight gain in pounds for pigs
fed ration A as compared to those fed ration B. Eight pairs of pigs were used. The pigs within each pair
were littermates. The rations were assigned at random to the two animals within each pair. The gains (in
pounds) after 45 days are shown in the following table. Note the pairing of the littermates. The
designers of this experiment are coupling the littermates for their similarities in an attempt to reveal any
differences in weight gained between similar littermates consuming two different rations. We can treat this
as one population subjected to two measurements where we will focus on the differences found between
each pair of littermates. This is paired data.

Differences in Weight Gain: Ration A -Ration B
10.0                                                            ROW RationA RationB
7.5
======================
1       65       58
Difference in Pounds

5.0                                                               2       37       39
3       40       31
2.5
4       47       45
0.0                                                               5       49       47
6       65       55
-2.5
7       53       59
-5.0                                                               8       59       51

MTB> Let C3 = C1 – C2

For paired data we must create a third column of
differences for hypothesis testing. The difference for                                ROW RationA RationB (diff.)
our class will always be:                                                             ==============================
1       65       58      7
Difference = First data set – Second data set                                            2       37       39     -2
Perform a complete hypothesis test using Minitab                                         3       40       31      9
to determine if the pigs that are fed ration B gain less                                 4       47       45      2
weight than those that are fed ration A. Include a                                       5       49       47      2
6       65       55     10
95% confidence interval on the mean difference of
7       53       59     -6
gained weight.
8       59       51      8

Step 1: Preliminary Checks:                                                       Is the data set normally distributed? 2pts
 Are the data sets independent?                                1pt
The column of differences are normally
No, the littermates are paired, this is                        distributed because: p > 
Dependent data.                                                                 ( p = 0.40) > 

Step 2: Declare the parameter: 2pts

 d = Mean difference in weight gained in pounds between pig littermates
consuming Ration A – Ration B.
Biom 109:                Lab 8                                                              385
Comparison of 2 Means: Independent vs Dependent Samples

 Step 3: State the Hypothesis         Caution! A common mistake is in choosing the wrong
and the LOS: 3pt                     inequality for the alternate hypothesis. Ask yourself is
this a Large – Small scenario or a Small – Large
H0: d = 0                      scenario? Consider this case: “determine if the pigs that
HA: d > 0                      are fed ration B gain less weight than those that are fed
ration A”
LOS = 0.05                       This             Ration A – Ration B
Scenario fits Large – Small > 0  for HA

Step 4-6: Calculate the test statistic 2pts

 One-Sample T: C3
MTB > ttest C3;          Test of mu = 0 vs > 0
SUBC> Alte 1.                                                          95% Lower
Variable   N      Mean     StDev   SE Mean       Bound    T         P
C3         8   3.75000   5.72588   2.02440    -0.08539 1.85     0.053

Step 7: State the conclusion with Statistics and English 4pts

Statistics: At the 5% LOS we do not reject H0: d = 0, because
t-sample = 1.85 p > 
                          p

English: There is not a significant difference in weight gained between littermates on either
ration.

Find a (1 - )100% CI on the mean difference of weight gained between littermates on Ration
A – Ration B. Write a summary statistical sentence, and then an English sentence for the CI.
MTB > tint 95 c3               One-Sample T: C3

Variable   N      Mean     StDev    SE Mean           95% CI
C3         8   3.75000   5.72588    2.02440    (-1.03696, 8.53696)

Stat: The 95% CI on the mean difference in weight gained between littermates on either
ration is (-1.03696, 8.53696) pounds. 2pt

English: There is no significant difference in weight gained between littermates fed on either
Ration. 2pt

Note: If the Confidence Interval does not bracket zero, there will be a significant difference and
the bounds of the CI must be used to explicitly state what that difference is. For example suppose
the 95% CI on d was (1.23, 3.04). The English conclusion would read: Pigs consuming Ration
A gained more weight than their littermates consuming Ration B by at least 1.23 pounds but not
more than 3.04 pounds.
Biom 109:                Lab 8                                                                386
Comparison of 2 Means: Independent vs Dependent Samples

Lab 8 Exercises.
Use Minitab to run an appropriate t-test on the given data sets. Keep in mind that you will
have to distinguish between dependent and independent scenarios in order to choose the
correct test. After each hypothesis write a (1 – )x100% CI. (The compliment of  ). The
confidence interval should be reported with a statistical statement and an English sentence
interpretation of the bounds of the interval. See the example problems if you need help.

Lab 8.1)          Data Columns C8 & C9:
Researchers studied the effect of fertilizer on radish growth as measured in sprout height in
inches. They randomly selected some radish seeds to serve as controls in Group1, while
Group 2 radish seeds were planted in planters to which fertilizer was added. Other conditions
were held constant between the two groups. Use an appropriate t-test to test the hypothesis
that that fertilizer enhances radish growth. Use   0.05 . Determine a 95% CI on the mean.

Lab 8.2)        Data Columns C11 & C12
A plant physiologist conducted an experiment to determine whether mechanical stress can
retard the growth of soybean plants. Young plants were randomly allocated to two groups of
13 plants each. Plants in group 1 were mechanically agitated by shaking for 20 minutes twice
daily, while plants in group 2 were not agitated. After 16 days of growth, the total stem
length in cm of each plant was measured, with the results given in the Minitab file. Use an
appropriate t-test to test the hypothesis that stress tends to retard plant growth. Use   0.01.
Determine a 99% CI on the mean.

Lab 8.3)        Data Columns C14 & C15
Androstenedine is a steroid that is thought by some athletes to increase strength. Researchers
investigated this claim by giving androstenedine to 10 men in Group 1 and a placebo to
another 10 men in Group 2. One of the variables measured in the experiment was the
increase in “lat pulldown” strength in pounds of each subject after 4 weeks. Use an
appropriate t-test to test the hypothesis that taking androstenedine improves an athletes
strength. Use   0.03 . Determine a 97% CI on the mean.

Lab 8.4)       Data Columns C17 & C18
Octane is a measure of how much fuel can be compressed before it spontaneously ignites.
Researchers randomly selected 11 individuals to drive their cars and record the difference in
mileage gained by each car run on 10 gallons of gas at 2 different octane ratings. 87 octane
mileages were assigned to Group 1 while 92 octane mileages were assigned to Group 2. Use
an appropriate t-test to test the hypothesis that higher octane levels in gas results in a higher
gas mileage. Use   0.01. Determine a 99% CI on the mean.

Lab 8.5)       Data Columns C20 & C21
An automotive researcher wanted to estimate the braking distance in feet while traveling at
40 miles per hour on wet versus dry pavement. The researcher used 8 different cars with the
same driver and tires. Wet braking distances were assigned to Group 1 while dry braking
distances were assigned to Group 2. Use an appropriate t-test to test the hypothesis that wet
pavement results in a longer braking distance. Use   0.20 . Determine an 80% CI on the
mean.

```
To top