Ex by gegeshandong

VIEWS: 23 PAGES: 8

									   Biom 109:                Lab 8                                                             379
   Comparison of 2 Means: Independent vs Dependent Samples

 Lab 8 Goals:
 Use Minitab to run hypothesis tests comparing 2 populations.
 Determine if the comparison between the 2 populations is
   dependent or independent.

Open the Lab 8 Minitab file at the following website:

www.humboldt.edu/~tsp1  Biom 109  Data files              Lab_8

Ex. 1)      Clones are genetically identical cells descended from the
same individual. Researchers have identified a single poplar clone
that yields fast-growing, hardy trees. These trees may one day serve
as an alternative to conventional fuel as an energy resource.
Researchers at The Pennsylvania State University planted Poplar
Clone 252 on two different sites--one, a rich site by a creek (Site 1),
and the other, a dry, sandy site on a ridge (Site 2). Let’s examine if
there is a significant difference in the mean tree diameter in inches
between two sites. Run a hypothesis test and build a confidence
interval. Interpret the result.


   Step 1: Preliminary testing of samples:
           Whether building a confidence interval or performing a hypothesis test, when faced
   with 2 samples the first step will be to determine independence and normality.

   1a)    Are the two samples independent?
   Consider whether the measurement of a sample from one group could possibly be interpreted
   as a measurement from the second group. If the data sets are paired in a “before versus after”
   or a “method A versus a method B” scenario, then we have a dependent data set and must
   apply a one sample t-test on the column of differences. If however the two populations are
   separate form one another; different populations measured in different locations, then we
   have independent data sets and must apply a two sample t-test.

    Independent data sets: 2 different populations of trees, measured at 2 different sites. 1pt

   1b)    Are the samples drawn from normal populations?
          We must address normality for any hypothesis involving a population mean.
          If we are given summarized data then we have no choice but to proclaim “We
          assume normality.” But if we are given raw data, as in this case, we use Minitab to
          determine whether our data sets are normally distributed.
                   Biom 109:                Lab 8                                                                                           380
                   Comparison of 2 Means: Independent vs Dependent Samples

                   Step 1: Preliminary testing of samples Continued:

                   Then: MTB > norm c1                                                MTB > norm c2
                           Probability Plot of Creek Site 1                                           Probability Plot of Sandy Site 2
                                         Normal                                                                     Normal
          99                                                                         99
                                                          Mean     2.598                                                               Mean      3.028
          95
                                                          StDev   0.9158             95
                                                                                                                                       StDev     1.284
          90
                                                          N           10             90
                                                                                                                                       N            10
          80                                                                         80
          70
                                                          AD       0.494             70
                                                                                                                                       AD        0.357
Percent




                                                                           Percent
          60                                              P-Value 0.164              60                                                P-Value   0.379
          50                                                                         50
          40                                                                         40
          30                                                                         30
          20                                                                         20
          10                                                                         10
          5                                                                          5


          1
                                                                                     1
               0       1        2       3         4   5                                   0       1      2      3      4     5   6
                               Creek Site 1                                                               Sandy Site 2




                   Minitab answers with normality test                                   Both data sets are normally       2pt
                   p-values which we incorporate into                                     distributed because:
                   a brief answer using correct notation:                                       p1               p2  
                                                                                            0.164 > 0.01        0.379 > 0.01

                   Step 2: Declare the Parameters of Interest:

                    1  Mean Poplar tree diameter in inches for trees grown at the creek site.                                     2pt
                      2  Mean Poplar tree diameter in inches for trees grown at the ridge site.

                                                                                                            H 0 : 1   2          3pts

                   Step 3: State the Hypothesis and the LOS:                                                 H A : 1   2

                                                                                                            0.05

                   Steps 4-6: Use Minitab to run a hypothesis test and calculate the confidence interval


                           MTB > twos 95 c1 c2;                                                                2pts
                            SUBC> Alte 0.
                   TWOSAMPLE T FOR Diameter
                   Site   N      MEAN     STDEV   SE MEAN
                   1     10     2.598     0.916      0.29
                   2     10      3.03       1.28     0.41
                   95 PCT CI FOR MU 1 - MU 2: ( -1.49, 0.63)

                   TTEST MU 1 = MU 2 (VS NE): T= -0.86                                        P=0.40            DF=        16
Biom 109:                Lab 8                                                                  381
Comparison of 2 Means: Independent vs Dependent Samples

Step 7: Conclusion.

      Stat: At the 5% LOS we do not           t Sample  0.86       p              3pts
       reject H 0 : 1   2 because:                         (p = 0.40) > 0.05

      English:                                                                       1pt
       There is not a statistically significant difference in the width of tree diameters.


Find a 95% confidence interval on the mean difference of the tree widths.
(From Minitab output)

      “The 95% confidence interval in the difference between the width of tree diameters of
       the Creek site- Sandy ridge site is (-1.49, 0.62) inches. There is not a significant
       difference in at either of the two sites.”                                   2pts

Interpreting the CI: Note that as long as the confidence interval straddles zero our
conclusion is brief. It is when the confidence interval does not contain zero that we must
specify which population is larger or smaller than the other and by how much.




Recall from lab 9 that the inequality sign    Minitab code:                   Interpretation:
used in the alternate hypothesis is
indicated with a sub-command of
                                              MTB > twos 95 c1 c2;
                                              SUBC> alternative -1.
                                                                               H 0 : 1   2
“alternative” with the number of -1, 0, or                                     H A : 1   2
1 as shown in the table at the right.

Note: Only the 2 tailed hypothesis will
                                              MTB > twos 95 c1 c2;
                                              SUBC> alte 0.
                                                                               H 0 : 1   2
yield a confidence interval.                                                   H A : 1   2
The one tailed hypothesis test will return
only an upper or a lower bound.

                                              MTB > twos 95 c1 c2;
                                                                               H 0 : 1   2
                                              SUBC> alte 1.                    H A : 1   2
                Biom 109:                Lab 8                                                                                           382
                Comparison of 2 Means: Independent vs Dependent Samples

                Ex. 2) Medical researchers recorded blood cholesterol levels
                in mg/dL of 28 heart-attack victims 2 days following the
                heart-attack. The levels of 30 individuals who had not had an
                attack were taken as a control. Test the hypothesis at the
                10% LOS that heart-attack victims suffer from a higher
                cholesterol count than those without heart-attacks. Build and
                interpret a 90% confidence interval about the mean
                difference in cholesterol levels between the 2 populations.

                Step 1a:         Are the two samples independent?                                    

                 Independent data sets: 2 different populations of patients.
                  One with heart attacks, the other without heart attacks.
                  1pt

                Step 1b)         Are the samples drawn from normal populations?

                Then: MTB > norm c1                                                     MTB > norm c2

                         Probability Plot of 2 day                                             Probability Plot of control
                                        Normal                                                                 Normal

          99                                                                       99
                                                       Mean      253.9                                                               Mean      193.1
          95                                           StDev     47.71             95                                                StDev     22.30
          90                                                                       90
                                                       N            28                                                               N            30
          80                                                                       80
          70                                           AD        0.448             70                                                AD        0.616
          60
Percent




                                                                         Percent




                                                       P-Value   0.259             60                                                P-Value   0.099
          50                                                                       50
          40                                                                       40
          30                                                                       30
          20                                                                       20
          10                                                                       10
           5                                                                        5

           1                                                                        1
               150     200      250        300   350                                     150   175      200             225    250
                                2 day                                                                control




                     Ex.2) Preliminary Checks:                                          Both data sets are normally 2pts
                     Minitab answers with normality test                                 distributed because:
                     p-values which we incorporate into                                        p1               p2  
                     a brief answer using correct notation:                                0.259 > 0.01        0.099 > 0.01

                Step 2: Declare the Parameters of Interest:

                            1  Mean cholesterol level in mg/dL for heart-attack victims.                             2pts
                              2  Mean cholesterol level in mg/dL for controls.
Biom 109:                Lab 8                                                                  383
Comparison of 2 Means: Independent vs Dependent Samples

Step 3: State the Hypothesis and the LOS:                      H 0 : 1   2       3pts
                                                                H A : 1   2

                                                                  0.10

Steps 4-6: Use Minitab to run a hypothesis test and calculate the confidence interval


      MTB > twos 90 c4 c5;                   2pts                Note that only the lower bound,
                                                                not a confidence interval was not
       SUBC> Alte 1.                                            reported in the conclusion because
Two-Sample T-Test and CI: Heart Attack, Controls                this was a one tailed test. Only 2
                                                                tailed hypothesis tests will report
                  N    Mean     StDev    SE Mean                on confidence intervals.
Heart Attack     28   253.9      47.7        9.0
Controls         30   193.1      22.3        4.1

Difference = mu (Heart Attack) - mu (Controls)                  The p-value can never be zero, it is
Estimate for difference: 60.7952                                very small and rounded to zero at
90% lower bound for difference: 48.5037                         the 3rd decimal.
T-Test of difference = 0 (vs >):
T-Value = 6.15 P-Value = 0.000 DF = 37                          We must conclude that p < 0.001


Step 7: Conclusion.

      Stat: At the 10% LOS we                   t Sample  6.15      p               3pts
       reject H 0 : 1   2 because:                           (p <0.001) < 0.10

      English:                                                                        1pt
       Heart attack patients have a statistically greater cholesterol level than that of controls.

Find a 90% confidence interval
on the mean cholesterol difference between the 2 populations. Interpret the interval.

        “The 90% confidence interval in        MTB > twos 90 c4 c5
the difference between the cholesterol
                                                Two-Sample T-Test and CI: Heart Attack, Controls
levels of heart attack patients and                              N    Mean   StDev    SE Mean
controls is (44.1047, 77.4857)mg/dL.            Heart Attack    28   253.9    47.7        9.0
                                                Controls        30   193.1    22.3        4.1
The cholesterol level of heart attack
patients is higher than that of the             Difference = mu (Heart Attack) - mu
controls by at least 44.1 mg/dL but not         (Controls)
                                                Estimate for difference: 60.7952
more than 77.5 mg/dL.”         3pts             90% CI for difference: (44.1047, 77.4857)
                                                T-Test of difference = 0 (vs not =):
                                                T-Value = 6.15 P-Value = 0.000 DF = 37
        Biom 109:                Lab 8                                                                                      384
        Comparison of 2 Means: Independent vs Dependent Samples

        Dependent Samples: Recognize the wording for a paired data set.

Ex 3. An experiment was designed to estimate the mean difference in weight gain in pounds for pigs
fed ration A as compared to those fed ration B. Eight pairs of pigs were used. The pigs within each pair
were littermates. The rations were assigned at random to the two animals within each pair. The gains (in
pounds) after 45 days are shown in the following table. Note the pairing of the littermates. The
designers of this experiment are coupling the littermates for their similarities in an attempt to reveal any
differences in weight gained between similar littermates consuming two different rations. We can treat this
as one population subjected to two measurements where we will focus on the differences found between
each pair of littermates. This is paired data.

                                  Differences in Weight Gain: Ration A -Ration B
                                10.0                                                            ROW RationA RationB
                                 7.5
                                                                                                ======================
                                                                                                   1       65       58
         Difference in Pounds




                                 5.0                                                               2       37       39
                                                                                                   3       40       31
                                 2.5
                                                                                                   4       47       45
                                 0.0                                                               5       49       47
                                                                                                   6       65       55
                                -2.5
                                                                                                   7       53       59
                                -5.0                                                               8       59       51

                                                                                               MTB> Let C3 = C1 – C2

 For paired data we must create a third column of
 differences for hypothesis testing. The difference for                                ROW RationA RationB (diff.)
 our class will always be:                                                             ==============================
                                                                                          1       65       58      7
 Difference = First data set – Second data set                                            2       37       39     -2
 Perform a complete hypothesis test using Minitab                                         3       40       31      9
 to determine if the pigs that are fed ration B gain less                                 4       47       45      2
 weight than those that are fed ration A. Include a                                       5       49       47      2
                                                                                          6       65       55     10
 95% confidence interval on the mean difference of
                                                                                          7       53       59     -6
 gained weight.
                                                                                          8       59       51      8

  Step 1: Preliminary Checks:                                                       Is the data set normally distributed? 2pts
     Are the data sets independent?                                1pt
                                                                              The column of differences are normally
               No, the littermates are paired, this is                        distributed because: p > 
               Dependent data.                                                                 ( p = 0.40) > 

        Step 2: Declare the parameter: 2pts

                                         d = Mean difference in weight gained in pounds between pig littermates
                                          consuming Ration A – Ration B.
    Biom 109:                Lab 8                                                              385
    Comparison of 2 Means: Independent vs Dependent Samples


      Step 3: State the Hypothesis         Caution! A common mistake is in choosing the wrong
       and the LOS: 3pt                     inequality for the alternate hypothesis. Ask yourself is
                                            this a Large – Small scenario or a Small – Large
            H0: d = 0                      scenario? Consider this case: “determine if the pigs that
            HA: d > 0                      are fed ration B gain less weight than those that are fed
                                            ration A”
           LOS = 0.05                       This             Ration A – Ration B
                                             Scenario fits Large – Small > 0  for HA

    Step 4-6: Calculate the test statistic 2pts

                                 One-Sample T: C3
    MTB > ttest C3;          Test of mu = 0 vs > 0
    SUBC> Alte 1.                                                          95% Lower
                             Variable   N      Mean     StDev   SE Mean       Bound    T         P
                             C3         8   3.75000   5.72588   2.02440    -0.08539 1.85     0.053


    Step 7: State the conclusion with Statistics and English 4pts

    Statistics: At the 5% LOS we do not reject H0: d = 0, because
                   t-sample = 1.85 p > 
                              p

    English: There is not a significant difference in weight gained between littermates on either
             ration.

  Find a (1 - )100% CI on the mean difference of weight gained between littermates on Ration
  A – Ration B. Write a summary statistical sentence, and then an English sentence for the CI.
    MTB > tint 95 c3               One-Sample T: C3

    Variable   N      Mean     StDev    SE Mean           95% CI
    C3         8   3.75000   5.72588    2.02440    (-1.03696, 8.53696)


    Stat: The 95% CI on the mean difference in weight gained between littermates on either
           ration is (-1.03696, 8.53696) pounds. 2pt

    English: There is no significant difference in weight gained between littermates fed on either
           Ration. 2pt


Note: If the Confidence Interval does not bracket zero, there will be a significant difference and
the bounds of the CI must be used to explicitly state what that difference is. For example suppose
the 95% CI on d was (1.23, 3.04). The English conclusion would read: Pigs consuming Ration
A gained more weight than their littermates consuming Ration B by at least 1.23 pounds but not
more than 3.04 pounds.
Biom 109:                Lab 8                                                                386
Comparison of 2 Means: Independent vs Dependent Samples

Lab 8 Exercises.
Use Minitab to run an appropriate t-test on the given data sets. Keep in mind that you will
have to distinguish between dependent and independent scenarios in order to choose the
correct test. After each hypothesis write a (1 – )x100% CI. (The compliment of  ). The
confidence interval should be reported with a statistical statement and an English sentence
interpretation of the bounds of the interval. See the example problems if you need help.


Lab 8.1)          Data Columns C8 & C9:
Researchers studied the effect of fertilizer on radish growth as measured in sprout height in
inches. They randomly selected some radish seeds to serve as controls in Group1, while
Group 2 radish seeds were planted in planters to which fertilizer was added. Other conditions
were held constant between the two groups. Use an appropriate t-test to test the hypothesis
that that fertilizer enhances radish growth. Use   0.05 . Determine a 95% CI on the mean.

Lab 8.2)        Data Columns C11 & C12
A plant physiologist conducted an experiment to determine whether mechanical stress can
retard the growth of soybean plants. Young plants were randomly allocated to two groups of
13 plants each. Plants in group 1 were mechanically agitated by shaking for 20 minutes twice
daily, while plants in group 2 were not agitated. After 16 days of growth, the total stem
length in cm of each plant was measured, with the results given in the Minitab file. Use an
appropriate t-test to test the hypothesis that stress tends to retard plant growth. Use   0.01.
Determine a 99% CI on the mean.

Lab 8.3)        Data Columns C14 & C15
Androstenedine is a steroid that is thought by some athletes to increase strength. Researchers
investigated this claim by giving androstenedine to 10 men in Group 1 and a placebo to
another 10 men in Group 2. One of the variables measured in the experiment was the
increase in “lat pulldown” strength in pounds of each subject after 4 weeks. Use an
appropriate t-test to test the hypothesis that taking androstenedine improves an athletes
strength. Use   0.03 . Determine a 97% CI on the mean.

Lab 8.4)       Data Columns C17 & C18
Octane is a measure of how much fuel can be compressed before it spontaneously ignites.
Researchers randomly selected 11 individuals to drive their cars and record the difference in
mileage gained by each car run on 10 gallons of gas at 2 different octane ratings. 87 octane
mileages were assigned to Group 1 while 92 octane mileages were assigned to Group 2. Use
an appropriate t-test to test the hypothesis that higher octane levels in gas results in a higher
gas mileage. Use   0.01. Determine a 99% CI on the mean.

Lab 8.5)       Data Columns C20 & C21
An automotive researcher wanted to estimate the braking distance in feet while traveling at
40 miles per hour on wet versus dry pavement. The researcher used 8 different cars with the
same driver and tires. Wet braking distances were assigned to Group 1 while dry braking
distances were assigned to Group 2. Use an appropriate t-test to test the hypothesis that wet
pavement results in a longer braking distance. Use   0.20 . Determine an 80% CI on the
mean.

								
To top