VIEWS: 23 PAGES: 8 POSTED ON: 2/26/2012
Biom 109: Lab 8 379 Comparison of 2 Means: Independent vs Dependent Samples Lab 8 Goals: Use Minitab to run hypothesis tests comparing 2 populations. Determine if the comparison between the 2 populations is dependent or independent. Open the Lab 8 Minitab file at the following website: www.humboldt.edu/~tsp1 Biom 109 Data files Lab_8 Ex. 1) Clones are genetically identical cells descended from the same individual. Researchers have identified a single poplar clone that yields fast-growing, hardy trees. These trees may one day serve as an alternative to conventional fuel as an energy resource. Researchers at The Pennsylvania State University planted Poplar Clone 252 on two different sites--one, a rich site by a creek (Site 1), and the other, a dry, sandy site on a ridge (Site 2). Let’s examine if there is a significant difference in the mean tree diameter in inches between two sites. Run a hypothesis test and build a confidence interval. Interpret the result. Step 1: Preliminary testing of samples: Whether building a confidence interval or performing a hypothesis test, when faced with 2 samples the first step will be to determine independence and normality. 1a) Are the two samples independent? Consider whether the measurement of a sample from one group could possibly be interpreted as a measurement from the second group. If the data sets are paired in a “before versus after” or a “method A versus a method B” scenario, then we have a dependent data set and must apply a one sample t-test on the column of differences. If however the two populations are separate form one another; different populations measured in different locations, then we have independent data sets and must apply a two sample t-test. Independent data sets: 2 different populations of trees, measured at 2 different sites. 1pt 1b) Are the samples drawn from normal populations? We must address normality for any hypothesis involving a population mean. If we are given summarized data then we have no choice but to proclaim “We assume normality.” But if we are given raw data, as in this case, we use Minitab to determine whether our data sets are normally distributed. Biom 109: Lab 8 380 Comparison of 2 Means: Independent vs Dependent Samples Step 1: Preliminary testing of samples Continued: Then: MTB > norm c1 MTB > norm c2 Probability Plot of Creek Site 1 Probability Plot of Sandy Site 2 Normal Normal 99 99 Mean 2.598 Mean 3.028 95 StDev 0.9158 95 StDev 1.284 90 N 10 90 N 10 80 80 70 AD 0.494 70 AD 0.357 Percent Percent 60 P-Value 0.164 60 P-Value 0.379 50 50 40 40 30 30 20 20 10 10 5 5 1 1 0 1 2 3 4 5 0 1 2 3 4 5 6 Creek Site 1 Sandy Site 2 Minitab answers with normality test Both data sets are normally 2pt p-values which we incorporate into distributed because: a brief answer using correct notation: p1 p2 0.164 > 0.01 0.379 > 0.01 Step 2: Declare the Parameters of Interest: 1 Mean Poplar tree diameter in inches for trees grown at the creek site. 2pt 2 Mean Poplar tree diameter in inches for trees grown at the ridge site. H 0 : 1 2 3pts Step 3: State the Hypothesis and the LOS: H A : 1 2 0.05 Steps 4-6: Use Minitab to run a hypothesis test and calculate the confidence interval MTB > twos 95 c1 c2; 2pts SUBC> Alte 0. TWOSAMPLE T FOR Diameter Site N MEAN STDEV SE MEAN 1 10 2.598 0.916 0.29 2 10 3.03 1.28 0.41 95 PCT CI FOR MU 1 - MU 2: ( -1.49, 0.63) TTEST MU 1 = MU 2 (VS NE): T= -0.86 P=0.40 DF= 16 Biom 109: Lab 8 381 Comparison of 2 Means: Independent vs Dependent Samples Step 7: Conclusion. Stat: At the 5% LOS we do not t Sample 0.86 p 3pts reject H 0 : 1 2 because: (p = 0.40) > 0.05 English: 1pt There is not a statistically significant difference in the width of tree diameters. Find a 95% confidence interval on the mean difference of the tree widths. (From Minitab output) “The 95% confidence interval in the difference between the width of tree diameters of the Creek site- Sandy ridge site is (-1.49, 0.62) inches. There is not a significant difference in at either of the two sites.” 2pts Interpreting the CI: Note that as long as the confidence interval straddles zero our conclusion is brief. It is when the confidence interval does not contain zero that we must specify which population is larger or smaller than the other and by how much. Recall from lab 9 that the inequality sign Minitab code: Interpretation: used in the alternate hypothesis is indicated with a sub-command of MTB > twos 95 c1 c2; SUBC> alternative -1. H 0 : 1 2 “alternative” with the number of -1, 0, or H A : 1 2 1 as shown in the table at the right. Note: Only the 2 tailed hypothesis will MTB > twos 95 c1 c2; SUBC> alte 0. H 0 : 1 2 yield a confidence interval. H A : 1 2 The one tailed hypothesis test will return only an upper or a lower bound. MTB > twos 95 c1 c2; H 0 : 1 2 SUBC> alte 1. H A : 1 2 Biom 109: Lab 8 382 Comparison of 2 Means: Independent vs Dependent Samples Ex. 2) Medical researchers recorded blood cholesterol levels in mg/dL of 28 heart-attack victims 2 days following the heart-attack. The levels of 30 individuals who had not had an attack were taken as a control. Test the hypothesis at the 10% LOS that heart-attack victims suffer from a higher cholesterol count than those without heart-attacks. Build and interpret a 90% confidence interval about the mean difference in cholesterol levels between the 2 populations. Step 1a: Are the two samples independent? Independent data sets: 2 different populations of patients. One with heart attacks, the other without heart attacks. 1pt Step 1b) Are the samples drawn from normal populations? Then: MTB > norm c1 MTB > norm c2 Probability Plot of 2 day Probability Plot of control Normal Normal 99 99 Mean 253.9 Mean 193.1 95 StDev 47.71 95 StDev 22.30 90 90 N 28 N 30 80 80 70 AD 0.448 70 AD 0.616 60 Percent Percent P-Value 0.259 60 P-Value 0.099 50 50 40 40 30 30 20 20 10 10 5 5 1 1 150 200 250 300 350 150 175 200 225 250 2 day control Ex.2) Preliminary Checks: Both data sets are normally 2pts Minitab answers with normality test distributed because: p-values which we incorporate into p1 p2 a brief answer using correct notation: 0.259 > 0.01 0.099 > 0.01 Step 2: Declare the Parameters of Interest: 1 Mean cholesterol level in mg/dL for heart-attack victims. 2pts 2 Mean cholesterol level in mg/dL for controls. Biom 109: Lab 8 383 Comparison of 2 Means: Independent vs Dependent Samples Step 3: State the Hypothesis and the LOS: H 0 : 1 2 3pts H A : 1 2 0.10 Steps 4-6: Use Minitab to run a hypothesis test and calculate the confidence interval MTB > twos 90 c4 c5; 2pts Note that only the lower bound, not a confidence interval was not SUBC> Alte 1. reported in the conclusion because Two-Sample T-Test and CI: Heart Attack, Controls this was a one tailed test. Only 2 tailed hypothesis tests will report N Mean StDev SE Mean on confidence intervals. Heart Attack 28 253.9 47.7 9.0 Controls 30 193.1 22.3 4.1 Difference = mu (Heart Attack) - mu (Controls) The p-value can never be zero, it is Estimate for difference: 60.7952 very small and rounded to zero at 90% lower bound for difference: 48.5037 the 3rd decimal. T-Test of difference = 0 (vs >): T-Value = 6.15 P-Value = 0.000 DF = 37 We must conclude that p < 0.001 Step 7: Conclusion. Stat: At the 10% LOS we t Sample 6.15 p 3pts reject H 0 : 1 2 because: (p <0.001) < 0.10 English: 1pt Heart attack patients have a statistically greater cholesterol level than that of controls. Find a 90% confidence interval on the mean cholesterol difference between the 2 populations. Interpret the interval. “The 90% confidence interval in MTB > twos 90 c4 c5 the difference between the cholesterol Two-Sample T-Test and CI: Heart Attack, Controls levels of heart attack patients and N Mean StDev SE Mean controls is (44.1047, 77.4857)mg/dL. Heart Attack 28 253.9 47.7 9.0 Controls 30 193.1 22.3 4.1 The cholesterol level of heart attack patients is higher than that of the Difference = mu (Heart Attack) - mu controls by at least 44.1 mg/dL but not (Controls) Estimate for difference: 60.7952 more than 77.5 mg/dL.” 3pts 90% CI for difference: (44.1047, 77.4857) T-Test of difference = 0 (vs not =): T-Value = 6.15 P-Value = 0.000 DF = 37 Biom 109: Lab 8 384 Comparison of 2 Means: Independent vs Dependent Samples Dependent Samples: Recognize the wording for a paired data set. Ex 3. An experiment was designed to estimate the mean difference in weight gain in pounds for pigs fed ration A as compared to those fed ration B. Eight pairs of pigs were used. The pigs within each pair were littermates. The rations were assigned at random to the two animals within each pair. The gains (in pounds) after 45 days are shown in the following table. Note the pairing of the littermates. The designers of this experiment are coupling the littermates for their similarities in an attempt to reveal any differences in weight gained between similar littermates consuming two different rations. We can treat this as one population subjected to two measurements where we will focus on the differences found between each pair of littermates. This is paired data. Differences in Weight Gain: Ration A -Ration B 10.0 ROW RationA RationB 7.5 ====================== 1 65 58 Difference in Pounds 5.0 2 37 39 3 40 31 2.5 4 47 45 0.0 5 49 47 6 65 55 -2.5 7 53 59 -5.0 8 59 51 MTB> Let C3 = C1 – C2 For paired data we must create a third column of differences for hypothesis testing. The difference for ROW RationA RationB (diff.) our class will always be: ============================== 1 65 58 7 Difference = First data set – Second data set 2 37 39 -2 Perform a complete hypothesis test using Minitab 3 40 31 9 to determine if the pigs that are fed ration B gain less 4 47 45 2 weight than those that are fed ration A. Include a 5 49 47 2 6 65 55 10 95% confidence interval on the mean difference of 7 53 59 -6 gained weight. 8 59 51 8 Step 1: Preliminary Checks: Is the data set normally distributed? 2pts Are the data sets independent? 1pt The column of differences are normally No, the littermates are paired, this is distributed because: p > Dependent data. ( p = 0.40) > Step 2: Declare the parameter: 2pts d = Mean difference in weight gained in pounds between pig littermates consuming Ration A – Ration B. Biom 109: Lab 8 385 Comparison of 2 Means: Independent vs Dependent Samples Step 3: State the Hypothesis Caution! A common mistake is in choosing the wrong and the LOS: 3pt inequality for the alternate hypothesis. Ask yourself is this a Large – Small scenario or a Small – Large H0: d = 0 scenario? Consider this case: “determine if the pigs that HA: d > 0 are fed ration B gain less weight than those that are fed ration A” LOS = 0.05 This Ration A – Ration B Scenario fits Large – Small > 0 for HA Step 4-6: Calculate the test statistic 2pts One-Sample T: C3 MTB > ttest C3; Test of mu = 0 vs > 0 SUBC> Alte 1. 95% Lower Variable N Mean StDev SE Mean Bound T P C3 8 3.75000 5.72588 2.02440 -0.08539 1.85 0.053 Step 7: State the conclusion with Statistics and English 4pts Statistics: At the 5% LOS we do not reject H0: d = 0, because t-sample = 1.85 p > p English: There is not a significant difference in weight gained between littermates on either ration. Find a (1 - )100% CI on the mean difference of weight gained between littermates on Ration A – Ration B. Write a summary statistical sentence, and then an English sentence for the CI. MTB > tint 95 c3 One-Sample T: C3 Variable N Mean StDev SE Mean 95% CI C3 8 3.75000 5.72588 2.02440 (-1.03696, 8.53696) Stat: The 95% CI on the mean difference in weight gained between littermates on either ration is (-1.03696, 8.53696) pounds. 2pt English: There is no significant difference in weight gained between littermates fed on either Ration. 2pt Note: If the Confidence Interval does not bracket zero, there will be a significant difference and the bounds of the CI must be used to explicitly state what that difference is. For example suppose the 95% CI on d was (1.23, 3.04). The English conclusion would read: Pigs consuming Ration A gained more weight than their littermates consuming Ration B by at least 1.23 pounds but not more than 3.04 pounds. Biom 109: Lab 8 386 Comparison of 2 Means: Independent vs Dependent Samples Lab 8 Exercises. Use Minitab to run an appropriate t-test on the given data sets. Keep in mind that you will have to distinguish between dependent and independent scenarios in order to choose the correct test. After each hypothesis write a (1 – )x100% CI. (The compliment of ). The confidence interval should be reported with a statistical statement and an English sentence interpretation of the bounds of the interval. See the example problems if you need help. Lab 8.1) Data Columns C8 & C9: Researchers studied the effect of fertilizer on radish growth as measured in sprout height in inches. They randomly selected some radish seeds to serve as controls in Group1, while Group 2 radish seeds were planted in planters to which fertilizer was added. Other conditions were held constant between the two groups. Use an appropriate t-test to test the hypothesis that that fertilizer enhances radish growth. Use 0.05 . Determine a 95% CI on the mean. Lab 8.2) Data Columns C11 & C12 A plant physiologist conducted an experiment to determine whether mechanical stress can retard the growth of soybean plants. Young plants were randomly allocated to two groups of 13 plants each. Plants in group 1 were mechanically agitated by shaking for 20 minutes twice daily, while plants in group 2 were not agitated. After 16 days of growth, the total stem length in cm of each plant was measured, with the results given in the Minitab file. Use an appropriate t-test to test the hypothesis that stress tends to retard plant growth. Use 0.01. Determine a 99% CI on the mean. Lab 8.3) Data Columns C14 & C15 Androstenedine is a steroid that is thought by some athletes to increase strength. Researchers investigated this claim by giving androstenedine to 10 men in Group 1 and a placebo to another 10 men in Group 2. One of the variables measured in the experiment was the increase in “lat pulldown” strength in pounds of each subject after 4 weeks. Use an appropriate t-test to test the hypothesis that taking androstenedine improves an athletes strength. Use 0.03 . Determine a 97% CI on the mean. Lab 8.4) Data Columns C17 & C18 Octane is a measure of how much fuel can be compressed before it spontaneously ignites. Researchers randomly selected 11 individuals to drive their cars and record the difference in mileage gained by each car run on 10 gallons of gas at 2 different octane ratings. 87 octane mileages were assigned to Group 1 while 92 octane mileages were assigned to Group 2. Use an appropriate t-test to test the hypothesis that higher octane levels in gas results in a higher gas mileage. Use 0.01. Determine a 99% CI on the mean. Lab 8.5) Data Columns C20 & C21 An automotive researcher wanted to estimate the braking distance in feet while traveling at 40 miles per hour on wet versus dry pavement. The researcher used 8 different cars with the same driver and tires. Wet braking distances were assigned to Group 1 while dry braking distances were assigned to Group 2. Use an appropriate t-test to test the hypothesis that wet pavement results in a longer braking distance. Use 0.20 . Determine an 80% CI on the mean.