STAT 601 – Assignment #4 (Due 07/18/11) (80 pts.)
Review the following materials:
1) Narrated Powerpoint Lectures
Comparing Two Populations with a Numeric Response
a) independent samples
b) dependent samples
Comparing Two Populations with a Nominal/Categorical Response
a) independent samples (z-test, Fisher's Exact test, CI’s for RR & OR)
b) dependent samples (FYI ONLY – NOT ON ASSIGNMENT)
2) Non-narrated Powerpoint Lectures (same as above)
3) Lecture Handouts
9 – Statistical Inference sections 9.5 – 9.8 only.
4) JMP tutorials below by the Assignment 4 link.
JMP Demo - Two Sample Pooled t-Test
JMP Demo - Two Sample Non-pooled t-Test
JMP Demo - Paired t-Test
RESEARCH ARTICLE REVIEW PROBLEMS
1) Use Table VI in the paper entitled “Perceived Coping MI” answer the
a) Were the groups in this study independent or dependent? Provide a
rationale for your answer.
b) Examine the t-ratios (i.e. t-statistics) in Table VI. Which t-ratio
indicates the largest difference between the males and females post
MI in this study? Is this ratio significant? Provide a rationale for your
c) What is a Type I error? Is there a risk of Type I error in this study?
Provide a rationale for your answer.
d) The authors reported multiple df (degrees of freedom) values in Table
VI. Why were different df values reported for this study?
e) What does the t-value for the Physical Component Score tell you
about men and women post MI? If this result was consistent with
previous research, how might you use this knowledge in your
2) Using the article labeled “Transfer Anxiety MI” answer the following
a) The baseline anxiety and information scores were not significantly
different between the experimental and comparison groups. What
does this mean? Does this strengthen or weaken the results of the
study? Provide a rationale for your answer.
b) The results indicated a statistically significant difference between the
anxiety score for the two groups of patients (t=3.875, p<.0001). What
do these results mean?
c) The results indiciated a statistically significant difference between the
anxiety scores for the two groups of patients (t = 3.875, p < .0001).
Are these results also significant at the .01 level? Provide a rationale
for your answer.
d) How might you use these study findings in practice?
3) Using the article titled “Effect of Health Promo CV Risk” answer the
a) What are the two groups whose results are reflected by the t-ratios in
Tables 2 & 3?
b) Which t-ratio in Table 2 represents the greatest relative or
standardized difference between the pretest and 3 months outcomes?
Is this t-ratio statistically significant? Provide a rationale for your
c) Which t-ratio in Table 3 represents the smallest relative difference
between the pretest and 3 months? Is this t-ratio statistically
significant? What does this result mean?
d) Compare the 3 months and 6 months t-ratios for the variable Exercise
from Table 3. What is your conclusion about the long-term effect of
the health-promotion intervention on Exercise in this study?
f) Why are the largest t-ratios more likely to be statistically significant?
g) Did the health-promotion program have a statistical effect on Systolic
blood pressure (BP) in this study? Provide a rationale for your
Problems 1 – 4 deal with comparing two population means using
independent and dependent samples.
1 - Preeclampsia and Gestational Age (this example is worked through in the
narrated Powerpoint, see if you can reproduce the results)
The goal of study conducted by Baker et. al. was to determine whether medical
deformation alters in vitro effects of plasma from patients with preeclampsia on
endothelial cell function to produce a paradigm similar to in vivo disease state. Subjects
were 24 nulliparous pregnant women before delivery, of whom 12 had preeclampsia and
12 were normal pregnant patients. The patients were independently sampled from these
populations and were not matched according to any criteria. Among the data collected
were the gestational ages (in weeks) at delivery.
Research Question: Is there evidence to suggest that the mean gestational age at delivery
for mothers with preeclampsia is lower than that for mothers with a normal pregnancy?
Use JMP to analyze these data. You can enter the data in JMP yourself. You will need
two columns, one to denote the group and the other to contain the response, in this case
gestational age at birth. Be sure to check assumptions and perform your analysis
Preeclampsia: 38, 32, 42, 30, 38, 35, 32, 38, 39, 29, 29, 32
Normal: 40, 41, 38, 40, 40, 39, 39, 41, 41, 40, 40, 40
a) Perform a hypothesis test answer the question of interest and summarize your findings.
b) Find and report the 95% CI for the difference in the population means from the JMP
output. Discuss this interval in practical terms. (2 pts.)
2 - DHEAS Levels in Asthmatics
Data File: Asthma.JMP
In a study to explore the possibility of hormonal alteration in asthma, Weinstein et al.
collected data on 22 postmenopausal women with asthma and 22 age-matched post
menopausal, women without asthma.
Perform the appropriate analysis of these data to answer the following research question:
Is there evidence to suggest that postmenopausal women with asthma have significantly
higher levels of dehydroepiandrosterone sulfate (DHEAS)?
a) Do you think that age-matching to create pairing is valid? Why or why not? (2 pts.)
b) Use an appropriate test and supporting CI to answer the research question. Summarize
your findings. (5 pts.)
3 – Comparisons of the Mean Infant Birth Weight for Different
Populations of Mothers Data File: NCBirth.JMP
In this problem you will use comparative methods to compare the actual mean birth
weights of different populations of mothers. The results of your comparisons will be
contained in the table below. For each situation be sure to check assumptions and briefly
summarize your findings in that regard.
Use appropriate statistical methods to make comparisons of mean birth weight across the
two populations defined by the variables below:
Sex of child (1 = male, 2 = female)
Marital status (1 = married, 2 = not married)
White? (Non-white vs. White)
Hispanic? (Hisp vs. non-Hisp)
Smokers vs. non-smokers
a) Use both hypothesis tests and confidence intervals to compare the mean birth weights
of the infants born to the two populations defined by the factors above. To organize your
results enter them into the table below. For the p-value and CI columns you will need to
enter the p-value from the appropriate test for comparing the two population means for
each factor and the confidence interval for the difference in those population means, thus
for each factor you will only have one p-value and confidence interval. Report the
sample size, sample mean, and sample standard deviation (SD) for each level of the
factor. (15 pts.)
Factor Sample Sample SD p-value CI for Difference in
Size (n) Mean Population Means
Sex of Child (-100, 20)
b) Briefly comment on the assumptions required for the analyses you conducted in
completing the table. Are the assumptions satisfied for each factor? (2 pts.)
c) Summarize your findings from part (a) in a clearly written paragraph, citing p-values
and confidence intervals as needed. (5 pts.)
4 - Middle Ear Effusion in Breast-Fed and Bottle-Fed Infants
A common symptom of otitus media in young children in the prolonged presence of
fluid in the middle ear, known a middle-ear effusion. The presence of fluid may result in
termporary hearing loss and interfere with normal learning skills in the first two years of
life. One hypothesis is that babies who are breast-fed for at least 1 month build up some
immunity against the effects of the disease and have less prolonged effusion than do
bottle-fed babies. A small study of 24 pairs of babies is set up, where the babies are
matched on a one-to-one basis according to age, sex, socioeconomic status, and type of
medications taken. One member of the matched pair is a breast-fed baby, and other
member is a bottle fed baby. The outcome variable is the duration of middle-ear effusion
after the first episode of otitus media. The results are shown below.
Pair Duration of effusion in breast- Duration of effusion in bottle-
Number fed baby fed baby Difference
1 20 18
2 11 35
3 3 7
4 24 182
5 7 7
6 28 33
7 58 223
8 39 57
9 17 76
10 17 186
11 12 29
12 52 39
13 14 15
14 12 21
15 30 28
16 7 8
17 15 27
18 65 77
19 10 12
20 7 8
21 19 16
22 34 28
23 25 20
Do these data provide evidence that breast-fed babies have shorter durations of effusion
when compared to bottle-fed babies that are the same age, sex, socioeconomic status, and
on the same medications? Enter these data into JMP and conduct the appropriate
analysis. (6 pts.)
PROBLEMS 5 & 6 DEAL WITH COMPARING PROPORTIONS and
INFERENCE FOR ODDS RATIOS (OR) AND RELATIVE RISKS
5 - Prostate-Specific Antigen (PSA) Levels and Cancer Diagnosis
Babaian et al. "The Role of Prostate-Specific Antigen as Part of the Diagnostic Triad and
as a Guide When to Perform a Biopsy", Cancer, 68, (1991) state that prostate-specific
antigen (PSA), found in the ductal epithelial cells of the prostate, is specific for prostatic
tissue and is detectable in serum from men with normal prostates and men with either
benign or malignant diseases of this gland. They determined the PSA values in sample of
124 men who underwent a prostate biopsy. Sixty-seven of the men had elevated PSA
values (> 4 ng/ml). Of these, 46 were diagnosed as having cancer. Ten of the 57 men
with PSA values < 4 ng/ml had cancer. On the basis of these data may we conclude that,
in general, men with elevated PSA values are more likely to have prostate cancer? Let
Use inferential based upon the standard normal and Fisher's Exact Test to test the
hypothesis of interest. Data File: PSA-Cancer.JMP
a) Standard normal test and CI for the difference in the population proportions. (4 pts.)
b) Fisher's Exact Test (2 pts.)
c) Summarize your findings. (2 pts.)
6 – HIV Status & IV Drug Use History of Women in NY Prison System
In a study of HIV infection among women entering the New York State prison system,
475 inmates were cross-classified with respect to HIV seropositivity and their histories of
intravenous drug use. The variables you will be working with are coded as follows:
• IV Drug Use – indicator of previous intravenous drug use (Yes or No)
• HIV Status – results of HIV seropositivity test (positive or negative)
and the study results are contained in the data file: Prison HIV-Drug Use.JMP .
Research Question: Is there evidence that intravenous drug use is associated with HIV
a) Among women who have used drugs intravenously, what proportion are HIV-
positive? Among women who have not used drugs intravenously, what proportion are
HIV-positive? (2 pts.)
b) Use Fisher’s Exact Test to determine if being HIV-positive is positively associated
with a previous history of intravenous drug use for this population of women. State your
conclusion along with a supporting p-value. (2 pts.)
c) Find a 95% CI for the risk difference and interpret. This difference is also referred to
as the attributable risk (AR) = pexposed - punexposed . (3 pts.)
d) Use your answers to calculate the relative risk (RR) for being HIV-positive associated
with intravenous drug use for this population of women. Also find a 95% CI for the RR.
Interpret. (4 pts.)
e) Compute the odds ratio (OR) for being HIV-positive associated with intravenous drug
use for this population of women. Also find a 95% CI for the OR. Interpret. (4 pts.)
f) Number Need to Harm (NNH) – Go to the following website which is actually the
first hit when you Google Search: Number Needed to Harm.
Read through the Wikipedia entry on this website and then find the Number Need to
Harm for this study. (2 pts.)