# Analyze_ descriptive statistics_crosstabs

Document Sample

```					        GCRC Data Analysis Workshop
Session 5
Notes for using SPSS

Log on to stargate                     USER: realclas          PASS WORD: reallyreal

This is an illustration of one strategy for analyzing a dataset from start to finish.

LOW BIRTH WEIGHT Data

Move LBW.sav from s:\gcrc data analysis to the desktop. Open it in SPSS.

Code Sheet for the Variables in the Low Birth Weight Study
Described in Section 1.6.2 Page 24 Hosmer, D.W. and Lemishow, S., Applied Logistic
Regression, Wiley, 2000

Low birth weight, defined as < 2500 grams, is an outcome that has been of concern to
physicians for years. This is due to the fact that infant mortality rates and birth defect rates are
very high for low birth weight babies. A woman’s behavior during pregnancy (including diet,
smoking habits, and receiving prenatal care) can generally alter the chances of carrying the baby
to term and consequently, of delivering a baby of normal birth weight.
Data were collected as part of a larger study at Baystate Medical Center in Springfield,
Massachusetts. This dataset contains information on 189 births to women seen in the obstetrics
clinic. Fifty-nine of these births were low birth weight. The variables have been shown to be
associated with low birth weight in the obstetrical literature. The goal of the current study was to
determine whether these variables were risk factors in the clinic population being served by
Baystate Medical Center. Observed values have been modified somewhat to protect subject
confidentiality.

Variable Description                      Codes/Values                              Name
1     Identification Code                       ID Number                           ID
2     Low Birth Weight                    1 = BWT<=2500g,                           LOW
0 = BWT>2500g
3       Age of Mother                           Years                               AGE
4       Weight of Mother at                     Pounds                              LWT
Last Menstrual Period
5       Race                              1 = White, 2 = Black                      RACE
3 = Other
6       Smoking Status                          0 = No, 1 = Yes                     SMOKE
During Pregnancy
7       History of Premature Labor                0   =   None, 1 = One,            PTL
2   =   Two, etc.
8       History of Hypertension                   0   =   No, 1 = Yes               HTN
9       Presence of Uterine                       0   =   No, 1 = Yes               UI
Irritability
10      Number of Physician Visits                0 = None, 1 = One                 FTV
During the First Trimester                2 = Two,etc.
11      Birth Weight                              Grams                             BWT
12      Log(LWT)                                                                    lnLWT
13      Log(AGE)                                                            lnAGE
14      age*log(AGE)                                                        ageXlnage
15      LWT*log(LWT)                                                        lwtXlnlwt
16      SMOKE*AGE                                                           smokeXage
17      LWT*AGE                                                             LWTXage
18      AGE-23.238                                                          CNTage
19      LWT-129.81                                                          CNTlwt

Examine the chi square test for goodness of fit.

Analyze>nonparametric tests>chi square
Select RACE into the test variable list, and under Expected Values, select VALUES and
add .8, .1, .1 (to test the null hypothesis that this sample was drawn from a population
that’s 80% white, 10% Black, and 10% other)
NPAR TEST
/CHISQUARE=RACE
/EXPECTED=.8 .1 .1
/MISSING ANALYSIS.

The 2df chi square is 144.2, with p<0.001, so we reject the null hypothesis.

Now, univariate exploration:
Analyze> descriptive statistics>crosstabs
Add LOW to the column variables and RACE SMOKE PTL HTN UI to the row variables.
Select STATISTICS and check chi square, continue; select CELLS. Under COUNTS,
select observed and expected; under percentages, select column. Continue, PASTE, RUN
CROSSTABS
/TABLES=RACE SMOKE PTL HTN FTV UI BY LOW
/FORMAT= AVALUE TABLES
/STATISTIC=CHISQ
/CELLS= COUNT EXPECTED COLUMN
/COUNT ROUND CELL .

There are several chi square statistics in the output, along with a warning when any of the
expected cell counts are <5 or there are empty cells:
Pearson’s Chi Square Statistic used to test the hypothesis that the row and column
variables are independent. This should not be used if any cell has an expected value less
than 1, or if more than 20% of the cells have expected values less than 5. For general
purposes, the significance value is more important than the actual value of the statistic.
Continuity Correction is a correction sometimes applied in the computation of chi-
square for 2 X 2 tables to improve the approximation. Corrected chi-square values are
always smaller than uncorrected values.
Likelihood Ratio is a goodness-of-fit statistic similar to Pearson's chi-square. For large
sample sizes, the two statistics are equivalent. The advantage of the likelihood-ratio chi-
square is that it can be subdivided into interpretable parts that add up to the total. For
smaller sample sizes, this is the statistic to report; since it approaches the Pearson as n
increases, it can be reported in either case.
Fisher’s Exact Test is a test for independence in a 2 X 2 table. It is most useful when the
total sample size and the expected values are small.
Linear-by-linear Association is a measure of linear association between the row and
column variables in a crosstabulation. This statistic should not be used for nominal data.
Also known as the Mantel-Haenszel chi-square test for trend.

Are there any variables for which we might want to report the Mantel-Haenszel chi-
square test?
Are there any candidates for recoding?
Do these variables all contribute to the risk for low birth weight?
How many EVENTS are there in this data set? What does this suggest about a reasonable
number of explanatory variables?

For the continuous variables, we can compute t tests (with LOW as the grouping
variable). Analyze>compare means> independent samples
Select LWT (Weight at last Menstrual Period) and AGE into test variables and LOW into
the grouping variable. Define groups, 0, 1, continue.
T-TEST
GROUPS = LOW(0 1)
/MISSING = ANALYSIS
/VARIABLES = LWT AGE
/CRITERIA = CI(.95) .

Do these variables contribute to the risk for low birth weight?

There are clinical reasons to check for an interaction between age and low weight at last
MP. Regress LOW on age LWD and LWD*AGE. Save the predicted values and plot
them.
LOGISTIC REGRESSION LOW
/METHOD = ENTER AGE LWD LWDXage
/SAVE = PRED
/CLASSPLOT
/PRINT = GOODFIT SUMMARY CI(95)
/CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

GRAPH /SCATTERPLOT(BIVAR)=AGE WITH PRE_1 BY LWD /MISSING=LISTWISE .

How does age affect the OR for Low weight at last MP?

Now, how would you evaluate the risk factors for low birth weight?

NB: Be sure to SAVE predicted, Cook’s, dfbetas, deviation residuals; select options for
classification plots, Hosmer-Lemishow goodness of fit, CI (exp(beta)); and display LAST
iteration
The classification table gives sensitivity (proportion EVENTS correctly classified – true
positives) 0.37, and specificity (proportion NO EVENT correctly classified –true
negatives) 0.86

Classification Table(a)

Observed                                             Predicted
Percentage
LOW                Correct

over2500     under2500
Step 1             LOW              over2500                 112            18             86.2
under2500                 37               22          37.3
Overall Percentage                                                      70.9
a The cut value is .500

Is age linear in the logit?

LOGISTIC REGRESSION LOW
/METHOD = ENTER AGE LWD LWDXage RACE SMOKE UI HTN ageXlnage
/CONTRAST (RACE)=Indicator(1) /CONTRAST (SMOKE)=Indicator(1)
/SAVE = PRED COOK DFBETA DEV
/CLASSPLOT
/PRINT = GOODFIT SUMMARY CI(95)
/CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

How would you evaluate the low birth weight variable as a continuous measure?

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 12 posted: 2/12/2010 language: English pages: 4