GCRC Data Analysis Workshop Session 5 Notes for using SPSS Log on to stargate USER: realclas PASS WORD: reallyreal This is an illustration of one strategy for analyzing a dataset from start to finish. LOW BIRTH WEIGHT Data Move LBW.sav from s:\gcrc data analysis to the desktop. Open it in SPSS. Code Sheet for the Variables in the Low Birth Weight Study Described in Section 1.6.2 Page 24 Hosmer, D.W. and Lemishow, S., Applied Logistic Regression, Wiley, 2000 Low birth weight, defined as < 2500 grams, is an outcome that has been of concern to physicians for years. This is due to the fact that infant mortality rates and birth defect rates are very high for low birth weight babies. A woman’s behavior during pregnancy (including diet, smoking habits, and receiving prenatal care) can generally alter the chances of carrying the baby to term and consequently, of delivering a baby of normal birth weight. Data were collected as part of a larger study at Baystate Medical Center in Springfield, Massachusetts. This dataset contains information on 189 births to women seen in the obstetrics clinic. Fifty-nine of these births were low birth weight. The variables have been shown to be associated with low birth weight in the obstetrical literature. The goal of the current study was to determine whether these variables were risk factors in the clinic population being served by Baystate Medical Center. Observed values have been modified somewhat to protect subject confidentiality. Variable Description Codes/Values Name 1 Identification Code ID Number ID 2 Low Birth Weight 1 = BWT<=2500g, LOW 0 = BWT>2500g 3 Age of Mother Years AGE 4 Weight of Mother at Pounds LWT Last Menstrual Period 5 Race 1 = White, 2 = Black RACE 3 = Other 6 Smoking Status 0 = No, 1 = Yes SMOKE During Pregnancy 7 History of Premature Labor 0 = None, 1 = One, PTL 2 = Two, etc. 8 History of Hypertension 0 = No, 1 = Yes HTN 9 Presence of Uterine 0 = No, 1 = Yes UI Irritability 10 Number of Physician Visits 0 = None, 1 = One FTV During the First Trimester 2 = Two,etc. 11 Birth Weight Grams BWT 12 Log(LWT) lnLWT 13 Log(AGE) lnAGE 14 age*log(AGE) ageXlnage 15 LWT*log(LWT) lwtXlnlwt 16 SMOKE*AGE smokeXage 17 LWT*AGE LWTXage 18 AGE-23.238 CNTage 19 LWT-129.81 CNTlwt Examine the chi square test for goodness of fit. Analyze>nonparametric tests>chi square Select RACE into the test variable list, and under Expected Values, select VALUES and add .8, .1, .1 (to test the null hypothesis that this sample was drawn from a population that’s 80% white, 10% Black, and 10% other) NPAR TEST /CHISQUARE=RACE /EXPECTED=.8 .1 .1 /MISSING ANALYSIS. The 2df chi square is 144.2, with p<0.001, so we reject the null hypothesis. Now, univariate exploration: Analyze> descriptive statistics>crosstabs Add LOW to the column variables and RACE SMOKE PTL HTN UI to the row variables. Select STATISTICS and check chi square, continue; select CELLS. Under COUNTS, select observed and expected; under percentages, select column. Continue, PASTE, RUN CROSSTABS /TABLES=RACE SMOKE PTL HTN FTV UI BY LOW /FORMAT= AVALUE TABLES /STATISTIC=CHISQ /CELLS= COUNT EXPECTED COLUMN /COUNT ROUND CELL . There are several chi square statistics in the output, along with a warning when any of the expected cell counts are <5 or there are empty cells: Pearson’s Chi Square Statistic used to test the hypothesis that the row and column variables are independent. This should not be used if any cell has an expected value less than 1, or if more than 20% of the cells have expected values less than 5. For general purposes, the significance value is more important than the actual value of the statistic. Continuity Correction is a correction sometimes applied in the computation of chi- square for 2 X 2 tables to improve the approximation. Corrected chi-square values are always smaller than uncorrected values. Likelihood Ratio is a goodness-of-fit statistic similar to Pearson's chi-square. For large sample sizes, the two statistics are equivalent. The advantage of the likelihood-ratio chi- square is that it can be subdivided into interpretable parts that add up to the total. For smaller sample sizes, this is the statistic to report; since it approaches the Pearson as n increases, it can be reported in either case. Fisher’s Exact Test is a test for independence in a 2 X 2 table. It is most useful when the total sample size and the expected values are small. Linear-by-linear Association is a measure of linear association between the row and column variables in a crosstabulation. This statistic should not be used for nominal data. Also known as the Mantel-Haenszel chi-square test for trend. Are there any variables for which we might want to report the Mantel-Haenszel chi- square test? Are there any candidates for recoding? Do these variables all contribute to the risk for low birth weight? How many EVENTS are there in this data set? What does this suggest about a reasonable number of explanatory variables? For the continuous variables, we can compute t tests (with LOW as the grouping variable). Analyze>compare means> independent samples Select LWT (Weight at last Menstrual Period) and AGE into test variables and LOW into the grouping variable. Define groups, 0, 1, continue. T-TEST GROUPS = LOW(0 1) /MISSING = ANALYSIS /VARIABLES = LWT AGE /CRITERIA = CI(.95) . Do these variables contribute to the risk for low birth weight? There are clinical reasons to check for an interaction between age and low weight at last MP. Regress LOW on age LWD and LWD*AGE. Save the predicted values and plot them. LOGISTIC REGRESSION LOW /METHOD = ENTER AGE LWD LWDXage /SAVE = PRED /CLASSPLOT /PRINT = GOODFIT SUMMARY CI(95) /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) . GRAPH /SCATTERPLOT(BIVAR)=AGE WITH PRE_1 BY LWD /MISSING=LISTWISE . How does age affect the OR for Low weight at last MP? Now, how would you evaluate the risk factors for low birth weight? NB: Be sure to SAVE predicted, Cook’s, dfbetas, deviation residuals; select options for classification plots, Hosmer-Lemishow goodness of fit, CI (exp(beta)); and display LAST iteration The classification table gives sensitivity (proportion EVENTS correctly classified – true positives) 0.37, and specificity (proportion NO EVENT correctly classified –true negatives) 0.86 Classification Table(a) Observed Predicted Percentage LOW Correct over2500 under2500 Step 1 LOW over2500 112 18 86.2 under2500 37 22 37.3 Overall Percentage 70.9 a The cut value is .500 Is age linear in the logit? LOGISTIC REGRESSION LOW /METHOD = ENTER AGE LWD LWDXage RACE SMOKE UI HTN ageXlnage /CONTRAST (RACE)=Indicator(1) /CONTRAST (SMOKE)=Indicator(1) /SAVE = PRED COOK DFBETA DEV /CLASSPLOT /PRINT = GOODFIT SUMMARY CI(95) /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) . How would you evaluate the low birth weight variable as a continuous measure?