Embed
Email

Missing Template

Document Sample
Missing Template
Description

Missing Template document sample

Shared by: zmc41800
Categories
Tags
Stats
views:
0
posted:
1/19/2012
language:
pages:
43
Performing Sensitivity Analyses

of Imputed Missing Values



Jenny H. Qin and Mike Singleton









Kentucky CODES

Kentucky Injury Prevention & Research Center

University of Kentucky



July 14th, 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Multiple Imputation in

Public Health Research



Handling Missing Data in Nursing

Research with Multiple Imputation



Multiple Imputation

Publications

Application of Multiple Imputation in

Medical Studies: from AIDS to NHANES

Questions???

• May I use MI to deal with missing

data problems for my data sets?



• How can I believe that the MI will

give me better analysis results?



• What should I do to get good results

from MI?

July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

??? A sensitivity analysis

tests if our study

results are sensitive

to our assumptions

(missing data

Sensitivity Analyses mechanism), data

on Imputed Values conditions (missing

data rate), and

choices (imputation

models or number of

imputations) made

Answers for obtaining the

results





July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

MI Process

Analysis

Model





Set 1 Results 1



1 Missing Data Mechanism

Set 2 Results 2

Imputation

3 Model Proc

Data Set of Interest Proc MI Set 3 Results 3 MIANALYZE

. .

. .

2 Missing Data Rate 4 Proc MI

Options . . Results

Set n Results n









July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

CODES Application

Research Question:

What was the relationship between driving under the influence of

drugs and/or alcohol, and being killed or hospitalized in a crash,

for motorcycle riders in Kentucky in 2001?



Outcome (Dependent Variable):

Killed or Hospitalized (K/H)



Risk Factor Candidates (Independent Variables):

Age, gender, suspected DUI, posted speed limit, helmet use,



fixed object, head-on collision, collision time, rural vs. urban







July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Analysis Model



Logistic Regression Model:

K/H = β0 + β1*DUI + β2*Speed + β3*Fixed + β4*Head-On



Total records in our study Data set:

1,226



Records with missing values:

14 (1.1%)





July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Results for the Gold Standard

This Gold

Parameter OR(95% CI) Estimate SE P

Standard

DUI 2.51 (1.58 3.98) 0.9189 result 0.0001

0.2364 is used

Speed 1.58 (1.18 2.10) 0.4546

to compare

0.1456 0.0018

with all other

Fixed 1.70 (1.24 2.33) 0.5311 0.1599

results.0.0009

Head-on 1.70 (1.04 2.77) 0.5316 0.2486 0.0380





Conclusion: comparing motorcyclists with DUI to motorcyclists

without DUI, the odds of being killed or hospitalized are 2.5

times greater than the odds of not being killed or hospitalized,

when other factors are controlled.







July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Imputation Model



Analysis Model:

K/H = β0 + β1*DUI + β2*Speed + β3*Fixed + β4*Head-On



Imputation Model:

K/H DUI Speed Fixed Head-On





Note: The imputation model does not have to be identical to the analysis

model, but at least it should include all of the analysis covariates. You can

add any additional variables that are correlated to the variables that have

missing values.







July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

SA: 1 Missing Data Mechanism





MCAR MAR NMAR





1 Missing Data Mechanism

Imputation Analysis

3 Model Model Data Proc

Study Data Set Proc MI

Analysis MIANALYZE





2 Missing Data Rate 4 Proc MI options

Results









July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

SA: 1 Missing Data Mechanism

• Missing Completely At Random (MCAR)

– DFN: the missing data values are a simple random sample of all data values.

– We simulated this condition by using SAS Proc SurveySelect to pick a

random sample from the study data set, then set DUI = missing for those

selected cases.



• Missing At Random (MAR)

- DFN: the probability of missing values on one variable is unrelated to the

values of this variable, after controlling for other variables in the analysis

- We simulated this condition by setting DUI = missing for riders aged 46 or

older



• Not Missing At Random (NMAR)

– DFN: the probability of missing values on one variable is related to the

values of this variable even if we control other variables in the analysis

– We simulated this condition by setting DUI = missing for uninjured riders

who were not suspected of DUI (DUI=‘NO’).







July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Created 3 data sets from the study data set with different missing data

mechanisms, but with the same percent missing values for DUI (25%)



MCAR MAR NMAR

25% missing on DUI 25% missing on DUI 25% missing on DUI

Parameter E SE P E SE P E SE P





Intercept -1.7336 0.1096 0.0001 -1.7259 0.1092 0.0001 -1.7204 0.1092 0.0001





DUI 0.8544 0.2664 0.0016 0.8286 0.2623 0.0018 0.5791 0.2223 0.0092





Speed 0.5018 0.1449 0.0005 0.4843 0.1448 0.0008 0.4812 0.1443 0.0009





Fixed 0.4927 0.1610 0.0022 0.5079 0.1597 0.0015 0.5400 0.1578 0.0006





Head-on 0.5133 0.2485 0.0388 0.5133 0.2486 0.0389 0.5103 0.2475 0.0393









July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Sensitivity analysis Estimates for Parameters with Different

on missing data Missing Data Mechanisms

mechanism:

Different

1

1 Missing Data Mechanism

0.8

Same



Estimate

0.6

2 Missing Data Rate (25%)

0.4 GoldStd

Same MCAR

0.2 MAR

3 Imputation Model NMAR

0

Same



d









d

I









n

DU









ee









xe









-o

4 Proc MI Options









ad

Sp









Fi







He

What is the result?







July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Conclusions of SA on Missing Data Mechanism

•Even if we used the

Point Estimate and 95% CI for DUI

simplest imputation model

with Different Missing Data Mechanisms MI was able to produce

results that are consistent

4.5

95%CI_upper with the Gold Standard

4 Point Estimate when the missing data

3.5 95%CI_lower

mechanisms were MCAR

3

Odds Ratio









or MAR, but not NMAR

2.5

2 •we would predict the

1.5 increased odds of death or

1 hospitalization for riders

0.5 suspected of DUI to be 1.78

0 (1.15 2.76) for NMAR,

GoldStd MCAR MAR NMAR while our Gold Standard

predicts it to be 2.51 (1.58

3.98).



July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

SA: 2 Missing Data Rate







1 Missing Data Mechanism

Imputation Analysis

3 Model Model Data Proc

Study Data Set Proc MI

Analysis MIANALYZE





2 Missing Data Rate 4 Proc MI options

Results



6% 25% 50%









July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

SA: 2 Missing Data Rate



• Data sets with MCAR (Test on

percentage of values missing for DUI as

6%, 25%, 50% respectively)



• Data sets with MAR (Test on

percentage of values missing for DUI as

6%, 25%, 50% respectively)







July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Create 3 data sets with MCAR from the study data set having values

missing for DUI as 6%, 25%, and 50% respectively.



MCAR MCAR MCAR

6% missing on DUI 25% missing on DUI 50% missing on DUI

Parameter E SE P E SE P E SE P





Intercept -1.7361 0.1094 0.0001 -1.7336 0.1096 0.0001 -1.7377 0.1119 0.0001





DUI 0.9447 0.2429 0.0001 0.8544 0.2664 0.0016 0.8457 0.2973 0.0065





Speed 0.4812 0.1446 0.0009 0.5018 0.1449 0.0005 0.4831 0.1460 0.0009





Fixed 0.5213 0.1584 0.0010 0.4927 0.1610 0.0022 0.5200 0.1617 0.0013





Head-on 0.5245 0.2489 0.0351 0.5133 0.2485 0.0388 0.4936 0.2508 0.0490









July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Create 3 data sets with MAR from the study data set having values missing

for DUI as 6%, 25%, and 50% respectively.



MAR MAR MAR

6% missing on DUI 25% missing on DUI 50% missing on DUI

Parameter E SE P E SE P E SE P





Intercept -1.7382 0.1095 0.0001 -1.7259 0.1092 0.0001 -1.7502 0.1109 0.0001





DUI 0.9191 0.2334 0.0001 0.8286 0.2623 0.0018 1.2722 0.3298 0.0002





Speed 0.4836 0.1449 0.0008 0.4843 0.1448 0.0008 0.5063 0.1473 0.0006





Fixed 0.5076 0.1590 0.0014 0.5079 0.1597 0.0015 0.5234 0.1597 0.0010





Head-on 0.5174 0.2486 0.0374 0.5133 0.2486 0.0389 0.5371 0.2487 0.0308









July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Sensitivity analysis Estimates for Parameters with Different Missing

on Missing Data Rates

Rate?

Same 1.4

GoldStd

MAR6%

1 Missing Data Mechanism 1.2

MCAR or MAR MAR25%

1 MAR50%

Different MCAR6%

Estimate

0.8 MCAR25%

2 Missing Data Rate MCAR50%

0.6

Same

0.4

3 Imputation Model

0.2

Same

0

DUI Speed Fixed Head-on

4 Proc MI Options

What is the result?







July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Conclusions of SA on Missing Data Rate

• For both missing data

Point Estimate and 95%CI for DUI with mechanisms, the 50% missing

Different Missing Data Rates case produced the DUI

parameter estimate farthest

from the Gold Standard

8

95%CI_upper estimate, as well as the widest

7

Point Estimate 95% CI. However, for MCAR

Odds Ratio









6

95%CI_lower the difference from the Gold

5 Standard estimate was -7%,

4 whereas for MAR it was 42%.

3 In addition, the 95% CI for

2 50%MCAR was 19% wider

1 than the Gold Standard 95%

0 CI, whereas for 50%MAR it

was 106% wider.

M









Go





M





M





M

M





M

CA









AR





AR





AR

CA





CA





ld









•It shows that the simplest

St

R6









6%





25





50

R2





R5





d









%





%

%





5%





0%









imputation model is not

sufficient to handle very high

missing data rates .



July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

SA: 3 Imputation Model





Model1 Model2 Model3 Model4





1 Missing Data Mechanism

Imputation Analysis

3 Model Model Data Proc

Study Data Set Proc MI

Analysis MIANALYZE





2 Missing Data Rate 2 Proc MI options

Results









July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

SA: 3 Imputation Model

• Data set with MAR and values missing for DUI=50%



• Tests on the following 4 Imputation models



– Model1: D/H DUI Speed Fixed Head-on

Model1 = Analysis model, it is the simplest imputation model



– Model2: Model1 + age_group + colltime (Categorical)



– Model3: Model1 + age_group + hour (Continuous)



– Model4: Model1 + age_group + hour_normal (Continuous)

We are adding age and collision time to help predict DUI in

Model2, Model3, and Model4





July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Use 4 different imputation models to do MI on the same data set with

MAR, 50% missing on DUI.



Model 2 Model 3 Model 4

50% missing on DUI 50% missing on DUI 50% missing on DUI

Parameter E SE P E SE P E SE P





Intercept -1.8110 0.1222 0.0001 -1.8081 0.1235 0.0001 -1.8034 0.1238 0.0001





DUI 1.0127 0.2948 0.0016 0.9814 0.2966 0.0024 0.9563 0.2813 0.0015





Speed 0.5079 0.1466 0.0005 0.5021 0.1463 0.0006 0.5081 0.1469 0.0005





Fixed 0.5370 0.1604 0.0008 0.5404 0.1601 0.0007 0.5371 0.1598 0.0008





Head-on 0.5554 0.2537 0.0286 0.5477 0.2552 0.0320 0.5561 0.2521 0.0274









July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Sensitivity analysis Estimates for Parameters with Different

on Imputation Imputation Models

Model

1.5

Same 1.4 GoldStd

NoMI

1 Missing Data Mechanism

1.3

Model1

MAR 1.2 Model2

1.1 Model3

Same

Estimates

1 Model4

0.9

2 Missing Data Rate (50%) 0.8

0.7

Different 0.6

0.5

3 Imputation Models 0.4

0.3

Same 0.2

I









n

d

d

DU









4 Proc MI Options









-o

xe

ee









ad

Fi

Sp









He

What is the result?







July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Conclusions of SA on Imputation Models

•Models 2, 3, and 4 are all

Point Estimate and 95% CI for DUI with improvements over model 1,

Different Imputation Models and produced DUI

9

parameter estimates and

8

95% CI widths close to those

95%CI_upper

7 Point Estimate

of the Gold Standard.

95%CI_lower

•So even with 50% missing

Odds Ratio









6

5 values (MAR), we are able to

4 get a good result by using a

3 richer imputation model.

2

1 •The higher percent missing

0 values (MAR) in your data

NoMI Model1 Model2 Model3 Model4 GoldStd set, the more you must

include additional predictors

in the imputation model.





July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Comparison of No MI and Model 4 to the Gold Standard



Estimates for Parameters

(Data set with 50% MAR on DUI)

1.6

GoldStd

1.4 NoMI

1.2 Model4

Estimates









1



0.8



0.6



0.4



0.2



0

DUI Speed Fixed Head-on





July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Comparison of No MI and Model 4 to the Gold Standard

Point Estimate and 95% CI for DUI Point Estimate and 95% CI for Speed



9 2.5



8

2

7

6 MI









Odds Ratio

G.S.

Odds Ratio









1.5

5

4

1

3

G.S. MI

2

0.5

1

0

0

No MI

Point Estimate and 95% CI for Fixed Point Estimate and 95% CI for Head-on



3.5 6



3 5

2.5









Odds Ratio

4

Odds Ratio









2

G.S. MI 3

1.5

2

1 G.S. MI

1

0.5

0

0





July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

SA: 4 Proc MI: Number of Imputations





1 Missing Data Mechanism

Imputation Analysis

3 Model Model Data Proc

Study Data Set Proc MI

Analysis MIANALYZE





2 Missing Data Rate 4 Proc MI: number of MI

Results



N=0 N=2 N=5 N=10 N=20









July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

SA: 4 Proc MI: Number of Imputations

• Data set with MAR and values missing for

DUI=50%, use Model4 to do MI



• Test on different number of imputations

– N=0



– N=2



– N=5



– N=10



– N=20







July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Use same imputation model (Model4), but different number of imputations to

do MI on the same data set with MAR, 50% missing on DUI.



N=5 N=10 N=20

50% missing on DUI 50% missing on DUI 50% missing on DUI

Parameter E SE P E SE P E SE P





Intercept -1.7975 0.1177 0.0001 -1.8034 0.1238 0.0001 -1.7898 0.1204 0.0001





DUI 0.8658 0.2537 0.0023 0.9563 0.2813 0.0015 0.9942 0.3176 0.0026





Speed 0.4971 0.1457 0.0006 0.5081 0.1469 0.0005 0.5016 0.1465 0.0006





Fixed 0.5448 0.1610 0.0007 0.5371 0.1598 0.0008 0.5286 0.1599 0.0010





Head-on 0.5652 0.2522 0.0251 0.5561 0.2521 0.0274 0.5506 0.2509 0.0282









July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Sensitivity analysis Estimates for Parameters with

on Number of Different Number of Imputations

Imputations

Same

1.6

GoldStd

1 Missing Data Mechanism 1.4 NoMI

MAR

1.2 MI N2

Same





Estimates

1 MI N5

2 Missing Data Rate (50%) MI N10

0.8

MI N20

Same 0.6

0.4

3 Imputation Model

0.2

Different 0

4 Number of Imputation









ed

d









n

I

DU







ee









-o

x

What is the result?









ad

Fi

Sp









July 14th , 2003 www.kiprc.uky.edu He 29th TRF 2003, Denver

Conclusions of SA on Number of Imputations

Point Estimate and 95% CI for DUI with •In our example, n=5 to

Different Imputation Numbers 10 is enough to get good

results for data set with

9 50% MAR on DUI.

8 95%CI_upper

Point Estimate •No MI (complete cases

7

95%CI_lower only), we would conclude

Odds Ratio









6

5

that: motorcyclists with

4

DUI had 4.2 (2.1, 8.4)

3

times more likely killed

2

or hospitalized than

1 motorcyclists without

0 DUI. But from the Gold

n=0 n=2 n=5 GoldStd n=10 n=20 Standard, the OR is 2.5

(1.5, 4.0)







July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Summary---Answers?

• May I use MI to deal with missing data problems for

my data sets?

Seems a good idea to try MI. Depend on the missing data mechanisms of

variables with missing values in your data sets (however, even our results

with MI for NMAR were better than No MI)



• How can I believe that the MI will give me the better

analysis results?

We found that using MI on our example gave us much better analysis

results than No MI (the complete cases only)



• How can I get better analysis results by using MI?

Understand the relationship between variables in your data sets;

Know the missing data mechanisms of variables;

Determine the percent of missing information;

Build a reasonable imputation model;

Use Proc MI options wisely





July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Poll Results

Like Denver Like TRF Liked the Talk Use MI



Y

Q1. I like Denver.

Y Y Y



Missing (left

session early)

Missing Data

Y Missing (too nice to

say “NO”)

N



Q2. I like TRF.

Y



Y

Problems

N



N

Y



N

Y



Missing (not

sure yet)

N Everywhere

Q3. I liked the talk.

Missing

(daydreaming)

Y Y



Missing (fell Y Missing N

asleep)

N Q4. NI will use the MI.

N Missing



N Missing Y Y





July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Acknowledgment



Special thanks to Dr. Mike McGlincy,

who gave us helpful suggestions during

our study of sensitivity analyses on

imputed values and insightful comments

on the analysis results.





July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Thank You







July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Questions?









July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Can We Improve Analysis Results for NMAR

by Using a More Complex Imputation Model?



Estimates for Parameters on

25% NMAR with Different Models

Model5=Model1+age+hour

+gender+safety

1

GoldStd

Model4=Model1+age+hour 0.9 NoMI

Model1

0.8 Model4

Model5

Model1=K/H + DUI + Speed Estimates 0.7

+ Fixed + Head-on 0.6



0.5

No MI=Complete cases only

0.4



0.3



0.2

DUI Speed Fixed Head-on









July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Multiple Imputation inference involves

three distinct phases:



1. The missing data are filled in m times to generate m

complete data sets

(using imputation model)



2. The m complete data sets are analyzed by using standard

procedures

(using analysis model)



3. The results from the m complete data sets are combined

for the inference







July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

Statistical Assumptions for

Multiple Imputation

1. The MI procedure assumes that the data are from a continuous

multivariate distribution. It also assumes that the data are from a

multivariate normal distribution when the MCMC method is used

According to Schafer’s MI FAQ page, MI tends to be quite

forgiving of assumption for normal distribution. For example: when

working with binary or ordered categorical variables, it is often

acceptable to impute under a normality assumption and then round

off the continuous imputed values to the nearest category.

Variables whose distributions are heavily skewed may be

transformed to approximate normality and then transformed back

to their original scale after imputation.



2. Proc MI and Proc MIANALYZE assume that the missing data are

Missing At Random (MAR)

MCAR is unlikely for real world crash datasets

NMAR may be shifted to MAR by using a richer imputation model to

help predict missing values. Because crash datasets include many

related variables that can help predict each other







July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver

July 14th , 2003 www.kiprc.uky.edu 29th TRF 2003, Denver


Related docs
Other docs by zmc41800
Michigan Loan Agreement
Views: 0  |  Downloads: 0
Minuta Purchase Sale
Views: 0  |  Downloads: 0
Merger Agreement for Charities
Views: 2  |  Downloads: 0
Middle Man Commission Contracts
Views: 14  |  Downloads: 0
Missing Invoice
Views: 0  |  Downloads: 0
Mining Permitting Proposals
Views: 0  |  Downloads: 0
Metode Value at Risk
Views: 35  |  Downloads: 0
Microsoft Works Medical Resume Templates
Views: 4  |  Downloads: 0
Mma Promotional Agreement
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!