Embed
Email

sterne_tilling_g-estimation

Document Sample

Shared by: ajizai
Categories
Tags
Stats
views:
1
posted:
12/21/2011
language:
pages:
29
Applications of G-estimation using

a new Stata command



Jonathan Sterne

jonathan.sterne@bristol.ac.uk



Kate Tilling

kate.tilling@bristol.ac.uk



Department of Social Medicine,

University of Bristol UK

Outline



• Time varying confounding and G-estimation

• G-estimation in Stata

• Applications

• Discussion and future plans

A covariate is a time-varying confounder for the

effect of exposure on outcome if:

1. past covariate values predict current exposure

2. current covariate value predicts outcome



Example:

1. people with low CD4 are more likely to get HAART

2. Low CD4 is a risk factor for AIDS and death



If, in addition, past exposure predicts current

covariate value then standard survival analyses with

time-updated exposure effects will give biased

exposure effect estimates



For example, CD4 count predicts HAART and HAART raises

CD4 counts

G-estimation (1)

• Assume that subject i has an underlying counterfactual failure

time Ui - the time to failure had they never been exposed. This

is unobservable for subjects who were exposed at any time

• Assume that exposure accelerates failure time by a factor

exp(- ) - the causal survival time ratio. So if  0 exposure decreases survival

• If we knew  , then for any subject who experienced the

outcome event at time Ti, the counterfactual failure time could

be derived by: Ti



U i   exp(  E (t )) dt

0

• Example: if subject i experienced the outcome event at 5 years

and was exposed for 3 years then Ui =3exp( )+2

G-estimation (2)

Assume that there are no unmeasured confounders

• conditional on measured history (past and present

confounders and past exposure) subjects’ present

exposure is independent of their counterfactual

failure time Ui

• e.g. for 2 individuals with identical histories, the

decision to quit smoking does not depend on

underlying survival time



Use logistic regression to search for a value of 

that satisfies this condition

Censoring



• No competing risks

Replace U( ) with variable indicating whether

individual would have been observed to fail both if

they were exposed and if they were unexposed.

• Competing risks

Assume that conditional on known covariates

censoring due to competing risks is independent of

failure time

Estimate the cumulative probability of being free from

competing risks until end of follow up, and weight by

the inverse of this probability.

The stgest command



• Written for Stata

• User specifies exposure, covariates (including

baseline and lagged covariates) and any censoring

variables

• Data set up in Stata survival analysis format (i.e.

start time, end time and failure indicator for each

interval for each individual)

• Uses interval bisection method to search for G-

estimate and 95% CI (or user can specify range

and ‘step’ for grid search)

Caerphilly study



– 2512 men first examined 1979 to 1983, mean

age at baseline 52 years

– Three further follow up surveys with

ascertainment of MI and deaths to August 2000

– Data from the first examination is used to

provide baseline exposure measures, so follow-

up starts from the second examination

– 1756 men included in analyses

– 244 had a first MI or died from CHD between the

second examination and the end of follow up

Data



• Baseline

smoking history, age, self-reported CHD,

gout, diabetes, high blood pressure



• Every visit

BP, BMI, smoking status, total cholesterol,

CHD, gout, diabetes, fibrinogen

Censoring



• Four possibilities:

– Not censored 1175 (66.9%)

– MI or MI death 244 (13.9%)

– Death from other cause 231 (13.2%)

– Lost 106 (6.0%)





• Multinomial logistic regression

estimate the probability that each id was censored (last

two categories) as the product of the probability of

censoring at each examination

list id visit examdat exitdate mi examdat2 cursmok if touse



id visit examdat exitdate mi examdat2 cursmok

16. 1021 1 10sep1979 31jul1984 0 31jul1984 0

17. 1021 2 31jul1984 17mar1992 0 31jul1984 0

18. 1021 3 17mar1992 18jun1996 1 31jul1984 0

19. 1022 1 10sep1979 19sep1984 0 19sep1984 1

20. 1022 2 19sep1984 20nov1989 0 19sep1984 1

21. 1022 3 20nov1989 28oct1993 0 19sep1984 1

22. 1022 4 28oct1993 31dec1998 0 19sep1984 0

23. 1023 1 10sep1979 03oct1984 0 03oct1984 1

24. 1023 2 03oct1984 20nov1989 0 03oct1984 1

25. 1023 3 20nov1989 08nov1993 0 03oct1984 1

26. 1023 4 08nov1993 31dec1998 0 03oct1984 1

. stset exitdate, id(id) failure(mi) origin(time examdat2) scale(365.25)



id: id

failure event: mi ~= 0 & mi ~= .

obs. time interval: (exitdate[_n-1], exitdate]

exit on or before: failure

t for analysis: (time-origin)/365.25

origin: time examdat2



-----------------------------------------------------------------------

6377 total obs.

1756 obs. end on or before enter()

-----------------------------------------------------------------------

4621 obs. remaining, representing

1756 subjects

244 failures in single failure-per-subject data

18547.87 total analysis time at risk, at risk from t = 0

earliest observed entry t = 0

last observed exit t = 14.47502

. list id visit examdat exitdate mi _t0 _t _d _st if touse, noobs nodisp



id visit examdat exitdate mi _t0 _t _d _st

1021 1 10sep1979 31jul1984 0 . . . 0

1021 2 31jul1984 17mar1992 0 0.00 7.63 0 1

1021 3 17mar1992 18jun1996 1 7.63 11.88 1 1

1022 1 10sep1979 19sep1984 0 . . . 0

1022 2 19sep1984 20nov1989 0 0.00 5.17 0 1

1022 3 20nov1989 28oct1993 0 5.17 9.11 0 1

1022 4 28oct1993 31dec1998 0 9.11 14.28 0 1

1023 1 10sep1979 03oct1984 0 . . . 0

1023 2 03oct1984 20nov1989 0 0.00 5.13 0 1

1023 3 20nov1989 08nov1993 0 5.13 9.10 0 1

1023 4 08nov1993 31dec1998 0 9.10 14.24 0 1

. makebase cursmok hearta gout highbp diabet fibrin chol cholsq /*

> */ bpsyst bpdias obese thin, firstvis(1) visit(visit)



Baseline confounders



storage display value

variable name type format label variable label

---------------------------------------------------------------------

Bcursmok byte %9.0g

Bhearta byte %9.0g

Bgout byte %9.0g

Bhighbp byte %9.0g

Bdiabet byte %9.0g

Bfibrin float %9.0g

Bchol float %9.0g

Bcholsq float %9.0g

Bbpsyst int %9.0g

Bbpdias int %9.0g

Bobese byte %9.0g

Bthin byte %9.0g

. makelag cursmok hearta gout highbp diabet fibrin chol cholsq /*

> */ bpsyst bpdias obese thin, firstvis(1) visit(visit)



Lagged confounders



storage display value

variable name type format label variable label

----------------------------------------------------------------------

Lcursmok byte %9.0g

Lhearta byte %9.0g

Lgout byte %9.0g

Lhighbp byte %9.0g

Ldiabet byte %9.0g

Lfibrin float %9.0g

Lchol float %9.0g

Lcholsq float %9.0g

Lbpsyst int %9.0g

Lbpdias int %9.0g

Lobese byte %9.0g

Lthin byte %9.0g

. stcox cursmok Agegrp* hearta gout highbp diabet fibrin chol

cholsq bpsyst bpdias obese thin B* L*



failure _d: mi

analysis time _t: (exitdate-origin)/365.25

origin: time examdat2

id: id



No. of subjects = 1756 Number of obs = 4621

No. of failures = 244

Time at risk = 18547.87132

LR chi2(41) = 178.92

Log likelihood = -1662.3478 Prob > chi2 = 0.0000



----------------------------------------------------------------------

_t |

_d | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

---------+------------------------------------------------------------

cursmok | 1.014992 .2085446 0.07 0.942 .6785331 1.518288



(remaining output omitted)

. stgest cursmok Agegrp* fibrin hearta gout highbp diabet chol cholsq

bpsyst bpdias obese thin,

visit(visit) firstvis(2)

lagconf(cursmok fibrin hearta gout highbp diabet chol cholsq bpsyst

bpdias obese thin)

baseconf(fibrin hearta gout highbp cursmok chol cholsq diabet bpsyst

bpdias obese thin)

lasttime(mienddat) range(-2 2) saveres(caergestsmoknocens) replace



causvar: cursmok

visit: visit

Range: -2 2, rnum: 2

Search method: interval bisection



-2.00 2.00 0.00 1.00 0.50 0.25 0.13 0.19 0.22 0.23 0.24 0.24 0.24 0.24

0.38 0.31 0.34 0.36 0.37 0.37 0.37 0.37 -1.00 -0.50 -0.25 -0.13 -0.06

-0.03 -0.02 -0.01 -0.00 -0.00 -0.00



savres: caergestsmoknocens



G estimate of psi for cursmok: 0.239 (95% CI -0.001 to 0.368)



Causal survival time ratio for cursmok: 0.787 (95% CI 0.692 to 1.001)

2

z









0









-2



-.2 0 .2 .4

psi

. weibull _t cursmok Agegrp* hearta gout highbp diabet fibrin chol

cholsq bpsyst bpdias obese thin B* L* if visit>=2, dead(_d) t0(_t0) hr



_t | Haz. Ratio Std Err z P>|z| [95% Conf. Interval]

--------+---------------------------------------------------------

cursmok | 1.01690 .2083929 0.08 0.935 .6805221 1.519549



(rest of output omitted)



. gesttowb



g-estimated hazard ratio 1.28 ( 1.00 to 1.47)

. * allowing for censoring due to competing risks;



. stgest cursmok Agegrp* fibrin hearta gout highbp diabet chol cholsq

bpsyst bpdias obese thin, visit(visit) firstvis(2)

lagconf(fibrin hearta gout highbp diabet cursmok chol cholsq bpsyst

bpdias obese thin) baseconf(fibrin hearta gout highbp cursmok chol

cholsq diabet bpsyst bpdias obese thin)

lasttime(mienddat) saveres(caergestsmok) replace

idcens(idcrcens) range(-2 2) pnotcens(pnotcens)



G estimate of psi for cursmok: 0.290 (95% CI -0.190 to 0.773)



Causal survival time ratio for cursmok: 0.748 (95% CI 0.462 to 1.210)



. gesttowb

g-estimated hazard ratio 1.34 ( 0.82 to 2.19)

Atherosclerosis Risk in Communities

(ARIC) study

• 15, 792 members of 4 communities in

the USA

• baseline exam between 1987 and 1989

• 3 follow-up exams at 3 year intervals

• followed up for death, CHD and stroke

ARIC data



• Baseline

smoking history, education level, age,

sex, ethnicity, self-reported stroke/CHD

• Every visit

BP, BMI, smoking status, total, HDL

and LDL cholesterol, diabetes status,

use of anti-hypertensive medication

ARIC data



13898 persons with data on visits 1 and 2

7699 (55%) female

Mean age =54 (min=45, max=65).

CHD present in 625 (5%)

9754 (70%) not on anti-hypertensive

medication at visits 1 or 2.

Methods

Weibull analysis and G-estimation

Outcomes - death, incident CHD.

CHD as outcome - exclude those with

CHD at baseline/1st visit, censor if die of

other causes

Exposures - BP, smoking, BMI, HDL,LDL

BP - exclude those on anti-hypertensives

at baseline, censor at anti hypertensive

use.

Results



Published in the American Journal of Epidemiology,

April 15th 2002.

Tilling K, Sterne JAC, Szklo M. G-estimation of the

effects of cardiovascular risk factors on all-cause

mortality and CHD: the ARIC study. AJE 2004; 155:

710-718





Summary: effects tended to be under-estimated by

Weibull compared to g-estimation.

Discussion - model specification



Model specified that exposure at a given visit

multiplies survival from that moment by a given

amount.

Alternatives:

• effect on survival only lasts for a given period

(e.g. use of anti-hypertensives)

• effect on survival starts after a given period

(e.g. possible lagged effect of smoking)

Future work and (we hope) collaboration



• Implement MSMs in Stata

• Effect of cardiovascular risk factors (e.g. smoking,

fibrinogen) and anti-hypertensives in Caerphilly

study

• Effect of treatments (e.g. anti-hypertensives, anti-

platelet agents) on stroke recurrence using South

London Stroke Register

Future work and (we hope) collaboration



• Causal effect of HAART

– When to start

– Effect of different drug combinations

– Will require large collaborations between

cohorts

– Aim to build on an existing collaboration

between 13 cohorts involving 12500 patients

starting HAART



Other docs by ajizai
NH_Members
Views: 0  |  Downloads: 0
06 Mr. Wu Jun 16012009
Views: 0  |  Downloads: 0
9i CITY OF RAPID CITY
Views: 0  |  Downloads: 0
K Readiness Doc. July 2010
Views: 0  |  Downloads: 0
LookMaNoHands
Views: 0  |  Downloads: 0
97605964
Views: 0  |  Downloads: 0
NBA 2006-07 data
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!