folk uio no folk uio no

Document Sample
folk uio no folk uio no Powered By Docstoc
					Dynamic analysis of binary
    longitudinal data

           Ørnulf Borgan
       Department of Mathematics
           University of Oslo

        Based on joint work with
 Rosemeire L. Fiaccone, Robin Henderson
        and Mauricio L. Barreto
- An example of binary longitudinal data:
  The Blue Bay project
- Modelling missingness for longitudinal binary
  data (including the relation to independent
  censoring in event history analysis)

- An additive model for longitudinal binary data

- Dynamic covariates

- Martingale residual processes
- Concluding comments                              2
Blue Bay project:
Bahia State, Brazil (size of France)
State capital Salvador (pop: 2.5 mill.)

Public works and education in the areas of
sanitation and environment executed by the
Bahia State Government since 1997
Cost: more than $1 billion

   Belgica 1996              Belgica 2002

Daily data on diarrhoea for almost a thousand
children (one per family)
Collected at home visits Oct 2000 to Jan 2002
Children less than 3 years of age at entry
Diarrhoea: three or more fluid motions a day

Episode of diarrhoea: sequence of days with
diarrhea until at least two consecutive clear days

The reduced prevalence/incidence over time may
reflect improved health over the study period, or
may be an artefact due to ageing of the cohort
Social, demographic and economic
characteristics collected at entry to the study:

Follow-up information on 10 children:

                                  Under observation:

                                  New episode:    X

                                  Ongoing episode:     X

                                  Drop-out:   O

   Pattern of missing observations for
   all 926 children:
data collector        Police strike

                     St. John's day   Christmas Day

Three types of missingness:
- Late entries (16% of children)
- Drop-outs (21% of children)
- Intermittent missingness (20% of observations)
Features of the data:
Longitudinal binary data
Four time scales: calendar, age, study, episode
Calendar time used as basic time scale
Study factors of importance for incidence and
prevalence of diarrhoea and how diarrhoea
incidence and prevalence vary over calendar time

Ignored (for this talk):
Spatial associations
Other non-independence                            11
                                                     Conditions on the
Modelling                                            missingness are
missingness:                 Joint model for         defined for this
                             binary data and         model

   Model for binary
   data without
                                                Model for
                                                observed data

Parameters of interest are
defined for this model                   Statistical methods are derived
                                         and studied for this model

We need to relate the models for the three situations
(starting with models for one individual)
Model without missingness
Observations for child i is a binary time series

        Yi1 , Yi 2 ,..., YiT
Here Yit  1 if the child starts a new episode of
diarrhea at day t (has diarrhoea at day t)

Let H i 0 be the s-algebra generated by the fixed
and external time-varying covariates for child i

Hit  Hi 0  s {Yi1 , Yi 2 ,..., Yit } is the information
that had been available on child i by day t had
there been no missingness                                   13
Introduce the conditional probabilities
          it  P(Yit  1| H i ,t 1 )
The aim for our analysis is to study how the  it
vary over time and how they depend on covariates,
including dynamic covariates that are functions of
 Yis for s < t
This differs from the common approach in
longitudinal data analysis, where the focus is on
the marginal probabilities

           it  P(Yit  1| Hi 0 )
Joint model for binary longitudinal
data and missingness
Introduce the observation process for individual i

           1 observed at t
     Rit  
           0 not observed at t

We need to consider the larger filtration:

  Git  Gi 0  s {Ri1 , Yi1 , Ri 2 , Yi 2 ,..., Rit , Yit }
where Gi 0 is generated by H i 0 and external
aspects of the observation process for child i
We make two assumption on the missingness:

(1) P(Yit  1| Gi ,t 1 )  P(Yit  1| H i ,t 1 )

(2) Yit and Rit are indpendent, given Gi ,t 1

These assumption correspond to:
• sequential MAR in longitudinal data analysis
• independent censoring in event history analysis

Modelling the observable data

Binary observations for individual i : Yit  RitYit

Observed filtration:

Fit  Gi 0  s {Ri1 , Yi1 , Ri 2 , Yi 2 ,..., Rit , Yit , Ri ,t 1}

(Note that we for convenience have included
Ri ,t 1 in the definition of Fit )

Then:   it  P(Yit  1| Fi ,t 1 )
             E{E( RitYit | Gi ,t 1 , Rit ) | Fi ,t 1}
             Rit E{P(Yit  1| Gi ,t 1 ) | Fi ,t 1}
             Rit E{it | Fi ,t 1}
We will assume that  it is Fit predictable,
implying that the time-dependent dynamic
covariates used for regression modelling
depend only on observables

Thus:    it  Rit it
Intoduce             it  Yit  it

The    it   are martingale differences

M it  st  is is a discrete time martingale

Predictable variation process:

 Mi   t
            Var( is | Fi ,s 1 )   is (1  is )
             s t                      s t

Modelling the relation between individuals
Denote by Ft the information available to the
researcher on all children by day t
We impose the following assumptions:

(i)    it  P(Yit  1| Fi ,t 1 )  P(Yit  1| Ft 1 )
(ii)   Cov( it ,  jt | Ft 1 )  0 for i  j
The assumptions are weaker than independence
Nevertheless they are debatable [(i) in particular]
for the diarrhoea data
Note that (ii) implies that the martingales
M it and M jt are orthogonal                              20
An additive model for longitudinal
binary data
Have the decomposition        Yit  it   it
Let xi1t ,…, xipt be predictable covariates for
child i at day t

Consider the model
  it  Rit it
       Rit 0t  1t xi1t  ...   pt xipt 

Conditional on "the past" Ft-1 we at day t have
Yit  it   it
     0t Rit  1t Rit xi1t  ...   pt Rit xipt   it
i.e. a linear regression model
We may estimate the  jt by ordinary least
squares at each day t (quick!)

The estimates for each day will be quite
unstable, but they may be accumulated over
time to get stable estimates for the cumulative
regression coefficients
            B jt  st  js                                22
Some estimated
cumulative regression
coefficients for a
model for incidence
with fixed covariates
(may be interpreted
as expected numbers)

 We have (using "obvious" matrix notation)

                                           
       ˆ  β 
              s  Xs Xs
       Bt   ˆ      T                                  T
                                                     X Ys
            s t            s t                             Xsβ s  ε s
         Bt    X X s  X ε     T   1       T
                                   s            s s
                     s t
                     martingale transformation

Properties may be derived using martingale
methods as for Aalen's additive hazards model for
time-continuous event history data.
In particular B t is approximately multivariate
normal with a covariance matrix that may be
estimated by

 X X              X diag{λ s (1  λ s )}Xs  X Xs 
                1                                                   1
        T               T   ˆ        ˆ                         T
        s   s           s                                      s            24
s t
Dynamic covariates

                     How can past
                     episodes of
                     diarrhoea be
                     used to predict
                     future episodes?

Consider dynamic covariates of the form:

                   w Y  st is
           xit   s t

                   w R
                  s t
                         st   is

with Yis incidence (prevalence) of diarrhoea

                               for t  s  
  wst  
        exp   (t  s   ) for t  s  

Use  = 30 days and  = 0.01 below              26
A dynamic covariate may be on the causal pathway
between a fixed covariate and the event process

The inclusion of dynamic covariates in the analysis
may distort the estimation of the effects of the fixed

To avoid such distortion we at each time t regress
the dynamic covariates on the fixed covariates and
use the residuals from these fits as new covariates

This procedure keeps the effect of the fixed
covariates the same as in the model without the
dynamic covariates                                   27
Cumulative regression coefficients for incidence:

          Average number of         Average number of
          diarrhoea episodes        days with diarrhoea

Also: male, 3 or more per bedroom, contaminated water source,
open sewerage, rain affected accommodation, young mother        28
Martingale residual processes

 M it   (Yis  is )   1  xT ( XT X s ) 1 XT  ε s
 ˆ               ˆ
                                 s    s           s
        s t              s t

                                 martingale transformation

                                    Examples of
                                    martingale residual
                                    (standardized by
                                    model based SDs)

Empirical standard deviations of the
martingale residual processes:

Cumulative regression coefficients for prevalence:
                           Average number of           Diarrhoea previous
  Baseline                 days with diarrhoea         day (lag 1)

 Lag 2 (residual effect)   Lag 3 (residual effect)   Lag 4 (residual effect)

 Also: male, age, 3 or more per bedroom, poor street,
 contaminated water storage and source, standing water, open
 sewerage, rain affected accommodation, young mother
Prevalence: empirical standard deviations of
the martingale residual processes

Not Markovian!

Concluding comments:
A dynamic additive model provides a flexible
framework for analyzing longitudinal binary data
The method illustrate how ideas and approaches
from event history analysis may be useful for
analysis of longitudinal data
Advantage: method is computationally very quick

Drawback: incidence and prevalence are not
restricted to the range 0 to 1
Methodological work is needed, in particular on
methods for model selection and goodness-of-fit

Shared By: