Document Sample

Page Contents 1 of 17 Print Regression Spline-Based Flexible Relative Survival Model: Estimation, Inference and Validation Michal Abrahamowicz1; Amel Mahboubi1; Jean Faivre2; Claire Bonithon-Kopp3; Catherine Quantin34 1 Department of Epidemiology and Biostatistics, McGill University, Montreal, Que., Canada H3A 1A1. 2 Inserm, U866, Dijon, F-21079, France; Univ Bourgogne, Registre Bourguignon des Cancers Digestifs, Dijon, F-21079, France. 3 Inserm, U866, Dijon, F-21079, France; Univ Bourgogne, Dijon, F-21079, France. 4 Department of Biostatistics and Medical Informatics. Dijon University Hospital, France. E-mail: michal.abrahamowicz@mcgill.ca; amel.mahboubi@mail.mcgill.ca; catherine.quantin@chu-dijon.fr IPMs Sessions Funding: A Mahboubi was supported by a research grant from the Regional Council of Burgundy and INSERM. M. Abrahamowicz is a James McGill Professor of Biostatistics at McGill University. This research was supported by the NSERC grant # 228203 and the CIHR grant # MOP-81275. 1. Introduction Relative survival is an increasingly popular approach to modelling censored survival data in applications where the individual causes of death remain unknown, such as cohort studies based on cancer registries (Dickman, Sloggett et al. 2004). Several relative survival regression models have been proposed to discriminate between the effects of covariates on disease-specific mortality from their effects on other-causes mortality (Hakulinen and Tenkanen 1987; Esteve, Benhamou et al. 1990; Giorgi, Abrahamowicz et al. 2003; Dickman, Sloggett et al. 2004; Stare, Henderson et al. 2005; Remontet, Bossard et al. 2007). B T Page Contents 2 of 17 Print The current study is motivated by our belief that, similar to crude survival, an accurate assessment of the effects of continuous prognostic factors on relative survival requires simultaneous modelling and testing of both time-dependent and non-loglinear effects. In particular, we expand the relative survival model originally proposed by Esteve et al (Esteve, Benhamou et al. 1990) in a similar way Abrahamowicz and MacKenzie (Abrahamowicz and MacKenzie 2007) recently expanded the Cox’s model for crude survival. In the flexible ‘product model’ developed by Abrahamowicz and MacKenzie (Abrahamowicz and MacKenzie 2007), the effect of a continuous covariate on the logarithm of the hazard is modelled as a product of two ‘marginal’ IPMs functions, which represent, respectively, its time-dependent effect and non-linear relationship with the log- Sessions hazard. 2. Joint Flexible Modelling of Time-Dependent and Non-Loglinear Effects in Relative Survival We propose a flexible extension of the parametric relative survival regression model proposed by Esteve et al. (Esteve, Benhamou et al. 1990). Their additive-hazards model assumes that the observed hazard for all- cause mortality, λt , at time t after diagnosis of an individual with age a at diagnosis and with a covariate vector z, is the sum of two hazards: λt (t z, a ) = λe (t + a z s ) + λc (t z ) (1) B T Page Contents 3 of 17 Print The first component, λ e , represents the expected hazard function for all-causes mortality in the underlying general population, stratified on a limited subset zs of covariates, typically age and sex, and is obtained from relevant mortality statistics using external sources. The second component, λ c , is the hazard function for disease-related mortality, which is estimated from the data at hand. Disease-specific hazard, conditional on the covariate vector z is expressed as (Esteve, Benhamou et al. 1990): p r λc ( t z ) = exp ( ∑ β i z i ) ∑τ k I k (t ) (2) th i =1 k =1 where τk is the baseline mortality hazard in the k time-segment for patients with z=0, Ik is an indicator IPMs Sessions function (Ik=1 if tk-1 ≤ t < tk, and 0 otherwise), and βi (i=1,…,p) are the log hazard ratios corresponding to each of the p covariates. Model (2) carries the assumption of proportionality of hazards (PH), as βi are constant, i.e. independent of time t. Moreover, for quantitative predictors, βi is multiplied by an un-transformed predictor value zi, which implies log-linearity (LG), i.e. linearity of the predictor effect on the log hazard. Our new model respects the assumption of the additivity of the hazards of (i) all-causes and (ii) disease- specific mortality, as postulated by equation (1). The difference between our new model and the model originally proposed by Esteve et al (Esteve, Benhamou et al. 1990) concerns equation (2). In particular, we B T Page Contents 4 of 17 Print replace the parametric modelling of both baseline disease-specific hazard, and of covariates effects on this hazard, by a more flexible approach. Firstly, similar to relative survival models proposed by Giorgi et al (Giorgi, Abrahamowicz et al. 2003) and Remontet et al (Remontet, Bossard et al. 2007), to avoid clinically implausible ‘jumps’ in the absolute risks, we assume the baseline hazard is a smooth function of the follow- up time γ(t) rather than a step function as in (2). Secondly, when modelling the effects of continuous covariates on the baseline log hazard, we simultaneously relax both the PH and the log-linearity assumptions. Specifically, following the flexible product model proposed for crude survival by Abrahamowicz and IPMs MacKenzie (Abrahamowicz and MacKenzie 2007), we model the logarithm of the hazard ratio associated Sessions with a value zi of a (continuous) ith covariate, at time t, as a product of two functions: α i ( z i ) and β i (t ) . Accordingly, we write the disease-specific hazard, conditional on the covariate vector z, as: λc (t z ) = exp(γ (t )) * exp(∑ α i ( z i ) * β i (t )) (3) i As in (Abrahamowicz and MacKenzie 2007), α i ( z i ) in (3) represents a possibly non-linear dose-response function (drf), i.e. describes how the log hazard ratio changes with increasing value of the ith covariate. Because in (3), this function does not depend on t, the shape of the drf is assumed to remain unchanged over time. However, the impact of these differences in the covariate values does vary over time because α i ( z i ) is multiplied by the time-dependent function (tdf) β i (t ) . In other words, higher values of β i (t ) indicate that at a corresponding time t the differences in the values of zi have bigger impact on the disease-specific hazard. B T Page Contents 5 of 17 Print Similar to Abrahamowicz and MacKenzie (Abrahamowicz and MacKenzie 2007) and Giorgi et al (Giorgi, Abrahamowicz et al. 2003), all three functions in (3) are estimated by low-dimension quadratic regression splines, i.e. smooth piecewise quadratic polynomials, whose pieces join each other at pre-specified argument values termed knots (Ramsay 1988). Using quadratic splines ensures the continuity of the estimated function and its first derivative, while limiting the number of the estimated coefficients, which enhances statistical power for testing non-linear and/or time-dependent effects (Abrahamowicz, MacKenzie et al. 1996). Specifically, following (Abrahamowicz and MacKenzie 2007), we used 1 knot, i.e. a 2-pieces quadratic spline to model the non-linear drf α i ( z i ) , and 2 knots, corresponding to a 3-pieces spline for both functions IPMs Sessions of time: the time-dependent function βi(t) and the baseline log hazard γ(t). Finally, knots were positioned so as to ensure optimal and equal data support in each of the resulting intervals (Ramsay 1988). This implied a single knot at the sample median value of a continuous covariate for α i ( z i ) , and two knots at the terciles of the observed distribution of un-censored event times for both β i (t ) and γ(t) (Abrahamowicz, Ciampi et al. 1992). Accordingly, in our model (3): 3 α i ( z i ) = ∑ α il * Bl ,3 ( z i ) (4a) l =1 5 β i (t ) = ∑ β i , j * B j ,3 (t ) (4b) = j5 1 γ (t ) = ∑ γ k * Bk ,3 (t ) (4c) B T k =1 Page Contents 6 of 17 Print where B.,3 are quadratic, i.e. order 3, B-splines (de Boor 1978). Estimation of model (3) poses an un-identifiability problem because the two functions of the same covariate α i ( z i ) and β i (t ) ‘share the scale’ (Abrahamowicz and MacKenzie 2007). Abrahamowicz and MacKenzie (Abrahamowicz and MacKenzie 2007) suggested to circumvent this problem by imposing a ‘scale’ constraint on α i ( z i ) , so that the range of its estimated values is a priori constrained to 1. This implies a constraint that a value of β i (t ) , at a given time t, represents the adjusted difference in log hazards between IPMs the values of z i corresponding to, respectively, the highest and the lowest risk (Abrahamowicz and Sessions MacKenzie 2007). However, in the joint estimation procedure employed by Abrahamowicz and MacKenzie, implementation of the scale constraint required a complex trial-and-error procedure and caused occasional non-convergence problems (Abrahamowicz and MacKenzie 2007). To achieve an efficient full maximum likelihood estimation of model (3), we use an iterative alternating conditional algorithm. Alternating conditional algorithms are especially useful for estimation of complex models, involving different types of coefficients, some of which are subject to specific constraints that do not apply to other estimable coefficients (Ramsay and Abrahamowicz 1989; Abrahamowicz and Ramsay 1992). The basic idea is to partition the estimation into consecutive steps, each of which involves estimation of only a subset of all coefficients of interest. Simulations indicated very good performance of the alternating B T Page Contents 7 of 17 Print conditional estimation of complex regression spline-based models with various constraints (Ramsay and Abrahamowicz 1989; Abrahamowicz and Ramsay 1992). To estimate model (3), we implement a 3-step alternating conditional full maximum likelihood estimation (mle) algorithm. The 3 steps involve flexible estimation of, respectively, (1) baseline log hazard function γ(t), defined in (4c); (2) non-linear dose-response functions α i ( z i ) in (4a), for each continuous covariate z i ; and (3) time-dependent effect β i (t ) in (4b). At each step, the estimation of respective coefficients is IPMs conditional on the most recent estimates of all the coefficients of the two other functions. The iterations stop Sessions after the total reduction in the negative log likelihood (LogL) relative to the previous iteration, i.e. the sum of the changes in LogL attained at each of the three steps, is below 0.1. 3. Application We re-assess the role of prognostic factors for mortality among 3367 patients diagnosed with colon cancer. Time 0 was the date of the cancer diagnosis. Time-to-event was defined as the time to death of any cause. The cause of death was not recorded. There were 1593 deaths during 5 years of follow-up (52.7% censoring rate). B T Page Contents 8 of 17 Print Available prognostic factors included two continuous covariates: age at diagnosis and calendar period of diagnosis (1976 to 2000), as well as gender, tumour location (right vs left colon), and cancer stage at diagnosis, categorized into 4 stages (I-IV). Period of diagnosis was included among ‘prognostic factors’ to investigate if survival of colon cancer patients did change between 1976 and 2000. Parametric multivariable regression models, which impose conventional PH and log-linearity assumptions included the Cox’s proportional hazards (PH) model (Cox 1972) and the relative survival model of Esteve et al (Esteve, Benhamou et al. 1990). In all relative survival analyses, the expected hazard mortality ( λ e ) was IPMs Sessions quantified based on published age-, gender- and year of death-specific mortality rates in the general population of Côte d’Or, region in France, in which the 3367 cancer patients were identified. In both models older age, male gender, and higher cancer stage at diagnosis were all associated with significantly higher mortality. For period, the adjusted HR was significantly lower than 1.0 and suggested that with each consecutive year between 1976 and 2000 the risk of death decreased by about 3%. In our flexible multivariable ‘product model’ (3) for relative survival, for the two continuous covariates, age at diagnosis and calendar period of diagnosis, we modelled both non-linear and time-dependent effects (equations (4a) and (4b)). For each of the dummy variables for stages II-IV, we estimated their time- dependent effects. In contrast, we a priori assumed that the effects of gender and cancer location were B T Page Contents 9 of 17 Print constant over time and represented these effects by a single parameter (adjusted HR). Finally, the baseline log hazard γ(t) was modelled by the regression spline expansion defined in (4c). Table 1 summarizes the results of non-parametric tests. Each continuous variable shows very significant violations of both log-linearity and PH assumption, and the latter is also rejected for the two more advanced cancer stages (Table 1). Accordingly, we include all these effects in our final model. Table 1: p-values for the non-parametric LRT tests of non-linear (NL) and time-dependent (TD) effects in the full multivariable flexible model. IPMs Sessions Variable LRT for Test of NL LRT for Test of TD Age 0,0007 0,0001 Period of diagnosis 0,017 0,0018 Stage II 0,2715 Stage III 0,0001 Stage IV 0,0001 Figure 1 shows the adjusted spline estimates of time-dependent HR, relative to stage I, for cancer stages II- IV, obtained from the full flexible multivariable model. For advanced stages III and IV, the relative risks vary considerably over time, increasing steeply in the first year and then decreasing gradually. Comparison between the estimates shown in Figure 1 and the Esteve et al’s model reported in the right part of Table 1 B T Page Contents 10 of 17 Print indicates that, by imposing the incorrect PH assumption, the latter model substantially under-estimates the impact of high cancer stages on disease-specific mortality around 1-2 years after diagnosis. IPMs Sessions Figure 1: Time-dependent effects of cancer stages, relative to stage I. B T Page Contents 11 of 17 Print Figure 2 describes the adjusted effect of age at diagnosis on the log hazard of colon cancer-specific mortality. The spline estimate of the dose-response function (drf) in Figure 2(a) is non-monotone, with the lowest risk for patients diagnosed around 50 years, and higher risks for both very young and older subjects, which explains why the log-linearity hypothesis was rejected (p=0.0007 in Table 2). However, the significant time-dependent effect (tdf) in Figure 2(b), indicates that the impact of age at diagnosis is mostly limited to IPMs the first 0.5 year after diagnosis. The 3D surface in Figure 2(c) shows how the log hazard (vertical axis) Sessions changes with increasing age at diagnosis (horizontal axis) and follow-up time (the 3rd axis). Figure 2(d) compares the drf’s for age corresponding to selected times after cancer diagnosis, i.e. shows the horizontal cross-sections of the 3D surface at selected follow-up times. Whereas all four curves in Figure 2(d) replicate the U-shaped drf from panel (a), the curve for 6 months after diagnosis shows much bigger differences in risk than the four curves for the later phases of disease. B T Page Contents 12 of 17 Print IPMs Sessions B T Figure 2: Flexible modeling of the effects of age at cancer diagnosis. Page Contents 13 of 17 Print Figure 3 focuses on the adjusted effect of the calendar period of cancer diagnosis. Even if excess mortality generally decreased between 1976 and 2000, the drf curves in Figure 3 suggest two distinct calendar periods, of improving survival (1976-1985 and 1995-2000). IPMs Sessions B T Figure 3: Non-linear effects of calendar year at diagnosis, at selected times since diagnosis. Page Contents 14 of 17 Print 4. Discussion In the context of ‘crude survival’, Abrahamowicz and MacKenzie (Abrahamowicz and MacKenzie 2007), incorporate flexible modelling of both non-linear and time-dependent effects of continuous covariates. The same paper demonstrates that accounting for both effects is essential to avoid biased estimates and/or incorrect conclusions (Abrahamowicz and MacKenzie 2007). Therefore, we have developed a flexible relative survival model, which generalizes the additive-hazards relative survival model of Esteve et al (Esteve, Benhamou et al. 1990) by implementing the ‘product model’ approach proposed by Abrahamowicz IPMs and MacKenzie (Abrahamowicz and MacKenzie 2007) to estimate the effects of continuous prognostic Sessions factors on the hazard of disease-specific mortality. Flexible modelling of a continuous age effect revealed a statistically significant non-monotonicity, with a clinically important decrease in disease-specific mortality between 30 and 50 years of age at diagnosis (Figure 2). For the period of cancer diagnosis, Figure 3 shows that clinically meaningful improvements in disease-specific survival occurred in two distinct periods: 1976- 1985 and 1995-2000. It is likely that the two periods reflect different changes in medical or surgical procedures. Additional insights were obtained from flexible estimates of time-dependent effects. Age at diagnosis was found to have very important impact on the risk of dying almost immediately after diagnosis but matter little B T Page Contents 15 of 17 Print after the first year of follow-up. In contrast, higher cancer stage had little impact on mostly post-surgical mortality in the first 6 months after diagnosis. Both our relative survival model (3) and the crude survival model in (Abrahamowicz and MacKenzie 2007) assume that the two functions of the same covariate α i ( z i ) and β i (t ) are multiplied by each other. This implies that the shape of the flexible drf α i ( z i ) does not change over time, even if the strength of the covariate impact may change, as reflected by the tdf estimate β i (t ) (Abrahamowicz and MacKenzie 2007). IPMs Future research should develop efficient tests of the shape-invariance assumption, as well as reliable and Sessions user-friendly regression diagnostic methods for complex relative survival models. The methods proposed by Stare et al (Stare, Pohar et al. 2005) may be useful in this context. We add another method to the growing toolbox for flexible modelling of relative survival (Giorgi, Abrahamowicz et al. 2003; Lambert, Smith et al. 2005; Remontet, Bossard et al. 2007). Our model (3) can be considered an extension of the spline-based time-dependent relative survival model developed by Giorgi et al (Giorgi, Abrahamowicz et al. 2003). Limited comparisons between the two models confirmed that, similar to crude survival (Abrahamowicz and MacKenzie 2007), accounting for the non-loglinear effect of a continuous covariate may substantially change the estimate of its time-dependent effect. Indeed, in our dataset the model of Giorgi et al did not detect any significant time-dependent effect of the period of B T Page Contents 16 of 17 Print diagnosis (data not shown), in contrast to our results (Table 1). It would be interesting to explore if and how inclusion of the possibly non-loglinear effects of continuous covariates affects the fractional polynomials estimates of time dependent effects in multiplicative and additive relative survival models proposed by Lambert et al (Lambert, Smith et al. 2005), and to compare it with the flexible spline-based relative survival model of Remontet et al (Remontet, Bossard et al. 2007). In spite of some differences in the formulation of the model, its estimation and inference, all these models enhance our understanding of disease progression, and may ultimately improve clinical prognosis and management of various cancers. IPMs Sessions REFERENCES [1] Abrahamowicz, M., A. Ciampi, et al. (1992). "Nonparametric density estimation for censored survival data: regression-spline approach." The Canadian Journal of Statistics 20(2): 171-185. [2] Abrahamowicz, M., T. MacKenzie, et al. (1996). "Time-dependent hazard ratio: modelling and hypothesis testing with application in lupus nephritis " Journal of the American Statistical Association 91(436): 1432-1439. [3] Abrahamowicz, M. and T. A. MacKenzie (2007). "Joint estimation of time-dependent and non-linear effects of continuous covariates on survival." Statistics in Medicine 26(2): 392-408. [4] Abrahamowicz, M. and J. O. Ramsay (1992). "Multicategorical spline model for time response theory." Psychometrika 57(1): 5-27. [5] Cox, D. R. (1972). "Regression models and life tables (with discussion)." Journal of the Royal Statistical Society, Series B 34: 187-220. B T [6] de Boor, C. (1978). A Practical guide to Splines. New York. Page Contents 17 of 17 Print [7] Dickman, P. W., A. Sloggett, et al. (2004). "Regression models for relative survival." Statistics in Medicine 23(1): 51-64. [8] Esteve, J., E. Benhamou, et al. (1990). "Relative survival and the estimation of net survival: elements for further discussion." Statistics in Medicine 9(5): 529-38. [9] Giorgi, R., M. Abrahamowicz, et al. (2003). "A relative survival regression model using B-spline functions to model non-proportional hazards." Statistics in Medicine 22(17): 2767-84. [10] Hakulinen, T. and L. Tenkanen (1987). "Regression analysis of survival rates." Journal of the Royal Statistical Society, Series C 36: 309-317. [11] Lambert, P. C., L. K. Smith, et al. (2005). "Additive and multiplicative covariate regression models for IPMs relative survival incorporating fractional polynomials for time-dependent effects." Statistics in Sessions Medicine 24(24): 3871-85. [12] Ramsay, J. O. (1988). "Monotone regression splines in action (with discussion)." Statistical Science 3(4): 425-461. [13] Ramsay, J. O. and M. Abrahamowicz (1989). "Binomial regression with monotone splines: A psychometric approach." Journal of the American Statistical Association 84: 906-915. [14] Remontet, L., N. Bossard, et al. (2007). "An overall strategy based on regression models to estimate relative survival and model the effects of prognostic factors in cancer survival studies." Statistics in Medicine 26(10): 2214-28. [15] Stare, J., R. Henderson, et al. (2005). "An individual measure of relative survival." Journal of the Royal Statistical Society, Series C 54: 115-126. [16] Stare, J., M. Pohar, et al. (2005). "Goodness of fit of relative survival models." Stat Med 24(24): 3911- 25. B T

DOCUMENT INFO

Shared By:

Categories:

Tags:
regression splines, impact of information technology, congenital toxoplasmosis, ocular lesions, regression models, semiparametric regression, productivity impact, covariance structure, Longitudinal Data, empirical study

Stats:

views: | 18 |

posted: | 2/12/2011 |

language: | English |

pages: | 17 |

OTHER DOCS BY sdsdfqw21

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.