Introduction to survival analysis

Document Sample
Introduction to survival analysis Powered By Docstoc
					Exercises                                                    BEPAR summer school August 2007
Lia Hemerik and Patsy Haccou
Exercises in survival analysis.
All commercially available software packages require a similar layout for survival data. Therefore it
is useful to practice data input during this course. Some basic analyses can be illustrated on sample
data sets with any statistical software package in which a module for survival analysis is included.
We chose R, because this package is freely available on the internet. You are advised to save all
your output from the computer practical on the hard disk of your computer, because we will
sometimes ask you to compare results from different analyses.

Introduction to survival analysis.
Exercise 1.1
We simulate failure times by ‘throwing a die’ with R. The random numbers drawn from a geometric
distribution represent the number of non-successes before the first success. Therefore, e.g. 100
failure times are generated in R with success probability 1/6 using the command ‘time<-
1+rgeom(100, 1/6)’. From the resulting failure times we now construct censored observations: find
out where the failure times greater than 15 are located in the object ‘time’ and put these positions in
an object called ‘index’. All runs of length 16 or larger are considered censored observations of
length 16. Make an object ‘fc’ that has to contain a ‘1’ for each failure time and a’0’ for each
censored observation. This is easily done when you start with ‘fc’ containing 100 ‘1’ `s.

a. What kind of hazard rate were you simulating with these die throws? And from what kind of
   theoretical distribution are the failure times drawn?
b. Suggest how other forms of a hazard rate can be simulated with a die.

Exercise 1.2
The data that you put in the objects ‘time’ and ‘fc’ in exercise 1.1 can be used for survival analysis
with R. Decide how many variables you have to define in the SPSS input when making a data set
that summarises your simulated failure and censor times adequately.
a. Use the necessary objects in R and make a histogram of all data or of failure times only.
b. What do you expect the log survivor function to look like? Make a survivor plot, and a log
    survivor plot of the data.

Survivor functions: estimation and tests.
Exercise 2.1
This exercise is meant for exploring data with the help of life tables and the product limit estimator
(as defined by Kaplan and Meier). These are created in R using the function survfit() (look into the
help of this function).
a. The file with all data from two treatment groups of leukaemia patients (from Kleinbaum 2005,
    p. 17). One group got a new treatment and patients in the other have been treated in the standard
    way is given in remissionfc.txt. Make a life table for the two groups of patients and find out
    what the output of R means. At this stage, we only use three columns nr_weeks (failure times),
    group (treatment=0, control=1) and failcens (1=event happened, 0=censored observation).
b. Different kinds of plots can be made survivor and log survivor. Make a product-limit estimator
    table and plot for both groups in remission in R. Try to follow the calculations by yourself,
    using the PowerPoint files as help. Details can also be found in Kleinbaum (2005, p. 51-53).

Exercise 2.2
In this exercise we want you to get a feeling for the way in which censored observations influence
your conclusions.

                                                                                                         1
Exercises                                                         BEPAR summer school August 2007
Lia Hemerik and Patsy Haccou
a. We first ignore all censored data in the formerly used data set remissionfc.txt and use the data
   set remission_allfail.txt. For these cases we want the plot of the log survivor and the outcome
   of the log-rank test.
b. Now we include the censored observations (i.e. use data set remissionfc.txt), make a plot of the
   log survivor and a calculation of the log rank test statistic. Compare your results with those
   obtained in part a of this exercise.

Exercise 2.3
You can find the extended remission data with the variable log WBC “logarithm of white blood cell
count” in the file remission.txt. We can stratify on this covariate and compare survival curves for
different strata (stratification is dealt later in this course).
a. Suppose we want to describe Kaplan-Meier curves for the variable log WBC. Because log WBC
    is continuous, we need to categorise this variable before we compute Kaplan-Meier curves.
    Suppose we categorise log WBC into three categories -low, medium, high- as follows:
    Low (0-2.30), n=11; Medium (2.31-3.00), n=14; High (>3.00), n=17.
    Based on this classification, compute and graph Kaplan-Meier curves for each of the three
    categories of log WBC.
b. Compare the three Kaplan-Meier curves you obtained in part a. How do they differ?
c. Calculate the log rank test for comparing these three groups.

Cox’ regression model with one ore more covariates.
Exercise 3.1
First figure out how Cox’ regression model works in case only one covariate is considered: for the
remission data (remission.txt) we compared the survival curves for group status in Exercise 2.2.
The log rank test gave a significant difference between the two treatment groups. The size of the
effect is, however, as yet not known.
a. Analyse the remission data with Cox’ regression model. The only effect we currently want to
    quantify is the effect of treatment.
b. The table below contains part of the output of the analysis. Identify i. the regression coefficient,
    ii. the hazard ratio, iii. the standard deviation of the regression coefficient, iv. the test statistic, v.
    the degrees of freedom, vi. the effect and vii. the significance of the test. What can you conclude
    from this analysis?

      coef      exp(coef)        se(coef)        z        p
group 1.57       4.82            0.412           3.81     0.00014

Exercise 3.2
a. The files kidneyf.txt and kidneym.txt contain data on kidney transplant patients, respectively
   females and males. The covariate colosexe is a factor with two categories ‘white’ and ‘black’.
   We explore these data for an effect of colour on the survival time.
b. Now we take the combined file kidney.txt (females and males together) containing data on
   kidney transplant patients. The covariate colosexe is a categorical factor with four categories.
   Analyse the data with Cox’ regression model. Give an interpretation of the resulting table.
c. One covariate codes for both colour and sex (‘wm = white male’, ‘bm = black male’, ‘wf =
   white female’, ‘bf = black female’). Code the data so that you have one column coding for
   colour and one column coding for sex. Analyse the data again with Cox’ regression model and
   draw conclusions on the effect of colour, sex, and their interaction. Are the results in accordance
   with what you would expect from the results of part b?

                                                                                                             2
Exercises                                                  BEPAR summer school August 2007
Lia Hemerik and Patsy Haccou


Exercise 3.3
For the remission data (remission.txt) we compared the survival curves for group status in Exercise
3.1. In Exercise 2.3 we have looked at the effect of the covariate Log WBC. We now want to
examine the effects of both covariates at the same time and study their interaction.
a. Analyse the remission data with Cox’ regression model: (i) group status and Log WBC without
    their interaction and (ii) group status, Log WBC with their interaction.
b. Compare the resulting tables and draw your conclusions about significance of univariate effects
    and their interaction. Perform a simultaneous test with the help of variance/covariance matrix
    (V) of the covariates. You can do the calculations in R with the matrix inverse function
    solve(V), matrix multiplication %*% and the estimations of beta. The test statistic you have to
    use is betaT V-1 beta, where beta is a column vector.

Proportionality assumption and stratification.
Exercise 4.1
For the data set canclary.txt we want to check the proportionality assumption.
a. Compare the Kaplan-Meier survivor functions for each of the covariates with the adjusted
   survivor functions as estimated with the proportional hazards model.
b. Compare the log-minus-log plots of the Kaplan-Meier survivor with those obtained with Cox’
   regression model.

Exercise 4.2
For the data set remission.txt we check the proportionality assumption.
a. Compare the Kaplan-Meier survivor functions for each of the covariates with the adjusted
   survivor functions as estimated with the proportional hazards model.
b. Compare the log-minus-log plots of the Kaplan-Meier survivor with those obtained with Cox’
   regression model.




                                                                                                  3