Document Sample

Exercises BEPAR summer school August 2007 Lia Hemerik and Patsy Haccou Exercises in survival analysis. All commercially available software packages require a similar layout for survival data. Therefore it is useful to practice data input during this course. Some basic analyses can be illustrated on sample data sets with any statistical software package in which a module for survival analysis is included. We chose R, because this package is freely available on the internet. You are advised to save all your output from the computer practical on the hard disk of your computer, because we will sometimes ask you to compare results from different analyses. Introduction to survival analysis. Exercise 1.1 We simulate failure times by ‘throwing a die’ with R. The random numbers drawn from a geometric distribution represent the number of non-successes before the first success. Therefore, e.g. 100 failure times are generated in R with success probability 1/6 using the command ‘time<- 1+rgeom(100, 1/6)’. From the resulting failure times we now construct censored observations: find out where the failure times greater than 15 are located in the object ‘time’ and put these positions in an object called ‘index’. All runs of length 16 or larger are considered censored observations of length 16. Make an object ‘fc’ that has to contain a ‘1’ for each failure time and a’0’ for each censored observation. This is easily done when you start with ‘fc’ containing 100 ‘1’ `s. a. What kind of hazard rate were you simulating with these die throws? And from what kind of theoretical distribution are the failure times drawn? b. Suggest how other forms of a hazard rate can be simulated with a die. Exercise 1.2 The data that you put in the objects ‘time’ and ‘fc’ in exercise 1.1 can be used for survival analysis with R. Decide how many variables you have to define in the SPSS input when making a data set that summarises your simulated failure and censor times adequately. a. Use the necessary objects in R and make a histogram of all data or of failure times only. b. What do you expect the log survivor function to look like? Make a survivor plot, and a log survivor plot of the data. Survivor functions: estimation and tests. Exercise 2.1 This exercise is meant for exploring data with the help of life tables and the product limit estimator (as defined by Kaplan and Meier). These are created in R using the function survfit() (look into the help of this function). a. The file with all data from two treatment groups of leukaemia patients (from Kleinbaum 2005, p. 17). One group got a new treatment and patients in the other have been treated in the standard way is given in remissionfc.txt. Make a life table for the two groups of patients and find out what the output of R means. At this stage, we only use three columns nr_weeks (failure times), group (treatment=0, control=1) and failcens (1=event happened, 0=censored observation). b. Different kinds of plots can be made survivor and log survivor. Make a product-limit estimator table and plot for both groups in remission in R. Try to follow the calculations by yourself, using the PowerPoint files as help. Details can also be found in Kleinbaum (2005, p. 51-53). Exercise 2.2 In this exercise we want you to get a feeling for the way in which censored observations influence your conclusions. 1 Exercises BEPAR summer school August 2007 Lia Hemerik and Patsy Haccou a. We first ignore all censored data in the formerly used data set remissionfc.txt and use the data set remission_allfail.txt. For these cases we want the plot of the log survivor and the outcome of the log-rank test. b. Now we include the censored observations (i.e. use data set remissionfc.txt), make a plot of the log survivor and a calculation of the log rank test statistic. Compare your results with those obtained in part a of this exercise. Exercise 2.3 You can find the extended remission data with the variable log WBC “logarithm of white blood cell count” in the file remission.txt. We can stratify on this covariate and compare survival curves for different strata (stratification is dealt later in this course). a. Suppose we want to describe Kaplan-Meier curves for the variable log WBC. Because log WBC is continuous, we need to categorise this variable before we compute Kaplan-Meier curves. Suppose we categorise log WBC into three categories -low, medium, high- as follows: Low (0-2.30), n=11; Medium (2.31-3.00), n=14; High (>3.00), n=17. Based on this classification, compute and graph Kaplan-Meier curves for each of the three categories of log WBC. b. Compare the three Kaplan-Meier curves you obtained in part a. How do they differ? c. Calculate the log rank test for comparing these three groups. Cox’ regression model with one ore more covariates. Exercise 3.1 First figure out how Cox’ regression model works in case only one covariate is considered: for the remission data (remission.txt) we compared the survival curves for group status in Exercise 2.2. The log rank test gave a significant difference between the two treatment groups. The size of the effect is, however, as yet not known. a. Analyse the remission data with Cox’ regression model. The only effect we currently want to quantify is the effect of treatment. b. The table below contains part of the output of the analysis. Identify i. the regression coefficient, ii. the hazard ratio, iii. the standard deviation of the regression coefficient, iv. the test statistic, v. the degrees of freedom, vi. the effect and vii. the significance of the test. What can you conclude from this analysis? coef exp(coef) se(coef) z p group 1.57 4.82 0.412 3.81 0.00014 Exercise 3.2 a. The files kidneyf.txt and kidneym.txt contain data on kidney transplant patients, respectively females and males. The covariate colosexe is a factor with two categories ‘white’ and ‘black’. We explore these data for an effect of colour on the survival time. b. Now we take the combined file kidney.txt (females and males together) containing data on kidney transplant patients. The covariate colosexe is a categorical factor with four categories. Analyse the data with Cox’ regression model. Give an interpretation of the resulting table. c. One covariate codes for both colour and sex (‘wm = white male’, ‘bm = black male’, ‘wf = white female’, ‘bf = black female’). Code the data so that you have one column coding for colour and one column coding for sex. Analyse the data again with Cox’ regression model and draw conclusions on the effect of colour, sex, and their interaction. Are the results in accordance with what you would expect from the results of part b? 2 Exercises BEPAR summer school August 2007 Lia Hemerik and Patsy Haccou Exercise 3.3 For the remission data (remission.txt) we compared the survival curves for group status in Exercise 3.1. In Exercise 2.3 we have looked at the effect of the covariate Log WBC. We now want to examine the effects of both covariates at the same time and study their interaction. a. Analyse the remission data with Cox’ regression model: (i) group status and Log WBC without their interaction and (ii) group status, Log WBC with their interaction. b. Compare the resulting tables and draw your conclusions about significance of univariate effects and their interaction. Perform a simultaneous test with the help of variance/covariance matrix (V) of the covariates. You can do the calculations in R with the matrix inverse function solve(V), matrix multiplication %*% and the estimations of beta. The test statistic you have to use is betaT V-1 beta, where beta is a column vector. Proportionality assumption and stratification. Exercise 4.1 For the data set canclary.txt we want to check the proportionality assumption. a. Compare the Kaplan-Meier survivor functions for each of the covariates with the adjusted survivor functions as estimated with the proportional hazards model. b. Compare the log-minus-log plots of the Kaplan-Meier survivor with those obtained with Cox’ regression model. Exercise 4.2 For the data set remission.txt we check the proportionality assumption. a. Compare the Kaplan-Meier survivor functions for each of the covariates with the adjusted survivor functions as estimated with the proportional hazards model. b. Compare the log-minus-log plots of the Kaplan-Meier survivor with those obtained with Cox’ regression model. 3

DOCUMENT INFO

Shared By:

Categories:

Tags:
survival analysis, survival data, survival function, survival time, hazard function, failure time, time t, cox model, regression models, survival times, how to, proc lifetest, proportional hazards, hazard ratio, cox proportional hazards model

Stats:

views: | 11 |

posted: | 6/22/2010 |

language: | English |

pages: | 3 |

OTHER DOCS BY hwh10252

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.