Introduction to Biostatistics II Session 11 Survival analysis

Reviews
Shared by: gregoria
Stats
views:
112
rating:
not rated
reviews:
0
posted:
11/21/2008
language:
pages:
0
Introduction to Biostatistics II Session 11 Survival analysis We are using the data set hemophiliac_patients.sav. A listing of the data is as follows: Case Summaries(a) age <=40 <=40 <=40 <=40 <=40 <=40 <=40 <=40 <=40 <=40 <=40 <=40 >40 >40 >40 >40 >40 >40 >40 >40 >40 N 21 survival 2 3 6 6 7 10 15 15 16 27 30 32 1 1 1 1 2 3 3 9 22 21 censor 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Total a Limited to first 100 cases. where age is a dichotomized variable (less than 40 years and over 40 years), survival is the time (in months) until death (if censor=1) or the end of follow-up (sometimes called as the last time known alive, if censor=0). In the original version of the dataset all subjects died while on observation. An estimate of survival is based on the argument that, for each time interval, a person lives to the end of the interval given that they are alive at the beginning of the interval. But to be alive to that point they would have to have survived to the end of the previous interval and so on. Here we concentrate on the Kaplan-Meier method of how to separate the time axis into intervals. Each interval is established between each previous death until the subsequent death. For example consider the first group (<=40 years of age). There are 12 individuals alive at the beginning (t=0). Then, at the time of the first death (t=2) 1/12 individuals dies so that 11/12 are still alive. So S(1)=0.9167. At the time of the second death (t=3) there is another death. So the survival rate in the interval (2, 3) is 10/11=0.9091. To survive past t=3, one must survive past t=2 (there is a 0.9167 chance of that) then live within the interval (2, 3) (the chance of that is 0.9091). To survive both S(3)=(0.9167)×(0.9091)=0.8333. Then, in the period until the third death (t=6) there are two deaths, so the survival rate in the interval (3, 6) is 8/10. To survive past t=6 you must survive up to t=3 (with probability about 0.8333 as we saw) then make it through (3, 6) (with probability 0.8000). Then the survival past t=6 is S(6)=(0.8333)×(0.8000)=0.6666… and so on. We can generate all this by SPSS as follows (we will split the file by age group as follows) then we carry out the Kaplan-Meier procedure We define the event (death) as censor=1 (only the <=40 years group is shown here): age: <=40 Survival Analysis for survival Time 2 3 6 6 7 10 15 15 16 27 30 32 Status 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 Cumulative Survival .9167 .8333 .6667 .5833 .5000 .3333 .2500 .1667 .0833 .0000 Censored: 0 Standard Error 3 5 Standard Error .0798 .1076 .1361 .1423 .1443 .1361 .1250 .1076 .0798 .0000 Cumulative Events 1 2 3 4 5 6 7 8 9 10 11 12 Number Remaining 11 10 9 8 7 6 5 4 3 2 1 0 Number of Cases: 12 Survival Time Mean: Median: 14 10 ( .00%) Events: 12 95% Confidence Interval ( ( 8, 1, 20 ) 19 ) To plot the K-M plot we proceed as follows by choosing the survival plot in the options of the Kaplan-Meier analysis. Then the plot is as follows: Survival Function age: <=40 1.0 0.8 The survival function remains horizontal until there is a failure, then drops by the death rate in that interval (i.e., 1/12 in the first step, 1/11 in the second, 2/10 in the third, 1/8 in the fourth and so on). Notice that there are 9 steps, since SPSS starts a the first death (most software packages start at t=0 with survival S(0)=1.0000). Then there are two tied deaths out of the remaining 11 subjects. Cum Survival 0.6 0.4 0.2 0.0 0 10 20 30 survival Inspection of the Kaplan-Meier plot can produce an estimate of the median survival (mean survival is not really a great summary since survival times are usually decidedly non-normal). To do this we draw a horizontal line from the 50% survival percentile on the y axis. Then, where the line intercepts the Kaplan-Meier survival curve, we draw a vertical line. The median survival is the survival where the perpendicular line intercepts the x axis. See annotation in the above plot. The median seems to be about 10 months (as shown in the output above). Censoring How is censoring (no observed failure) handled by the Kaplan-Meier procedure. For example we change the censoring variable to zero at time t=3 and the failure at time t=10. Then the estimates of survival are as follows. At the end of the first interval (up to the first death at time t=2) the K-M estimate of survival is S(2)=11/12=0.9167 as before. Since no one has failed in the interval (2, 3) (the second subject was censored so they are considered alive until the end of that interval), so the survival rate within the interval is 100%. So, at the end of the interval (2, 3), S(3)=0.9167 (still 11/12 alive up to t=3). At the end of the second interval the censored subject vanishes and there are 10 subjects alive at the beginning of the interval (3, 6). Two persons die at t=6 so the survival rate within (3, 6) is 8/10=0.8000, so S(6)=0.9167×0.8000=0.7333. The SPSS output for the under-40 group is as follows: age: <=40 Survival Analysis for survival Time 2 3 6 6 7 10 15 15 16 27 30 32 Status 1.00 .00 1.00 1.00 1.00 .00 1.00 1.00 1.00 1.00 1.00 1.00 12 Cumulative Survival .9167 .7333 .6417 .4278 .3208 .2139 .1069 .0000 Censored: 2 Standard Error .0798 .1324 .1441 .1565 .1495 .1325 .1005 .0000 ( 16.67%) Cumulative Events 1 1 2 3 4 4 5 6 7 8 9 10 Events: 10 Number Remaining 11 10 9 8 7 6 5 4 3 2 1 0 Number of Cases: Survival Time Mean: Median: 16 15 Standard Error 3 6 95% Confidence Interval ( ( 10, 4, 23 ) 26 ) Survival Function age: <=40 1.0 Survival Function Censored 0.8 The new median is 15 months as survival is estimated to be longer since these two individuals were not observed to die. The K-M plot is shown here to the left. Compare this to the previous plot (especially at the points of the censoring observations). Cum Survival 0.6 0.4 0.2 0.0 0 10 20 30 survival Comparing the survival between two groups We can compare the survival experience between two groups using the log-rank test. This tests the null hypothesis Ho: S1=S2 versus the alternative hypothesis Ho: S1≠S2. The log-rank test is based on comparing the expected number of deaths under the null hypothesis to the observed number of deaths. If these two numbers are much different the test rejects the null hypothesis in favor of the alternative. At each interval i, out of the di total deaths observed in that interval we would expect the ratio of deaths in each group should be close to the ratio of the number of individuals that are alive at the beginning of the interval in each of the two groups. If r1i is the number of individuals in group 1 alive at the beginning of interval i and ri the total number of individuals in both groups alive at the beginning of time interval i, ⎛r ⎞ then, the expected number of deaths in the first group at i is e1i = d i ⎜ 1i ⎟ . The sum of the squared ⎝ ri ⎠ deviations of the observed from the expected number of deaths, divided by the expected number of deaths over all the time intervals produces the log-rank statistic, which is distributed according to a chisquare distribution with 1 degree of freedom. The log-rank test is produced as follows (after we again “unsplit” the group to perform an overall analysis). Survival Analysis for survival Factor age = <=40 Time 2 3 6 6 7 10 15 15 16 27 30 32 Number of Cases: Status 1.00 .00 1.00 1.00 1.00 .00 1.00 1.00 1.00 1.00 1.00 1.00 12 Cumulative Survival .9167 .7333 .6417 .4278 .3208 .2139 .1069 .0000 Censored: 2 Standard Error .0798 .1324 .1441 .1565 .1495 .1325 .1005 .0000 ( 16.67%) Cumulative Events 1 1 2 3 4 4 5 6 7 8 9 10 Events: 10 Number Remaining 11 10 9 8 7 6 5 4 3 2 1 0 Survival Time Mean: Median: 16 15 Standard Error 3 6 95% Confidence Interval ( ( 10, 4, 23 ) 26 ) Survival Analysis for survival Factor age = >40 Time 1 1 1 1 2 3 3 9 22 Number of Cases: Status 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 9 Cumulative Survival Standard Error Cumulative Events 1 2 3 4 5 6 7 8 9 .00%) Events: 9 Number Remaining 8 7 6 5 4 3 2 1 0 .5556 .4444 .2222 .1111 .0000 Censored: 0 .1656 .1656 .1386 .1048 .0000 ( Survival Time Mean: Median: 5 2 Standard Error 2 1 95% Confidence Interval ( ( 0, 0, 9 ) 5 ) Survival Analysis for survival Total age age Overall <=40 >40 12 9 21 Number Events 10 9 19 Number Censored 2 0 2 Percent Censored 16.67 .00 9.52 Test Statistics for Equality of Survival Distributions for age Statistic 8.02 df 1 Significance .0046 Log Rank The log-rank test produces a statistic 8.02>3.84 and a p value p=0.0046<0.05 we reject the null hypothesis. The survival between younger and older hemophiliac HIV-infected patients is not equal. Observing the figure we decide that the under-40 group enjoys longer survival than the older-than-40 group. Survival Functions age <=40 >40 <=40-censored 0.8 1.0 Cum Survival 0.6 0.4 0.2 0.0 0 10 20 30 survival

Related docs
premium docs
Other docs by gregoria