raykar NIPS 2007 survival analysis slides by 9oaJ0il

VIEWS: 6 PAGES: 21

									On ranking in survival analysis: Bounds
              on the concordance index
                            Vikas C. Raykar | Harald Steck | Balaji Krishnapuram
     CAD & Knowledge Solutions (IKM CKS), Siemens Medical Solutions USA, Inc., Malvern, USA

                                               Cary Dehing-Oberije | Philippe Lambin
     Maastro clinic, University Hospital Maastricht, University Maastricht-GROW, The Netherlands

                                                                                NIPS 2007




                                                                                         1
Organization

•   Motivation
•   Brief review of survival analysis
•   Concordance index
•   Our proposed ranking approach
•   Connections to survival analysis
•   Results


                                        2
Motivation: Personalized medicine

Predict survival time of lung cancer
patients.

 Different kinds of treatment
 Chemo/radiotherapy dosage

                                                  Survival time


 Different patient characteristics
 Age/gender/health
                                     Dataset available from MAASTRO
                                     hospital our collaborator.

                                                                  3
Why not use regression?


• Not amenable to standard statistical/
machine learning methods due to
censored data.
• Well studied in statistics as survival
analysis.



                                      4
Review: Survival Analysis
Branch of statistics that deals with
time until the occurrence of a event
   When did a patient die ?
   When did the disease manifest?
   When did the machine fail?


Widely used in medical statistics, epidemiology,
reliability engineering, economics, sociology,
marketing, insurance, etc.
                                                   5
What is censored data?                               At the end of the
                                                     study a lot of
                                                     patients may
                                                     still survive.
     Start of the study   Patient unavailable               Data collected
                          for follow-up
                                                            at this time



         Some
         patients die
         during the                                            End of study
         study period.



                                                Censored Data
             Patient 1      Death




     2001                                                  2005
                                 TIME
                                                                          6
The exact survival time may be longer than the observation period
Censoring provides only partial information
                 Typically a large portion of the data is censored.

                                                   Observed Data
 Survival Time




                                                                7
                                           Censored Data
Notation: Survival analysis




                              8
Proportional Hazard (PH) Model
• Has become a standard model for studying the effect of
covariates on survival time distributions.
                                   unknown regression
                                   parameters

                                          relative
                                          hazard function


                     Baseline hazard function


        covariate
 • Parameter estimates for PH model are obtained by maximizing
  Cox’s partial likelihood.
                                                            9
Concordance Index or c-index
• Standard performance measure for model
assessment in survival analysis.

• Generalization of the area under the ROC
curve to regression problems/censored
data.

• Fraction of all pairs of subjects who's
survival times can be ordered such that the
subject with higher predicted survival is the
one who actually survived longer.
                                             10
Concordance Index-no censoring
       5
                                              5


       4       Survival time                 4

                                       3
           3                       2
                               1

                                           covariate
           2


                  C=1 perfect prediction accuracy
       1          C=0.5 as good as a random predictor



                                                       11
Concordance Index-with censoring
                 5
                                                       5


                 4         Survival time               4

                                                   3

                     3                     1   2

No arrow can
go above a
censored point       2




                     1
                         Censored
                                                           12
Proposed approach:      Maximize CI directly




 • While CI is widely used to evaluate a learnt
 model, it is not generally used as an objective
 function for training.

 • CI is invariant to monotone transformation of the
 survival times.

 • Hence the model learnt by maximizing the CI is a
 ranking function. (N-partite ranking problem)
                                                   13
Lower bounds on the CI

Discrete optimization problem



                                     Use a differentiable
                                     concave lower bound




                                Related to the PH model



                                                     14
Maximize lower bounds on the CI




                             Linear ranking functions




            Regularization
                                     Use gradient
                                     based methods to
                                     maximize this

                                                        15
Connection to the PH model
Log-likelihood for correct ranking




For a proportional hazard model we can
show that




This is a common assumption made in ranking
literature. We have shown that if we use PH
models this is exactly the case.
                                              16
Penalized log-likelihood




 Compare this with the objective function
 using the lower bound approach




                                            17
Cox partial likelihood


• Our proposed method explicitly maximizes
a lower bound.
• Cox method maximizes partial likelihood.
• Experimental results indicate that both do
well.
• Conjecture: Is Cox’s partial likelihood also
a lower bound on the CI?


                                            18
Cox partial likelihood (cont.)




                                 19
Results
          Proposed method slightly
          better than Cox-PH.

          However differences not
          significant.




                               20
Thank You ! | Questions ?




                            21

								
To top