Document Sample

Generalized pairwise comparisons of prioritized outcomes in the two-sample problem Marc Buyse, ScD IDDI, Louvain-la-Neuve, and I-BioStat, Hasselt University, Belgium marc.buyse@iddi.com Outline • Key problems in clinical development • An example in cancer • A bit of theory • Back to the example • Another example in ophthalmology • Conclusions KEY PROBLEMS IN CLINICAL DEVELOPMENT Development costs are too high… Development times are too long… Dev_Days Approv_Days 50% = 4500 20 % = 2400 Source: Steven Hirschfeld (FDA) Days 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 Ref: Steven Hirschfeld, FDA (personal communication) Too few new drugs are approved… Ref: Arthur D. Little’s views on key Pharma trends, March 31, 2010 AN EXAMPLE IN CANCER Advanced colorectal cancer 420 subjects with previously untreated metastatic colorectal cancer R 210 210 LV5FU2 + oxaliplatin LV5FU2 new combination of 5-fluorouracil, standard regimen of 5-fluorouracil leucovorin and oxaliplatin and leucovorin until disease progression, intolerance to treatment, or death Progression-free survival 100% LV5FU2+oxaliplatin (n=210) LV5FU2 (n=210) 80% Progression-free Survival 60% HR = 0.66, P = 0.0003 40% 20% 0% 0 5 10 15 20 25 Month Survival 100% LV5FU2+oxaliplatin (n=210) LV5FU2 (n=210) 80% 60% HR = 0.83, P = 0.12 Overall Survival 40% 20% 0% 0 5 10 15 20 25 30 35 Month Oxaliplatin approved for metastatic colorectal cancer • In France (AFSSAPS) in 1996 • In Europe (EMEA) in 1999 • In the US (FDA) in 2002 Problems? 1. The two endpoints (OS and PFS) are analyzed separately. One endpoint suggests (PFS) statistically significant benefit, the other (OS) does not. On balance, do we claim treatment to be better? Problems? 1. The two endpoints (OS and PFS) are analyzed separately. One endpoint suggests (PFS) statistically significant benefit, the other (OS) does not. On balance, do we claim treatment to be better? 2. Neither endpoint is perfect: • PFS is not confounded by other treatments, is less affected by unrelated causes of death, and has more events • OS is clinically most relevant and is measured without bias or error Problems? 1. The two endpoints (OS and PFS) are analyzed separately. One endpoint suggests (PFS) statistically significant benefit, the other (OS) does not. On balance, do we claim treatment to be better? 2. Neither endpoint is perfect: • PFS is not confounded by other treatments, is less affected by unrelated causes of death, and has more events • OS is clinically most relevant and is measured without bias or error 3. The PFS ignores the time between progression and death. The time to first event ignores subsequent events. Thus, LV5FU2 + oxaliplatin might prolong the PFS of some patients, but shorten their remaining survival afterwards. Problems? 1. The two endpoints (OS and PFS) are analyzed separately. One endpoint suggests (PFS) statistically significant benefit, the other (OS) does not. On balance, do we claim treatment to be better? 2. Neither endpoint is perfect: • PFS is not confounded by other treatments, is less affected by unrelated causes of death, and has more events • OS is clinically most relevant and is measured without bias or error 3. The PFS ignores the time between progression and death. The time to first event ignores subsequent events. Thus, LV5FU2 + oxaliplatin might prolong the PFS of some patients, but shorten their remaining survival afterwards. 4. Traditional methods of analysis cannot differentiate between a modest benefit in all patients and a large benefit in some patients. A BIT OF THEORY General Setup Eligible subjects R Treatment (T ) Control (C ) Let Xi be the outcome of Let Yj be the outcome of i th subject in T (i = 1, … , n ) j th subject in C (j = 1, … , m ) Recall the Wilcoxon test Xi and Yj are realizations of a continuous or an ordered discrete variable. Let S1 , S2 , … , Sn be the ordered ranks of the outcomes observed in T. Wilcoxon (1945) proposed the test statistic with expectation and variance The Mann-Whitney form of the Wilcoxon test The Wilcoxon test statistic can be derived from consideration of all possible pairs of subjects, one from each treatment group. Let The Wilcoxon-Mann-Whitney test statisticW can be written as Gehan generalized the Wilcoxon test Gehan (1965) generalized the Wilcoxon test to the case of censored outcomes. Letting and denote censored observations, the pairwise comparison indicator is now First, generalize the test further for a single outcome measure Now let Xi and Yj be observed outcomes for any outcome measure (continuous, time to event, binary, categorical, …) All we require is that the pairwise comparison of observed outcomes Xi and Yj be able to classify the pair as favoring T , C , or neither (if outcomes Xi and Yj are tied or if either outcome is missing). pairwise Xi Yj comparison favors T favors C (favorable) (unfavorable) neutral uninformative Continuous outcome measure Pairwise comparison Pair is Xi Yj > * favorable Xi Yj ≤ * neutral Xi Yj < * unfavorable Xi orYj missing uninformative * chosen to reflect clinical relevance; = 0 is Wilcoxon test Time to event outcome measure Pairwise comparison Pair is Xi Yj > * or Yj > * favorable Xi Yj ≤ * neutral Xi Yj < * or Xi < * unfavorable otherwise uninformative * chosen to reflect clinical relevance; = 0 is Gehan test Binary outcome measure Pairwise comparison Pair is Xi = 1, Yj = 0 favorable Xi = 1, Yj = 1 or Xi = 0, Yj = 0 neutral Xi = 0, Yj = 1 unfavorable Xi orYj missing uninformative Generalized pairwise comparisons Let Xi and Yj be vectors of observed outcomes for any number of occasions of a single outcome measure, or any number of outcome measures that can be prioritized. All we require is that the pairwise comparison of prioritized outcomes Xi and Yj be able to classify the pair as favorable, unfavorable, or neither. Next, generalize the test to prioritized repeated observations of a single outcome measure… Occasion with Occasion with Pair is higher priority lower priority favorable ignored favorable unfavorable ignored unfavorable neutral ignored neutral uninformative favorable favorable uninformative unfavorable unfavorable uninformative neutral neutral uninformative uninformative uninformative Last, generalize the test to several prioritized outcome measures… Outcome with Outcome with Pair is higher priority lower priority favorable ignored favorable unfavorable ignored unfavorable neutral ignored neutral uninformative favorable favorable uninformative unfavorable unfavorable uninformative neutral neutral uninformative uninformative uninformative A general measure of treatment effect Extend the previous definition of Uij U is the difference between the proportion of favorable pairs and the proportion of unfavorable pairs. We call this general measure of treatment effect the « proportion in favor of treatment » (). The proportion in favor of treatment () is a linear transformation of the probabilistic index, P (X > Y ) : Situation P (X > Y ) T uniformly worse than C 0 1 T no different from C 0.5 0 T uniformly better than C 1 +1 The proportion in favor treatment () For a binary variable, is equal to the difference in proportions For a continuous variable , is related to the effect size d For a time-to-event variable, is related to the hazard ratio and the proportion of informative pairs f A re-randomization test for The test statistic U (or ) no longer has known expectation and variance. An empirical distribution of can be obtained through re- randomization. Tests of significance and confidence intervals follow suit. BACK TO THE EXAMPLE Prioritized outcomes for patients with metastatic colorectal cancer Priority Outcomes 1 Time to death with pairwise difference ≥ 12 months 2 Time to death with pairwise difference ≥ 6 but < 12 months 3 Time to death with pairwise difference < 6 months Prioritized outcomes for patients with metastatic colorectal cancer Priority Outcomes 1 Time to death from any cause 2 Time to objective progression of disease Prioritized outcomes for patients with early HER2neu overexpressing breast cancer Priority Outcomes 1 Time to death from any cause 2 Occurence of congestive heart failure 3 Time to distant metastases 4 Occurrence of second invasive cancer 5 Time to local recurrence Progression-free survival GENERALIZED PAIRWISE COMPARISONS (44,100 pairs) Difference in PFS Oxliplatin Standard Cumulative P-value * better better At least 12 months 2.6% 1.1% 1.5% 0.090 Between 6 and 12 15.5% 5.4% 11.6% <0.0001 months Less than 6 months 35.5% 22.9% 24.2% <0.0001 * Unadjusted for multiplicity Overall survival GENERALIZED PAIRWISE COMPARISONS (44,100 pairs) Difference in PFS Oxliplatin Standard Cumulative P-value * better better At least 12 months 10.9% 6.5% 4.4% 0.043 Between 6 and 12 14.7% 10.8% 8.3% 0.038 months Less than 6 months 17.0% 15.2% 10.1% 0.050 * Unadjusted for multiplicity Magnitude of benefits 0.044 ≥ 12 months 0.015 0.039 ≥ 6 months OS 0.101 PFS 0.018 < 6 months 0.126 0 0.05 0.1 0.15 Prioritized outcomes GENERALIZED PAIRWISE COMPARISONS (44,100 pairs) Difference in Oxliplatin Standard Cumulative P-value * better better Time to death 42.6% 32.5% 10.1% 0.050 Time to progression 9.1% 4.4% 14.8% 0.0054 * Unadjusted for multiplicity ANOTHER EXAMPLE Age-related Macular Degeneration 592 subjects with neovascular age-related macular degeneration R 296 296 Pegaptanib Sham Intravitreous injections of 3 mg of Sham injections (with a syringe applied pegaptanib (an anti–vascular on the surface of the eye to simulate endothelial growth factor) the pressure of an injection) every 6 weeks over a period of 54 weeks Endpoints NCKZO RHSDK DOVHR CZRHS ONHRC Measurement of visual acuity (number of letters of standardized chart correctly read) every 6 weeks Mean visual acuity over time 55 3 mg Sham 50 Mean visual acuity 45 40 35 0 6 12 18 24 30 36 42 48 54 Week Endpoints NCKZO RHSDK “clinically relevant loss”: DOVHR 15 letters 3 lines CZRHS ONHRC Primary endpoint: loss of < 15 letters of visual acuity at one year (prevention of major vision loss) The whole data, and nothing but the data: measurements of visual acuity Wk 0 6 12 18 24 30 36 42 48 54 Pt 1 43 25 22 11 15 13 11 7 11 11 Pt 2 75 69 63 65 60 73 51 53 Pt 3 71 68 73 75 67 Pt 4 51 41 51 36 38 38 37 37 Pt 5 42 50 52 48 47 48 42 42 40 39 Pt 6 55 55 63 61 66 69 64 63 72 64 Pt 7 29 48 43 44 43 43 43 45 47 Measurements of visual acuity with last observation carried forward Wk 0 6 12 18 24 30 36 42 48 54 Pt 1 43 25 22 11 15 13 11 7 11 11 Pt 2 75 69 63 65 60 73 51 51 53 53 Pt 3 71 68 73 75 67 67 67 67 67 67 Pt 4 51 41 51 36 38 38 37 37 37 37 Pt 5 42 50 52 48 47 48 42 42 40 39 Pt 6 55 55 63 61 66 69 64 63 72 64 Pt 7 29 48 43 44 44 43 43 43 45 47 Measurements of visual acuity at week 0 and week 54 Wk 0 6 12 18 24 30 36 42 48 54 Pt 1 43 11 Pt 2 75 53 Pt 3 71 67 Pt 4 51 37 Pt 5 42 39 Pt 6 55 64 Pt 7 29 47 Measurements of visual acuity changes from week 0 to week 54 Wk 0 54 diff Pt 1 43 11 -32 Pt 2 75 53 -22 Pt 3 71 67 -4 Pt 4 51 37 -14 Pt 5 42 39 -3 Pt 6 55 64 +9 Pt 7 29 47 +18 Loss < 15 letters in visual acuity between weeks 0 and 54 Wk 0 54 diff B Pt 1 43 11 -32 0 Pt 2 75 53 -22 0 Pt 3 71 67 -4 1 Pt 4 51 37 -14 1 Pt 5 42 39 -3 1 Pt 6 55 64 +9 1 Pt 7 29 47 +18 1 Binary endpoint STANDARD ANALYSIS Loss of < 15 Pegaptanib Sham Difference P-value letters at 1 year (N = 296) (N = 296) 65.2% 55.4% 9.8% 0.0123 December 2004 December 2004 Problems? 1. Binary endpoint ignores gains in vision Changes in vision on a continuous scale would be more sensitive to any change in vision Problems? 1. Binary endpoint ignores gains in vision Changes in vision on a continuous scale would be more sensitive to any change in vision 2. Binary endpoint only considers one time point (1 year) Time to loss of 3 lines uses all time points Problems? 1. Binary endpoint ignores gains in vision Changes in vision on a continuous scale would be more sensitive to any change in vision 2. Binary endpoint only considers one time point (1 year) Time to loss of 3 lines uses all time points 3. Binary endpoint is clinically relevant but insensitive Repeated measures models use all data and are very likely to be most sensitive Problems? 1. Binary endpoint ignores gains in vision Changes in vision on a continuous scale would be more sensitive to any change in vision 2. Binary endpoint only considers one time point (1 year) Time to loss of 3 lines uses all time points 3. Binary endpoint is clinically relevant but insensitive Repeated measures models use all data and are very likely to be most sensitive 4. Binary endpoint requires data imputation Time to loss of 3 lines and repeated measures models do not require imputation Ophthalmology - binary endpoint STANDARD ANALYSIS Loss of < 15 Pegaptanib Sham Difference P-value* letters at 1 year (N = 296) (N = 296) 65.2% 55.4% 9.8% 0.015 * ² test PAIRWISE COMPARISONS (87,616 pairs) Pegaptanib = Pegaptanib > Pegaptanib < P-value Sham Sham Sham 51.6% 29.1% 19.3% 9.8% 0.015 Ophthalmology - binary endpoint STANDARD ANALYSIS (STRATIFIED) Loss of < 15 Pegaptanib Sham Difference P-value* letters at 1 year (N = 296) (N = 296) 65.2% 55.4% 9.8% 0.012 * Cochran-Mantel-Haenszel test PAIRWISE COMPARISONS (12,907 pairs) Pegaptanib = Pegaptanib > Pegaptanib < P-value Sham Sham Sham 51.9% 29.6% 18.5% 11.1% 0.0095 Ophthalmology - continuous endpoint GENERALIZED PAIRWISE COMPARISONS (12,907 pairs) Change in VA Pegaptanib > Pegaptanib< P-value at 1 year Sham Sham At least 6 lines 11.4% 4.9% 6.5% 0.0013 At least 5 lines 4.4% 2.6% 8.3% 0.0011 At least 4 lines 5.1% 3.0% 10.4% 0.0007 At least 3 lines 6.2% 4.0% 12.6% 0.0005 At least 2 lines 7.9% 5.6% 14.9% 0.0003 At least 1 line 8.6% 7.2% 16.3% 0.0005 Less than 1 line 9.6% 8.6% 17.1% 0.0007 Ophthalmology - continuous endpoint GENERALIZED PAIRWISE COMPARISONS (12,907 pairs) Change in Pegaptanib > Pegaptanib< P-value visual acuity Sham Sham At week 54 48.0% 35.0% 13.0% 0.0036 At week 48 3.1% 1.0% 15.1% 0.0010 At week 42 1.1% 0.9% 15.3% 0.00093 At week 36 2.4% 0.9% 16.8% 0.0003 At week 30 0.8% 1.0% 16.6% 0.0003 At week 24 0.9% 0.6% 16.9% 0.0003 At week 18 0.7% 0.3% 17.3% 0.0003 At week 12 1.2% 0.4% 18.1% 0.0002 At week 6 0.4% 0.0% 18.5% 0.0002 CONCLUSIONS Generalized Pairwise Comparisons 1. are equivalent to well-known non-parametric tests in simple cases 2. allow testing for differences thought to be clinically relevant 3. allow any number of prioritized outcomes of any type to be analyzed simultaneously 4. naturally lead to a universal measure of treatment effect, The proportion in favor of treatment () 1. is a universal measure of treatment effect that can be calculated for any type of outcome measure 2. is directly related to classical measures of treatment effect (difference in proportions, effect size or hazard ratio) 3. for time-related outcomes such as survival, provides descriptive statistics on treatment effects in terms of differences in times to event References Buyse M. Generalized pairwise comparisons for prioritized outcomes in the two-sample problem. Statistics in Medicine, 2010. DOI: 10.1002/sim.3923. Buyse M. Reformulating the hazard ratio to enhance communication with clinical investigators. Clinical Trials 5:641-2, 2008.

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 9 |

posted: | 3/21/2013 |

language: | Unknown |

pages: | 65 |

OTHER DOCS BY ajizai

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.