VIEWS: 100 PAGES: 47 CATEGORY: Accounting POSTED ON: 8/29/2010 Public Domain
SAMSI-SAMSEE, June 2008 Overcoming the scope and limitations of the literature: some examples of complex evidence synthesis Julian Higgins MRC Biostatistics Unit, Cambridge, UK with thanks to Becky Turner, Georgia Salanti, Debbi Caldwell Outline Outline • Complex synthesis: going beyond a simple pair-wise meta-analysis • Some examples illustrating syntheses resulting from a limited evidence base to answer the question of interest 1. Randomized trials 2. Three-way interaction in observational epidemiology What is the best use of available evidence? What is the best use of available evidence? • Suppose I need to make a decision about whether to use – intervention A or intervention B – in population P – to prevent bad outcome Y What is the best use of available evidence? What is the best use of available evidence? Good RCT Weak RCT of A vs B of A vs B in population Q in population P on outcome Y on outcome Y Good RCT Good RCT Observ. study of A vs B of A vs B of A vs B in population P in population P in population P on outcome Y on outcome Z on outcome Y Bias Good RCT of A' vs B' Good RCTs in population P of A vs C on outcome Y and B vs C in population P Diversity Indirect on outcome Y comparisons Tackling diversity Tackling diversity • Exchangeability? – Random-effects; borrow strength • Regression model? • For outcomes, might exploit correlation between Y and Z to use Z-studies to learn about Y Tackling bias Tackling bias • Genuine nuisance • Assume they average around zero? – Probably not • Almost certainly need external evidence on biases – From empirical research (e.g. meta-epidemiology) – From plausible ranges – From elicitation Tackling diversity and bias Tackling diversity and bias • Both bias and diversity can be addressed through elicitation – Rigour and relevance – Internal bias and external bias Turner RM, Spiegelhalter DJ, Smith GCS, Thompson SG. Bias modelling in evidence synthesis. Journal of the Royal Statistical Society Series A, in press Example: Effectiveness of routine anti-D prophylaxis Example: Effectiveness of routine anti-D prophylaxis Population: pregnant Rhesus negative women in UK Control: anti-D immunoglobulin given only after risk events during pregnancy and after delivery Intervention: anti-D immunoglobulin (500 IU) given antenatally (at 28 and 34 weeks’ gestation) + control care Outcome: prevention of ‘sensitisation’ (anti-D antibodies) which would affect a subsequent pregnancy NICE appraisal (2002) identified 8 comparative studies 1 randomised controlled trial 2 non-randomised studies with concurrent controls 5 non-randomised studies with historical controls How to handle biases in the anti-D studies? How to handle biases in the anti-D studies? Internal bias (lack of rigour) External bias (lack of relevance) Lack of randomisation Varying doses of anti-D Lack of blinding Different populations Confounding not addressed Varying control antenatal care Exclusions, losses to follow-up Timing of outcome etc… etc… NICE appraisal on routine anti-D prophylaxis Principal results based on two studies considered most relevant Chilcott et al. Health Technology Assessment, 7(4), 2003 Quantifying bias Quantifying bias Distributions are needed for each bias in each study. Not enough empirical evidence at the moment, so they construct distributions from bias ranges elicited as follows: (1) One assessor completes qualitative checklist for sources of bias (based on Downs & Black checklist,1998) (2) All assessors read original papers alongside bias checklists, and meet to resolve queries (3) Assessors (independently) mark on elicitation scales 67% ranges for each bias Adjusting for internal biases Adjusting for internal biases Each study provides an effect estimate y i , where y i ~ [θ , si2 ] if there are no internal or external biases. Assume impact of internal bias j in study i is δ ijI ~ [ μij ,σ ij 2 ] I I To allow for biases, assume additive model y i ~ [θ + μiI , si2 + σ iI 2 ] , where μiI = ∑ μij and σ iI 2 = ∑ σ ij 2 I I j j (Internal) bias-adjusted meta-analysis (y − μiI ) ∑s ( ) i 1 2 +σ I2 SE θ = ˆ ∑( ) −1 θˆ = i i i si2 + σ iI 2 ∑ (s ) −1 i 2 +σ I2 i i i Adjusting for external biases Adjusting for external biases Assume impact of internal bias j in study i is δ ijE ~ [ μij ,σ ij 2 ] E E To allow for biases, assume additive model, with heterogeneity y i ~ [θ + μiE , si2 + τ 2 + σ iE 2 ] where μiE = ∑ μijE and σ iE 2 = ∑ σ ijE 2 j j Adjusting for internal and external biases Adjusting for internal and external biases To allow for both biases, y i ~ [θ + μiI + μiE , si2 + σ iI 2 + τ 2 + σ iE 2 ] Bias-adjusted meta-analysis (y − μiI − μiE ) ∑s i α i + τˆ 2 β i ( ) 1 2 θˆ = i i SE θ = ˆ ∑ (s α i + τˆ β i ) −1 ∑ (s α i + τˆ β i ) −1 2 2 2 2 i i i i where α i = si2 ( si2 + σ iI 2 ) β i = τˆ 2 (τˆ 2 + σ iE 2 ) can be interpreted as quality weights and relevance weights respectively Adjusting for all biases in all 8 studies: Adjusting for all biases in all 8 studies: odds ratios and 95% confidence intervals odds ratios and 95% confidence intervals (a) Unadjusted (c) Bias-adjusted Bowman Hermann Huchet Lee MacKenzie Mayne Tovey Trolle Combined .01 .1 1 10 .01 .1 1 10 Odds ratio Comparison of unadjusted and bias-adjusted results Comparison of unadjusted and bias-adjusted results Odds ratio (95% CI) All 8 studies (unadjusted) 0.28 (0.17 to 0.46) All 8 studies (bias-adjusted) 0.25 (0.11 to 0.56) MacKenzie and Mayne (unadjusted) 0.37 (0.21 to 0.66) MacKenzie and Mayne (bias-adjusted) 0.23 (0.04 to 1.33) Indirect comparisons Indirect comparisons or or Network meta-analysis Network meta-analysis or or Multiple treatments meta-analysis Multiple treatments meta-analysis What is the best topical fluoride? What is the best topical fluoride? • Toothpaste • Gel • Varnish • Mouthrinse • A series of seven Cochrane reviews tackles these four therapies and comparisons among them Marinho, Higgins, Sheiham, Logan. CDSR 2002-2004 Fluoride data Fluoride data No. studies Gel Rinse Varnish Toothpaste Placebo Nothing 9 26 3 61 9 3 4 1 4 1 1 1 3 4 1 Indirect comparison Indirect comparison No. studies Gel Rinse Varnish Toothpaste Placebo Nothing 9 26 3 61 9 3 4 1 G–V 4 1 1 1 3 4 1 Indirect comparison Indirect comparison No. studies Gel Rinse Varnish Toothpaste Placebo Nothing 9 G–P 26 3 V–P 61 9 3 4 1 G–V 4 1 1 1 3 4 1 Indirect comparisons Indirect comparisons A B • Trials of A vs C • Trials of B vs C C C C • Theoretical relationship (A – B) = (A – C) – (B – C) • cancels out variation in C A B Performing indirect comparisons Performing indirect comparisons Simple approach (Bucher 1997) • Take YAC and YBC results of meta-analyses of available direct comparisons • Estimate ′ YAB = YAC − YBC • with variance var (YAB ) = var (YAC ) + var (YBC ) ′ • Can assume fixed or random effects for each direct comparison meta-analysis ′ • Can combine YAB from indirect analysis with YAB from direct head-to-heads Example 2 YiTP the study specific SMD with se siTP Within study: SMDiTP ~ N(θiTP, (s iTP)2) Random effects: θiTP~ N(μTP , τ2) Indirect evidence: e.g. μTP = μTG − μGP Priors: μAB ~ N(0,1000), τ ~ U(0,1) Relate the functional parameters to the basic ones (‘Coherence’ equations) μTP = μTG − μGP μGP = μGV − μVP μVP = μVT − μTP ………………… With multi-arm trials With multi-arm trials • We need to take into account the correlations between the estimates that come from the same study • A B C yiBC yiAC • The random effects (θiBC, θiAC) that refer to the same trial are correlated as well Distributions of the observations yiAC~N(θiAC,si2) (yiAC, yiBC )~MVN((θiAC ,θiBC),S) yiBC~N(θiBC,si2) S is the variance-covariance matrix estimated from the data Distributions of the random effects θiAC~N(μAC,τ2) (θiAC, θiBC )~MVN((μAC ,μBC),Σ) θiBC~N(μBC,τ2) Σ is the variance-covariance matrix of the random effects (involves τ) which is unknown μAB= μAC− μBC No. Control Sclerotherapy Beta- studies blockers Treatments Higgins & Whitehead for first 17 xC/nC xS/nS 1996, Stat Med bleeding in 7 xC/nC xB/nB cirrhosis 2 xC/nC xS/ nS xB/nB xiC ~Β (πiC,niC) Logit(πiC)=ui θiCS~N(μCS ,τ2) xiS ~Β (πiS,niS) Logit(πiS)=ui+θiCS θiCB ~N(μCB ,τ2) xiB ~Β (πiB,niB) Logit(πiB)=ui+ θiCB In the two 3-arms trials we only substitute (θiCS, θiCB )~MVN((μCS ,μCB),Σ) μSB= μCB− μCS Computational methods Computational methods Lu, Ades. Stat Med 2004; 23: 3105-24 Higgins, Whitehead. Stat Med 1996; 15: 2733-2749 • Analyses can be performed within classical or Bayesian framework • We choose a Bayesian framework using WinBUGS due to – ease and flexibility – ability to incorporate prior / external information – natural interpretations of results • Analyses incorporate random-effects meta-analysis models – Accounting for correlated effect sizes from multi-arm studies Fluoride synthesis results: extracts Fluoride synthesis results: extracts • Single head-to-head trial of gel vs varnish: Ranking SMD = 0.12 (95% CI: –0.13 to 0.37) Probability it’s Intervention • Extracted from multiple treatment synthesis: the best SMD = 0.05 (95% CrI: –0.10 to 0.21) Toothpaste 61% Varnish 24% Rinse 12% Gel 2% • 6 head-to-head trials of toothpaste vs rinse: Placebo 0% SMD = –0.10 (95% CrI: –0.46 to 0.26) No treatment 0% • Extracted from multiple treatment synthesis: SMD = –0.04 (95% CrI: –0.12 to 0.05) • A clear gain in precision Incoherence = weighted difference Toothpaste between direct and indirect evidence P – T = 0.34 Indirect Placebo T – G = – 0.15 6 Direct 1 T – G = – 0.09 3 P – G = 0.19 31 Gel Varnish 4 1 Rinse Evaluation of incoherence within closed loops Estimates with 95% confidence intervals Closed loops NGV NGR NRV PTG PTV PTR TGV TGR TRV PGV PGR PRV GRV AGRV PTGV PTGR PTRV TGRV PGRV PTGRV -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 SMD No. studies D G R V P Fup Baseline Year Water F (yes/no) 69 2.6 11.8 1968 0.2 13 2.3 3.8 1973 0.2 30 2.4 5.9 1973 0.1 3 2.3 2.7 1983 0 3 2.7 NA 1968 0.66 6 2.8 14.7 1969 0 1 2 0.9 1978 0 1 1 NA 1977 0 1 3 7.4 1991 NA 4 2.5 7.6 1981 0.33 Differences in year reflect differences in baseline Effectiveness A B C 1954 1994 Time Meta-regression results Meta-regression results Mult. treat. meta-regression Mult. treat. meta-analysis (slope=−0.04 (−0.08,−0.01)) Probability Probability Intervention Effect size Effect size it’s the best it’s the best Placebo 0 0% 0 0% Dentifrice 0.31(0.27,0.36) 62% 0.30(0.25,0.35) 31% Gel 0.23(0.13,0.34) 6% 0.24(0.13,0.35) 5% Rinse 0.29(0.22,0.36) 21% 0.30(0.23,0.36) 23% Varnish 0.24(0.09,0.38) 11% 0.30(0.14,0.45) 41% Het variance 0.17(0.14,0.21) 0.17(0.14,0.21) Concluding remarks: randomized trials Concluding remarks: randomized trials • Is this the future of meta-analysis of clinical trials? • Clinical and policy decisions (e.g. NICE) have obligation to use all relevant evidence – sometimes only weak or indirect evidence is available • Ability to tackle diversity and bias, and to incorporate indirect evidence in other ways – where to we draw the line? – perhaps we should synthesize the entire RCT literature! What is the joint association of NAT1, What is the joint association of NAT1, NAT2 and smoking in predisposition to NAT2 and smoking in predisposition to bladder cancer? bladder cancer? • NAT2 believed to enhance metabolism of toxic amines (e.g. in tobacco smoke) • NAT1 believed to activate toxic amines • we have a ‘rapid’ or a ‘slow’ version of each • smoking known to predispose to bladder cancer Systematic review of published studies Systematic review of published studies on NAT2, NAT1, smoking joint effects on NAT2, NAT1, smoking joint effects Single study of the Single study of the gene-gene-environment joint effects gene-gene-environment joint effects Smoking NAT2 NAT1 Cases Controls OR Slow 6 13 1 Rapid Rapid 8 16 1.08 (0.30, 3.9) No Slow 16 31 1.12 (0.36, 3.5) Slow Rapid 6 10 1.30 (0.32, 5.3) Slow 42 32 2.84 (0.97, 8.3) Rapid Rapid 41 26 3.42 (1.2, 10.1) Yes Slow 61 51 2.59 (0.92, 7.3) Slow Rapid 35 12 6.32 (2.0, 20.3) Taylor et al. Cancer Research 1998; 58: 3603-10 What is the best use of available evidence? What is the best use of available evidence? NAT1 Smoking Smoking NAT1 NAT2 NAT1 NAT2 Smoking NAT2 Smoking NAT1 NAT2 Experimental Bayesian synthesis with Experimental Bayesian synthesis with 41 extra studies (indirect evidence) 41 extra studies (indirect evidence) Study of NAT1 and smoking Study of NAT1 and smoking Disease Latent disease Smoking NAT1 NAT2 Proportions risk risk by NAT2 LR1 PNAT2 Slow ? πA LR2 1 – PNAT2 No LR3 PNAT2 Rapid ? πB LR4 1 – PNAT2 LR5 PNAT2 Slow ? πC LR6 1 – PNAT2 Yes LR7 PNAT2 Rapid ? πD LR8 1 – PNAT2 Study of NAT1 and smoking Study of NAT1 and smoking Disease Latent disease Smoking NAT1 NAT2 Proportions risk risk by NAT2 θ1 PNAT2 Slow ? πA θ2 1 – PNAT2 No θ3 PNAT2 Rapid ? πB θ4 1 – PNAT2 θ5 PNAT2 Slow ? πC θ6 1 – PNAT2 Yes θ7 PNAT2 Rapid ? πD θ8 1 – PNAT2 Study of NAT1 and smoking Study of NAT1 and smoking Disease Latent disease Smoking NAT1 NAT2 Proportions risk risk by NAT2 θ1 1 – λNAT2 Slow ? πA θ2 λNAT2 No θ3 Decomposition of risk: ? Rapid πB θ4 πA = θ1 × (1 – λNAT2) + θ2 × λNAT2 θ5 Slow ? πC θ6 Yes θ7 Rapid ? πD θ8 Study of NAT1 and smoking Study of NAT1 and smoking Disease Latent disease Smoking NAT1 NAT2 Proportions risk risk by NAT2 θ1 1 – λNAT2 Slow ? πA θ2 λNAT2 No θ3 1 – λNAT2 Decomposition of risk: ? Rapid πB θ4 λNAT2 πA = θ1 × (1 – λNAT2) + θ2 × λNAT2 θ5 1 – λNAT2 Assumption: Slow ? πC θ6 λNAT2 Yes The exposures are independent θ7 1 – λNAT2 Rapid πD (a strong assumption,?and not strictly necessary) θ 8 λNAT2 Borrowing strength across studies Meta-analysis of ORs derived from θ1, …, θ8 Sources of evidence on prevalence Sources of evidence on prevalence • Prevalence of smoking – WHO statistics, by country → direct prior distributions • Prevalence of genotypes, by ethnicity – ‘Internal’ evidence: other studies in the meta-analysis – ‘External’ evidence: other gene prevalence studies • Modelling relevance – External evidence can be • assumed to be true • treated as exchangeable with internal evidence • used as prior distributions • excluded Results (random-effects; WinBUGS) Results (random-effects; WinBUGS) Smoking NAT1 NAT2 OR-Taylor OR-synthesis Rapid 1 1 Slow Slow 1.12 (0.36, 3.5) 0.98 (0.52, 1.6) No Rapid 1.08 (0.30, 3.9) 0.83 (0.36, 1.8) Rapid Slow 1.30 (0.32, 5.3) 1.12 (0.52, 2.0) Rapid 2.84 (0.97, 8.3) 1.71 (1.01, 2.8) Slow Slow 2.59 (0.92, 7.3) 2.36 (1.47, 3.7) Yes Rapid 3.42 (1.15, 10.1) 1.36 (0.81, 2.1) Rapid Slow 6.32 (2.0, 20.3) 2.73 (1.7, 4.3) Concluding remarks Concluding remarks • Syntheses often require multiple sources of evidence • Bayesian framework is very convenient – beliefs about bias – external evidence on nuisance parameters • Most analyses can be done in classical framework but appear to be hard to do • Still much work to be done to refine methods