EPI Evidence Based Medicine

Document Sample
EPI Evidence Based Medicine Powered By Docstoc
					EPI-820 Evidence-Based Medicine

     LECTURE 9: Meta-Analysis I

       Mat Reeves BVSc, PhD
  • Understand the rationale for quantitative synthesis
  • Describe the steps in performing a meta-analysis:
     – identification, selection, abstraction, and analysis.
  • Know the appropriate analytic approach for meta-
    analysis of key study designs:
     – Experimental (RCT’s)
     – Observational (cohort, case-control, diagnostic tests)
  • Other issues:
     –   Publication bias
     –   Quality assessment
     –   Random versus fixed effects models
     –   Meta-regression

• Facts:
   • For most clinical problems/public health issues there is an
     overwhelming amount of existing information, as well as new
     information produced every year
   • However, much of this information
       – isn't very good (= poor quality)
       – is derived from different methods & definitions (= poor
       – is often contradictory (= heterogeneity)
   • Very few single studies resolve an issue unequivocally (…..
     a “home run study”)

• So how should we go about summarizing medical
How do we summarize medical
• Traditional Approach
  • Expert Opinion
  • Narrative review articles
     – Validity? Unbiased? Reproducible?
     – Methods? (one study one vote?)
  • Consensus statements (group expert opinion)

• New Approach (Meta-analysis)
  • Explicit quantitative synthesis of ALL the evidence

Definition - Meta-analysis

• A technique for quantitatively combining the results of
  previous studies to:
   • Generate a summary estimate of effect, OR
   • Identify and explain heterogeneity

• Alternate definition: a study of studies, to help guide
  further research and identify reasons for
  heterogeneity between studies.

• Overview or Synthesis


 • Initially developed in social sciences in mid-1960’s
 • Adapted to medical studies in early 1980’s
 • Initially applied to RCT’s – esp. when indv. studies
   were small and under-powered
 • Also applied to observational epidemiologic
   studies – often with little fore-thought which
   generated much controversy
 • Explosion in the number of published meta-
   analyses in the last 10-15 years.


• Often the initial step of a cost-effectiveness
  analysis, decision analysis, or grant
  application (esp. for RCT’s).

• Are much cheaper than a big RCT!!!

• Usually correspond to later randomized trials,
  but not always (from LeLorier, 1997):

Discrepancies between meta-analyses
and subsequent large RCT’s
(LeLorier NEJM 1997)

                           Results of RCT’s
   Results of          Positive        Negative
            Positive     13               6

           Negative         7            14

    27/40 (68%) agreement
When is a meta-analysis appropriate?

• When several studies are known to exist
• When studies disagree (= heterogeneity) resulting in
  a lack of consensus
• When both exposures and outcomes are quantified
  and presented in a useable format.
• When existing individual studies are under-powered
   • M-A could then produce a precise estimate of effect
• When you want to identify reasons for heterogeneity
   • M-A could illustrate why and identify important sub-group
• When no one else has done it (yet!), or an update of
  an existing meta-analysis is justified.
Before you begin…… plan
• M-A’s appear easy to do but require careful planning,
  and adequate resources (time = $$$)
• Need to develop study protocol
   • Specify primary and secondary objectives
   • Methods
      – Describe search strategy (sources, published studies only?,
        fugitive lit?, blinding?, reliability checks?)
      – Define eligibility criteria
      – Type of quality assessment (if any)
   • Analysis
      – Type of model (fixed vs random, use of quality scores?)
      – Subgroup analyses?
      – Sensitivity analysis?

Estimating Time Required to do a M-A
• Meta-Works (Boston, MA), private company
   • Provided estimates based on 37 M-A’s
   • Size of the body of literature, quality, complexity, reviewer
     pool and support services all important

• Aver. total # hrs per study = 1139 (range 216 – 2516)
   •   Search, selection, abstraction = 588 hrs
   •   Stat Analysis = 144 hrs
   •   Write up = 206 hrs
   •   Other tasks = 201 hrs

• Size of body of literature before any deletions (x) is
  best single guide (Hrs = 721 + 0.243x – 0.0000123x2)
Steps in a meta-analysis

• 1. Identification (Search)

• 2. Selection

• 3. Abstraction

• 4. Analysis

• 5. Write-up

1. Identification - Sources
• M-A’s use systematic, explicit search procedures (cf.
  qualitative literature review)

   • 4100 journals
   • 1966 - present
   • Web search at PubMed:
   • other search engines: BRS Colleague, WinSPIRs, etc

   • similar to MEDLINE, European version
   • Expensive, not widely available in US

Identification - Sources
• Cochrane Collaboration Controlled Trials Register
   • Over 160,000 trials, including abstracts (+ translations)
   • by subscriptions….. MSU Electronic Library database
   • includes
           •   MEDLINE, EMBASE
           •   non-English publications
           •   non-indexed publications
           •   hand-search of journals

       – CancerLit, AIDSLINE, TOXLINE, Dissertation Abstracts Online

• Index Medicus
       – important if searching before 1966
       – hand-search only
Identification - Steps:
• 1. Search own personal files
• 2. Search electronic databases
   • Review titles and on-line abstracts to eliminate irrelevant
   • Retrieve remaining articles, review, and determine if meet
     inclusion/exclusion criteria
• 3. Review reference lists of articles for missed
• 4. Consult experts/colleagues/companies
• 5. Conduct hand-searches of non-electronic
  databases and/or relevant journals
• 6. Consider consulting an expert (medical librarian)
  with training in MEDLINE and use of MeSH terms.

Limitations of electronic databases
• Electronic resources have been essential for growth
  of M-A, but they are far from perfect

• 1. Databases are incomplete
   • Medline contains only 1/3rd of all biomed journals

• 2. Indexing is never perfect
   • Want search to have high Se (include all relevant studies)
     and high Sp (but exclude the irrelevant!)
       – Ratio of retrieved articles : relevant articles can vary widely

Limitations of electronic databases
 2. Indexing is never perfect
 • Accuracy of indexing per se relies on:
    – authors understanding how studies are categorized
    – “database” assigning correct category to study

 • Indexing also depends on ability of search
   strategies (e.g., MeSH) to identify relevant articles

Limitations of electronic databases

3. Search Strategies are never perfect
- Its hard to find all the relevant studies
- Average Se of expert searchers using MEDLINE
(vs known Registries of studies) = 0.51

Example – National Perinatal RCT Registry
                              Perinatal   MEDLINE     MEDLINE
Topic                           RCT        (Expert    (Amateur
                              Registry    searcher)   searcher)

Neonatal hyperbilirubinemia      88          28          17

Intraventricular hemorrhage      29          19          11

Other search issues……
• Non-English Studies
       – Translation of title usually provided but abstracts often not. But
         N.B. that many non-English journals are not included anyway!

   • No a priori justification for excluding non-English studies
       – Quality is often equivalent or even better!
       – Excluding non-English studies can effect conclusions

   • But including means you need a translation just to determine

Fugitive Literature
      –   unpublished studies (… why are they unpublished?)
      –   dissertations
      –   drug company studies
      –   book chapters
      –   non-indexed studies and abstracts
      –   conference proceedings
      –   government reports
      –   pre-MEDLINE (1966)

• Sometimes important sources of information
• Hard to track down – contact experts/colleagues
• Need to decide whether to include or not - general
  consensus is that you should.

Publication bias

• Published studies are not representative of all studies
  that have been performed

• Articles with “positive findings” (P < 0.05) are more
  likely to be published

• Hence published studies are a biased sub-set

• Publication bias = systematic error of M-A that results
  from using only published studies

  Evidence of Publication Bias
Easterbrook (1991): 285 analyzed studies reviewed by Oxford
Ethics Committee 1984-87

  Study Status                   N               % (P < 0.05)

  Published                     138                  67%

  Presented Only                69                   55%

  Neither                       78                   29%

  Total                         285                  54%

 Implications of Publication Bias
Simes (1986): Chemotherapy for Advanced Ovarian CA
Comparison of Published Trials vs Registered Trials

 Results                   Published            Registered

 N                             16                     13

 Median Survival              1.16                    1.06
 95% CI                   1.06 – 1.27          0.97 – 1.15

 P value                      0.02                    0.24
Publication Bias
• Probably results from a combination of author and
  editor practices and decisions (Ioannidis, 98)

• Emphasizes the importance of registries of trials
  (N.B. Similar registries of observational studies are
  probably not feasible, although in Social Sciences
  Campbell Collaboration is attempting to do this)

• Simple Solution:
   • Don’t base publication decisions on statistically significance!
   • Focus on interval estimation.
   • Yeah right……!

Publication bias – Approaches
 • 1. Attempt to Retrieve all Studies
    • Required for Cochrane Publications
    • Difficult to identify unpublished studies and then to find out
      details about them

 • Worst Case Adjustment
    • Number of unpublished negative studies to negate a
      “positive” meta-analysis:
    • X = [N x (ES) / 1.645]2 - N
        – where: N = number of studies in meta-analysis,
        – ES = effect size
 • Example:
    • If N = 25, and ES = 0.6 then X = 58.2
    • Almost 60 unpublished negative studies would be required to
      negate the meta-analysis of 25 studies.
  2. Graphical Approaches - Funnel plot
                          Missing studies = small effects size with
                          negative findings

                  X        X

Sample                X
                  X             X
Size                       X
(precision)       X                 X     X
              X             X
                  X         X     X       X          X

                  Effect Size

2. Selection
• Inclusion/eligibility criteria essential to:
   • Produce a more focused (valid) study
   • Ensure reproducibility and minimize bias

• Apply criteria systematically and rigorously

• Balance between highly restrictive versus non-
  restrictive criteria in terms of
   • face validity, homogeneity, power (N), generalizability

• Always develop in advance and include clinical
  expert(s) in the team

Typical inclusion criteria:
  •   study design (e.g., RCT’s?, DBPC?, Cohort & CCS?)
  •   setting (emergency department, outpatient, inpatient)
  •   age (adults only, > 60 only, etc)
  •   year of publication or conduct (esp. if technology or typical
      dosing changes)
  •   similarity of exposure or treatment (e.g., drug class, or
  •   similarity of outcomes (case definitions)
  •   minimum sample size or follow-up
  •   languages?
  •   complete vs incomplete (abstracts)
  •   published vs fugitive?
  •   pre-1966?

Selection – Other Issues

• multiple publications from same study?
   • Include only one! (double dipping is common!)

• report should provide enough information for analysis
  (i.e. point estimate and variability = SD or SE)

• Selection process should be done independently by
  at least 2 reviewers
   • Measure agreement (K) and resolve discrepancies
   • Document excluded studies and reasons for exclusion
   • Keep pertinent but excluded studies

Typical Searching and Selection
  • First pass, using title in computer search: 300 -
    500 articles
  • Second pass, using abstract in computer search:
    60 - 100 articles
  • Final pass, using copy of entire article: 30 - 60
  • Included in study: 30 articles

3. Abstraction
• Goal: to abstract reliable, valid and bias free
  information from all written sources

• Should expect a degree of unreliability
   • intra- and inter- rater reliability is rarely if ever 100%!!

• Many sources of potential error:
   • Article may be wrong due to typographical or copyediting
   • Reported results can be misinterpreted
   • Errors in data entry during abstraction process

• Ways to minimize error:
   • Develop and pilot test abstraction forms
   • Develop definitions, abstraction instructions, and rules
   • Train abstractors, pilot test, get feedback, and refine

• Abstraction Forms
   • Number each data item
   • Require a response for EVERY item
       – Distinguish between negative, missing, and not-applicable
   • Simple instructions/language
   • Clear skip and stop instructions
   • Items clearly linked to definitions and abstraction rules

• Typical process
  • 2 independent reviewers
  • Practice with 2 or 3 articles to “calibrate”
  • Use a 3rd reviewer or consensus meeting to
    resolve conflicts
  • Measure agreement (K) and resolve discrepancies

Other Issues - Abstraction
• Outcome measures of interest may have to be
  calculated from original data

   • For example, data to calculate relative risk may be present
     but not described as such.

• Multiple estimates from same study?
   • Exp: intention-to-treat vs not, adjusted for loss-to-follow up
   • Obs: crude vs age-adjusted vs multiple adjusted (model)
   • Include only one estimate per study, avoid over-fitted model
     estimates (as often more imprecise)

Investigator Bias:

• Abstractor may be biased in favor of (or against!) a
  particular outcome (positive or negative finding), or
  researcher/institution, or journal.

• prominent journals may be given greater weight or
  authority (rightly or wrongly)

• if this may be an issue, have research assistant
  eliminate identifiers from articles (= blind review)

Blind Review
• Remove study information that could affect inclusion
  or quality of abstraction, like:
   • author, title, journal, institution, country

• Berlin (‘97):
   • compared blinded vs non-blinded reviews
   • Found discrepancy in which studies to include but little
     difference in summary effect sizes

• Time consuming

• Probably can avoid esp. if use well defined
  abstraction procedures
Assessment of study quality
• Quality is an implicit measure of validity
• Poor quality studies have lower validity
• Using quality scoring should theoretically improve the
  validity of M-A’s

• Process
   • Develop criteria (…how?)
   • Develop scale (= scoring system)
   • Abstract information and score each study

• Example RCT scoring systems
   • Chalmers (1981) – 36 item scale! (see HWK #5)
   • Jadad (1997) – 5 point scale
Jadad Criteria for Scoring RCTs
(1997 Cont Clin Trials 17:1-12)

• 1. Randomization
        – Appropriate (= 1 point) if each patient had equal chance of
          receiving intervention and investigators could not predict
        – Add 1 point if mechanism described and appropriate
        – Deduct 1 point if mechanism described and inappropriate
• 2. Double blinding
        – Appropriate (= 1 point) if stated that neither the patient nor
          investigators could identify intervention, or if “active placebo”,
          “identical placebo” or “dummies” mentioned
        – Add 1 point if method described and appropriate
        – Deduct 1 point if mechanism described and inappropriate
• 3. Withdrawals and dropouts
        – Appropriate (= 1 point) if number and reasons for loss-to-FU in
          each group described.
Uses of Quality Scores

 • Threshold (minimum score for inclusion)
 • Categorize study quality
    • High, medium, low quality
    • Use as sub-group analyses
 • Sensitivity analysis
 • Combine study-specific scores with variance (based
   on N) to generate modified weights
    • Poorer studies “count less”
    • Generally not recommended
 • Meta-regression

Other Issues – Quality Scoring
• Quality is difficult to measure
• No consensus on method of scale development – not
  even for RCT’s
• Few reliability/validity studies of scoring systems
   • inter-rater reliability of quality assessment often poor
• Relies on quality of the reporting itself
   • sometimes study is blinded or randomized, but if not
     explicitly stated then it suffers in quality assessment
• Difficult to detect bias from publications
• More recent studies score higher – partly because
  they conform to recent standardized reporting
  protocols (e.g., RCT’s – CONSORT)


Shared By: