Validity and reliability: screening and diagnostic tests

Document Sample
Validity and reliability: screening and diagnostic tests Powered By Docstoc
					               Lecture 3
       Validity of screening and
           diagnostic tests
• Reliability: kappa coefficient
• Criterion validity:
  –   “Gold” or criterion/reference standard
  –   Sensitivity, specificity, predictive value
  –   Relationship to prevalence
  –   Likelihood ratio
  –   ROC curve
  –   Diagnostic odds ratio                        1
        Clinical/public health
             applications
• screening:
  – for asymptomatic disease (e.g., Pap test,
    mammography)
     • for risk (e.g., family history of breast cancer
• case-finding: testing of patients for diseases
  unrelated to their complaint
• diagnostic: to help make diagnosis in
  symptomatic disease or to follow-up on
  screening test                                2
    Evaluation of screening and
         diagnostic tests
• Performance characteristics
  – test alone
• Effectiveness (on outcomes of disease):
  – test + intervention




                                            3
        Criteria for test selection
•   Reliability
•   Validity
•   Feasibility
•   Simplicity
•   Cost
•   Acceptability

                                      4
Measures of inter- and intra-rater
  reliability: categorical data
• Percent agreement
  – limitation: value is affected by prevalence -
    higher if very low or very high prevalence
• Kappa statistic
  – takes chance agreement into account
  – defined as fraction of observed agreement not
    due to chance

                                                    5
              Kappa statistic
Kappa = p(obs) - p(exp)
          1 - p(exp)

p(obs): proportion of observed agreement
p(exp): proportion of agreement expected by chance




                                                     6
                   Example of Computation of Kappa


Agreement between the First and the Second Readings to Identify Atherosclerosis Plaque
in the Left Carotid Bifurcation by B-Mode Ultrasound Examination in the
Atherosclerosis Risk in Communities (ARIC) Study


                                                               First Reading
                                         Plaque            Normal          Total
Second reading         Plaque             140                52            192
                       Normal              69               725            794
                       Total              209               777            986


Observed agreement = 140 +725/986 = 0.877

Chance agreement for plaque – plaque cell = (209 x 192)/986 = 40.7

Chance agreement for normal- normal cell = 777 x 794/986 = 625.7

Total chance agreement = 40.7 + 625.7/986 = 0.676

Kappa = 0.877 – 0.676 = 0.62
          1 – 0.676

                                                                                         7
       Interpretation of kappa
• Various suggested interpretations
• Example: Lanis & Koch, Fleiss
    excellent:    over 0.75
    fair to good: 0.40 - 0.75
    poor:         less than 0.40




                                      8
        Validity (accuracy) of
      screening/diagnostic tests
• Face validity, content validity: judgement of the
  appropriateness of content of measurement
• Criterion validity
   – concurrent
   – predictive




                                                      9
          Normal vs abnormal
• Statistical definition
   – “Gaussian” or “normal” distribution
• Clinical definition
   – using criterion




                                           10
11
12
13
14
        Selection of criterion
    (“gold” or criterion standard)
• Concurrent
  – salivary screening test for HIV
  – history of cough more than 2 weeks (for TB)
• Predictive
  – APACHE (acute physiology and chronic
    disease evaluation) instrument for ICU patients
  – blood lipid level
  – maternal height
                                                  15
     Sensitivity and specificity
Assess correct classification of:
• People with the disease (sensitivity)
• People without the disease (specificity)




                                             16
                                                   "True" Disease Status
                                                 Present             Absent

 Screening
 test results
                         Positive           "True positives"    "False positives"
                                                    A                   B
                         Negative           "False negatives"   "True negatives"
                                                    C                  D


Sensitivity of screening test =        A
                                      A+C

Specificity of screening test =        D
                                      B+D

Predictive value of positive test =    A
                                      A+B

Predictive value of negative test =    D
                                      C+D




                                                                                    17
            Predictive value
• More relevant to clinicians and patients
• Affected by prevalence




                                             18
          Choice of cut-point
If higher score increases probability of disease
• Lower cut-point:
  – increases sensitivity, reduces specificity
• Higher cut-point:
  – reduces sensitivity, increases specificity




                                                 19
  Considerations in selection of
           cut-point
Implications of false positive results
• burden on follow-up services
• labelling effect
Implications of false negative results
• Failure to intervene



                                         20
  Receiver operating characteristic
           (ROC) curve
• Evaluates test over range of cut-points
• Plot of sensitivity against 1-specificity
• Area under curve (AUC) summarizes
  performance:
  – AUC of 0.5 = no better than chance




                                              21
22
           Likelihood ratio
• Likelihood ratio (LR) = sensitivity
                          1-specificity
• Used to compute post-test odds of disease
  from pre-test odds:
     post-test odds = pre-test odds x LR
• pre-test odds derived from prevalence
• post-test odds can be converted to
  predictive value of positive test           23
            Example of LR
• prevalence of disease in a population is 25%
• sensitivity is 80%
• specificity is 90%,
• pre-test odds = 0.25 = 1/3
                  1 - 0.25
• likelihood ratio = 0.80 = 8
                      1-0.90
                                            24
        Example of LR (cont)
• If prevalence of disease in a population is
  25%
• pre-test odds = 0.25 = 1/3
                 1 - 0.25
• post-test odds = 1/3 x 8 = 8/3
• predictive value of positive result = 8/3+8
                             = 8/11 = 73%
                                                25
        Diagnostic odds ratio
• Ratio of odds of positive test in diseased vs
  odds of negative test in non-diseased:
                       a.d
                       b.c
• From previous example:
              OR = 8 x 27 = 36
                     2x3
                                                  26
      Summary: LR and DPR
• Values:
  – 1 indicates that test performs no better than
    chance
  – >1 indicates better than chance
  – <1 indicates worse than chance
• Relationship to prevalence?


                                                    27
   Applications of LR and DOR
• Likelihood ratio: Primarily in clinical
  context, when interest is in how much the
  likelihood of disease is increased by use of
  a particular test
• Diagnostic odds ratio Primarily in research,
  when interest is in factors that are
  associated with test performance (e.g., using
  logistic regression)
                                              28

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:48
posted:11/6/2012
language:Unknown
pages:28