Docstoc

the_clinical_value_of_diagnostic_tests

Document Sample
the_clinical_value_of_diagnostic_tests Powered By Docstoc
					     The clinical value of
       diagnostic tests
A well-explored but underdeveloped
             continent




                       J. Hilden : March 2006
The clinical value of diagnostic
              tests
  The diagnostic test and some
neglected aspects of its statistical
           evaluation.




     Some aspects were covered in my seminar spring 2003
                                 Plan of
                                  my talk


   Historical & ”sociological” observations


                  Clinicometric framework
Displays and measures of diagnostic power
              Appendix: math. peculiarities
                                     Plan of
                                      my talk
   Historical & ”sociological” observations
                               ”Skud & vildskud”
            - Diagnostic vs. therapeutic research
             - 3 key innovations & some pitfalls
                    Clinicometric framework
Displays and measures of diagnostic power
               Appendix: math. peculiarities
A quantitative framework for diagnostics is much
harder to devise than for therapeutic trials.

• Trials concern what       • Diagnostic activities
  happens observably          aim at changing the
• …concern 1st order          doc‟s mind
  entities (mean effects)   • …concern 2nd order
                              entities (uncertainty /
                              “entropy” change)

           CONSORT >> 10yrs >> STARD
            CC ~1993        CC ~2003
In the 1970s
   medical decision theory established itself
   – but few first-rate statisticians took notice.
Were they preoccupied with other topics, …
   Cox, prognosis, … trial follow-up ?
Sophisticated models became available for
   describing courses of disease conditionally
   on diagnostic data.
Fair to say that they themselves remained
    „a vector of covariates‟ ?
      Early history
Yerushalmy ~1947:
     studies of observer variation*
Vecchio:
 /:BLACK~WHITE:/ Model 1966
 - simplistic but indispensable
 - simple yet often misunderstood?!
Warner ~1960:
 congenital heart dis. via BFCI
       * important but not part of my topic today
  Other topics not mentioned

Location (anatomical diagnoses)
       and multiple lesions
Monitoring, repeated events, prognosis
Systematic reviews & meta-analyses
Interplay between diagnostic test data &
      knowledge from e.g. physiology
Tests with a therapeutic potential
Non-existence of ”prevalence-free”
      figures of merit
Patient involvement, consent
 BFCI (Bayes‟ Formula w.
Conditional Independence**)
   ”based on the assumption of CI”:
     what does that mean?



   Do you see why it was misunderstood?




 ** Indicant variables independent cond’lly on pt’s true condition
 BFCI (Bayes‟ Formula w.
Conditional Independence)
”Bayes based on the assumption of CI”
    - what does that mean?
1) ”There is no ”Bayes Theorem” without CI”
2) ”The BFCI formulae presuppose CI
    (CI is a necessary condition for correctness)”
       No, CI is a sufficient condition; whether it is
       also necessary is a matter to be determined
       – and the answer is No.
                     Counterexample: next picture !
Joint conditional distribution of two
tests* in two diseases (red, green)
.0625   .0375   .1            .125          .075          .2


.25     .15     .4            .1875         .1125         .3


.4375   .0625   .5            .4375         .0625         .5


.75     .25     1             .75           .25           1


                *with 3 and 2 test qualitative outcomes
Vecchio‟s /:BLACK&WHITE:/ Model 1966

  Common misunderstandings:
  1) ”The sensitivity and specificity are
     properties of the diagnostic test
     [rather than of the patient population]”
  2) They are closely connected with the
     ability of the test to rule out & in
         True only when the ”prevalence” is intermediate
                                 Plan of
                                  my talk


   Historical & ”sociological” observations
                   Clinicometric framework
Displays and measures of diagnostic power
               Appendix: math. peculiarities
 You cannot discuss Diagnostic Tests without:

Some conceptual framework*
A Case, the unit of experience in the
  clinical disciplines,
is a case of a Clinical Problem,
  defined by the who-how-where-why-
  what of a clinical encounter
– or Decision Task.
We have a case population or:
  case stream (composition!) with a
  case flow (rate, intensity).
   *Clini[co]metrics, rationel klinik, …
              Examples
Each time the doc sees the patient we
  have a new encounter / case, to be
  compared with suitable ”statistical”
  precedents – and physio- &
  pharmacology.
Prognostic outlook at discharge from
  hospital: a population of cases =
  discharges, not patients (CPR Nos.).


        Danish Citizen No.
                 Diagnosis?
Serious diagnostic endeavours are always action-
    oriented
                    – or at least counselling-oriented –
i.e., towards what should be done so as to influence
      the future (action-conditional prognosis).

The ”truth” is either
(i) a gold standard test (”facitliste”), or
(ii) syndromatic (when tests define the ”disease,*”
     e.g. rheum. syndromes, diabetes)

        *in clinimetrics there is little need for that word!
                   Example
The acute abdomen:
 there is no need to discriminate between
 appendicitis and non-app. (though it is
 fun to run an ”appendicitis contest”)
What is actionwise relevant is the
 decision: open up or wait-and-see?

<This is frequently not recognized in the literature>
In clinical studies the choice of sample,
  and of the variables on which to base
                  one's prediction,
must match the clinical problem
       as it presents itself
        at the time of decision making.

In particular, one mustn't
  discard subgroups (impurities?)
   that did not become identifiable
 until later: prospective recognizability !
Purity vs. representativeness:
A meticulously filtered case stream
            ('proven infarctions')
may be needed for patho- and
      pharmaco-physiological research,
but is inappropriate as a basis
         for clinical decision rules

[incl. cost studies].
Consecutivity as a safeguard against
    selection bias.

Standardization:
     (Who examines the patient? Where?
     When? With access to clin. data?)

Gold standard … the big problem !!
     w. blinding, etc.

Safeguards against change of data after
     the fact.
If the outcome is FALSE negative or
    positive,
you apply an ”arbiter” test
”in order to resolve the discrepant finding,”

i.e. a 2nd, 3rd, … reference test.
If TRUE negative or positive, accept !

~ The defendant decides who shall be allowed
  to give testimony and when
Digression…


    Randomized trials of diagn. tests
                   …theory under development

    Purpose & design: many variants
    Sub(-set-)randomization, depending on the
       pt.‟s data so far collected.
    ”Non-disclosure”: some data are kept under
       seal until analysis. No parallel in therapeutic trials!
    Main purposes…
…Randomized trials of diagn. tests
 1) when the diagnostic intervention is itself
    potentially therapeutic;
 2) when the new test is likely to redefine the
    disease(s) ( cutting the cake in a
    completely new way );
 3) when there is no obvious rule of
    translation from the outcomes of the new
    test to existing treatment guidelines;
 4) when clinician behaviour is part of the
    research question…

                        …end of digression
                                 Plan of
                                  my talk


   Historical & ”sociological” observations

                  Clinicometric framework
Displays and measures of diagnostic power
              Appendix: math. peculiarities
Displays & measures of
diagnostic power
 1) The Schism – between:
 2) ROCography
 3) VOIography
        ROCography
~ classical discriminant analysis /
  pattern recognition

Focus on disease-conditional
 distribution of test results (e.g., ROC)

AuROC (the area under the ROC) is
 popular … despite 1991 paper
VOI (value of information)
~ decision theory.
VOI = increase in expected utility afforded
  by an information source such as a
  diagnostic test
Focus on posttest conditional distribution of
  disorders, range of actions and the
  associated expected utility – and
– its preposterior quantification.
Less concerned with math structure, more
  with medical realism.
                       VOI

Do we have a canonical guideline?
  1) UTILITY
  2) UTILITY / COST

Even if we don't have the utilities
  as actual numbers, we can use this
      paradigm as a filter:
  evaluation methods that violate it are
  wasteful of lives or resources.

Stylized utility (pseudo-regret functions) as a
  (math. convenient) substitute.
                         VOI
Def. diagnostic uncertainty as expected regret
(utility loss, relative to if you knew what ailed the pt.)

Diagnosticity measures (DMs):
   Diagnostic tests
      should be evaluated in terms of
      pretest-posttest difference
      in diagnostic uncertainty.

Auxiliary quantities like sens and spec
  … go into the above.
                          …so much as to VOI principles
      NOT




Diagnosticity measures and auxiliary quantities
Diagnosticity measures and auxiliary quantities
  Sens (TP), spec (TN):   nosografic distrib.
  PVpos, Pvneg: diagnostic distr.|test result
  Youden‟s Index: Y = sens + spec – 1 =
    1 – (FN) – (FP)            ROC
  = det(nosog. 2X2) =   Y= 1                FN
  (TP)(TN)–(FP)(FN)
  = 2(AuROC – ½)
                                            TP

  AuROC =
  [sens+spec] / 2

                          FP         TN
    Diagnosticity measures and auxiliary quantities
        Sens, spec             nosografic distribution
        LRpos, LRneg = slopes of segments

The ”Likelihood ratio” term is o.k. when diagnostic
 hypotheses are likened to scientific hypotheses
                                        ROC
                             Y= 1                        FN




                                                         TP




                               FP             TN
Diagnosticity measures and auxiliary quantities
   «Utility index» = (sens) x Y.
            ... is nonensense

                              ROC
                     Y= 1                   FN




                                            TP




                       FP           TN
Diagnosticity measures and auxiliary quantities
   DOR (diagnostic odds ratio) =
   [(TP)(TN)] / [(FP)(FN)]
   = infinity in this example
   even if TP is only 0.0001.   ... careful!
                                     ROC
                         Y= 1


                                                    FN




                                                    TP
                           FP = 0              TN
Three test
outcomes
FREQUENCY-WEIGHTED ROC
         implies constant misclassification




     Continuous test
      Cutoff at x = c
minimizes misclassification
      Two binary tests and
    their 6 most important
joint rules of interpretation
”Overhull” implies
      superiority
             Essence of the proof
     *   §   that ”overhull”
             implies superiority




 *
**

§
Utility-based evaluation
in general
                           *
Utility-based evaluation
in general
∫(pdy + qdx )mina{ (LaD pdy + La,nonD qdx)/(pdy + qdx) }
is how it looks when applied to the ROC                           *
(which contains the required information about
the disease-conditional distributions).
                                            Utility-based evaluation
                                            in general
The area under the ROC
 (AuROC) is misleading
You have probably seen my
counterexample* before.

Assume D and non-D
equally frequent and also
utilitywise symmetric …

  *Medical Decision Making 1991; 11: 95-101
Two Investigations
          Expected regret (utility drop
         relative to perfect diagnoses)



                                                         Bxsens



                                             The tent graph
Cxspec




                                   pretest
Good & bad pseudoregret functions




                                    Shannon-like

                                          Brier-like
                                 Plan of
                                  my talk


   Historical & ”sociological” observations

                  Clinicometric framework
Displays and measures of diagnostic power
              Appendix: math. peculiarities
LRpos = LRneg = 1
     End of
      my talk


  Thank you !
Tak for i dag !

				
DOCUMENT INFO