__ The _New plot_ _Nyt plot_ alias the BoC plot

Document Sample
__ The _New plot_ _Nyt plot_ alias the BoC plot Powered By Docstoc
					/* The "New plot" (Nyt plot) alias the BoC plot. -- Jørgen Hilden April ’04 */

NP.1 A decision-theoretic approach to binary tests
NP.1.1 Expected regret
         Assume one assigns a loss, B, to overlooking the disease (a false negative case) and a
loss C to an incorrectly managed patient without the disease (false positive case).
Mnemotechnically, B stands for treatment benefit and C for "cost." Let the pre-test disease
probability ("prevalence") be p. It is then preferable to "Treat as diseased" when pB > (1-p)C,
i.e., when p and/or B are high or C is low. In the opposite situation expectable loss is
minimized by choosing the "Don't treat" option. One notes that the expected loss occasioned
by this policy is (1-p)C or pB, whichever is smaller, i.e.: min((1-p)C, pB). The first term arises
when Treat, the second when Don't treat is the better policy. In this setup, loss is relative to
managing everyone correctly, i.e. a "regret." That is,

         Expected per case regret without testing
                 = min((1 - p)C, pB),                                          (1)

alias the EVPI; see below.
          Still assuming that all cases are uniform in term of prior data and clinical preferences,
we now ask: What happens when we perform the index test? A post-test disease probability
now replaces p. Thus, for each positive case the expected regret becomes min((1 - PVpos)C,
PVpos B). Let P{A} denote the probability the next patient has characteristic X. Taking
negative cases into account, too, one gets the overall

         Expected per case regret with access to test =

                  P{positive}min((1 - PVpos)C, PVposB) +
                  P{negative}min(PVnegC, (1 - PVneg)B)

           =      min(P{false positive}C, P{true positive}B ) +
                  min(P{true negative}C, P{false negative}B).                 (2)

As to the latter version of this formula, recall the definitions: PVpos = P{true
positive}/P{positive}, etc.
          Testing generally implies a drop in expected regret, i.e., an expected utility gain. A
perfect test, for instance, would lead to all patients being correctly managed, and the expected
regret would drop to zero (both PVs would be 1). Incidentally, this is why eq. (1) may also be
called the expected value of perfect information (EVPI). A good test is one that comes close
to this ideal.
          With a poor test, however, the gain will vanish: all patients will receive the
management they would have received anyhow because the post-test probabilities are not very
different from the pre-test one (PVpos and (1-PVneg) are both close to p; LRpos and LRneg
are both close to 1). This raises a what-if question:

NP.1.2 What would happen if test outcomes were taken literally?
         The preceding remarks concern optimized decisions. The decision maker is allowed
to override the tests's verdict, thus effectively shunting a useless test out of the decision-
making process. For the graphical analysis it expedient to examine also the policy of acting
according to the test's directives, treating positives as diseased and negatives as non-diseased.
One immediately sees that this policy would imply an

         Expected per case regret with action according to test outcome
                 = P{false positive}C + P{false negative}B,                           (3)

the contribution from true positives and true negatives being zero as these patients are handled
correctly. Compared with eq. (1) this may imply a negative gain in expected utility: it may be
better to bypass the test simply because the abundance of false outcomes would lead to a net
increase in clinical mismanagement.
          That, as a matter of fact, always happens when p is sufficiently low: it then pays to do
nothing (one foregoes testing and treatment) rather than risking a false positive reply. It also
happens when p is sufficiently close to 1: it is then preferable to proceed to treatment
straightaway instead of risking a false negative reply.
          [So, unless the sensitivity or specificity is perfect, which is never the case in practice,
or the test is completely uninformative (sensitivity = 1 - specificity) or for other reasons has
disqualified itself, then it is in an intermediate p range that a test may, and a good test will,
hold a gain. This range extends from an easily calculated lower p threshold to an upper p
threshold (cf. Hilden & Glasziou 19xx). They are often called the test and the treatment

NP.1.3 Confidence limits
         The probabilities and probability thresholds discussed, whether estimated from raw
data or by meta-analysis, come with an estimation uncertainty that can be expressed in terms
of SEs and CIs. As regards interpretation, note that a CI for the treatment threshold, say,
comprises a range of imaginable pre-test disease probabilities in which is uncertain what is the
better policy, Test (and decide accordingly) or Treat (straightaway).

NP.2 Unknown Benefits and Costs; a graphical sensitivity analysis
NP.2.1 The problem
         So far it has been assumed that patients have no individualizing features. In actual
practice all components of the argument will depend on patient characteristics and prior
investigations. The graph to be described, however, keeps to the uniformity assumption.
         Assigning agreed numerical values to B and C, in terms of quality-adjusted life years
(QALYs) or any other relevant units, is difficult. The task is eased by noting that, insofar as
inexpensive and risk-free index tests are concerned, only the B/C ratio matters. Even so,
reviewmakers will rarely want to assign a fixed value to B/C but they may still want to
describe how sensitive or insensitive is their conclusion to people's assessments of this ratio
(i.e., a "sensitivity analysis"). When the test is good, there will be a broad range af values that
allow the test to benefit patients.
           The CDR system offers a graph, the B/C or AB over C@ (BoC) plot, that visualizes
this range and provides confidence limits for its endpoints.

NP.2.2 The B/C plot
           In Fig. '''X(a) the horizontal axis is the log of the B/C ratio. Using the ratio itself
would clearly lead to an awkward and useless graph when B/C runs from, e.g., 1/100 to 100/1.
In the logarithmic version these values fall conveniently at -2 and +2. Another way to obtain a
nicely symmetric and interpretable configuration is to express B as a fraction of B+C. This is
what was done in Fig. '''X(b). For the ordinates we have arbitrary fixed B+C at 1.
           Across the graph runs a curve which depicts the estimated performance of the test-
and-act-accordingly policy: it shows the expected regret as a function of B and C (eq. (3)),
surrounded by a confidence band. For comparison is shown the expected regret with no test
available (eq. (1)). This reference curve forms the 'pagoda' in part (a) and a skewed triangle in
(b). Its left leg reflects the Don't treat option, which is preferable when B is sufficiently low,
and its right leg the Treat option. The apex is located where B/C = (1-p)/p, or B/(B+C) = 1-p,
this being the situation where the two options are equally poor and much stands to be gained
by testing.
           [The legs are straight in part (b); plotted against the log scale in part (a) they are
logistic functions. Incidentally, the test curve is also a straight line in part (b) and S-shaped in
part (a).]
           In the example chosen for illustration the test offers a wide range of intermediate B/C
values that favour testing, i.e., where testing and acting accordingly is regret-saving. Parts (c)
and (d) show configurations that might arise with a very poor and a nearly perfectly
discriminating test.
           For systematic reviews, the B/C graphs of several studies or subsets may be
superimposed on each other or plotted as an array of mini-diagrams.

NP.2.3 The limits
         However, the expected regret estimate inherits the statistical uncertainty of its
component probabilities. This is what is shown by the confidence bands. It follows that there
is some uncertainty as to location and width of the B/C range that favours testing, a fact that is
clearly brought out by the diagrams. A pessimist, fearing that the performance of the test had
been overrated, would be comfortable with the testing option only if B/C fell in the narrow
range found by intersecting the reference triangle/pagoda with the upper confidence curve; an
optimist might hope the lower confidence curve and the resulting outer B/C range came close
to the truth.
          The narrow inner limits thus delimit a B/C range about which it can be confidently
concluded that:

                  if the patient's true B/C ratio is located within this range, the patient will
                  enjoy a gain [in expected utility] from the use of the test.

Similarly, outside the outer confidence limits are those values of B/C for which

                  the clinician can be confident that testing is not beneficial; it is better to
                  either Treat (on the right) or Do nothing (on the left).

When the usual 1.96SE limits (two-sided 95% CIs) are employed, both conclusions are made
with at least 97.5% confidence.

NP.3 The tent graph and an extention
          A related type of graph is the tent graph (Hilden & Glasziou, loc. cit.). It takes p as
variable and B and C as fixed. For this reason it is not suited for the present kind of sensitivity
analysis. However, there is nothing to prevent exploiting the shared underlying principle by
plotting expected regret against (some suitable function of) of the (pB)-to-((1-p)C) ratio,
which is the key quantity that determines what must be required of a test in terms of
sensitivity and specificity for it to be clinically beneficial (as a rewriting of eq. (2) will show).
          Such a plot is appropriate when B, C and p are assumed to be situation-dependent,
whereas the test's sensitivity and specificity, being the only objects of the systematic review,
are not. This may not be a typical review task, but the plot is easily obtained, simply by having
pB play the rôle of B, having (1-p)C play that of C, and replacing all specifications of p by 0.5
(with SE = 0). The vertical axis must then be labelled: Expected regret (when pB+(1-p)C =

Shared By: