Docstoc

Analysis and visualization of classifier performance Comparison

Document Sample
Analysis and visualization of classifier performance Comparison Powered By Docstoc
					F. Provost and T. Fawcett
Confusion Matrix




             Bitirgen - CS678   2
Introduction
   Data mining requires:
     Experiments with a wide variety of learning
      algorithms
     Using different algorithm parameters
     Varying output threshold values
     Using different training regimens
   Using accuracy alone is inadequate because:
     Class distributions are skewed
     Misclassification (FP, FN) costs are not uniform


                         Bitirgen - CS678                3
Class Distributions -Problems with Acc.
 … assumes that class distribution among
  examples is constant and relatively
  balanced (-which is not the case in real life-)
 Classifiers are generally used to scan
  ‘large number of normal entities’ to find
  ‘small number of unusual ones’
     Looking for defrauded customers
     Checking an assembly line
     Skews of 106 were reported           (Clearwater & Stern 1991)



                        Bitirgen - CS678                               4
Misclassification Costs -Problems with Acc.
   ‘Equal error costs’ does not hold in real
    life problems
     Disease tests, fraud detection…
   Instead of maximizing the accuracy, we
    need to minimize the error cost.

       Cost = FP • c(Y,n) + FN • c(N,p)



                        Bitirgen - CS678        5
Bitirgen - CS678   6
ROC Plot and ROC Area
 Receiver Operator Characteristic
 Developed in WWII to statistically model
  “false positive” and “false negative”
  detections of radar operators
 Becoming more popular in ML and standard
  measure in medicine and biology
 However does poor job on deciding the
  choice of classifiers


                   Bitirgen - CS678          7
ROC graph of four classifiers




                                 Informally a point in
                                 ROC space is better
                                 than the other if it is to
                                 the northwest.



              Bitirgen - CS678                                8
Bitirgen - CS678   9
    Iso-performance Lines
   Expected cost of a classification by a classifier (FP,TP):



   Therefore, two points have
    the same performance if




   Iso-perf. line: All classifiers
    corresponding to points on
    the line have the same
    expected cost.
                                 Bitirgen - CS678                10
ROC Convex Hull
   If a point is not on the
    convex hull the classifier
    represented by that point
    cannot be optimal.

   In this example B and D
    cannot be optimal
    because none or their
    points are on the convex
    hull.



                             Bitirgen - CS678   11
    How to use the ROC Convex Hull
   p(n):p(p) = 10:1
   Scenario A:
     c(N,p) = c(Y,n)
      m(iso_perf) = 10
   Scenario B:
     c(N,p) = 100 • c(Y,n)
      m(iso_perf) = 0.1




                              Bitirgen - CS678   12
Adding New Classifiers
   Adding new classifiers
    may or may not extend
    the existing hull.
   E may be optimal under
    some circumstances
    since it extends the hull
   F and G cannot be
    optimal




                            Bitirgen - CS678   13
What if distributions & costs are unknown?

   ROC convex hull gives
    us an idea about all
    classifiers that may be
    optimal under any
    conditions.
   With complete
    information the method
    identifies the optimal
    classifiers.
   In between ?



                              Bitirgen - CS678   14
Sensitivity Analysis
   Imprecise distribution info
    defines a range of slopes
    for iso-perf lines.

     p(n):p(p) = 10:1
     Scenario C:
      ○ $5 < c(Y,n) < $10
      ○ $500 < c(N,p) < $1000
      ○ 0.05 < m(iso_perf) < 0.2




                                   Bitirgen - CS678   15
Sensitivity Analysis - 2
   Imprecise distribution info
    defines a range of slopes
    for iso-perf lines.

     p(n):p(p) = 10:1
     Scenario D:
      ○ 0.2 < m(iso_perf) < 2




                                Bitirgen - CS678   16
Sensitivity Analysis - 3
   Can “do nothing” strategy
    be better than any of the
    available classifiers?




                          Bitirgen - CS678   17
Conclusion
 Accuracy alone as a performance metric is
  incapable for various reasons
 ROC plots give more accurate information
  about the performance of classifiers
 ROC convex hull method
     Is an efficient solution to the problem of comparing
      multiple classifiers in imprecise environments
     Allows us to incorporate new classifiers easily
     Allows us to select the classifiers that are
      potentially optimal
                         Bitirgen - CS678                    18

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:8/17/2012
language:
pages:18