Rapid Object Detection using a Boosted Cascade of

Document Sample
Rapid Object Detection using a Boosted Cascade of Powered By Docstoc
					           (moving or acting with great speed)

Rapid Object Detection
   using a Boosted Cascade of Simple Features
                            (increase the strength or value of Sth)

                           Original Author
                      Paul Viola & Michael Jones

 In: Proc. Conf. Computer Vision and Pattern Recognition. Volume 1.,
                   Kauai, HI, USA (2001) 511–518

                                                     Speaker: Jing Ming Chiuan (井民全)
   Introduction
   The Boost algorithm for classifier learning
     Feature Selection
     Weak learner constructor

     The strong classifier

   A tremendously difficult problem
   Result
   Conclusion
               What had we done?
   A machine learning approach for visual object
       Capable of processing images extremely rapidly
       Achieving high detection rates

   Three key contributions                    Speed up the feature evaluation

       A new image representation  Integral Image
       A learning algorithm( Based on AdaBoost[5])
                    Select a small # of visual features from a larger set
                      yield an efficient classifiers

       A combining classifiers method  cascade classifiers
                                               Discard the background regions
                                                         of the image
    A demonstration on face detection
   A frontal face detection system x 288 on a PentiumIII 700 MHz
       The detector run at 15 frames per second
        without resorting to image differencing or skin
        color detection         Image difference in video sequences

               Working only with a single grey scale image
       The broad practical applications
                               for a extremely fast face detector

   User Interface, Image Databases,
   The system can be implemented on a small low
    power devices.
                Compaq iPaq  2 frame/sec
     Training process for classifier
   The attentional operator is trained to detect
    examples of a particular class --- a supervised
    training process
         Face classifier is constructed
                         In the domain of face detection
                               < 1% false negative
                               <40% false postivie
         Cascaded detection process
   The sub-windows are processed by a sequence
    of classifiers
                  each slightly more complex than the last

                   Any classifier rejects the sub-window,
                    no further processing is performed

       The process is essentially that of a degenerate
        decision tree
    Our object detection framework
                                                                        Feature Evaluation

                                                                      Haar Basis Functions

                                 Integral Image                       Haar Basis Functions

   Original Image
                               In order to computing
                                                                      Haar Basis Functions
                             features rapidly at many
                                                                                  Large # of features

Cascaded Classifiers Structure
                                     Small set of critical features                  Feature Selection

                                                               Modified Ada Boost Procedure
                 Feature Selection
The detection process is based
 on the feature rather than the
         pixels directly.

                                 Two Reasons:
                     The ad-hoc domain knowledge is difficult
                       to learn using a finite quantify of
                       training data.
                     The feature based system operates much

    The simple features are used
          The Haar basis functions which have been used by
                       Papageorgiou et al.[9]
                 Three kinds of features
    Two-Rectangle Feature
                                                                 Feature Selection
               The difference between the sum of pixels
                    within two rectangular regions
                                                     The region have the same size and shape
                                                     And are horizontally or vertically adjacent

    Feature       the sum within two
                  outside rectangle
                  subtracted from the sum
                  in a center rectangle                                  Four-
                                                    The difference
     The base resolution is 24x24                   between the
The exhaustive set of rectangle is large,           diagonal pairs of
              over 180,000.                         rectangles
                                                            A intermediated representation

         Integral Image                                        for rapidly computing the
                                                                   rectangle features
    The integral image
                                                                The original image

                         ii( x, y)             i( x ' , y ' )
                                          x'  x, y '  y

The recurrences pair
    for one pass
                         The cumulative row sum
     computing                                                                  1    2   5
  s ( x, y )  s ( x, y  1)  i ( x, y ),                                      3    4   6
  ii( x, y )  ii( x  1, y )  s( x, y ),                  +
                                                                                7    8   9
  s ( x,1)  0,               ii
                                  1 3 8
                                                                                1    2   5
  ii(1, y )  0;                 4 10 21                                       4    6   11   +
                                   11 25 45                                     11 14 20
Calculating any rectangle sum with
          integral image


                      Rectangle Sum
                        D=4 -3-2+1
Learning Classification Functions

 Feature Set
                                 Learning Process                        Face
                          A variant AdaBoost procedure                            non-
                               Weak        Weak        Weak
                              Learner 1   Learner 2   Learner 2
  Training set                                                    The final strong classifier
1. Positive                   The final strong classifier
2. Negative

                        AdaBoost learning algorithm
                   Is used to do the feature selection task
                       Over   180,000 rectangle features associate with each sub-image
        Step 1: Giving example images       Image

    The Boost                         Positive =1
                                                               ( x1 , y1 ), ( x2 , y2 ), ... , ( xn , yn )

  algorithm for
classifier learning                             w1,i 
                                                       1 1
                                                          , for yi  0,1,
                   Step 2: Initialize the            2m 2l
                                                m and l are the # of negatives and positives.
 Weak learner constructor
 For t = 1, … , T
                                                         wt ,i
       1. Normalize the weights,       wt ,i                            , so that wt is a probabity distribution
                                                         j 1
                                                                wt , j
      2. For each feature j, train a classifier hj which is restricted to using a single
         feature    The error is evaluated with respect tow ,                            t

                          j   wi | h j ( xi )  yi |

                    Choose the classifier, ht , with thelowest error  t .
      3. Update the weights:
                                                                 wt ,i  t , if xi is classified correctly
                                   wt 1,i  wt ,i    1 ei
                                                       t                                                  
                                                                                                       t  t
                                                                  wt ,i                otherwise          1 t
         Weak learner constructor 圖示解說
                                      Training set

                                  Normalized the weights

             w1           w2                                         wn

                  fj                                   fj
Features                                                                           Over 180,000 features
                                 fj                                                 for each subimage


         1  2  3                            m in              180, 000         j   wi | h j ( xi )  yi |
         h1 h2 h3                              ht                h180, 000
                                                        Choose the classifier , ht , with the lowest error  t .
Update the weights
            w1           w2                    wi                   wn               wt 1,i  wt ,i
           miss        correct              correct               miss                                 1 t
    Training the weak learner 圖解說明

       f j (x )
                        If fj(x) > 
                           X is a face
                                             False positive                       j   wi | h j ( xi )  yi |
  ex                   h j ( xi )  1


False negative
                                                              X (Training set)

                      Face examples           Non-Face examples

                                                    1, if Pj f j ( x)  Pj j
                                         h j ( x)  
                                                    0      otherwise
                                         where  j is a threshold,
                                         Pj indicating the direction of the inequality sign,
                                         f j is a feature
   Place the most weight on the examples must
    often misclassified by the preceding weak rules
       Forcing the base learner to focus its attention on the
        “hardest” examples
          The Boost algorithm for classifier learning

           Step 1: Giving example images

                                                ( x1 , y1 ), ( x2 , y2 ), ... , ( xn , yn )

              Step 2: Initialize the
                    weights                   1 1
                                       w1,i     , for yi  0,1,
                                            2m 2l
                                       m and l are the # of negatives and positives.
      Weak learner constructor
     For t = 1, … , T
           1. Normalize the weights,
           2. For each feature j, train a classifier hj which is restricted to using a single
              feature                                          Selected the weaker classifiers
            3. Update the weights:             h      h     h
                                                            t 1     t2          t 3
Final strong classifier

                                                                   t                    j   wi | h j ( xi )  yi |
                                                                          1 t                 i
   The Big Picture on testing process
                     Stage 1                                  Stage 2                       Stage 3
              Ada Boosting Learner               Ada Boosting Learner                Ada Boosting Learner
                      h1                          h1     h2             h10           h1 h2          more

                  Feature Select
                   & Classifier
Feature set
                                   100% Detection Rate
                                    50% False Positive

                  False (Reject)                       False (Reject)                     False (Reject)

                 Reject as many negatives as possible  (minimize the false negative)
A tremendously difficult problem
   How to determine
     The number of classifier stages
     The number of features in each stages

     The threshold of each stage
Training example                Stage 1                                          Stage 2

                          Ada Boosting Learner                Pass     Ada Boosting Learner
                                  h1                                   h1   h2               h10
                                                 100% Detection Rate
                                                  50% False Positive
                              Feature Select
                               & Classifier

                              False (Reject)                                False (Reject)


   A 38 layer cascaded classifier was trained to
    detect frontal upright faces
       Training set:
          Face: 4916 hand labeled faces with resolution 24x24.
          Non-face: 9544 images contain no face.
                      (350 million subwindows within these non-face images)
       Features
          The first five layers of the detector: 1, 10, 25, 25 and 50
          Total # of features in all layer  6061
   Each classifier in the cascade was trained
     Face : 4916 + the vertical mirror image  9832 images
     Non-face sub-windows: 10,000 (size=24x24)

   Speed of the final Detector
   Image Processing
   Scanning the Detector
   Integration of Multiple Detector
   Experiments on a Real-World Test Set
            Speed of the final Detector

   The speed is directly related to the number of
    features evaluated per scanned sub-window.
   MIT+CMU test set
        An average of 10 features out of a total 6061 are
         evaluated per sub-window.
   On a 700Mhz PentiumIII, a 384 x 288 pixel
    image in about .067 seconds (using a staring scale of
    1.25 and a step size of 1.5)
                              Image Processing

      Minimize the effect of different lighting-conditions
        Variance normalized

              Scanning the Detector

   The final detector is scanned across the image at
    multiple scale and locations
                       Scale is achieved by scaling the
                       detector itself rather than the image
       Good results are obtained using a set of scales a
        factor of 1.25 apart
   Locations are obtained by shifting the window
    some pixels             [] is the rounding operation
       If the current scale is s, the window is shifted by [ s]
     Integration of Multiple Detector

   Multiple detections will usually occur around
    each face and some types of false positives.
   A post-process to detected sub-windows in
    order to combine overlapping detections into a
    single detection
       Two detections are in the same subset if their
        bounding regions overlap
Experiments on a Real-World Test Set

                   The MIT+CMU frontal face test set consists
                   of 130 images with 507 labeled frontal faces
          Experiments on a Real-World Test Set

Our detector

          Detection rates for various numbers of false positives on the MIT+ CMU test set
          containing 130 images and 507 faces.
ROC curve for the face detector on MIT+CMU test set       75,081,800 sub-windows

         Correct detection rate

                                         False Positive

       The detector was run using a step size of 1.0 and starting scale of 1.0
    A simple voting scheme to further improve results

    Running three detectors
       The 38 layer one described above plus two similarly
        trained detectors
       Output the majority vote of three detectors

    The improvement would be greater if the detectors were more independent.
Output of our face detector from the MIT+CMU test set
   A object detection approach minimizes
    computation time while achieving high detection
        The detector is approximately 15 times faster than previous approach

   This paper brings together new algorithms,
    representations and insights which are quite
   The database set includes faces under very wide
    range of conditions including: illumination, scale,
    pose, and camera variation

Shared By: