Learning Center
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>



									  Rapid Object Detection
using a Boosted Cascade
      of Simple Features

          Paul Viola and Michael Jones
     In CVPR01, pages I:511–518, 2001

           Presented by: Chang Jia
 As for: Advanced Computer Vision
    Instructor: Dr. Mircea Nicolescu

The Problem
   Main Goal – Develop an algorithm to learn a fast and accurate
    method for visual object (face) detection.
   Problem Definition

       Minimizes computation time
       Achieves high detection rates
       Minimizes hardware requirements
   Practical Application
       User interfaces
       Teleconferencing
       Security
       Image database
Major Contributions
   New Image Representation – “Integral Image”
       Very fast feature evaluation
       Integral image derived by a few operations per pixel
       Rectangular features can be computed in constant time

   Constructing a Classifier by Selecting a Small Number of
    Feature using “AdaBoost”
       Fast classification
       Select critical features
       Effective learning algorithm

   Combining Complex Classifiers in “Cascade” Structure
       Dramatically increase the speed of the detector
       Focusing attention
Definition of Simple Features for
Object Detection

                                     Three rectangular feature types:
                                          • two-rectangle feature type
                                          • three-rectangle feature type
                                          • four-rectangle feature type

Using a 24x24 pixel base detection window, with all the possible
combination of horizontal and vertical location and scale of these feature
types the full set of features has over 180,000 features.
The motivation behind using rectangular features, as opposed to more
expressive steerable filters is due to their extreme computational efficiency.
    Integral Image

                            Def: The integral image at location (x,y), is the sum
                            of the pixel values above and to the left of (x,y),
                            Using the following two recurrences, where i(x,y) is
                            the pixel value of original image at the given
                            location and s(x,y) is the cumulative column sum,
                            we can calculate the integral image representation
                            of the image in a single pass.
                                    s(x,y) = s(x,y-1) + i(x,y)
                                    ii(x,y) = ii(x-1,y) + s(x,y)

y           (x-1,y)   (x,y)
Rapid Evaluation of Rectangular

                               Using the integral image
                               representation we can
                               compute the value of any
                               rectangular sum in constant
                               For example: the integral sum
                               inside rectangle D we can
                               compute as:
                                   ii(4) + ii(1) – ii(2) – ii(3)

As a result two-, three-, and four-rectangular features can be
computed with 6, 8 and 9 array references respectively.
Classification Function

   Important Definitions
       False Positive
           Found a positive result where none actually exists
       False Negative
           Reports that a result was not detected, when it was
            really present

   Classification Function Goal
       Select single feature which best separate
Learning a Classification Function

   Given a feature set and labeled training set of images we can
    apply number of machine learning techniques.
   Problem: Too many features.
   Hypothesis: A combination of only a small number of these
    features can yield an effective classifier. (learned by
   Challenge: Find these discriminative features.
       Use aggressive approach which would discard the vast
        majority of features
   Train Classifier – “AdaBoost”

   AdaBoost (Adaptive Boost)
       Iterative learning algorithm
       Construct a “strong” classifier using
           A training set
           “Weak” learning algorithm.
       A “weak” classifier is selected at each iteration.
       Later classifiers are tuned up in favor of those
        regions misclassified by previous classifiers.
AdaBoost Pros and Cons

   Pros
       Very simple to implement
       Fairly good generalization
       The prior error need not be known ahead of time
   Cons
       Suboptimal solution
       Can over fit in presence of noise
AdaBoost for Aggressive Feature
  Given example images (x1,y1) , … , (xn,yn) where yi = 0, 1 for negative and positive
    examples respectively.
  Initialize weights w1,i = 1/(2m), 1/(2l) for training example i, where m and l are the  Week
    number of negatives and positives respectively.                                      Classifier
 For t = 1 … T
         1) Normalize weights so that wt is a distribution
         2) For each feature j train a classifier hj and evaluate its error j with respect to wt.
         3) Chose the classifier hj with lowest error.
         4) Update weights according to:
                                  1 i                                              Enhance
                wt 1,i  wt ,i 
                                              t                                      Misclassified
                 where ei = 0 is xi is classified correctly, 1 otherwise, and
                                 t
                               1      t                                                  Classifiers
  The final strong classifier is:                                                        Weight
                                                   1 T
                  1       t 1 t ht ( x)        2 t 1 t ,
         h( x )                                                 where      t
                                                                                   log(
                                           otherwise                                          t
First Two Features Selected by
AdaBoost for Face Detections
   The Detection Process is Based on the Feature Rather than the
    Pixels Directly.
     Reach image representation
           The ad-hoc domain knowledge is difficult to learn using a finite
            quantify of training data.
       The feature based system operates much faster.

                                                 The first and second features
                                                 selected by AdaBoost
                                                 The two features are shown in
                                                 the top row and then overlayed
                                                 on a typical training face in the
                                                 bottom row.
    The Attentional Cascade
   Main Idea
       Start with simple classifiers with
           Low false negative
           High false positive rates
       Positive results from the first
        classifier affects the evaluation
        of a second (more complex) classifier
       A negative outcome at any point leads to the immediate
        rejection of the sub-window
        Series of such simple classifiers can achieve good detection
        performance while eliminating the need for further processing of
        negative sub-windows.
The Attentional Cascade

   To build an optimal attentional-cascade classifier, we
    need to choose these parameters:
       the number of classifier stages
       the number of features of each stage
       the threshold of each stage
   Design Tradeoffs
       More feature achieve
         Higher detection rates

         Lower false positive rates.

       More feature require
         More time to compute
Training an Attentional Cascade

   User Constrains
     Maximum acceptable false positive rate per layer

     Minimum acceptable detection rate per layer

     Target
         Overall false positive rate
         Overall detection rate
   User Input
     A set of positive and negative examples

   Stages are Added until the Requirements are not Met
   Cascade Length
       A 38 layer cascaded classifier was trained to detect frontal upright
       Training set:
           Face: 4916 hand labeled faces with resolution 24x24.
           Non-face: 9544 images contain no face. (350 million sub-windows)
       Training task:
           Detect frontal upright faces
   Cascade Characteristic (# of Features)
       The first five layers of the detector: 1, 10, 25, 25 and 50 features
       Total # of features in all layer ����   ����   6061
   Each classifier in the cascade was trained
       Face: 4916 + the vertical mirror image ����      ����    9832 images
       Non-face sub-windows: 10,000 (size=24x24)
Experiments (Cont.)

Dataset for Training:

    4916     positive  training
     example were hand labeled
     aligned, normalized, and
     scaled to a base resolution
     of 24x24 pixels
    10,000 negative examples
     were selected by randomly
     picking sub-windows from
     9544 images which did not
     contain faces (manually
Testing of the final face detector was performed using the
MIT+CMU frontal face test which consists of:
• 130 images
• 507 labeled frontal faces
Results in the table compare the performance of the detector to
best face detectors known.
Results (Cont.)

• Speed of the detector ~ the number of features evaluated per
scanned sub-window
• On the MIT-CMU test set the average number of features
evaluated is 10 (out of 6061).
• The processing time of a 384 by 288 pixel image on a
conventional computer (a 700 MHz Pentium III processor)
about .067 seconds.
• Processing time should linearly scale with image size!
Results (Cont.)

  ROC curve for the proposed face detector on the MIT+CMU test set.
  The detector was run using a step size of 1.0 and starting scale of 1.0
  (75,081,800 sub-windows scanned).
  Results (Cont.)

Output of our face detector on a number of test images from the MIT+CMU test set.
   Main Contributions:
       “Integral Image” for fast rectangle feature evaluation.
       Each weak classifier depends on a single feature.
       Combining complex classifiers in a “cascade”.
   Issues:
       Minimizes a quantity related to classification error, but not
        minimize the number of false negatives.
       The selected features are not optimal for the task of
        rejecting negative examples.

             Thank you.

To top