VIEWS: 8 PAGES: 23 POSTED ON: 2/15/2012
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola and Michael Jones In CVPR01, pages I:511–518, 2001 Presented by: Chang Jia As for: Advanced Computer Vision Instructor: Dr. Mircea Nicolescu 10/26/2006 The Problem Main Goal – Develop an algorithm to learn a fast and accurate method for visual object (face) detection. Problem Definition Minimizes computation time Achieves high detection rates Minimizes hardware requirements Practical Application User interfaces Teleconferencing Security Image database Major Contributions New Image Representation – “Integral Image” Very fast feature evaluation Integral image derived by a few operations per pixel Rectangular features can be computed in constant time Constructing a Classifier by Selecting a Small Number of Feature using “AdaBoost” Fast classification Select critical features Effective learning algorithm Combining Complex Classifiers in “Cascade” Structure Dramatically increase the speed of the detector Focusing attention Definition of Simple Features for Object Detection Three rectangular feature types: • two-rectangle feature type (horizontal/vertical) • three-rectangle feature type • four-rectangle feature type Using a 24x24 pixel base detection window, with all the possible combination of horizontal and vertical location and scale of these feature types the full set of features has over 180,000 features. The motivation behind using rectangular features, as opposed to more expressive steerable filters is due to their extreme computational efficiency. Integral Image Def: The integral image at location (x,y), is the sum of the pixel values above and to the left of (x,y), inclusive. Using the following two recurrences, where i(x,y) is the pixel value of original image at the given location and s(x,y) is the cumulative column sum, we can calculate the integral image representation of the image in a single pass. x (0,0) s(x,y) = s(x,y-1) + i(x,y) ii(x,y) = ii(x-1,y) + s(x,y) y (x-1,y) (x,y) Rapid Evaluation of Rectangular Features Using the integral image representation we can compute the value of any rectangular sum in constant time. For example: the integral sum inside rectangle D we can compute as: ii(4) + ii(1) – ii(2) – ii(3) As a result two-, three-, and four-rectangular features can be computed with 6, 8 and 9 array references respectively. Classification Function Important Definitions False Positive Found a positive result where none actually exists False Negative Reports that a result was not detected, when it was really present Classification Function Goal Select single feature which best separate examples Learning a Classification Function Given a feature set and labeled training set of images we can apply number of machine learning techniques. Problem: Too many features. Hypothesis: A combination of only a small number of these features can yield an effective classifier. (learned by experiment) Challenge: Find these discriminative features. Use aggressive approach which would discard the vast majority of features Train Classifier – “AdaBoost” AdaBoost AdaBoost (Adaptive Boost) Iterative learning algorithm Construct a “strong” classifier using A training set “Weak” learning algorithm. A “weak” classifier is selected at each iteration. Later classifiers are tuned up in favor of those regions misclassified by previous classifiers. AdaBoost Pros and Cons Pros Very simple to implement Fairly good generalization The prior error need not be known ahead of time Cons Suboptimal solution Can over fit in presence of noise AdaBoost for Aggressive Feature Selection Given example images (x1,y1) , … , (xn,yn) where yi = 0, 1 for negative and positive examples respectively. Initialize weights w1,i = 1/(2m), 1/(2l) for training example i, where m and l are the Week number of negatives and positives respectively. Classifier For t = 1 … T 1) Normalize weights so that wt is a distribution 2) For each feature j train a classifier hj and evaluate its error j with respect to wt. 3) Chose the classifier hj with lowest error. 4) Update weights according to: 1 i Enhance wt 1,i wt ,i t Misclassified where ei = 0 is xi is classified correctly, 1 otherwise, and t t 1 t Classifiers The final strong classifier is: Weight 1 T 1 t 1 t ht ( x) 2 t 1 t , T 1 h( x ) where t log( ) 0 otherwise t First Two Features Selected by AdaBoost for Face Detections The Detection Process is Based on the Feature Rather than the Pixels Directly. Reach image representation The ad-hoc domain knowledge is difficult to learn using a finite quantify of training data. The feature based system operates much faster. The first and second features selected by AdaBoost The two features are shown in the top row and then overlayed on a typical training face in the bottom row. The Attentional Cascade Main Idea Start with simple classifiers with Low false negative High false positive rates Positive results from the first classifier affects the evaluation of a second (more complex) classifier A negative outcome at any point leads to the immediate rejection of the sub-window Series of such simple classifiers can achieve good detection performance while eliminating the need for further processing of negative sub-windows. The Attentional Cascade To build an optimal attentional-cascade classifier, we need to choose these parameters: the number of classifier stages the number of features of each stage the threshold of each stage Design Tradeoffs More feature achieve Higher detection rates Lower false positive rates. More feature require More time to compute Training an Attentional Cascade User Constrains Maximum acceptable false positive rate per layer Minimum acceptable detection rate per layer Target Overall false positive rate Overall detection rate User Input A set of positive and negative examples Stages are Added until the Requirements are not Met Experiments Cascade Length A 38 layer cascaded classifier was trained to detect frontal upright faces Training set: Face: 4916 hand labeled faces with resolution 24x24. Non-face: 9544 images contain no face. (350 million sub-windows) Training task: Detect frontal upright faces Cascade Characteristic (# of Features) The first five layers of the detector: 1, 10, 25, 25 and 50 features Total # of features in all layer ���� ���� 6061 Each classifier in the cascade was trained Face: 4916 + the vertical mirror image ���� ���� 9832 images Non-face sub-windows: 10,000 (size=24x24) Experiments (Cont.) Dataset for Training: 4916 positive training example were hand labeled aligned, normalized, and scaled to a base resolution of 24x24 pixels 10,000 negative examples were selected by randomly picking sub-windows from 9544 images which did not contain faces (manually inspected) Results Testing of the final face detector was performed using the MIT+CMU frontal face test which consists of: • 130 images • 507 labeled frontal faces Results in the table compare the performance of the detector to best face detectors known. Results (Cont.) • Speed of the detector ~ the number of features evaluated per scanned sub-window • On the MIT-CMU test set the average number of features evaluated is 10 (out of 6061). • The processing time of a 384 by 288 pixel image on a conventional computer (a 700 MHz Pentium III processor) about .067 seconds. • Processing time should linearly scale with image size! Results (Cont.) ROC curve for the proposed face detector on the MIT+CMU test set. The detector was run using a step size of 1.0 and starting scale of 1.0 (75,081,800 sub-windows scanned). Results (Cont.) Output of our face detector on a number of test images from the MIT+CMU test set. Discussions Main Contributions: “Integral Image” for fast rectangle feature evaluation. Each weak classifier depends on a single feature. Combining complex classifiers in a “cascade”. Issues: Minimizes a quantity related to classification error, but not minimize the number of false negatives. The selected features are not optimal for the task of rejecting negative examples. Questions? Thank you.
Pages to are hidden for
"viola"Please download to view full document