VIEWS: 3 PAGES: 59 POSTED ON: 11/8/2011
Shared features and Joint Boosting Sharing visual features for multiclass and multiview object detection A. Torralba, K. P. Murphy and W. T. Freeman PAMI. vol. 29, no. 5, pp. 854-869, May, 2007. Yuandong Tian Outline Motivation to choose this paper Motivation of this paper Basic ideas in boosting Joint Boost Feature used in this paper My results in face recognition Motivation to choose this paper Axiom: Computer vision is hard. Assumption: (smart-stationary) Equally smart people are equally distributed over time. Conjure: If computer vision cannot be solved in 30 years, it won’t be solved forever! Wrong! Because we are standing on the Shoulder of Giants. Where are the Giants? More computing resources? Lots of data? Advancement of new algorithm? What I Machine Learning? believe Cruel Reality Why ML seems not to help much in CV (at least for now)? My answer: CV and ML are weakly coupled A typical question in CV Q: Why do we use feature A instead of feature B? A1: Feature A gives better performance. A2: Feature A has some fancy properties. A3: The following step requires the feature to have a certain property that only A has. A strongly-coupled answer Typical CV pipeline Preprocessing Steps (―Computer Vision‖) Feature/Similarity ML black box Have some domain- specific structures Design for generic structures Contribution of this paper Tune the ML algorithm in a CV context A good attempt to break the black box and integrate them together Outline Motivation to choose this paper Motivation of this paper Basic ideas in boosting Joint Boost Feature used in this paper My results in face recognition This paper Object Recognition Problem Many object category. Few images per category Solution—Feature sharing Find common features that distinguish a subset of classes against the rest. Feature sharing Concept of Feature Sharing Typical behavior of feature sharing Template-like features Wavelet-like features, 100% accuracy for a single object weaker discriminative power But too specific. but shared in many classes. Result of feature sharing Why feature sharing? ML: Regularization—avoid over-fitting Essentially more positive samples Reuse the data CV: Utilize the intrinsic structure of object category Use domain-specific prior to bias the machine learning algorithm Outline Motivation to choose this paper Motivation of this paper Basic ideas in boosting Joint Boost Feature used in this paper My results in face recognition Basic idea in Boosting Concept: Binary classification samples, labels(+1 or -1) Goal: Find a function (classifier) H which maps positive samples to the positive value Optimization: Minimize the exponential loss w.r.t the classifier H Basic idea in boosting(2) Boosting: Assume H is additive Each is a ―weak‖ learner (classifier). Almost random but uniformly better than random Example: Single feature classifier: make decision only on a single dimension How weak learner looks like Key point: The addition of weak classifiers gives a strong classifier! Basic idea in boosting(3) How to minimize? Greedy Approach Fix H, add one h in each iteration Weighting samples After each iteration, wrongly classified samples (difficult samples) get higher weights Technical parts Greedy -> Second-order Taylor Expansion in each iteration weights The weak learner labels to be optimized in this iteration Solved by Least Square Outline Motivation to choose this paper Motivation of this paper Basic ideas in boosting Joint Boost Feature used in this paper My results in face recognition Joint Boost—Multiclass We can minimize a similar function using one-vs-all strategy This doesn’t work very well, since it is separable in c. Put constraints. -> shared features! Joint Boost (2) In each iteration, choose One common feature A subset of classes that use this feature So that the objective decreases most Sharing Diagram #Iteration I II #class III IV V Features 1 3 4 5 2 1 4 6 2 7 3 Key insight Each class may have its own favorite feature a common feature may not be any of them, however it simultaneously decreases errors of many classes. Joint Boost – Illustration Computational issue Choose the best subset is prohibitive Use greedy approach Choose one class and one feature so that the objective decreases the most Iteratively add more classes until the objective increases again Note the common feature may change From O(2^C) to O(C^2) #features = O(log #class) (greedy) 0.95 ROC 29 objects, average over 20 training sets Outline Motivation to choose this paper Motivation of this paper Basic ideas in boosting Joint Boost Feature used in this paper My results in face recognition Feature they used in the paper Dictionary 2000 random sampled patches Of size from 4x4 to 14x14 no clustering Each patch is associated with a spatial mask The candidate features template position Dictionary of 2000 candidate patches and position masks, randomly sampled from the training images Features Building feature vectors Normalized correlation with each patch to get response Raise the response to some power Large value gets even larger and dominate the response (max operation) Use spatial mask to align the response to the object center (voting) Extract response vector at object center Results Multiclass object recognition Dataset: LabelMe 21 objects, 50 samples per object 500 rounds Multiview car recognition Train on LabelMe, test on PASCAL 12 views, 50 samples per view 300 rounds 70 rounds, 20 training per class, 21 objects 12 views 50 samples per class 300 features Outline Motivation to choose this paper Motivation of this paper Basic ideas in boosting Joint Boost Feature used in this paper My results in face recognition Simple Experiment Main point of this paper They claimed shared feature helps in the situation of many categories, only a few samples in each category. Test it! Dataset: face recognition ―Face in the wild‖ dataset. Many famous figures Experiment configuration Use Gist-like feature but Only Gabor response Use finer grid to gather histogram Face is aligned in the dataset. Feature statistics 8 orientation, 2 scale, 8x8 grid 1024 dimension Experiment Training and testing Find 50 identities with most images For each identity, random select 3 as training The rest for testing Nearest neighbor (50 classes, 3 per class) Chance rate = 0.02 orientation Scale Block K blur L1 L2 Chisqr 8 2 8 3 No 0.1338 0.1033 0.13 8 2 8 1 No 0.1868 0.1350 0.1681 8 2 6 1 No 0.1651 0.1285 0.1544 8 2 8 1 1.0 0.1822 0.1407 0.1754 8 2 8 1 2.0 0.1677 0.1365 0.1616 80% better than NN Result on More images 50 people, 7 images each Chance rate = 2% Nearest neighbor L1 = 0.2856 (0.1868 in 50/3) L2 = 0.2022 Chisqr = 0.2596 Joint Boost doubles the accuracy of NN More feature Single->Pairwise is shared Pairwise->Joint 7% percent Result on More Identities 100 people, 3 images each Chance rate = 1% Nearest neighbor L1 = 0.1656 (0.1868 in 50/3) L2 = 0.1235 Chisqr = 0.1623 Joint Boost is still better than NN yet the increment is less (~60%) compared to the previous cases. The performance of single Boost is the same as NN Conclusion Joint Boosting indeed works Especially when the number of images per class is not too small (otherwise NN) Better performance in the presence of Many classes, each class has only a new samples Introduce regularization that reduce overfitting Disadvantages Train slowly, O(C^2). Thanks! Any questions?