A Brief Introduction to Adaboost by mX32o3

VIEWS: 15 PAGES: 35

									   A Brief Introduction
   to Adaboost
   Hongbo Deng
   6 Feb, 2007

                                                           1
Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman.
Outline
   Background

   Adaboost Algorithm

   Theory/Interpretations



                             2
What’s So Good About Adaboost
   Can be used with many different classifiers

   Improves classification accuracy

   Commonly used in many areas

   Simple to implement

   Not prone to overfitting

                                                  3
                                Resampling for
A Brief History                estimating statistic


   Bootstrapping

   Bagging                     Resampling for
                                  classifier
   Boosting (Schapire 1989)       design

   Adaboost (Schapire 1995)

                                                 4
Bootstrap Estimation
 Repeatedly draw n samples from D
 For each set of samples, estimate a
  statistic
 The bootstrap estimate is the mean of the
  individual estimates
 Used to estimate a statistic (parameter)
  and its variance

                                              5
Bagging - Aggregate Bootstrapping

   For i = 1 .. M
     Draw n*<n samples from D with replacement
     Learn classifier Ci

 Final classifier is a vote of C1 .. CM
 Increases classifier stability/reduces
  variance                                   D2
                                   D1



                                        D3        D
                                                      6
Boosting (Schapire 1989)
   Consider creating three component classifiers for a two-category problem
    through boosting.
   Randomly select n1 < n samples from D without replacement to obtain D1
        Train weak learner C1

   Select n2 < n samples from D with half of the samples misclassified by C1 to
    obtain D2
        Train weak learner C2

   Select all remaining samples from D that C1 and C2 disagree on
        Train weak learner C3
                                                                              D
   Final classifier is vote of weak learners                           D3
                                                           D1

                                                                         D2   -
                                                                         ++   -

                                                                                   7
Adaboost - Adaptive Boosting
   Instead of resampling, uses training set re-weighting
       Each training sample uses a weight to determine the probability
        of being selected for a training set.

   AdaBoost is an algorithm for constructing a “strong”
    classifier as linear combination of “simple” “weak”
    classifier


   Final classification based on weighted vote of weak
    classifiers


                                                                          8
Adaboost Terminology
   ht(x) … “weak” or basis classifier (Classifier =
    Learner = Hypothesis)
                  … “strong” or final classifier

   Weak Classifier: < 50% error over any
    distribution
   Strong Classifier: thresholded linear combination
    of weak classifier outputs

                                                       9
Discrete Adaboost Algorithm
            Each training sample has a
           weight, which determines the
          probability of being selected for
         training the component classifier




                                              10
Find the Weak Classifier




                           11
Find the Weak Classifier




                           12
The algorithm core




                     13
Reweighting



              y * h(x) = 1


               y * h(x) = -1




                               14
Reweighting




In this way, AdaBoost “focused on” the
informative or “difficult” examples.
                                         15
Reweighting




In this way, AdaBoost “focused on” the
informative or “difficult” examples.
                                         16
Algorithm recapitulation   t=1




                                 17
Algorithm recapitulation




                           18
Algorithm recapitulation




                           19
Algorithm recapitulation




                           20
Algorithm recapitulation




                           21
Algorithm recapitulation




                           22
Algorithm recapitulation




                           23
Algorithm recapitulation




                           24
Pros and cons of AdaBoost
Advantages
   Very  simple to implement
   Does feature selection resulting in relatively
    simple classifier
   Fairly good generalization
Disadvantages
   Suboptimal  solution
   Sensitive to noisy data and outliers


                                                     25
References
   Duda, Hart, ect – Pattern Classification

   Freund – “An adaptive version of the boost by majority algorithm”

   Freund – “Experiments with a new boosting algorithm”

   Freund, Schapire – “A decision-theoretic generalization of on-line learning and an application to boosting”

   Friedman, Hastie, etc – “Additive Logistic Regression: A Statistical View of Boosting”

   Jin, Liu, etc (CMU) – “A New Boosting Algorithm Using Input-Dependent Regularizer”

   Li, Zhang, etc – “Floatboost Learning for Classification”

   Opitz, Maclin – “Popular Ensemble Methods: An Empirical Study”

   Ratsch, Warmuth – “Efficient Margin Maximization with Boosting”

   Schapire, Freund, etc – “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods”

   Schapire, Singer – “Improved Boosting Algorithms Using Confidence-Weighted Predictions”

   Schapire – “The Boosting Approach to Machine Learning: An overview”

   Zhang, Li, etc – “Multi-view Face Detection with Floatboost”

                                                                                                                  26
Appendix
 Bound on training error
 Adaboost Variants




                            27
Bound on Training Error (Schapire)




                                 28
Discrete Adaboost (DiscreteAB)
(Friedman’s wording)




                                 29
Discrete Adaboost (DiscreteAB)
(Freund and Schapire’s wording)




                                  30
Adaboost with Confidence
Weighted Predictions (RealAB)




                                31
Adaboost Variants Proposed By
Friedman
   LogitBoost
     Solves
     Requires         care to avoid numerical problems


   GentleBoost
       Update is fm(x) = P(y=1 | x) – P(y=0 | x) instead of
            Bounded [0 1]




                                                               32
Adaboost Variants Proposed By
Friedman
   LogitBoost




                                33
Adaboost Variants Proposed By
Friedman
   GentleBoost




                                34
Thanks!!!
Any comments or questions?




                         35

								
To top