5707 4 adaboost

Document Sample
5707 4 adaboost Powered By Docstoc
					Adaboost for building robust
classifiers

    KH Wong




              Adaboost v.2a    1
Overview

   Objective of AdaBoost
   2-class problems
       Training
       Detection
       Examples




                      Adaboost v.2a   2
Objective

   Automatically classify inputs into different
    categories of similar features
   Example
       Face detection:
           find the faces in the input image


       Vision based gesture recognition [Chen 2007]




                                 Adaboost v.2a         3
Different detection problems

   Two classes problem (will be discussed here)
       E.g. face detection
           In a picture, are there any faces or no faces?
   Multi-class problems (Not discussed here)
       Adaboost can be extended to handle multi class
        problems
           In a picture, are there any faces of men , women,
            children etc. (Still an unsolved problem)




                                 Adaboost v.2a                  4
Define a 2-class classifier
:its method and procedures
   Supervised training
       Show many positive samples (face) to the system
       Show many negative samples (non-face) to the
        system
       Learn the parameters and construct the final
        strong classifier
   Detection
       Given an unknown input image, the system can
        tell if there are positive samples (faces) faces or
        not

                             Adaboost v.2a                    5
We will learn

   Training procedures
       Give +ve and –ve examples to the system, then
        the system will learn to to classify an unknown
        input.
       E.g. give pictures of faces (+ve examples) and
        non-faces (-ve examples) to train the system.
   Detection procedures
       Input an unknown (e.g. an image) , the system will
        tell you it is a face or not.


                           Adaboost v.2a   Face   non-face   6
(Revised)!! First let us learn what
is what a weak classifier h( )
    - -Case1- -
    If a point [ x  (u,v)] is in the " gray" area , then h(x)  1

    otherwiseh(x)  0. It can be written as :
              1 if mu  v  c , where m,x, are given constants                            v
    h( x )                                                                                        v=mu+c
              1 otherwise
    - -Case2 - -
    If a point [x  (u, v)] is in the " white" area , then h(x)  1                  v>mu+c Gradient m
    otherwiseh(x)  1. It can be written as :
              1 if - ( mu  v )  c , where m,x, are given constants
    h( x )  
              1 otherwise                                                                c        v<mu+c
    - -At time t , combine case1 and 2 togather to becomeequation (i)                           (0,0)  u
    and use polarity pt to control whichcase you want touse.
                1 if pt f t ( x )  pt t
    ht ( x )                                       (i )                   •m,c are used to
                 1 otherwise
                                                                                     define the line
    where pt  polarity {1 or - 1},
    f is the function: ( f ( x[u, v ])  mu  v, ) and  t  c
                                                                                     •Any points in the
    where m,c are constants,u,v are variables.                                       gray area satisfy
     pt mu  v   pt c, equation(i ) becomes                                       v<mu+c
                1 if pt mu  v   pt c                                           •Any points in the
    ht ( x )                                      (ib)
                1 otherwise                                                        white area satisfy
                                                                     Adaboost v.2a   v>mu+c               7
The weak classifier (a summary)                        Function f is a straight line

                                                         v
   By definition a weak classifier                                 v=mu+c
    should be slightly better than a
    random choice (probability                               Gradient m
    =0.5) . Otherwise you should
    use a dice!                                    v>mu+c

   In u,v space, function f is a                       c        v<mu+c
                                                             (0,0)      u
    straight line defined by m,c.




                                   Adaboost v.2a                               8
Learn what is h( ), a weak classifier.
Decision stump
   Decision stump                                                Example
    definition
   A decision stump is a machine
    learning model consisting of a one-                              Temperature T
    level decision tree.[1] That is, it is a
    decision tree with one internal node
    (the root) which is immediately
    connected to the terminal nodes. A                         T<=10oc 10oc<T<28oC T>=280c
    decision stump makes a prediction
                                                               Cold        mild       hot
    based on the value of just a single
    input feature. Sometimes they are also
    called 1-rules.[2]
   From http://en.wikipedia.org/wiki/Decision_stump




                                                   Adaboost v.2a                             9
A weak learner (classifier ) is a decision stump
    Define weak learners based on rectangle
     features
                                 The function of a
                               decision-line in space



                    1 if pt f t ( x )  ptt
        ht ( x )  
                    1 otherwise                       threshold
       window
                                            Pt=
                                      polarity{+1,-1}               Decision
                                       Select which                 line
                                      side separated
                                      by the line you
                                           prefer


                              Adaboost v.2a                                10
Weak classifier we use here: Axis parallel
weak classifier
   We will use special
    type: axis parallel weak
                                                                 If polarity pt=1, this region is -1
    classifier                                                   If polarity pt=-1, this region is +1
   The decision line is
    parallel to the either the
    horizontal or vertical
    axis.
                                                  ht(x)
    use v0  t  threshold
                                                          v0
    f t  x  ( u, v )   ( v )
                         1 if pt (v )  pt v0
    ht x  (u, v )   
                         1 otherwise


                                                    If polarity pt=1, this region is +1
                                                    If polarity pt=-1, this region is -1


                                                           Adaboost v.2a                                11
An example to show how Adaboost works
                                                v-axis [xi={-0.48,0},yi=’+’]

   Training,
       Present ten samples to the system
        :[xi={ui,vi},yi={’+’ or ‘-’}]
           5 +ve (blue, diamond) samples
           5 –ve (red, circle) samples
       Train up the system
       Detection
           Give an input xj=(1.5,3.4)
           The system will tell you it is ‘+’ or ‘-
            ’. E.g. Face or non-face
   Example:
       u=weight, v=height                                  u-axis    [xi={-0.2,-0.5},yi=’+’]
       Classification: suitability to play in
        the basket ball team.

                                         Adaboost v.2a                                  12
Adaboost concept                                                                Training data
                                                                                6 squares,
    Use this training                                                          5 circles.
     data, how to make a
                                                       Objective: Train a classifier to
     classifier                                        classify an unknown input to see
                      h3( )                            if it is a circle or square.

     h1( )
     h2 ( )                                                                 The solution is a
                                                                            H_complex( )

    One axis-parallel weak
                                                       The above strong classifier should; work,
    classifier cannot achieve 100%
                                                       but how can we find it?
    classification. h1(), h2(), h3() all
    fail.                                              ASNWER:
    You may try it yourself!                           Combine many weak classifiers
                                                       to achieve it.

                                           Adaboost v.2a                                  13
    How?

    

       h1( )          h2()        h3( )          h4( )     h5( )           h6()               h7()
Classification
Result
                             2     3 4                 5       6
  Weight for each
  weak classifier       1                                                 7
  i , i  1,2,..,7

                                                               Combine to form the
                                                               Final strong classifier
                                                                       T         
                                                           H(x)  sign  αt ht(x)
                                                                       t 1      

                                          Adaboost v.2a                                  14
                     Given : (x1, y1 ),..(xn ,yn ),where xi  X , yi  Y  {1,1}
Adaboost
                     Initialze distribution Dt 1 (i )  1 / n; such that n  M  L
Algorithm            M  number of positive (  1 ) examples; L  number of negative ( 1) examples
    Initialization   For t  1,...T
                     { Step1a : Find the classifier ht : X  {1,1} that minimizes the

                                    error withrespec t to Dt , that means : ht  argmin εq 
                                                                                     q
                                                                                             
                                                                                              
                                                n
                                                                                                          1 if ht ( xi )  yi 
                         Step1b : error  t   Dt (i ) * I ht ( xi ) yi , where I ht ( xi ) yi   
                                              i 1                                                       0        otherwise
                                    checking step : prerequisite : εt  0.5, otherwisestop.
                                           1 1  εt
                         Step2 :  t        ln     ,  t  weight (or confidence value).
                                           2    εt
                                                Dt (i ) exp( t yi ht ( xi ))
                         Step3 : Dt 1 (i )                                   , see next slide for explanatio n
                                                            Zt
                                                                                                  j t
                     Step4 : Current total cascadedclassifier error CEt   E j t , α , h (xi )
                                                                                                  j 1

    Main                         w the current classifier error E 
                                 hile
                                                                                     1   n

                                                                                        I t, α , h (xi ),
                                                                                     n  1
    Training
                              and I ( ) is defined as follows :
    loop                          If xi is correctlyclassified by the current cascadedclassifier , i.e.
                                              t           
                                  yi  sign  α h (xi ) , hence error I t , α , h (xi )  0
                                               1        
                                  If xi is incorrectly classified by the current cascadedclassifier i.e.
                                           t             
                                 yi  sign  α h (xi ) , hence error I t , α , h (xi )  1
                                            1          
                              If CEt  0 then T  t, break;
                     }
                                              t
                                                                                       1 if yi  signot (i ) 
                     The output ot ( xi )   α h (xi ), and S t , α , h (xi )  
                                             1                                       0         otherwise


                                                            T         
The final strong     The final strongclassifier H(x)  sign  αt ht(x)
                                    Adaboost v.2a                                                                                   15
classifier                                                  t 1      
Note: Normalization factor Zt in step3
  AdaBoost choose this weight update function deliberately

                 Dt 1 (i )  Dt (i ) exp(  t yi ht ( xi ))
  Because,
  •when a training sample is correctly classified, weight decreases
  •when a training sample is in correctly classified, weight increases


   Re call :
                                    Dt (i ) exp( t yi ht ( xi ))
   Step3 : Dt 1 (i )                                             ,
                                                Zt
   where Z t  normalization factor,so Dt becomes a probability distrubution
            n _ correctly _ classified                    n _ incorrectly _ classified
   Zt                    correct _ weight 
                         i 1
                                                                      incorrrect _ weight
                                                                      i 1
       n _ correctly _ classified                        n _ incorrectly _ classified
                D
                  i 1
                            t   (i)e-αt yi ht(xi )                 D
                                                                     i 1
                                                                             t      (i)eαt yi ht(xi )




                                                                  Adaboost v.2a                         16
Note: Stopping criterion of the main loop
   The main loops stops when all training data are correctly
    classified by the cascaded classifier up to stage t.
                                                                        j t
        Step4 : Current total cascadedclassifier error CEt   E j t , α , h (xi )
                                                                       j 1

                                                     1 n
              w the current classifier error E   I t , α , h (xi ),
               hile
                                                     n  1
             and I ( ) is defined as follows :
               If xi is correctlyclassified by the current cascadedclassifier , i.e.
                            t           
                yi  sign  α h (xi ) , hence error I t , α , h (xi )  0
                             1        
                If xi is incorrectly classified by the current cascadedclassifier i.e.
                          t             
                yi  sign  α h (xi ) , hence error I t , α , h (xi )  1
                           1          
             If CEt  0 then T  t, break;
        }

                                                  Adaboost v.2a                          17
Dt(i) =weight

   Dt(i) = probability distribution of the i-th
    training sample at time t . i=1,2…n.
   It shows how much you trust this sample.
   At t=1, all samples are the same with equal
    weight. Dt=1(all i)=same
   At >1 , Dt>1(i) will be modified, we will see
    later.


                        Adaboost v.2a               18
Initialization

    Given : (x1,y1 ),..(xn ,yn ),where where xi  X , yi  Y  {1,1}
    Initialze Dt 1 (i )  1 / n; such that n  M  L
    M  positive example; L  negative example


   M=5 +ve (blue, diamond) samples
   L=5 –ve (red, circle) samples
   n=M+L=10
   Initialize weight D(t=1)(i)= 1/10 for all
    i=1,2,..,10,
       So, D(1)(1)=0.1, D(1) (2)=0.1,……, D(1)(10)=0.1

                                             Adaboost v.2a               19
Main training loop


    Step 1a, 1b




                  Adaboost v.2a   20
Select h( ): For simplicity in implementation we
use the Axis-parallel weak classifier

    Recall
               1 if pt f t ( x )  ptt
    ht ( x )                                    (i )
               0 otherwise                                                             hb(x)
                                                                       v0
    where pt  polarity {1 or - 1},  t  v  threshold
    f is the function: ( f  mu  c ), m,c are constants,u,v are variables.
    
    Axis - parallel weak classifier
     f is a line of gradient m  0 (horizontal line)                          ha (x)
    the position of the line can be controlled by v0
    or
     f is a line of gradient m   (vertcial line)
    the position of the line can be controlled by u0



                                                     Adaboost v.2a
                                                                               u0      21
                  {Step1a : Find the classifier ht : X  {1,1} that minimize the error with respect to Dt
    Step1a,       That means : ht  arg  min εq 
                                         q
                                                  
                                                   
    1b            Step1b : checking step : prerequisi te : εt  0.5, otherwise stop.


                                                            Incorrectly classified by hq()
   Assume h() can only be
    horizontal or vertical                                                             hq()
    separators. (axis-parallel
    weak classifier)
   There are still many ways
    to set h(), here, if this hq()
    is selected, there will be 3
    incorrectly classified
    training samples.
   See the 3 circled training
    samples
   We can go through all h( )s
    and select the best with the
    least misclassification (see
    the following 2 slides)

                                            Adaboost v.2a                                              22
                                                    There are 9x2 choices here,
Example :Training example slides from [Smyth 2007]
                                                    hi=1,2,3,..9, (polarity +1)
classifier the ten red (circle)/blue (diamond) dots
                                                    h’i=1,2,3,..9, (polarity -1)
Step 1a:

             hi=1(x) ………….. hi=4(x) ……………… hi=9(x)


                                                            You may choose
                                                            one of the following
                                                            axis-parallel (vertical
    v-axis
                                                            line) classifiers

                                                                            1 if pu  pui
                                                                hi ( x )  
                                                                             1 otherwise
                                                                x  (u, v ), v is not used
                                                                because hi(x) is parallel to
                                                                the vertical axis.
                   u1 u2 u3   u4 u5 u6 u7 u8          u9        polarity p  {  1,-1 }
    Initialize:
                          u-axis
    Dn(t=1)=1/10
              Vertical Dotted lines
                                      Adaboost v.2a                                       23
              are possible choices
     Example :Training example slides from [Smyth 2007]     There are 9x2 choices here,
     classifier the ten red (circle)/blue (diamond) dots    hj=1,2,3,..9, (polarity +1)
     Step 1a:                                               h’j=1,2,3,..9, (polarity -1)

                                                            All together including last slide
                                                            36 choices
 
       hj=1(x)                                               v1
       hj=2(x)                                               v2     You may choose
       :                                                     v3     one of the following
v-axis hj=4(x)                                                     axis-parallel (horizontal
       :                                                    V4     lines) classifiers
       :                                                    V5
       :                                                    V6
                                                            V7                   1 if pv  pv j
       :                                                             h j ( x)  
       :                                                    V8                    1 otherwise
                                                                     x  (u, v ), u is not used
       hj=9(x)
                                                            v9       because h j(x) is parallel to
                                                                     the horizontal axis.
                               u-axis                                polarity p  {  1,-1 }
     Initialize:
     Dn(t=1)=1/10
                 Horizontal dotted lines
                                            Adaboost v.2a                                           24
                 are possible choices
Step 1b:
Find and check the error of the weak classifier h( )
   To evaluate how successful is your selected weak classifier h( ),
    we can evaluate the error rate of the weak classifier
                n
                                                                    1 if ht ( xi )  yi 
          t   Dt (i ) * I h ( x ) y , where I h ( x ) y   
                                                                   0        otherwise
                                t   i   i              t   i   i
               i 1

          Step1b : checking step : prerequisite : εt  0.5, otherwisestop.

   ɛt = Misclassification probability of h( )
   Checking: If εt>= 0.5 (something wrong), stop the training
       Because, by definition a weak classifier should be slightly
        better than a random choice--probability =0.5
       So if εt >= 0.5 , your h( ) is a bad choice, redesign
        another h( ) and do the training based on the new h( ).

                                              Adaboost v.2a                                   25
 Exercise0 for Step1a,1b
{Step1a : Find the classfier ht : X  {1,1} that minimizes the error withrespect toDt
 Step1b : checking step : prerequisite : εt  0.5, otherwisestop.

    Assume h() can only be
     horizontal or vertical
     separators.
    How many different
     classifiers are available?
    If hj() is selected, circle the
     misclassified training
     samples. Find ɛ( ) to see
     misclassification probability if         hj()
     the probability distribution (D)
     for each sample is the same.
    Find h() with minimum error.




                                        Adaboost v.2a                              26
Result of step2 at t=1
                                 Incorrectly classified by ht=1(x)





                                                     ht=1(x)




                 Adaboost v.2a                                       27
Step2 at t=1 (refer to the previous slide)
                                                        n

     Using εt=1=0.3,                            t   Dt (i ) * I h ( x ) y ,
                                                                        t   i   i
                                                       i 1
     because 3 samples are                                                  1 if ht ( xi )  yi 
                                                where I ht ( xi ) yi   
     incorrectly classified                                                0        otherwise


    εt 1  0.1  0.1  0.1  0.3
                    1 1  εt
    Step2 :  t      ln
                    2    εt
    where εt is the weighted error rate of classifier ht .
    so
       1 1  0.3
t 1  ln        0.424
       2   0.30
The proof can be found at http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdf

                                     Adaboost v.2a                                                    28
Step3 at t=1, update Dt toDt+1



                            Dt (i ) exp( t yi ht ( xi ))
       Step3 : Dt 1 (i ) 
                                        Zt
       where Z t  normalization factor,
       so Dt is a distrubution (prob.function)



The proof can be found at http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdf

                                  Adaboost v.2a                             29
Step 3: Find first Z (the normalization
factor). Note that Dt=1=0.1, at=1 =0.424
    Dt 1  0.1, αt 1  0.424

    7 correctand 3 incorrectsamples
    t 1
    Zt       
           yi  hi ( xi )
                            correct _ weight                              incorrect _ weight
                                                                  yi hi ( xi )

    Zt        D (i)e
           yi  hi ( xi )
                            t
                                 αt yi ht(xi )
                                                         D (i)e
                                                      yi hi ( xi )
                                                                      t
                                                                               αt yi ht(xi )
                                                                                                    (i )

    correctlyclassified : yi  hi ( xi ), so yi hi ( xi )  1, put it in (i)
    incorrectly classified : yi  hi ( xi ), so yi hi ( xi )  1, put it in (i)
    Zt        Dt(i)eαt ( 1) 
           yi  hi ( xi )
                                                       Dt(i)eαt ( 1) 
                                                  yi hi ( xi )
                                                                                                 Dt(i)eαt 
                                                                                            yi  hi ( xi )
                                                                                                                    Dt(i)eαt
                                                                                                                yi hi ( xi )

    (total _ correct _ weight)  (total _ incorrect _ weight)
     0.1 * 7 * 0.65  0.1 * 3 * 1.52
     0.455  0.456
    Z t 1  0.911
                                                                          Adaboost v.2a                                         30
Step 3: Example:
update Dt toDt+1
                 Dt (i )  ( correct)
       Dt 1           e
                 Zt
         0.1 0.42
           e
         Zt
                              0.1
       Dt 1 (i ) correct        0.65
                              Zt
                               0.1  1 0.1 0.42
       Dt 1 (i )incorrect       e      e  0.1 * 1.52
                               Zt      Zt
       since Z t  0.911 , So

                       
         decrease Dt 1 (i ) correct 
                                          0.1  1
                                         0.911
                                               e 
                                                     0. 1
                                                    0.911
                                                           0.65  0.0714

                       
         increase Dt 1 (i )incorrect 
                                           0.1  1
                                         0.911
                                                e 
                                                       0.1
                                                     0.911
                                                            1.52  0.167

                                         Adaboost v.2a                     31
Now run the main training loop second time t=2
                       0.1  1    0.1
 Dt 1 (i )correct         e         0.65  0.0714
                      0.911      0.911
                        0.1  1    0.1
Dt 1 (i )incorrect        e          1.52  0.167
                       0.911       0.911




                                    Adaboost v.2a       32
Final classifier
Exercise: work out 1and 2
       h1()





                        h2()
                                                       h3()



                 2          3
    1                            Combine to form the
                                  Final classifier


                                                T         
                                  H(x)  sign  αt ht(x)
                                                t 1      
                                  H ( x )  sign0.424 * h1 ( x )  αt 2h2 ( x )  αt 3h3 ( x ) 
                                  Exercise : workout αt 2 , αt 3

                  Adaboost v.2a                                                           33
Now run the main training loop second time t=2




    Final classifier by
    combining three weak
    classifiers




                           Adaboost v.2a     34
Exercise1

   if example ==1
        blue(*)=[ -26 38
           3 34
           32 3
           42 10];
        red(O)=[ 23 38
           -4 -33
           -22 -25
           -37 -31];
        datafeatures=[blue;red];
        dataclass=[ -1 -1 -1 -1 1 1 1 1 ];

                          Adaboost v.2a       35
Exercise1 , initialized, t=1






                  Adaboost v.2a   36
                                          Step1
     Exercise1 , t=1                                    n
                                                                                                                   1 if ht ( xi )  yi 
     h1(upper half =*, lower= o)          error  t   Dt (i ) * I ht ( xi ) yi  , where I ht ( xi ) yi   
                                                      i 1                                                        0        otherwise
   Weak classifier h1(upper half =*,      Step1b : checking step : prerequisite : εt  0.5, otherwisestop.
    lower= o)                                      1 1  εt
    We see that Feature(5) is wrongly Step2 : t  2 ln ε
    classified, 1 sample is wrong                        t


   err =ε(t)=D(t)*1,
   ε(t) =0.125
   Alpha=α=0.5*log[1- ε(t) )/ ε(t)]
   =0.973
   Find next D(t+1)
    =D(t)*exp(α*(h(x)≠y)
   I.e. Incorrect=Dt+1(i)=Dt(i)*exp(α)
   D(5)=0.125*exp(0.973)
   =0.3307 (not normalized yet)
   Correct=Dt+1(i)=Dt(i)*exp(-α)
   D(1)=0.125*exp(-0.973)=0.0472
    (not normalized yet)                 h1( )
   ------------
   Z=(7*0.0472+0.3307)=0.6611
   After normalization,D at t+1
   D(5)=0.3307 / Z=0.5002
   D(1)=D(2)..etc =0.0472 / Z=0.0714
                                             Adaboost v.2a                                                                             37
                                                                                  1 n
                              Step4 : Current total cascadedclassifier error CEt   S t , α , h (xi )
    Example 1,                                                                    n i 1
                                                t
                                                                        1 if y  signo (i ) 
    Result at t=1 output o ( x )   α h (x ), and S t , α , h (x )  
                                       t   i              i              i
                                                                                        i         t

                                                1                     0        otherwise


    ##display result t_step=1 ## O_=cascaded_sum, S_=sign(O_),Y=train_class,CE=classification
     error##
    >i=1, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=-1, CE_=0
    >i=2, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=-1, CE_=0
    >i=3, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=-1, CE_=0
    >i=4, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=-1, CE_=0
    >i=5, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=1, CE_=1
    >i=6, a1*h1(xi)=0.973, O_=0.973, S_=1.000, Y_=1, CE_=0
    >i=7, a1*h1(xi)=0.973, O_=0.973, S_=1.000, Y_=1, CE_=0
    >i=8, a1*h1(xi)=0.973, O_=0.973, S_=1.000, Y_=1, CE_=0
    >weak classifier specifications:
    -dimension: 1=vertical :direction:1=(left="blue_*", right="red_O"); -1=(reverse direction of 1)
    -dimension: 2=horizontal:direction:1=(up="red_O", down="blue_*"); -1=(reverse direction of 1)
    >#-new weak classifier at stage(1):dimension=2,threshold=-25.00;direction=-1
    >Cascaded classifier error up to stage(t=1)for(N=8 training samples) =[sum(CE_)/N]= 0.125




                                               Adaboost v.2a                                          38
     Exercise1 , t=2
                                              Step1
   Weak classifier h1(left =o, eight= *)                      n
                                                                                                                       1 if ht ( xi )  yi 
    :Feature(1),(2) are wrongly classified,    error  t   Dt (i ) * I ht ( xi ) yi , where I ht ( xi ) yi   
    2 samples are wrong.                                   i 1                                                       0        otherwise
                                               Step1b : checking step : prerequisite : εt  0.5, otherwisestop.
   err =ε(t)=Dt(1)+Dt(2)=0.0714+0.0714=
                                                                1 1  εt
   ε(t) =0.1428                              Step2 :  t        ln
                                                                2    εt
   Alpha=α=0.5*log[1- ε(t) )/ ε(t)]=0.8961
   Find next D(t+1) =D(t)*exp(α*(h(x)≠y),
    ie.
   Incorrect=Dt+1(i)=Dt(i)*exp( α)
   D(1)=D(2)=0.0714*exp(0.8961)
   =0.1749 (not normalized yet)
   correct=Dt+1(i)=Dt(i)*exp(-α)
   D(7)=D(6)=D(3,)D=(4)=D(8)=0.071*ex
    p(-0.8961)=0.029
   Same for sample (7)(6)(3,)(4), but
   D(5)=0.5*exp(-0.8961)=0.2041
   Z=(2*0.1749
    +5*0.029+0.2041)=0.6989
   After normalization
   D at t+1, D(1)=D(2) = 0.1749
    /0.6989=0.2503
   D(5)= 0.2041 /0.6989=0.292
   D(8)= 0.029 / 0.6989=0.0415
                                         Adaboost v.2a                                                                          39
                                                                                 1 n
                             Step4 : Current total cascadedclassifier error CEt   S t , α , h (xi )
    Example 1,                                t
                                                                                 n i 1
                                                                        1 if y  signo (i ) 
                  output o ( x )   α h (x ), and S t , α , h (x )                i         t

    Result at t=2                    t   i
                                              1
                                                        i            
                                                                        0
                                                                            i
                                                                                  otherwise

    ##display result t_step=2 ## O_=cascaded_sum,
     S_=sign(O_),Y=train_class,CE=classification error##
    >i=1, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, O_=-0.077, S_=-1.000, Y_=-1, CE_=0
    >i=2, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, O_=-0.077, S_=-1.000, Y_=-1, CE_=0
    >i=3, a1*h1(xi)=-0.973, a2*h2(xi)=-0.896, O_=-1.869, S_=-1.000, Y_=-1, CE_=0
    >i=4, a1*h1(xi)=-0.973, a2*h2(xi)=-0.896, O_=-1.869, S_=-1.000, Y_=-1, CE_=0
    >i=5, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, O_=-0.077, S_=-1.000, Y_=1, CE_=1
    >i=6, a1*h1(xi)=0.973, a2*h2(xi)=0.896, O_=1.869, S_=1.000, Y_=1, CE_=0
    >i=7, a1*h1(xi)=0.973, a2*h2(xi)=0.896, O_=1.869, S_=1.000, Y_=1, CE_=0
    >i=8, a1*h1(xi)=0.973, a2*h2(xi)=0.896, O_=1.869, S_=1.000, Y_=1, CE_=0
    >weak classifier specifications:
    -dimension: 1=vertical :direction:1=(left="blue_*", right="red_O"); -1=(reverse
     direction of 1)
    -dimension: 2=horizontal:direction:1=(up="red_O", down="blue_*"); -1=(reverse
     direction of 1)
    >#-new weak classifier at stage(2):dimension=1,threshold=23.00;direction=-1
    >Cascaded classifier error up to stage(t=2)for(N=8 training samples) =[sum(CE_)/N]=
     0.125

                                             Adaboost v.2a                                           40
Exercise1 , t=3






                  Adaboost v.2a   41
                                                                                     1 n
                                 Step4 : Current total cascadedclassifier error CEt   S t , α , h (xi )
    Example 1,                                     t
                                                                                     n i 1
                                                                        1 if y  signo (i ) 
                  output o ( x )   α h (x ), and S t , α , h (x )                      i            t

    Result at t=3                         t   i
                                                   1
                                                             i         
                                                                        0
                                                                                i
                                                                                  otherwise


     ##display result t_step=3 ## O_=cascaded_sum, S_=sign(O_),Y=train_class,CE=classification error##
    >i=1, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=-0.745, S_=-1.000, Y_=-1, CE_=0
    >i=2, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=-0.745, S_=-1.000, Y_=-1, CE_=0
    >i=3, a1*h1(xi)=-0.973, a2*h2(xi)=-0.896, a3*h3(xi)=0.668, O_=-1.201, S_=-1.000, Y_=-1, CE_=0
    >i=4, a1*h1(xi)=-0.973, a2*h2(xi)=-0.896, a3*h3(xi)=0.668, O_=-1.201, S_=-1.000, Y_=-1, CE_=0
    >i=5, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, a3*h3(xi)=0.668, O_=0.590, S_=1.000, Y_=1, CE_=0
    >i=6, a1*h1(xi)=0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=1.201, S_=1.000, Y_=1, CE_=0
    >i=7, a1*h1(xi)=0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=1.201, S_=1.000, Y_=1, CE_=0
    >i=8, a1*h1(xi)=0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=1.201, S_=1.000, Y_=1, CE_=0
    >weak classifier specifications:
    -dimension: 1=vertical :direction:1=(left="blue_*", right="red_O"); -1=(reverse direction of 1)
    -dimension: 2=horizontal:direction:1=(up="red_O", down="blue_*"); -1=(reverse direction of 1)
    >#-new weak classifier at stage(3):dimension=1,threshold=3.00;direction=1
    >Cascaded classifier error up to stage(t=3)for(N=8 training samples) =[sum(CE_)/N]= 0.000




                                                  Adaboost v.2a                                               42
Exercise1 , strong classifier






The strong
classifier




                  Adaboost v.2a   43
Test result, example1



                                                                        The final strong classifier
                                                                                    T         
                                                                        H(x)  sign  αt ht(x)
                                                                                    t 1      




CEt



                                                        1 n
    Step4 : Current total cascadedclassifier error CEt   S t , α , h (xi )
                                                        n i 1
                                           t             
                            1 if yi  sign  α h (xi ) 
    S t , α , h (xi )                  1          
                             0
                                              otherwise
                                                        Adaboost v.2a                                 44
Appendix




           Adaboost v.2a   45
Theory
   We first make up a measurement function called “Exponential
    Loss function” to measure the strength of a strong classifier.
       Exponential Loss function L (H) =a measurement of the
        misclassification rate of a strong classifier H .
   yiH(xi)=+1 ( correctly classified)
   yiH(xi)=-1 ( incorrectly classified)
   A good Strong classifier should have low L(H)
                                               T             
        For a strong classifier H ( x )  sign  k hk ( x ) 
                                               k 1          
        Exponential Loss fun                         d
                               ction L(H) is define as
                  n                           n
                              1
        L( H )                            e ( yi H ( xi ))
                 i 1   e( yi H ( xi ))      i 1

                                             Adaboost v.2a        46
Theory:
By definition, the weight update rule is
chosen to achieve adaptive boosting
   AdaBoost choose this weight update function
    deliberately
           Dt 1 (i )  Dt (i ) exp(  t yi ht ( xi ))
   Because,
   when a training sample is correctly classified, weight decreases
   when a training sample is in correctly classified, weight increases
   Some other systems may use different weight update formulas but
    with the same spirit (correctly classified samples will result in
    decreased weight, and vice versa) .


                                 Adaboost v.2a                            47
    Theory: part1a
Given
εt  Pr obability of the incorrect classification rate of the we classifier ht(x)
                                                               ak
                                                          1     1  t
selected at stage t. We want toprove t                    log
                                                          2       t
Define :
   
Loss function of H at stage t is LH t 
Loss function of H at stage t  1 is LH t  h ,
For simplicification α  αt , h  ht
Proof :
The Objective is : Find  to minimize LH t  h 
                    n
                                                                                     Dt (i ) exp( t yi ht ( xi ))
LH t  h    e  yi H t ( xi )h ( xi )  (see AdaBoost Step3 : Dt 1 (i )                                   )
                   i 1                                                                          Zt
                    n
LH t  h    e  yi H t ( xi )  e  yih ( xi )          (i )
                   i 1

                                      correct _ cases : { yi  h( xi ) hence yi h ( xi )  1 cases} 
                                                                                                     
                  all _ n _ samples
LH t  h              
                          i 1
                                                                                                      
                                      incorrect _ cases : { yi  h ( xi ) hence yi h( xi )  1 cases}
                                                                                                      
                                                                                                       
                                                         Adaboost v.2a                                             48
Theory : part2a                                                For simpli cification α  αt , h  ht
                           correct _ cases : { yi  h( xi ) hence yi h( xi )  1 cases} 
                                                                                        
                                all _ n _ samples
    LH t  h  
                     i 1
                                                                                          
                           incorrect _ cases : { yi  h( xi ) hence yi h( xi )  1 cases}
                                                                                          
                                                                                           
    Put the above formula to(i)
                                     e                                            e                                                       

    LH t  h                                  yi H t ( xi )
                                                                      e  ( 1)                             yi H t ( xi )
                                                                                                                                 e  ( 1)
                                 yi  h ( xi )                                                yi  h ( xi )

    LH t  h   e                           e
                                            yi  h ( xi )
                                                              yi H t ( xi )
                                                                                e   e
                                                                                         

                                                                                                yi  h ( xi )
                                                                                                                 yi H t ( xi )
                                                                                                                                           (ii)
    In (ii)       e
              yi  h ( xi )
                               yi H t ( xi )
                                                ,  e
                                                     yi  h ( xi )
                                                                      yi H t ( xi )
                                                                                       are independent of α, so they are constants.
                                                                                         dLH t  h 
    To minimize LH t  h  in (ii), set                                                              0
                                                                                             d
    dLH t  h 
                  
          d
                        
     e    e  yi H t ( xi )  e                                e          yi H t ( xi )
                                                                                                      0
              yi  h ( xi )                                        yi  h ( xi )



    e            e
              yi  h ( xi )
                               yi H t ( xi )
                                                 
         
    e           e          yi H t ( xi )
                                                                   (iii)
              yi  h ( xi )


                                                                                                       Adaboost v.2a                                        49
   Theory : part3a
                                                                             This is because of step 3 of previous
For simpli cification α  αt , h  ht                                        stage (t-1) and step1b of current stage t
Take log of both sides of (iii) ,                                             So,

                    e                        yi H t ( xi )
                                                                                                                  e            
                                                                                                                                 yi H t ( xi )


    e 
                    y h ( x )                                             1 ε 
                                                                                                               yi  h ( xi )
                                                                                                                                                                                  e          yi H t ( xi )
                                                                                                                                                                                                                 / Z
log     log  i i  yi H t ( xi )                                                          e          yi H t ( xi )
                                                                                                                               e                 yi H t ( xi )
                                                                                                                                                                       
                                                                
                                                                                                                                                                                                                        t
    
                    e
                                                                                                                                                                               yi  h ( xi )
    e                                                                                    yi  h ( xi )                           yi  h ( xi )

                    yi h ( xi )                                                                           e          
                                                                                                                          yi H t ( xi )


                                   
                    x e yi Ht ( xi )                                      ε
                                                                                        e
                                                                                                       yi  h ( xi )
                                                                                                     yi H t ( xi )
                                                                                                                      e                   yi H t ( xi )
                                                                                                                                                                          e
                                                                                                                                                                       yi  h ( xi )
                                                                                                                                                                                        yi H t ( xi )
                                                                                                                                                                                                         / Z    t
                  y h ( )
log( e 2 )  log i i  yi H t ( xi )
                                                       
                                                                                    yi  h ( xi )                            yi  h ( xi )

                    e                                                        where Z t is the nomalization factor ,
                                                                                           e                             e                                   
                        yi  h ( xi )


                 e                           
                                                                                                        yi H t ( xi )                          yi H t ( xi )
                              yi H t ( xi )                                  Zt 
                                                                                       yi  h ( xi )                           yi  h ( xi )
   1         yi  h ( xi )
  log                                                 (iv)
                 e                           
                                                                             Put the above in (iv)
                              yi H t ( xi )
   2                                                                             1     1 
             yi  h ( xi )                                                     log
                                                                                 2      
recall : For the weak classifier ht(x)                                       For simplicification,we set α  αt , h  ht earlier
ε  Incorrect classification probability , hence So
1  ε  correct classification probability            1    1  t
                                                  t  log
                                                                                      2                  t
                                                                              Pr oved!


                                                                       Adaboost v.2a                                                                                                                       50
         Advanced topic: Viola Jones’
         implementation, compared with the
         original AdaBoost
                Also , they the y range is {1,0}
                 rather than {1,-1}
Viola _ Jone _ AdaBoost :                                                 Orginal _ AdaBoost
update the weights:                                                       Re call :
wt 1,i  wt ,i  1ei                                                                           Dt (i ) exp( t yi ht ( xi ))
                                                                          Step3 : Dt 1 (i ) 
                                                                                                             Zt
where ei  0 if example xi is classfied correctlyand ei  1 otherwise,
                                                                          where Z t  normalization factor,so Dt is a probability distrubution
         ε
and β  t                                                                           M                             L
       1  εt                                                             Z t   correct _ weight   incorrrect _ weight
                                                                                    i 1                         i 1
Note (assume: wt is already normalized )                                     M                             L
                                                                            Dt(i)e-αt yi ht(xi )   Dt(i)eαt yi ht(xi )
                                              ε 
correct( ei  0) : wt 1,i  wt ,i   wt ,i  t  ,  decreased
                              10
                                             1 ε 
                                                                             i 1                         i 1

                                                    t                   Note :
incorrect ( ei  1) : wt 1,i  wt ,i   wt ,i , no _ change
                                       11
                                                                                                                        Dt (i ) exp( t yi ht ( xi )) Dt (i )e 1
                                                                          correct yi ht ( xi )  1 : Dt 1 (i )                                                   increase
                                                                                                                                    Zt                    Zt
                                                                                                                            Dt (i ) exp( t yi ht ( xi )) Dt (i )e 1
                                                                          incorrect yi ht ( xi )  1 : Dt 1 (i )                                                    decrease
                                                                                                                                        Zt                    Zt

                                                                    Adaboost v.2a                                                                                   51
Exercise2
   if example ==2
        blue=[ -46 18
           -30 -30
           -31 -19
           -8 15
           8 -45
           -22 2];
        red=[ 33 38
           30 10
           21 35
           1 19
           14 23
           37 -41];
   datafeatures=[blue;red];
        dataclass=[ -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 ];


                                          Adaboost v.2a   52
Exercise2 ,t=0






                 Adaboost v.2a   53
Face detection idea

   1) in Adaboost use parallel-axis (tree
    decision) classifier
    2) in Viola Jones, the weak classifier is the
    specially designed classifier described in the
    paper.




                        Adaboost v.2a                54
Useful Features Learned by Boosting




                Adaboost v.2a         55
A Cascade of Classifiers
will be discussed in the next chapter




                   Adaboost v.2a        56
Reference
   [Chen 007] Qing Chen, Nicolas D. Georganas and Emil M. Petriu,” Real-Time Vision-
    Based Gesture Recognition Using Haar-like Features”, IMTC 2007, Warsaw, Poland,
    May 1-3, 2007
   [smyth 2007] : slides: smyth, “Face Detection using the Viola-Jones Method” slide:
    http://www.ics.uci.edu/~smyth/courses/cs175/slides12_viola_jones_face_detection.pp
    t
   [Deng 2007 ] slides: Hongbo Deng A brief introduction to adaboost, 6 Feb, 2007,
    sildes: http://dtpapers.googlecode.com/files/Hongbo.ppt
   [Freund ] slides: A tutorial on boosting , A Tutorial on Boosting A Tutorial on Boosting
    www.cs.toronto.edu/~hinton/csc321/notes/boosting.pdf
   [Hoiem 2004]: sildes: Derek Hoiem, Adaboost , March 31, 2004,
    http://www.cs.uiuc.edu/~dhoiem/presentations/Adaboost_Tutorial.ppt
   [Jensen 2008] Jensen , “Implementing the Viola-Jones Face Detection Algorithm”,
   http://orbit.dtu.dk/getResource?recordId=223656&objectId=1&versionId=1
   http://informatik.unibas.ch/lehre/ws05/cs232/_Folien/08_AdaBoost.pdf
   [Boris Babenko]: Boris Babenko , “Note: A Derivation of Discrete AdaBoost”,
    Department of Computer Science and Engineering,University of California, San Diego
    http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdf
   [Kroon 2010] http://www.mathworks.com/matlabcentral/fileexchange/27813-classic-
    adaboost-classifier


                                         Adaboost v.2a                                    57
Matlab demo

   [Kroon 2010]
    http://www.mathworks.com/matlabcentral/file
    exchange/27813-classic-adaboost-classifier
   http://people.csail.mit.edu/torralba/shortCours
    eRLOC/boosting/boosting.html




                       Adaboost v.2a              58
 Answer0: Exercise for Step1a,1b
{Step1a : Find the classfier ht : X  {1,1} that minimizes the error withrespect toDt
 Step1b : checking step : prerequisite : εt  0.5, otherwisestop.
    Assume h() can only be horizontal or
     vertical separators.
    How many different classifiers are
     available?
        Answer: because there are 12 training
         samples, we will have 11x2 vertical +
         11x2 horizontal classifies, total=44
                                           .


    If hj() is selected, circle the
     misclassified training samples. Find
     ɛ( ) to see misclassification probability
     if the probability distribution (D) for
     each sample is the same.                        hj()
        Answer=(1/12), 4 misclassified
         samples. ɛ=4*(1/12)
    Find h() with minimum error. Answer:
        ?? Repeat above, compare and find
         result.




                                               Adaboost v.2a                       59
Answer , Exercise 2, t=1






                Adaboost v.2a   60
Answer, Exercise 2, t=2





                Adaboost v.2a   61
Answer, Exercise 2, t=3






                Adaboost v.2a   62
Answer, Exercise2, strong classifier






                 Adaboost v.2a         63
Testing example 2               The final strong classifier
                                            T         
                                H(x)  sign  αt ht(x)
                                            t 1      






                Adaboost v.2a                                 64
Assignment
Find the strong classifier from this training data set. Write
clearly the types of h( ) (e.g. left=blue, right =red, threshold at
u or v etc) and value of ε and α of each stage t.
    if example ==3
         blue=[ -26 -18                            The final strong classifier
            -30 -30
                                                                T         
            31 -19                                 H(x)  sign  αt ht(x)
            -8 -15                                             t 1      
            -22 2];
         red=[ 33 38
            30 10
            21 35
            1 19
            37 -41];
         datafeatures=[blue;red];
         dataclass=[ -1 -1 -1 -1 -1 1
    1 1 1 1 ];

                                    Adaboost v.2a                                 65
AdaBoost Assignment , t=0






               Adaboost v.2a   66

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:52
posted:6/13/2012
language:
pages:66