Study on Ensemble Learning

Document Sample

```					Study on Ensemble Learning

By Feng Zhou
Content
• Introduction
• A Statistical View of M3 Network
• Future Works
Introduction
• Ensemble learning:
–   To combine a group of classifiers rather than to design a new one.
–   The decisions of multiple hypotheses are combined to produce more
accurate results.
• Problems in traditional learning algorithms
–   Statistical Problem
–   Computational Problem
–   Representation Problem
• Related Works
– Resampling techniques: Bagging, Boosting
– Approaches for extending to multi-class problem:
One-vs-One, One-vs-All.
Min-Max-Modular                    (M3)      Network
(Lu, IEEE TNN 1999)

• Steps
– Dividing training sets. (Chen, IJCNN 2006; Wen, ICONIP 2005)
– Training pair-wise classifiers
– Integrating the outcomes (Zhao, IJCNN 2005)
• Min process
• Max process
0.1 0.5 0.7 0.2   
Min
0.1


Min


0.4 0.3 0.5 0.6            0.3        Max
0.3
0.8 0.5 0.4 0.2   
Min     0.2
0.5 0.9 0.7 0.3            0.3

Min
A Statistical View

• Assumption
– The pair-wise classifier outputs a probabilistic
value.
Sigmoid function (J.C. Platt, ALMC 1999):
1
P( | x) 
1  e Ax  B

• Bayesian decision theory
  arg max P( | x)
ˆ                                                                P( x |  ) P( )
where P( | x) 
 {  ,  }                             P( x |   ) P(  )  P( x |   ) P(  )
A Simple Discrete Example
P(w|x)

W+       W-

X1   1/2

X2   1/2      2/5

X3            2/5

X4            1/5
A Simple Discrete Example (II)
Pc0(w+|x=x2) = 1/3
Pc1(w+|x=x2) = 1/2
Pc2(w+|x=x2) = 1/2

Pc0 < min(Pc1,Pc2)
Classifier 0 (w+:w-)

Classifier 1 (w+:w1-)                   Classifier 2 (w+:w2-)
A More Complicated Example
•   When consider a new more
classifier, the evidence that
Information about         x belong to w+ is getting
w- is increasing          shrinking.

•   Pglobal(w+) < min(Ppartial(w+))

•   The one reporting the
minimum value contains
w- (Minimization principle)

•   If Ppartial(w+)=1, no
……                               contained.

Classifier 1 (w+:w1-)        Classifier 2 (w+:w2-)
Analysis
• For each classifier cij
P(i | x, i    )
j

P( x, i )
                                  M ij
P( x, i )  p( x,   )
j

• For each sub-positive class wi+
P (i | x, i    )
1
                           qi
1
j M      (n   1)
• For positive class w+
ij

P (  | x)
1
 1
1
 i 1  q  (n  1)
i
Analysis (II)
• Decomposition of a complex problem

• Restoration to the original resoluation
Composition of Training Sets
w+              w-
w1+   …    wn++ w1-   …    wn--

w1+
Have been used
w+   …

wn++

w1-                                         Not used yet

w-   …

wn--

Trivial set, useless
Another Way of Combination
w+              w-
w1+   …    wn++ w1-   …    wn--

w1+

w+   …

wn++
1
  ik    '
 (n  2)
w1-                                      q
M ki
1            1
ik M '   j M  (n  n  2)
w-   …                                                  ki           kj

wn--                                        Training and testing Time:

( n  * n  )  ( n   n  )
Experiments - Synthesis Data
Experiments – Text Categorization
(20 Newsgroup copus)
Experiments Setup
• Removing words :
stemming
stop
words < 30

• Using Naïve Bayes
as the elementary
classifier

• Estimating the
probability with a
sigmod function
Future Work
• Situation with consideration of noise
– The virtue of the problem:
To access the underlying distribution
     
– Independent parameters for the model:
   
n n
n n
– Constraints we get: (      )
2
– To obtain the best estimation.
Kullback-Leibler Distance (T. Hastie, Ann Statist 1998)
References
[1] T. Hastie & R. Tibshirani, Classification by pairwise coupling, Ann
Statist 1998.
[2] J. C. Platt, (Probabilistic outputs for support vector machines and
comparisons to regularized likelihood methods, ALMC 1999
[3] B. Lu & , Task decomposition and module combination based on
class relations a modular neural network for pattern classification,
IEEE Tran. Neural Networks, 1999
[4] Y. M. Wen & B. Lu, Equal Clustering Makes Min-Max Modular
Support Vector Machines More Efficient, ICONIP 2005
[5] H. Zhao & B. Lu, On efficient selection of binary classifiers for min-
max modular classifier, IJCNN 2005
[6] K. Chen & B. Lu, Efficient classification of multi-label and
imbalanced data using min-max modular classifiers, IJCNN 2006

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 3 posted: 6/24/2011 language: English pages: 16