Microsoft PowerPoint - Ensemble by accinent

VIEWS: 20 PAGES: 3

• pg 1
```									                  Ensemble Learning                                            A Classifier Ensemble
• what is an ensemble?                                                                  Class Prediction
• why use an ensemble?
• selecting component classifiers                                                           Combiner
• selecting combining mechanism
Class Predictions
• some results
Classifier 1 Classifier 2                   ...   Classifier N

Input Features

CS 5751 Machine        Ensemble Learning                1   CS 5751 Machine                  Ensemble Learning                  2
Learning                                                    Learning

Key Ensemble Questions                                   Why Do Ensembles Work?
Which components to combine?                                Hansen and Salamon, 1990
• different learning algorithms                             If we can assume classifiers are random in
• same learning algorithm trained in different ways            predictions and accuracy > 50%, can push
• same learning algorithm trained the same way                 accuracy arbitrarily high by combining more
classifiers
How to combine classifications?
Key assumption: classifiers are independent in their
• majority vote                                                predictions
• weighted (confidence of classifier) vote                  • not a very reasonable assumption
• weighted (confidence in classifier) vote                  • more realistic: for data points where classifiers
• learned combiner                                             predict with > 50% accuracy, can push accuracy
What makes a good (accurate) ensemble?                         arbitrarily high (some data points just too hard)
CS 5751 Machine        Ensemble Learning                3   CS 5751 Machine                  Ensemble Learning                  4
Learning                                                    Learning

What Makes a Good Ensemble?                              Ensemble Mechanisms - Components
Krogh and Vedelsby, 1995                                    • Separate learning methods
Can show that the accuracy of an ensemble is                     – not often used
mathematically related:                                        – very effective in certain problems (e.g., protein folding,
Rost and Sander, Zhang)
ˆ
E = E −D                                                  • Same learning method
ˆ
E is the error of the entire ensemble                          – generally still need to vary something externally
E is the average error of the component classifiers               • exception, some good results with neural networks
– most often, data set used for training varied:
D is a term measuring the diversity of the components
• Bagging (Bootstrap and Aggregate), Breiman
Effective ensembles have accurate and diverse                       • Boosting, Freund & Schapire
components                                                                  – Ada, Freund & Schapire
– Arcing, Breiman
CS 5751 Machine        Ensemble Learning                5   CS 5751 Machine                  Ensemble Learning                  6
Learning                                                    Learning
Ensemble Mechanisms - Combiners                                                      Bagging
• Voting                                                  Varies data set
• Averaging (if predictions not 0,1)                      Each training set a bootstrap sample
• Weighted Averaging                                         bootstrap sample - select set of examples (with
– base weights on confidence in component                   replacement) from original sample
• Learning combiner                                       Algorithm:
– Stacking, Wolpert                                       for k = 1 to #classifiers
• general combiner                                      train´ = bootstrap sample of train set
– RegionBoosting, Maclin                                     create classifier using train´ as training set
combine classifications using simple voting
• piecewise combiner

CS 5751 Machine           Ensemble Learning          7    CS 5751 Machine                 Ensemble Learning                         8
Learning                                                  Learning

Schapire showed that a set of weak learners (learners     Varies weights on training data
with > 50% accuracy, but not much greater) could        Algorithm:
be combined into a strong learner                            for each data points: weight wi to 1/#datapoints
Idea: weight the data set based on how well we have            for k = 1 to #classifiers
predicted data points so far                                     generate classifierk with current weighted train set
– data points predicted accurately - low weight                 εk = sum of wi’s of misclassified points
– data points mispredicted - high weight                        βk = 1- εk / εk
multiply weights of all misclassified points by βk
Result: focuses components on portion of data space
normalize weights to sum to 1
not previously well predicted
combine: weighted vote, weight for classifierk is log(βk )
Q: what to do if εk = 0.0 or εk > 0.5?
CS 5751 Machine           Ensemble Learning          9    CS 5751 Machine                 Ensemble Learning                     10
Learning                                                  Learning

Boosting - Arcing                          Some Results - BP, C4.5 Components
Sample data set (like Bagging), but probability of           Dataset C4.5 BP BagC4 BagBP AdaC4 AdaBP ArcC4 ArcBP

data point being chosen weighted (like Boosting)              letter 14.0 18.0   7.0        10.5        4.1   5.7    3.9    4.6

mi = #number of mistakes made on point i by                 segment 3.7     6.6    3.0         5.4        1.7   3.5    1.5    3.3
previous classifiers                                     promoter 12.8 5.3       10.6        4.0        6.8   4.5    6.4    4.6
probability of selecting point i :
4
kr-vs-kp 0.6    2.3    0.6         0.8        0.3   0.4    0.4    0.3
1 + mi
probi = N                                                  splice 5.9   4.7    5.4         3.9        5.1   4.0    5.3    4.2
∑ j =01 + m j 4                                  breastc 5.0    3.4    3.7        3.4-        3.5   3.8-   3.5    4.0-

Value 4 chosen empirically                                   housev 3.6     4.9    3.6         4.1       5.0-   5.1-   4.8-   5.3-

Combine using voting
CS 5751 Machine           Ensemble Learning          11   CS 5751 Machine                 Ensemble Learning                     12
Learning                                                  Learning
Some Theories on Bagging/Boosting                                             Combiner - Stacking
Error = Bayes Optimal Error + Bias + Variance               Idea:
Bayes Optimal Error = noise error                              generate component (level 0) classifiers with part
Theories:                                                         of the data (half, three quarters)
Bagging can reduce variance part of error                   train combiner (level 1) classifier to combine
Boosting can reduce variance AND bias part of                  predictions of components using remaining data
error                                                     retrain component classifiers with all of training
Bagging will hardly ever increase error                        data
Boosting may increase error                              In practice, often equivalent to voting
Boosting susceptible to noise
Boosting’s increases margins
CS 5751 Machine          Ensemble Learning             13   CS 5751 Machine         Ensemble Learning          14
Learning                                                    Learning

Combiner - RegionBoost
• Train “weight” classifier for each component
classifier
• “weight” classifier predicts how likely point will
be predicted correctly
• “weight” classifiers: k-Nearest Neighbor,
Backprop
• Combiner, generate component classifier
prediction and weight using corresponding
“weight” classifier
• Small gains in accuracy

CS 5751 Machine          Ensemble Learning             15
Learning

```
To top