# Support Vector Neural Training Metody Inteligencji Obliczeniowej dengue fever

Document Sample

```					  Support Vector Neural Training

Włodzisław Duch

Department of Informatics
Nicolaus Copernicus University, Toruń, Poland
School of Computer Engineering,
Nanyang Technological University, Singapore

ICANN Warsaw, Sept. 2005
Plan

• Main idea.
•   Support Vector Machines and active learning.
•   Neural Networks and Support Vectors
•   Pedagogical example
•   Results on real data
Main idea
• What data should be used for training?
Given conditional distributions P(X|C) for dengue fever for:
• World population.
• ASEAN countries.
• Singapore only.
• Choa Chu Kang only?
Which distributions should we use?
If we know that X is from Choa Chu Kang and P(X|C) is reliable
local knowledge should be used.

If X comes from region close to decision borders why use data
from regions far away?
Learning
• MLP/RBF: first fast MSE reduction, very slow later.

Typical MSE(t) learning curve: after 10 iterations almost all work is
done, but the final convergence is achieved only after a very long
What is going on?
Learning trajectories
• Take weights Wi from iterations i=1..K; PCA on Wi covariance
matrix captures 95-95% variance for most data, so error
function in 2D shows realistic learning trajectories.

Papers by
M. Kordos
& W. Duch

Instead of local minima large flat valleys are seen – why?
Data far from decision borders has almost no influence, the main
reduction of MSE is achieved by increasing ||W||, sharpening
sigmoidal functions.
Support Vectors
SVM gradually focuses on the training vectors near the
decision hyperplane – can we do the same with MLP?
Selecting Support Vectors
Active learning: if contribution to the parameter change is
negligible remove the vector from training set.
E  W          K
M k  X; W 
= -   Yk - M k  X; W  
2
Wij  -
Wij           k 1                              Wij
K
If the difference e W  X     Yk - M k  X; W 
k 1

is sufficiently small the pattern X will have negligible influence on
the training process and may be removed from the training.
Conclusion: select vectors with eW(X)>emin, for training.

2 problems: possible oscillations and strong influence of outliers.
Solution: adjust emin dynamically to avoid oscillations;
remove also vectors with eW(X)>1-emin =emax
SVNT algorithm
Initialize the network parameters W,
set e=0.01, emin=0, set SV=T.

Until no improvement is found in the last Nlast iterations do
• Optimize network parameters for Nopt steps on SV data.
• Run feedforward step on T to determine overall accuracy
and errors, take SV={X|e(X) [emin,1-emin]}.
• If the accuracy increases:
compare current network with the previous best one,
choose the better one as the current best
• increase emin=emin+e and make forward step selecting SVs
• If the number of support vectors |SV| increases:
decrease eminemin-e;
decrease e = e/1.2 to avoid large changes
XOR solution
Satellite image data
Multi-spectral values of pixels in the 3x3 neighborhoods in section
82x100 of an image taken by the Landsat Multi-Spectral Scanner;
intensities = 0-255, training has 4435 samples, test 2000 samples.

Central pixel in each neighborhood is red soil (1072), cotton crop
(479), grey soil (961), damp grey soil (415), soil with vegetation
stubble (470), and very damp grey soil (1038 training samples).
Strong overlaps between some classes.

System and parameters         Train accuracy Test accuracy
SVNT MLP, 36 nodes, a=0.5             96.5           91.3
kNN, k=3, Manhattan                    --            90.9
SVM Gaussian kernel (optimized)       91.6           88.4
RBF, Statlog result                   88.9           87.9
MLP, Statlog result                   88.8           86.1
C4.5 tree                             96.0           85.0
Satellite image data – MDS outputs
Hypothyroid data
2 years real medical screening tests for thyroid diseases, 3772 cases
with 93 primary hypothyroid and 191 compensated hypothyroid, the
remaining 3488 cases are healthy; 3428 test, similar class distribution.
21 attributes (15 binary, 6 continuous) are given, but only two of the
binary attributes (on thyroxine, and thyroid surgery) contain useful
information, therefore the number of attributes has been reduced to 8.

Method                           % train          % test
C-MLP2LN rules                   99.89            99.36
MLP+SCG, 4 neurons               99.81            99.24
SVM Minkovsky opt kernel         100.0            99.18
MLP+SCG, 4 neur, 67 SV           99.95            99.01
MLP+SCG, 4 neur, 45 SV           100.0            98.92
MLP+SCG, 12 neur.                100.0            98.83
MLP+backprop                     99.60            98.5
SVM Gaussian kernel              99.76            98.4
Hypothyroid data
Discussion
SVNT is very easy to implement, here only batch version
with SCG training was used.
First step only, but promising results.
Found smaller support vector sets than SVM;
may be useful in one-class learning;
speeds up training.

Problems:
possible oscillations, selection requires more careful analysis –
but oscillations help to explore the MSE landscape;
additional parameters – but rather easy to set;

More empirical tests needed.
Thank
you
for
lending
your
ears
...

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 3 posted: 1/25/2011 language: English pages: 16
How are you planning on using Docstoc?