A Practical Guide to SVM by guy21

VIEWS: 1,756 PAGES: 18

									A Practical Guide to SVM

          Yihua Liao
  Dept. of Computer Science
            2/3/03
Outline
• Support vector machine basics
• GIST
• LIBSVM (SVMLight)
Classification problems
• Given: n training pairs, (<xi>, yi), where
 <xi>=(xi1, xi2,…,xil) is an input vector,
  and yi=+1/-1, corresponding
  classification H+ /H-
• Out: A label y for a new vector x
Support vector machines




                    Goal: to find
                    discriminator
                    That maximize
                    the margins
A little math

 • Primal problem




• Decision function
Example
• Functional classifications of Yeast
  genes based on DNA microarray
  expression data.
• Training dataset
  – genes that are known to have the same
    Function f
  – genes that are known to have a different
    function than f
Gist
• http://microarray.cpmc.columbia.edu/gist/
• Developed by William Stafford Noble
  etc.
• Contains tools for SVM classification,
  feature selection and kernel principal
  components analysis.
• Linux/Solaris. Installation is
  straightforward.
Data files
• Sample.mtx (tab-delimited, same for testing)
gene   alpha_0X alpha_7X     alpha_14X     alpha_21X   …
YMR300C -0.1     0.82        0.25            -
  0.51    …
YAL003W 0.01    -0.56        0.25              -
  0.17    …
YAL010C -0.2    -0.01        -0.01             -
  0.36    …
…
• Sample.labels
gene      Respiration_chain_complexes.mipsfc
YMR300C   -1
YAL003W   1
YAL010C   -1
Usage of Gist
• $compute-weights -train sample.mtx -
  class sample.labels > sample.weights
• $classify -train sample.mtx -learned
  sample.weights -test test.mtx >
  test.predict
• $score-svm-results -test test.labels
  test.predict sample.weights
Test.predict
# Generated by classify # Gist, version
  2.0
….
gene      classification discriminant
YKL197C     -1              -3.349
YGL022W -1                   -4.682
YLR069C     -1               -2.799
YJR121W      1                0.7072
Output of score-svm-results
Number of training examples: 1644 (24 positive,
  1620 negative)
Number of support vectors: 60 (14 positive, 46
  negative) 3.65%
Training results: FP=0 FN=3 TP=21 TN=1620
Training ROC: 0.99874
Test results: FP=12 FN=1 TP=9 TN=801
Test ROC: 0.99397
Parameters
• compute-weights
  – -power <value>
  – -radial -widthfactor <value>
  – -posconstraint <value>
  – -negconstraint <value>
  …
Rules of thumb
• Radial basis kernel usually performs
  better.
• Scale your data. scale each attribute
  to [0,1] or [-1,+1] to avoid over-fitting.
• Try different penalty parameters C
  for two classes in case of unbalanced
  data.
LIBSVM
• http://www.csie.ntu.edu.tw/~cjlin/libsvm/
• Developed by Chih-Jen Lin etc.
• Tools for (multi-class) SV
  classification and regression.
• C++/Java/Python/Matlab/Perl
• Linux/UNIX/Windows
• SMO implementation, fast!!!
Data files for LIBSVM
• Training.dat
+1 1:0.708333 2:1 3:1 4:-0.320755
-1 1:0.583333 2:-1 4:-0.603774 5:1
+1 1:0.166667 2:1 3:-0.333333 4:-0.433962
-1 1:0.458333 2:1 3:1 4:-0.358491 5:0.374429
…
• Testing.dat
Usage of LIBSVM
• $svm-train -c 10 -w1 1 -w-1 5 Train.dat
  My.model
- train classifier with penalty 10 for class 1 and penalty 50 for class –1, RBK

• $svm-predict Test.dat My.model My.out
• $svm-scale Train_Test.dat > Scaled.dat
Output of LIBSVM
• Svm-train
optimization finished, #iter = 219
nu = 0.431030
obj = -100.877286, rho = 0.424632
nSV = 132, nBSV = 107
Total nSV = 132
Output of LIBSVM
• Svm-predict
Accuracy = 86.6667% (234/270) (classification)
Mean squared error = 0.533333 (regression)
Squared correlation coefficient = 0.532639 (regression)
• Calculate FP, FN, TP, TN from My.out

								
To top