Anda di halaman 1dari 18

A Practical Guide to SVM

Yihua Liao
Dept. of Computer Science
2/3/03

Outline
Support vector machine basics
GIST
LIBSVM (SVMLight)

Classification problems
Given: n training pairs, (<xi>, yi), where
<xi>=(xi1, xi2,,xil) is an input vector, and
yi=+1/-1, corresponding classification
H+ /H Out: A label y for a new vector x

Support vector machines

Goal: to find
discriminator
That maximize
the margins

A little math
Primal problem

Decision function

Example
Functional classifications of Yeast
genes based on DNA microarray
expression data.
Training dataset
genes that are known to have the same
Function f
genes that are known to have a different
function than f

Gist
http://microarray.cpmc.columbia.edu/gist/
Developed by William Stafford Noble
etc.
Contains tools for SVM classification,
feature selection and kernel principal
components analysis.
Linux/Solaris. Installation is
straightforward.

Data files
Sample.mtx (tab-delimited, same for testing)
gene
alpha_0X
alpha_7X
YMR300C -0.1
0.82
YAL003W 0.01
-0.56
YAL010C
-0.2
-0.01

alpha_14X
0.25
0.25
-0.01

alpha_21X
-0.51
-0.17
-0.36

Sample.labels
gene
YMR300C
YAL003W
YAL010C

Respiration_chain_complexes.mipsfc
-1
1
-1

Usage of Gist
$compute-weights -train
sample.mtx -class sample.labels >
sample.weights
$classify -train sample.mtx -learned
sample.weights -test test.mtx >
test.predict
$score-svm-results -test test.labels
test.predict sample.weights

Test.predict
# Generated by classify # Gist, version
2.0
.
gene
classification discriminant
YKL197C
-1
-3.349
YGL022W
-1
-4.682
YLR069C
-1
-2.799
YJR121W
1
0.7072

Output of score-svm-results
Number of training examples: 1644 (24
positive, 1620 negative)
Number of support vectors: 60 (14
positive, 46 negative) 3.65%
Training results: FP=0 FN=3 TP=21 TN=1620
Training ROC: 0.99874
Test results: FP=12 FN=1 TP=9 TN=801
Test ROC: 0.99397

Parameters
compute-weights

-power <value>
-radial -widthfactor <value>
-posconstraint <value>
-negconstraint <value>

Rules of thumb
Radial basis kernel usually performs
better.
Scale your data. scale each attribute
to [0,1] or [-1,+1] to avoid over-fitting.
Try different penalty parameters C
for two classes in case of unbalanced
data.

LIBSVM
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Developed by Chih-Jen Lin etc.
Tools for (multi-class) SV
classification and regression.
C++/Java/Python/Matlab/Perl
Linux/UNIX/Windows
SMO implementation, fast!!!

Data files for LIBSVM


Training.dat
+1 1:0.708333 2:1 3:1 4:-0.320755
-1 1:0.583333 2:-1 4:-0.603774 5:1
+1 1:0.166667 2:1 3:-0.333333 4:-0.433962
-1 1:0.458333 2:1 3:1 4:-0.358491 5:0.374429

Testing.dat

Usage of LIBSVM
$svm-train -c 10 -w1 1 -w-1 5 Train.dat
My.model
- train classifier with penalty 10 for class 1 and
penalty 50 for class 1, RBK

$svm-predict Test.dat My.model My.out


$svm-scale Train_Test.dat > Scaled.dat

Output of LIBSVM
Svm-train
optimization finished, #iter = 219
nu = 0.431030
obj = -100.877286, rho = 0.424632
nSV = 132, nBSV = 107
Total nSV = 132

Output of LIBSVM
Svm-predict
Accuracy = 86.6667% (234/270) (classification)
Mean squared error = 0.533333 (regression)
Squared correlation coefficient = 0.532639
(regression)

Calculate FP, FN, TP, TN from My.out

Anda mungkin juga menyukai