ABSTRACT
Fuzzy twin support vector machine is an important machine learning method and it overcomes impact of noise and outlier data
on classification. However, this method still accomplishes minimization of empirical risk such that overfitting is easily
produced in the process of training. For solving them, a novel fuzzy twin support vector machine model is presented by
introducing regularized item. Classifier is obtained by using quadratic programming and over-relaxation method to solve the
model. Some UCI datasets are selected to conduct the experiments and validate the effectiveness of the proposed method.
Keywords: Twin support vector machine, structural risk, empirical risk, fuzzy membership
1. INTRODUCTION
Support vector machine is a machine learning method proposed by Vapnik et al [1]. Its theoretical foundation is the VC
dimension and structural risk minimization principle theory. Since then, researchers conducted the extensive research
and proposed a number of different support vector machine [2]-[4]. In order to effectively solve the support vector
machines sensitivity about noise or outliers, researchers presented fuzzy support vector machines or rough support
vector machines by introducing fuzzy theory or rough set theory into support vector machine[5]- [10]. It is seen that for
these support vector machines, in order to obtain the desired classification surface, a quadratic programming problem is
usually solved. When scale of data is very large, to solve quadratic programming problem is more difficult and requires
a lot of calculating time. To solve this problem, Fung et al. [11] proposed a proximal support vector machine PSVM
which obtained solution to the problem by solving a system of linear equations. After, Mangasarian et al. [12] proposed
generalized eigenvalues support vector machine GEPSVM based on PSVM and reducing the two parallel hyper-planes
constraints. As this method use all data given problem, Jayadeva et al [13] proposed a twin support vector machine
TWSVM which is construct two non-parallel super-planes such that each super-plane is closest to one of two classes
and as far as possible from the other class. An important difference between TWSVM and SVM is that TWSVM solves
two smaller sized quadratic programming problems, whereas SVM solves one larger quadratic programming problem.
So, the running time for TWSVM is reduced to 1/4 of the original SVM. Since two factors c1 and c2 for TWSVM can
only control empirical risk ratio, Peng [14] proposed v-TSVM by introducing the parameters v1 and v2 to control
support vector and margin error. For TWSVM only take into account the empirical risk and ignore the structural risk
issues, Shao et al. [15] proposed twin bounded support vector machine TBSVM by introducing a regularization and its
performance is superior to TWSVM. In addition, Li et al. [16] proposed fuzzy twin support vector machine on the basis
of v-TSVM by considering different data point impact on the hyper-plane. Although the twins bounded support vector
machine compared to traditional support vector machine reduces the computational complexity and improves the speed,
but it did not consider the different data samples roles. So, the effects of noise and outlier points are very large on
classification. In this paper, introducing fuzzy membership of sample and regularized item into twin support vector
machine, an improved fuzzy twin support vector machine is presented to further reduce the impact of noise and outlier
points on classification.
m n
Suppose that A R 1 and B R 2 represent +1 class and -1 class data samples, wherein each row of the matrix
A and B respectively represent a data sample, m1 and m2 respectively represent the number of the +1 class and -1 class
data samples, n is the dimension of data samples. Fuzzy twin support vector machine seeks two non-parallel hyperplanes f1 ( x ) ( w(1) )T x b (1) 0 and f 2 ( x) ( w(2) )T x b (2) 0 .
2.1 Fuzzy twin support vector machine FTSVM
Optimization problems for fuzzy twin support vector machine FTSVM are
Page 19
1
( Aw(1) e1b(1) )T ( Aw(1) e1b (1) ) c1S Ae2T
2
,
s.t. ( Bw(1) e2b(1) )T e2 , 0
min
w(1) ,b(1) ,
(1)
1
( Bw(2) e2b(2) )T ( Bw(2) e2b(2) ) c2 S B e1T
2
,
s.t. ( Aw(2) e1b(2) )T e1 , 0
min
w( 2) ,b( 2) ,
(2)
where SA and SB represent fuzzy membership of each type of sample points, e1 and e2 are vectors of ones of appropriate
dimension,
1 T
T
T
1 T
max e2 G ( H H ) G
2
,
s.t. 0 c1S A
(3)
1 T
T
T
1
T
max e1 H (G G ) H
2
.
s.t. 0 c2 S B
(4)
where H [ A e1 ] and G [ B e2 ] .
When the data is nonlinear separable, surfaces of classification are K(xT,CT)u(1)+b(1)=0 and K(xT,CT)u(2)+b(2)=0 by
introducing kernel matrix K(xT,CT)=(xT)(CT), where CT=[A B]T , K (x, y) is the kernel function. The following
optimization problem is solved for the surface of classification:
1
|| K ( A, C T )u (1) e1b (1) ||2 c1S Ae2T
2
s.t . ( K ( B, C T )u (1) e2b(1) )T e2 , 0
min
(5)
u (1) ,b (1) ,
1
|| K ( B, C T )u (2) e2b (2) ||2 c2 S B e1T
.
2
T
(2)
(2) T
s.t . ( K ( A, C )u e1b ) e1 , 0
min
(6)
u ( 2) ,b( 2) ,
1 T
T
T
1 T
max e2 R ( S S ) R
,
2
s.t. 0 c1S A
(7)
1 T
T
T
1 T
max e1 S ( R R ) S
.
2
s.t. 0 c2 S B
(8)
min
w(1) ,b(1) , 2 , 1
1
1
|| Aw(1) e1b (1) ||2 v1 1 S BT 2
2
l2
(9)
2 0, 1 0
Volume 4, Issue 8, August 2016
Page 20
w( 2 ) ,b( 2) ,1 , 2
1
1
|| Bw(2) e2b (2) ||2 v2 2 S AT 1
2
l1
s.t.
(10)
1 0, 2 0
Dual problem for (9) and (10) are
max
s.t.
max
s.t.
1
T G ( H T H )1 GT
2
S
,
0 B , e2T v1
l2
1
T H (GT G )1 H T
2
S
.
0 A , e1T v2
l1
(11)
(12)
When the data is nonlinear separable, surfaces of classification are K(xT,CT)u(1)+b(1)=0 and K(xT,CT)u(2)+b(2)=0 by
introducing kernel matrix K(xT,CT)=(xT)(CT), where CT=[A B]T , K (x, y) is the kernel function. The following
optimization problem is solved for the surface of classification:
1
1
|| K ( A, C T )u (1) e1b(1) ||2 v1 1 S BT 2
2
l2
min
u (1) ,b(1) , 2 , 1
(13)
(14)
2 0, 1 0
min
u ( 2 ) ,b( 2 ) ,1 , 2
s.t.
1
1
|| K ( B, C T )u ( 2) e2b (2) ||2 v2 2 S AT 1
2
l1
1 0, 2 0
Dual problem for (13) and (14) are
max
s.t.
max
s.t.
1
T R ( S T S ) 1 R T
2
S
0 B , e2T v1 ,
l2
1
T S ( R T R ) 1 S T
2
S
0 A , e1T v2 ,
l1
(15)
(16)
Page 21
min
w(1) ,b(1) , , *
s.t.
2
1
1
c3 (|| w(1) ||2 b(1) ) *T * c1S Ae2T
2
2
(1)
(1)
Aw e1b *
(17)
2
1
1
c4 (|| w(2) ||2 b(2) ) *T * c2 S B e1T
,b
, , *
2
2
(2)
(2)
s.t. Bw e2b *
min
( 2) ( 2 )
(18)
1 T
T
T
1 T
max e2 G(H H c3 I ) G
,
2
s.t. 0 c1S A
(19)
1 T
T
T
1
T
max e1 H (G G c4 I ) H
.
2
s.t. 0 c2 S B
(20)
Similarly, for v-FTSVM , an improved fuzzy twin support vector machine v-FTBSVM is presented and its optimization
problem are
2
1
1
1
c3 (|| w(1) ||2 b (1) ) *T * v11 S B e2T
2
2
l2
min
w(1) ,b(1) , , *
Aw(1) e1b(1) *
s.t.
( Bw(1) e2b(1) ) 1
(21)
0, 1 0
min
( 2) ( 2 )
,b
, , *
2
1
1
1
c4 (|| w(2) ||2 b(2) ) *T * v2 2 S Ae1T
2
2
l1
(22)
0, 2 0
Dual problem for (21) and (22) are
max
s.t.
max
s.t.
1
T G( H T H c3 I ) 1 GT
2
S
,
0 B , e2T v1
l2
1
T H (GT G c4 I )1 H T
2
S
.
0 A , e1T v2
l1
(23)
(24)
Page 22
min
u (1) ,b(1) , , *
min
(25)
2
1
1
c4 (|| u (2) ||2 b(2) ) *T * c2 S B e1T
2
2
T
(2)
(2)
s.t. K ( B, C )u e2b *
w(2 ) ,b( 2) , ,*
( K ( A, C T )u (2) e1b(2) ) e1 , 0 ,
(26)
Let S=[K(A,CT) e1], R=[K(B,CT) e2], then dual problem for (25) and (26) are
1 T
T
T
1 T
max e2 R( S S c3 I ) R
,
2
s.t . 0 c1S A
(27)
1 T
T
T
1 T
max e1 S ( R R c4 I ) S
.
2
s.t. 0 c2 S B
(28)
For v-FTBSVM, surfaces of classification are K(xT,CT)u(1)+b(1)=0 and K(xT,CT)u(2)+b(2)=0 by introducing kernel matrix
K(xT,CT)=(xT)(CT), where CT=[A B]T , K (x, y) is the kernel function. The following optimization problem is solved
for the surface of classification:
min
(1) (1)
,b
, , *
2
1
1
1
c3 (|| u (1) ||2 b (1) ) *T * v11 S B e2T
2
2
l2
K ( A, C T )u (1) e1b(1) *
s.t.
( K ( B, C T )u (1) e2b(1) ) 1
(29)
0, 1 0
min
(2) (2)
,b
, ,*
2
1
1
1
c4 (|| u (2) ||2 b(2) ) *T * v2 2 S Ae1T
2
2
l1
(30)
0, 2 0
T
Let S=[K(A,C ) e1],R=[K(B,C ) e2], then dual problem for (29) and (30) are
max
s.t.
1
T R( S T S c3 I ) 1 RT
2
S
,
0 B , e2T v1
l2
(31)
Page 23
max
s.t.
1
T S ( RT R c4 I )1 S T
2
S
.
0 A , e1T v2
l1
(32)
It can be seen that in the fuzzy support vector machine described above, we have only given optimization problems for
different support vector machines. As these optimization problems are convex quadratic programming corresponding to
Lagrange multipliers, so their solutions can easily obtained in MATLAB. In the following, we also use over-relaxation
method to solve them.
Number of
samples
Number of attributes
270
208
351
1372
768
683
506
440
310
197
14
60
35
6
9
11
15
9
7
24
The experimental results are shown in Table 2. As can be seen from the table, whether it is linear kernel function or
Gaussian kernel, accuracy for FTBSVM and v-FTBSVM are better than FTSVM and v-FTSVM, for example, for the
ionosphere data set, when using the linear kernel function and Gaussian kernel function, the accuracies for FTSVM are
74.93% and 92.58% respectively, while accuracies for FTBSVM are 75.14% and 92.64% respectively. Likely, for vFTSVM, accuracies are 74.96% and 92.32% respectively, while accuracies are 75.14% and 92.37% for v-FTBSVM
respectively. To better understand the experimental results in the case of Gaussian kernel, we compute the differences
of the accuracies of FTBSVM minus that of FTSVM, as shown in Figure 1(a). Likely, we also compute the differences
of the accuracies of v-FTBSVM minus that of v-FTSVM, as shown in Figure 1(b).
Table 2: Accuracy for different method
Data set
heart-statlog
sonar
ionosphere
banknote_authentication
diabetes
breast_cancer
Kernel
function
FTSVM
FTBSVM
v-FTSVM
v-FTBSVM
Linear kernel
85.562.84
85.603.12
85.564.45
85.594.21
Gauss kernel
85.653.44
85.653.44
84.814.32
84.814.32
Linear kernel
75.512.30
75.552.24
77.463.14
77.273.53
Gauss kernel
80.612.40
81.472.54
76.462.13
76.592.08
Linear kernel
74.933.60
75.142.36
74.964.50
75.143.46
Gauss kernel
92.583.10
92.642.61
92.321.31
92.371.26
Linear kernel
97.530.68
97.530.37
97.380.67
97.240.51
Gauss kernel
97.890.14
97.920.11
98.250.09
98.300.04
Linear kernel
64.434.07
64.533.14
65.093.16
65.172.59
Gauss kernel
77.224.51
77.244.49
65.14.05
67.283.47
Linear kernel
65.045.84
65.304.28
64.996.07
65.185.27
Page 24
housing
wholesale_customers
vertebral_column
parkinsons
Gauss kernel
65.025.81
65.245.69
65.64.17
65.653.47
Linear kernel
83.611.03
84.082.12
81.256.50
81.256.50
Gauss kernel
80.755.23
82.425.51
77.281.74
78.192.38
Linear kernel
85.911.60
85.881.32
84.311.14
83.612.17
Gauss kernel
83.640.68
83.670.74
84.090.91
85.110.61
Linear kernel
80.642.00
80.642.21
79.354.21
80.043.76
Gauss kernel
80.001.29
80.231.07
78.060.97
79.322.61
Linear kernel
79.161.50
79.251.43
72.501.00
72.630.98
Gauss kernel
85.501.16
85.411.39
76.831.01
77.170.81
(a)
(b)
Kernel
function
FTSVM
FTSVM-SOR
FTBSVM
FTBSVM-SOR
Linear kernel
85.562.84
85.923.60
85.603.12
85.603.12
Gauss kernel
85.653.44
85.724.32
85.653.44
85.754.29
Linear kernel
75.512.30
75.513.10
75.552.24
76.112.87
Gauss kernel
80.612.40
82.363.67
80.472.54
82.713.15
Linear kernel
74.933.60
78.044.25
75.142.36
78.213.28
Gauss kernel
92.583.10
93.162.12
92.642.61
93.972.06
Linear kernel
97.530.68
97.740.78
97.530.37
97.740.63
Gauss kernel
97.890.14
98.100.37
97.920.11
98.450.18
Linear kernel
64.434.07
64.657.54
64.533.14
66.276.23
Gauss kernel
77.224.51
77.604.82
77.244.49
77.650.32
Linear kernel
65.045.84
65.327.05
65.304.28
65.377.10
Gauss kernel
65.025.81
65.725.81
65.245.69
65.725.81
Linear kernel
83.611.03
84.183.75
84.082.12
84.213.78
Gauss kernel
80.755.23
85.391.73
82.425.51
85.501.01
Linear kernel
85.911.60
86.090.46
85.881.32
86.130.50
Gauss kernel
83.640.68
87.720.68
83.670.74
87.810.77
Linear kernel
80.642.00
80.961.93
80.642.21
80.941.87
Gauss kernel
80.001.29
80.351.65
80.231.07
80.411.71
Page 25
Linear kernel
79.161.50
74.161.96
79.251.43
79.931.05
Gauss kernel
85.501.16
85.505.83
85.411.39
85.505.83
(a)
(b)
Figure 2 Difference diagram of accuracy for FTSVM and FTBSVM using different method
In addition, we show the differences of the accuracies of FTSVM-SOR minus that of FTSVM, as shown in Figure 2(a).
Likely, we also show the differences of the accuracies of FTBSVM-SOR minus that of FTSVM, as shown in Figure
2(b). As can be seen from Figure 2, the Accuracies for FTSVM-SOR and FTBSVM-SOR are higher than FTSVM and
FTBSVM, respectively. Therefore, whether fuzzy twin support vector machine or improved fuzzy twin support vector
machines, as a whole, its accuracy has a certain degree of improvement using overrelaxation method to solve Lagrange
multiplier.
Table 4: Accuracies using different solved methods for v-FTBSVM and v-FTSVM
Kernel
function
v-FTSVM
v-FTSVM-SOR
v-FTBSVM
v-FTBSVM-SOR
Linear kernel
85.564.45
86.294.32
85.594.21
85.374.24
Gauss kernel
84.814.32
85.184.51
84.814.32
85.394.40
Linear kernel
77.463.14
77.464.50
77.273.53
77.464.50
Gauss kernel
76.462.13
78.273.46
76.592.08
78.273.46
Linear kernel
74.964.50
78.383.42
75.143.46
78.413.38
Gauss kernel
92.321.31
92.573.20
92.371.26
92.602.73
Linear kernel
97.380.67
97.740.56
97.240.51
97.740.63
Gauss kernel
98.250.09
99.050.10
98.300.04
99.050.10
Linear kernel
65.093.16
76.185.48
65.172.59
76.375.19
Gauss kernel
65.14.05
74.975.06
67.283.47
75.183.84
Linear kernel
64.996.07
65.067.64
65.185.27
65.327.05
Gauss kernel
65.64.17
65.716.25
65.653.47
65.776.19
Linear kernel
81.256.50
81.246.73
81.256.50
82.812.33
Gauss kernel
77.281.74
78.462.37
78.192.38
78.961.67
Linear kernel
84.311.14
85.221.36
83.612.17
85.831.44
Gauss kernel
84.090.91
87.270.72
85.110.61
87.720.68
Linear kernel
79.354.21
80.002.26
80.043.76
80.961.93
Gauss kernel
78.060.97
81.292.62
79.322.61
81.342.11
Linear kernel
72.501.00
72.672.30
72.630.98
74.391.51
Gauss kernel
76.831.01
77.006.33
77.170.81
77.006.33
Data set
heart-statlog
sonar
ionosphere
banknote_authentication
diabetes
breast_cancer
housing
wholesale_customers
vertebral_column
parkinsons
According to above method, we also obtain some experimental results for v-FTBSVM and v-FTSVM using traditional
solved method and iterative method, where v-FTBSVM-SOR and v-FTSVM-SOR denote that solution of Lagrangian
multiplier is used by iterative method, as shown in Table 4 and Figure 3. As seen in Figure 3, the use of successive over
relaxation method to solve the Lagrange multiplier for v-FTBSVM and v-FTSVM on most data sets improve the
classification accuracy.
Page 26
(a)
(b)
Figure 3 Difference diagram of accuracy for v-FTSVM and v-FTBSVM using different method
In order to better show the running time for the different algorithms, we conduct some experiments for FTBSVM,
FTBSVM-SOR, v-FTBSVM and v-FTBSVM-SOR, as shown in table 5. It can see that running times using iterative
method run faster than traditional method, such as heart-statlog data set, running times for FTSVM-SOR, v-FTSVMSOR, FTBSVM-SOR and v-FTBSVM-SOR is 0.19,0.13,0.19 and 0.12, respectively, while running times for FTSVM, vFTSVM, FTBSVM and v-FTBSVM is 0.42,0.23,0.42 and 0.23, respectively.
Table 5: Running time for different method
Time/s
Data set
vFTBSVM
heart-statlog
sonar
FTBSVM
v-FTBSVM
-SOR
FTBSVM
-SOR
v-FTSVM
FTSVM
v-FTSVM
-SOR
FTSVM
-SOR
0.23
0.42
0.12
0.19
0.23
0.42
0.13
0.19
0.22
0.37
0.07
0.18
0.24
0.39
0.06
0.18
ionosphere
0.7
0.68
0.25
0.25
0.72
0.71
0.25
0.28
banknote_authentication
79.26
78.27
30.72
30.39
78.43
77.96
31.43
30.65
diabetes
11.22
11.37
5.72
6.2
11.47
12.5
5.72
6.2
breast_cancer
6.03
6.68
1.73
1.74
6.03
6.68
1.73
1.74
housing
3.71
3.15
1.1
1.1
3.71
3.43
1.11
1.16
wholesale_customers
2.28
2.21
0.42
0.42
2.25
2.2
0.43
0.44
vertebral_column
0.92
0.92
0.19
0.18
0.9
0.9
0.19
0.18
parkinsons
0.24
0.32
0.05
0.05
0.24
0.32
0.05
0.05
Further, in order to evaluate the time performance of different algorithms in general, average run times are given for all
the experimental results in the selected data sets, as shown in Figure 4. As can be seen that running times for FTSVM,
v-FTSVM, FTBSVM and v-FTBSVM are approximately 2.5 times running time for FTSVM-SOR, v-FTSVM-SOR,
FTBSVM-SOR and v-FTBSVM-SOR, respectively. It shows that the fuzzy twin support vector machine based on
iterative method reduces the time computational complexity, to further improve the performance of time fuzzy twin
support vector machine.
Page 27
5. CONCLUSIONS
In order to achieve minimization of structural risk, an improved fuzzy twin support vector machine model by
introducing regularized item is obtain. For their dual problem, the solution is obtained by using traditional quadratic
programming solving method and over relaxation method. Experiments are conducted to validate the performance of
the proposed method in selected UCI datasets. In addition, performances are compared with fuzzy twin support vector
machine.
Acknowledgement
This work is supported by postgraduate innovation project of Hebei University under grant X2016056.
REFERENCE
[1] V. N. Vapnik. The nature of statistical learning theory, Springer, New York, 1995.
[2] O. L. Mangasarian, D. R. Musicant, Lagrangian support vector machines, Journal of Machine Learning
Research, 1, pp.161-177, 2001.
[3] B. Scholkopf, A. J. Smola, R. C. Williamson, et al, New support vector algorithms, Neural
Computation,12(5),pp. 12071245,2000.
[4] V. Bloom, I. Griva, B. Kwon, et al, Exterior-point method for support vector machines, IEEE Transactions on
Neural Networks and Learning Systems, 25(7),pp. 1390-1393,2014.
[5] C. F. Lin, S. D. Wang, Fuzzy support vector machines, IEEE Transaction on Neural Networks, 13(2), pp.464471,2002.
[6] Y. Q. Wang, S. Y. Wang and K. K. Lai, A new fuzzy support vector machine to evaluate credit risk, IEEE
Transaction on Fuzzy System, 13(6), pp. 820831, 2005.
[7] X. W. Yang, G. Q. Zhang and J. Lu, A kernel fuzzy c-means clustering-based fuzzy support vector machine
algorithm for classication problems with outliers or noises, IEEE Transactions on Fuzzy Systems, 19(1),
pp.105-115, 2011.
[8] J. H. Zhang, Y. Y. Wang, A rough margin based support vector machine, Information Sciences, 178, pp.22042214, 2008.
[9] Y. T. Xu, A rough margin-based linear support vector regression, Statistics and Probability Letters, 82(3),
pp.528-534,2012.
[10] D. G. Chen, Q. He and X. Z. Wang, FRSVMs: Fuzzy rough set based support vector machines, Fuzzy Sets and
Systems,161,pp. 596607,2010.
[11] G. Fung, O. L. Mangasarian, Proximal support vector machine classifiers, In Proceedings of the 7th
International Conference on Knowledge and Data Discovery, pp.7786,2001.
[12] O. L. Mangasarian, E. W. Wild, Multisurface proximal support vector classification via generalized
eigenvalues, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28,pp.69-74,2006.
[13] R. K. Jayadeva, R. Khemchandani and S. Chandra, Twin support vector machine for pattern classification,
IEEE Transactions on Pattern Analysis and Machine Intelligence, 29,pp.905-910,2007.
[14] X. J. Peng, A v-twin support vector machine v-TSVM classifier and its geometric algorithms, Information
Sciences, 180, pp.3863-3875,2010.
[15] Y. H. Shao, C. H. Zhang and X. B. Wang, Improvements on twin support vector machines, IEEE Transactions
on Neural Networks, 22,pp.962-9682011.
[16] K. Li, H.Y. Ma, A fuzzy twin support vector machine algorithm, International Journal of Application or
Innovation in Engineering & Management, 2(3),pp.459-465,2013.
[17] O. L. Mangasarian, D. R. Musicant, Successive overrelaxation for support vector machines, IEEE Transactions
on Neural Networks, 10(5),pp.1032-1037,1999.
[18] C. Blake, C. J. Merz, UCI repository for machine learning databases[EB/OL], IrvineCA University of
California, Department of Information and Computer Sciences. http//www.ics.uci.edu/mlearn/MLRepository.html.
AUTHOR
Kai Li received the B.S. and M.S. degrees in mathematics department electrical engineering
department from Hebei University, Baoding, China, in 1982 and 1992,respectively. He received the
Ph.D. degree from Beijing Jiaotong University, Beijing, China, in 2001.He is currently a Professor in
college of computer science and technology, Hebei University. His current research interests include
machine learning, data mining, computational intelligence, and pattern recognition
Page 28
Lifeng Gu received the bachelor degree in college of information science and technology from Hebei Agricultural
University, Baoding, Hebei, China, in 2011, and she is currently pursuing the M.E. degree in the college of computer
science and technology, Hebei University, Baoding, Hebei, China. Her research interests include machine learning,
data mining, and pattern recognition.
Page 29