n
(1)
& max
n
(2)
e
n
(2)
(2)
s.t.(w x
i
) (n
(1)
i
n
(2)
i
) = b; for i[y
i
= 1; (3)
(w x
i
) (n
(1)
i
n
(2)
i
) = b; for i[y
i
= 1; (4)
n
(1)
; n
(2)
_0; (5)
where e R
l
be vector whose all elements are 1, w and
b are unrestricted, n
i
(1)
is the overlapping and n
i
(2)
the
distance from the training sample x
i
to the discriminator
(w x
i
) = b (classication separating hyperplane). By
introducing penalty parameter C, D[0, MCLP has the
following version
min
n
(1)
;n
(2)
Ce
n
(1)
De
n
(2)
; (6)
s.t. (w x
i
) (n
(1)
i
n
(2)
i
) = b; for i[y
i
= 1; (7)
(w x
i
) (n
(1)
i
n
(2)
i
) = b; for i[y
i
= 1; (8)
n
(1)
; n
(2)
_ 0; (9)
The geometric meaning of the model is shown in Fig. 1.
A lot of empirical studies have shown that MCLP is a
powerful tool for classication. However, we cannot ensure
this model always has a solution under different kinds of
training samples. To ensure the existence of solution,
recently, Shi et al. proposed a RMCLP model by adding
two regularized items
1
2
w
Hw and
1
2
n
(1)
Qn
(1)
on MCLP
as follows more theoretical explanation of this model can
be found in [10]:
Fig. 1 Geometric meaning of MCLP
Neural Comput & Applic
1 3
min
z
1
2
w
Hw
1
2
n
(1)
Qn
(1)
Ce
n
(1)
De
n
(2)
; (10)
s.t. (w x
i
) (n
(1)
i
n
(2)
i
) = b; for i[y
i
= 1; (11)
(w x
i
) (n
(1)
i
n
(2)
i
) = b; for i[y
i
= 1; (12)
n
(1)
; n
(2)
_0; (13)
where z = (w
; n
(1)
; n
(2)
; b)
R
nll1
; H R
nn
;
Q R
ll
are symmetric positive denite matrices. Obvi-
ously, the regularized MCLP is a convex quadratic
programming.
Compared with traditional SVM, we can nd that the
RMCLP model is similar to the SVM model in terms of the
formation by considering minimization of overlapping of
the data. However, RMCLP tries to measure all possible
distances n
(2)
from the training samples x
i
to the separating
hyperplane, while SVM xes the distance as 1 (through
bounding planes (w x) b = 1) from the support vec-
tors. Although the interpretation can vary, RMCLP
addresses more control parameters than the SVM, which
may provide more exibility for better separation of data
under the framework of the mathematical programming. In
addition, different from the traditional SVM, the RMCLP
considers all the samples to solve the classication problem
and thus is insensitive to outliers.
3 Universum-regularized multiple-criteria linear
programming (U-RMCLP)
We rstly give the formal representation of classication
problem with Universum sample. Suppose, that training set
~
T consists of two parts:
~
T = T
[
U; (14)
where
T = (x
1
; y
1
); . . .; (x
l
; y
l
) (1
n
})
l
;
U = x
+
1
; . . .; x
+
u
1
n
;
(15)
with x
i
1
n
; y } = 1; 1; i = 1; . . .; l and x
+
j
1
n
;
j = 1; . . .; u. The goal is to derive the decision function
y = sgn(g(x)); (16)
to predict the label y corresponding to any sample x in 1
n
space.
3.1 The primal problem of U-RMCLP
Weston et al. [23] proposed the algorithm U-SVM, which
uses the e-insensitive loss for Universum:
1
2
[[w[[
2
2
c
X
l
i=1
u y
i
f
w;b
(x
i
)
d
X
u
j=1
q f
w;b
(x
+
j
)
h i
; (17)
where f
w,b
(x
i
) is a prediction function and u
e
[t[ =
max0; e t is the hinge loss function, prior knowledge
embedded in the Universum
q[t[ = q
e
[t[ q
e
[t[ (18)
is the e-insensitive loss. In this way, the prior knowledge
embedded in the Universum can be reected in the sum of
the losses:
P
u
j=1
q[f
w;b
(x
+
j
)[, The smaller this value, the
higher prior possibility this classier f
w,b
is, and vice versa
[24].
For U-RMCLP, in order to exploit the prior knowledge,
except for nding the hyperplane by (10)(13), we also
hope these Universum data to locate close to the hyper-
plane as soon as possible based on (18). Now, we choose
H, Q to be identity matrix, (10)(13) can be turned into the
following optimization problem
min
z
1
2
[[w[[
2
2
1
2
[[n
(1)
[[
2
2
Ce
n
(1)
De
n
(2)
(19)
E
X
u
s=1
(w
s
w
+
s
); (20)
s.t. y
i
((w x
i
) b) = n
(2)
i
n
(1)
i
; (21)
n
(1)
i
; n
(2)
i
_ 0; i = 1; . . .; l; (22)
e w
+
s
_(w x
+
s
) b _e w
s
; s = 1; . . .; u; (23)
w
s
; w
+
s
_0; s = 1; . . .; u; (24)
where w
(+)
= (w
1
; w
+
1
; . . .; w
u
; w
+
u
)
; z = w; b; n
(1)
; n
(2)
;
w
(+)
and C; D; E [0; [ are prior parameters.
By introducing its Lagrange function
L(H) =
1
2
[[w[[
2
2
1
2
[[n
(1)
[[
2
2
Ce
n
(1)
De
n
(2)
E
X
u
s=1
(w
s
w
+
s
)
X
l
i=1
a
i
y
i
((w x
i
) b) n
(1)
i
h
n
(2)
i
i
X
l
i=1
g
i
n
(1)
i
X
l
i=1
b
i
n
(2)
i
X
u
s=1
v
s
e w
s
[
((w x
+
s
) b)
X
u
s=1
c
s
w
s
X
u
s=1
l
s
(w x
+
s
) b
e w
+
s
X
u
s=1
c
+
s
w
+
s
;
(25)
where H = w; b; n
(1)
; n
(2)
; w
(+)
; a; b; g; l; m; c; c
+
; a =
(a
1
; . . .; a
l
)
; b = (b
1
; . . .; b
l
)
; g = (g
1
; . . .; g
l
)
; l =
Neural Comput & Applic
1 3
(l
1
; . . .; l
u
)
; m = (m
1
; . . .; m
u
); c = (c
1
; . . .; c
u
)
and c
+
=
(c
+
1
; . . .; c
+
u
)
;n
(1)
;n
(2)
;b;w
(+)
L(H) = 0; (27)
a; b; l; g; m; c; c
+
_0: (28)
From Eq. (27), we get
\
w
L = w
X
l
i=1
y
i
a
i
x
i
X
u
s=1
v
s
x
+
s
X
u
s=1
l
s
x
+
s
= 0; (29)
\
n
(1) L = n
(1)
i
C a
i
g
i
= 0; i = 1; . . .; l; (30)
\
n
(2) L = D a
i
b
i
= 0; i = 1; . . .; l; (31)
\
b
=
X
l
i=1
y
i
a
i
X
u
s=1
v
s
X
u
s=1
l
s
= 0; (32)
\
w
s
= E v
s
c
s
= 0; s = 1; . . .; u; (33)
\
w
+
s
= E l
s
c
+
s
= 0; s = 1; . . .; u: (34)
Substituting the above equations into problem (26)(28),
we get
max
a;l;v;g
1
2
X
l
i;j=1
a
i
a
j
y
i
y
j
(x
i
; x
j
)
1
2
X
u
s;t=1
(l
s
v
s
)(l
t
v
t
)(x
+
s
; x
+
t
)
X
l
i=1
X
u
s=1
a
i
y
i
(l
s
v
s
)(x
i
; x
+
s
)
1
2
X
l
i=1
(a
i
g
i
C)
2
e
X
u
s=1
(l
s
v
s
); (35)
s.t.
X
l
i=1
y
i
a
i
X
u
s=1
(l
s
v
s
) = 0; (36)
a
i
_D; g
i
_0; i = 1; . . .; l; (37)
0 _v
s
_E; s = 1; . . .; u; (38)
0 _l
s
_E; s = 1; . . .; u: (39)
According to the KKT conditions of dual problem (35)
(39), we have the following conclusion.
Theorem 1 Suppose that ^a = (^a
1
; . . .; ^a
l
)
; ^ l =
(^ l
1
; . . .; ^ l
l
)
; ^ v = (^ v
1
; . . .; ^ v
l
)
; ^g = (^g
1
; . . .; ^g
l
)
is a solu-
tion to the dual problem (35)(39). if there is ^a
j
(D; ); j = 1; . . .; l) or ^ l
m
(0; E); m = 1; . . .; E or
^ v
t
(0; E); t = 1; . . .; E. We will obtain the solution ( ^ w;
^
b)
to the primal problem (20)(24):
^ w =
X
l
i=1
y
i
a
i
x
i
X
u
s=1
(l
s
v
s
)x
+
s
; (40)
and
^
b =
X
l
i=1
y
i
a
i
(x
i
; x
j
)
X
u
s=1
(l
s
v
s
)(x
+
s
; x
j
); (41)
or
^
b = e
X
l
i=1
^ a
i
y
i
(x
i
; x
+
m
)
X
u
s=1
(^ v
s
^ l
s
)(x
+
s
; x
+
m
); (42)
or
^
b = e
X
l
i=1
^ a
i
y
i
(x
i
; x
+
t
)
X
u
s=1
(^ v
s
^ l
s
)(x
+
s
; x
+
t
): (43)
Now we are in a position to establish the following
algorithm:
The above discussion is restricted to the linear case.
Here, we will analyze nonlinear U-RMCLP by introducing
Gaussian RBF kernel function
K(x
1
; x
2
) = exp [[x
1
x
2
[[
2
=2r
2
; (44)
where r is a real parameter, and the corresponding
transformation:
x = U(x); (45)
where x H; H represents the Hilbert space. So, the
training set (14) turns to be
~
T
[
~
U = (U(x
1
); y
i
); . . .; (U(x
l
); y
l
)
[
U(x
+
1
); . . .; U(x
+
u
)
:
(46)
This leads to the following algorithm:
Neural Comput & Applic
1 3
4 Numerical experiment
Our algorithmcode was programmed in MATLAB2010. The
experiment environment: Intel Core I5 CPU, 2 GB memory.
The quadprog function with MATLAB is employed to solve
quadratic programming problem related to this paper.
To demonstrate the capabilities of our algorithm, we report
results on four datasets. They are respectively MNIST, UCI,
and TFDSdataset. In all experiments, out methodis compared
with the standard RMCLP and U-SVM.
There are many methods to collect Universum examples
in practise. However, the results of [23] show that U
Mean
method
1
is better than others. In this section, we will use it
to construct Universum examples.
The testing accuracies are computed using standard
tenfold cross validation . The linear kernel parameter C and
the RBF kernel parameter r are selected from the set
2
i
[i = 7; . . .; 7((C, D) in RMCLP and U-RMCLP
models are also selected in the same range) by tenfold
cross-validation on the tuning set comprising of random 10
% of the training data. Once the parameters are selected,
the tuning set was returned to the training set to learn the
nal decision function.
4.1 MNIST dataset
MNIST dataset is a handwritten digit dataset with samples
from 0 to 9. The size of each sample is 16 9 16 pixels,
the same as the literature [23]. We test on the 5 versus 8
classication problem in the case of linear kernel. The
results are showed in Tables 1 and 2.
4.2 UCI datasets
The Iris and Wine datasets are from the UCI machine
learning repository
2
. Table 3 gives a description of the two
datasets.
Our experiment is on the binary classication problem.
For Iris dataset, the class 1 (50 instances) and class 2 (50
instances) are selected and the number of Universum
examples randomly generated is 50. For Wine dataset, we
use the class 1 (59 instances) and class 3 (48 instances) for
classication, the number of Universum examples is 60.
Tables 4 and 5 show the experiment results in the case of
RBF kernel.
4.3 Application in trouble of moving freight car
detection system (called TFDS)
TFDS system is an intelligent system that integrates high
speed digital image collection, real-time processing for
mass of image data and recognizing the trains fault. It
plays an important role in the transportation safety eld. In
this section, we will apply our method to recognize the
brake shoe fault (see Fig. 2). Brake shoe is the key com-
ponent in the train braking system. The loss of brake shoe
will probably result in a serious accident. TFDS datasets
are collected in Changsha City, Hunan Province of China.
Table 1 U-MCLPs percentage
of tenfold testing accuracy for
5 and 8 datasets
The number of Universum
examples is 350
Method Training subset size
400 800 1,500 2,000 2,500
RMCLP 96.24 1.75 96.67 1.24 97.34 0.96 97.86 0.82 98.01 0.73
U-SVM 96.45 1.81 96.91 1.12 97.66 0.80 98.02 0.73 98.31 0.65
U-MCLP 96.51 1.67 96.87 1.21 97.82 0.79 98.44 0.76 98.67 0.54
Table 2 The percentage of
tenfold testing accuracy for 5
and 8 datasets for different
amounts of Universum data
Train examples Number of Universum examples
500 1,000 2,000 4,000 6,000
2,500 98.21 0.87 98.67 0.62 98.81 0.43 98.98 0.41 99.21 0.33
Table 3 Description of Iris and Wine datasets
Name Dimension (N) Number of
classes (K)
Number of
examples (L)
Iris 4 3 150
Wine 13 3 178
1
Each Universum examples is generated by selecting two data from
two different categories and then combined with a mean coefcient.
2
UCI repository of machine learning databases.University of Cali-
fornia. http://www.ics.uci.edu/*mlearn/MLRepository.html.
Neural Comput & Applic
1 3
Fig. 3 The results of Adaboost
detection. a and b Normal brake
shoes; c and d Universum brake
shoes; e and f fault brake shoes
Table 4 The percentage of
tenfold testing accuracy for Iris
datasets
Method Training subset size
20 40 60 80 100
RMCLP 87.12 3.51 88.08 2.78 92.46 1.68 93.96 1.42 94.41 1.11
U-SVM 88.21 3.24 89.71 2.34 93.31 1.63 94.12 1.32 95.21 0.85
U-MCLP 89.24 3.67 90.77 2.12 93.56 1.21 95.24 1.08 96.49 0.74
Table 5 The percentage of
tenfold testing accuracy for
Wine datasets
Method Training subset size
30 50 70 90 107
RMCLP 78.12 5.54 82.28 3.88 87.66 2.86 93.69 2.22 95.11 1.61
U-SVM 81.25 4.27 84.51 3.54 91.21 2.36 95.42 2.62 97.31 0.75
U-MCLP 79.25 4.68 83.87 3.63 90.16 2.54 95.24 2.78 96.79 0.81
Fig. 2 Different states of brake
shoe. a Normal brake shoe,
b uncertain brake shoe, c fault
brake shoe
Neural Comput & Applic
1 3
Figure 2 shows the brake shoes image: (a) a normal brake
shoe, (c) the brake shoe has lost. The brake shoe showed in
(b) is in a special middle state. We can take these ones as
Universum data. Adaboost [28] is employed to detect the
brake shoes position in an image. Figure 3 shows the
detection result of Adaboost method. We use these results
as the training samples for recognizing brake shoes, which
are also randomly rotated between [-10, ?10] and
shifted between [-2, ?2] to generate ve virtual samples.
The size of them is 20 9 20 pixels and each dimension
value is normalized to [0, 1].
4.4 Discussion
From the experiment results, we can nd that U-RMCLP
which adds Universum to an existing training set outper-
forms the normal RMCLP. All these results almost show
that our method is superior to U-SVM. With the increase of
the number of Unversum samples, U-RMCLP can get
better performance (see Table 2 and the left of Fig. 4). The
performance of U-RMCLP is better than U-SVM in Iris
datasets (see Table 4), and lightly better in MNIST dataset
(see Table 1). For TFDS datasets (see the right of Fig. 4).
The error rate of U-RMCLP is lower than U-SVM when
the number of Universum data is \5,000. However, when
the number of Universum data is more than 5,000, the two
method have almost the same performance. This shows
Universum data play a leading role as the number of them
increase. Remarkably, the nal result shows the accuracy
of U-RMCLP is up to 91 % when the number of brake
shoes training sets is 2,500 and Universum examples is
10,000, satises the practical use.
5 Conclusion
In this paper, a new Universum-regularized multiple-crite-
ria linear programming (U-RMCLP) was proposed and
rstly applied to failure detection eld in TFDS. With the
help of Universum examples, the performance of U-
RMCLP in public datasets is better than original model. For
the trouble of brake shoes recognition, since there are a lot
of real Universum data in TFDS datasets, we do not need an
extra preparation for it. In this sense, methods which use
Universum data are more suitable than others. Remarkably,
the nal result shows the accuracy of U-RMCLP is up to 91
% when the number of brake shoes training sets is 2,500
and Universum examples is 10,000, and satises the prac-
tical use. In the future work, we will consider the semi-
supervised learning problem about U-RMCLP.
Acknowledgments This work has been partially supported by
grants from National Natural Science Foundation of China
(Nos.70921061, 11271361), the CAS/SAFEA International Partner-
ship Program for Creative Research Teams, Major International
(Ragional) Joint Research Project (No.71110107026), the President
Fund of GUCAS.
References
1. Vapnik VN (1995) The nature of statistical learning theory.
Springer, New York
2. Qi Z, Tian Y, Shi Y (2013) Robust twin support vector machine
for pattern classication. Pattern Recognit 46(1):305316
3. Qi Z, Tian Y, Shi Y (2012) Laplacian twin support vector
machine for semi-supervised classication. Neural Netw 35:46
53. doi:10.1016/j.neunet.2012.07.011
4. Qi Z, Tian Y, Shi Y (2012) Twin support vector machine with
universum data. Neural Netw 36C:112119. doi:10.1016/j.
neunet.2012.09.004
5. Fisher RA (1936) The use of multiple measurements in taxo-
nomic problems. Ann Eugen 7:179188
6. Mangasarian OL (2000) Generalized support vector machines.
Advances in Large Margin Classiers. MIT Press, Cambridge,
MA, 135146
7. Freed N, Glover F (1981) Simple but powerful goal programming
models for discriminant problems. Eur J Oper Res 7(1):4460
8. Freed N, Glover F (1986) Evaluating alternative linear pro-
gramming models to solve the two-group discriminant problem.
Decis Sci 17:151162
9. Olson D, Shi Y (2006) Introduction to business data mining.
IrwinMcGraw-Hill series: operations and decision sciences.
McGraw-Hill, New York
10. Shi Y, Tian Y, Chen X, Zhang P (2009) Regularized multiple
criteria linear programs for classication. Sci China Ser F Inf Sci
52(10):18121820
11. Qi Z, Shi Y (2012) Structural regular multiple criteria linear
programming for classication problem. Int J Comput Commun
Control 7(4):732742
0 500 1000 1500 2000 2500
0.1
0.15
0.2
0.25
0.3
Training size
M
e
a
n
e
r
r
o
r
s
RMCLP
USVM
URMCLP
2000 4000 6000 8000 10000
0.14
0.16
0.18
0.2
0.22
0.24
0.26
Number of Universum examples
M
e
a
n
e
r
r
o
r
s
USVM
URMCLP
Fig. 4 The results of brake
shoes recognition. Top the size
of the Universum data is xed to
be 600. Bottom the number of
the training samples is xed to
be 100. The number of positive
samples and negative samples is
equal in this experiment
Neural Comput & Applic
1 3
12. Qi Z, Tian Y, Shi Y (2012) Regular multiple criteria linear
programming for semi-supervised classication. OEDM (ICDM)
(to appear)
13. Qi Z, Tian Y, Shi Y (2012) Regularized multiple criteria linear
programming via linear programming. Proc Comput Sci
9:12341239
14. Qi Z, Tian Y, Shi Y (2012) Regularized multiple criteria second
order cone programming formulations. In: Proceedings of KDD,
DMIKM
15. Qi Z, Tian Y, Shi Y (2012) Multi-instance classication based on
regularized multiple criteria linear programming. Neural Comput
Appl. doi:10.1007/s00521-012-1008-0
16. Kou G, Peng Y, Shi Y, Wise M, Xu W (2005) Discovering credit
cardholders behavior by multiple criteria linear programming.
Ann Oper Res 135(1):261274
17. Shi Y, Peng Y, Xu W, Tang X (2002) Data Mining via multiple
criteria linear programming: applications in credit card portfolio
management. In: Proceedings of International Journal of Infor-
mation Technology and Decision Making, pp 131-151
18. Shi Y, Peng Y, Xu W, Tang X (2002) Data Mining via multiple
criteria linear programming: applications in credit card portfolio
management. Int J Inf Technol Decis Mak 1(1):131152
19. Zhang J, Zhuang W, Yan N (2004) Classication of HIV-1
mediated neuronal dendritic and synaptic damage using multiple
criteria linear programming. Neuroinformatics 2:303326
20. Kou G, Peng Y, Shi Y, Chen Z, Chen X (2004) A multiple-
criteria quadratic programming approach to network intrusion
detection. In: CASDMKM, pp 145153
21. Kwak W, Shi Y, Eldridge SW, Kou G(2006) Bankruptcy prediction
for japanese rms: using multiple criteria linear programming data
mining approach. Int J Bus Intell Data Min 1(4):401416
22. He J, Zhang Y, Shi Y, Huang G (2010) Domain-driven classi-
cation based on multiple criteria and multiple constraint-level
programming for intelligent credit scoring. IEEE Trans Knowl
Data Eng 22(6):826838
23. Weston J, Collobert R, Sinz F, Bottou L, Vapnik VN (2006)
Inference with the universum. In: Proceedings of the 23rd inter-
national conference on machine learning. ACM, pp 10091016
24. Zhang D, Wang J, Wang F, Zhang C (2008) Semi-supervised
classication with universum. In: SIAM international conference
on data mining (SDM), pp 323333
25. Chen S, Zhang C (2009) Selecting informative universum sample
for semi-supervised learning. IJCAI 6:10161021
26. Cherkassky V, Dhar S, Dai W (2011) Practical conditions for
effectiveness of the universum learning. IEEE Trans Neural Netw
22(8):12411255
27. Shen C, Wang P, Shen F, Wang H(2011) Uboost: boosting with the
universum. IEEE Trans Pattern Anal Mach Intell 34(4):825832
28. Viola P, Jones M (2001) Rapid object detection using a boosted
cascade of simple features. CVPR 1:511518
Neural Comput & Applic
1 3