Abstract—The sequential minimal optimization (SMO) algo- searches through the feasible region of the dual problem and
rithm has been extensively employed to train the support vector maximizes the objective function by selecting two Lagrange
machine (SVM). This work presents an efficient application multipliers and jointly optimizing them (with all others fixed)
specific integrated circuit chip design for sequential minimal
optimization. This chip is implemented as an intellectual property in each iteration. The power of this approach resides in the fact
core, suitable for use in an SVM-based recognition system on a that the optimization problem with two Lagrange multipliers ad-
chip. The proposed SMO chip was tested and found to be fully mits an analytical solution, eliminating the need to use an it-
functional, using a prototype system based on the Altera DE2 erative quadratic programming optimizer to solve it. Although
board with a Cyclone II 2C70 field-programmable gate array. the SMO algorithm makes SVM learning feasible for a large
Index Terms—Field-programmable gate array (FPGA), sequen- number of training samples, the required iterative computations
tial minimal optimization (SMO), support vector machine (SVM), still impose a heavy computational burden, especially for the
VLSI design.
standalone devices. This fact motivates the development herein
of a specific VLSI design for the SMO algorithm to accelerate
I. INTRODUCTION SVM learning.
Various VLSI designs for the SVM-based recognition system
have been presented [14]–[17]. Genov et al. [18] designed a
T HE support vector machine (SVM) is a new statistical ap-
proach and has recently attracted substantial interest in
various fields, including pattern recognition, machine learning,
mixed-signal VLSI chip that is dedicated to the most inten-
sive of SVM operations, which is the kernel evaluation over
large numbers of vectors of high dimensions. Kucher et al.
and bioinformatics [1]–[5]. An SVM learns by solving a con-
[19] adopted the margin propagation principle to design an
strained quadratic programming (QP) problem [6]–[8] whose
analog VLSI support vector machine that scales with scalable
size is equivalent to the number of training samples. Conven-
energy consumption without significant degradation of per-
tional methods [9] cannot be used for training an SVM with
formance. Manikandan et al. [20] proposed two schemes for
a large number of training samples, as the available memory
implementing a multi-class SVM-based recognition system in
cannot store all elements of a kernel-value matrix [10].
field-programmable gate array (FPGA). One exploits only logic
The sequential minimal optimization (SMO) algorithm pro-
elements while the other uses both soft-core processor and logic
posed by Platt [11] is an extensively utilized decomposition
elements. The main disadvantage of the aforementioned inves-
method [12], [13] for solving the QP problem. The basic con-
tigations is that they only perform kernel function evaluation or
cept of the SMO algorithm is the repetition of the following
decision function evaluation in the recognition phase. Anguita
two processes: 1) selecting a fixed number of variables and
et al. proposed an FPGA design for SVM learning [21]. The
2) solving the QP problem associated with the selected vari-
learning algorithm has two parts. In the first part, a recurrent
ables, until an optimal solution is found. The SMO algorithm
network is adopted to find the SVM parameters. A bisection
process is in the second part to compute the threshold. In [22],
Manuscript received August 08, 2010; revised November 16, 2010; accepted
Jiménez et al. used analog circuit and recurrent network to
December 30, 2010. Date of publication February 17, 2011; date of current ver- solve the QP problem in SVM learning. In [23], Genov et al.
sion March 12, 2012. This work was supported by the National Science Council extended their previous work [18] to design a hardware im-
under Grant NSC99-2218-E-006-001. plementation for performing online sequential SVM training.
T.-W. Kuan and J.-F. Wang are with the Department Electrical Engi-
neering, National Cheng-Kung University, Tainan 70101, Taiwan (e-mail: However, the main components in the above two investigations
gwam.davin@gmail.com; wangjf@csie.ncku.edu.tw). are based on analog circuit.
J.-C. Wang is with the Department of Computer Science and Information
Engineering, National Central University, Jhongli 32001, Taiwan (e-mail:
Papadonikolakis and Bouganis presented an FPGA design
jcw@csie.ncu.edu.tw). for nonlinear SVM training [34]. However, that investiga-
P.-C. Lin is with the Multimedia and Embedded System Design Laboratory, tion utilized conventional Gilbert’s algorithm rather than the
Department of Electronics Engineering and Computer Science, Tung-Fang In-
stitute of Technology, Kaohsiung 82941, Taiwan (e-mail: tony178.lin@gmail.
SMO—the state-of-the-art algorithm for SVM training. Cao et
com). al. [37] developed a digital design on SMO algorithm. The syn-
G.-H. Gu is with the Industrial Technology Research Institute, Hsinchu thesizable Verilog code was generated automatically from the
30011, Taiwan (e-mail: xquall@ms71.url.com.tw). Simulink Stateflow using Simulink HDL Coder. This high-level
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. design flow lacked dedicated-designed register transfer level
Digital Object Identifier 10.1109/TVLSI.2011.2107533 or circuit level architectures. Catanzaro et al. [38] adopted a
1063-8210/$26.00 © 2011 IEEE
674 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 4, APRIL 2012
A. SVM Learning
Support vector classification [24]–[26] is a computationally
efficient way of learning good separating hyperplanes in high-
dimensional feature space. Equation (1) indicates the dual pre-
sentation of an SVM primal optimization problem
subject to
(1)
TABLE I
PSEUDO-CODE PRESENTATION OF THE SMO ALGORITHM
(4)
B. SMO Algorithm
The SMO initially calculates the constraints on all multi- III. VLSI DESIGN OF SMO ALGORITHM
pliers, and then determines the constrained maximum. When
only two multipliers are to be optimized, the constraints can be A. Overview
regarded as defining a 2-D square with a diagonal line segment This section discusses the VLSI hardware implementation of
bounded on a box boarder. the SMO algorithm. Table I summarizes the SMO algorithm
Without loss of generality, assume two Lagrange using a pseudo code presentation. The first step is to initialize
multipliers from an old set of feasible solu- the parameters. The loop procedure comprises three main pro-
tions are to be optimized with cesses, which are to select two Lagrange multipliers, following
the initialization settings . From by their joint optimization and updating the SVM parameters.
, , the constraint in (1), In accordance with Table I, the SMO algorithm is partitioned
, where into three main processes and their corresponding VLSI mod-
. This equation will bound the optimization on a ules designed [28].
line, as shown in Fig. 2. In this presented design, the SMO algorithm is imple-
Given , and mented using fixed-point arithmetic rather than floating-point
, is substituted into the objection func- arithmetic. The distortion is caused by transforming the
tion (1). Through first and second derivatives by vanishing , floating-point format into the fixed-point format. To reduce
is obtained in (5), here is the prediction error this effect, the required finite word length accuracy is firstly
analyzed by simulation. In this analysis, an 8.15 fixed-point
(5) format is used with 1 sign bit, 8 integer bits, and 15 fraction
bits.
where , , , To increase the convergence speed, five-interrupt signals are
, and . designed in the proposed VLSI design. Table II presents the
For , the unconstrained maximum point must be five-interrupt signals and their triggering conditions. If an in-
checked to determine whether it is in the feasible range. Accord- terrupt-triggering condition is satisfied, then the interrupt signal
ingly, the constrained maximum is determined by clipping the is thereafter reset to zero, and the Lagrange multiplier indices
unconstrained maximum to the end of the line segment. Equa- and will be changed for the next iteration. The detailed usage
tion (6) is the corresponding clipping function. Eventually, of each interrupt signal will be described in the latter sections.
can be computed from using (6) and (7) The following subsections discuss the circuit design of the
three processing modules.
if
if (6) B. Step 1-Preprocess (PreP) Module
if
The PreP module is to initialize most of parameters in SMO
algorithm. The circuit design of the PreP module is given in
where (7) Fig. 3. This module consists of several sub-blocks: linear kernel
function, address generator, learned function, , gamma, , ,
Platt comprehensively discussed the SMO algorithm [27]. , , , and a controller.
676 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 4, APRIL 2012
TABLE II
FIVE INTERRUPT SIGNALS WITH SATISFIED CONDITIONS
Fig. 4. Circuit design of the liner kernel function (step. 1.1 in Table I).
Fig. 9. Circuit design for detecting R3 and performing enabled function (step.
2.2 in Table I).
Fig. 6. PreP FSM which has two states S0 and S1.
small, then this optimization process will take a very long time.
In this investigation, the threshold is set to .
Fig. 10. First step processing of SPU module design in step. 3.1 of Table I.
Fig. 11. Design of second step processing of SPU module (step. 3.2 in Table I).
(8)
In Fig. 14, an FSM with four states S0, S1, S2, and S3 is
designed in . The FSM iteratively checks if or
is true or not. If it is true, perdition error is set to zero.
Fig. 12. Example of kernel cache design corresponding to five training
samples.
Otherwise, condition is checked to decide whether
the prediction error is fetched from the error cache or not. If
is true, therefore fetching kernel cache to acquire the
To improve the kernel cache design, the symmetric property corresponding kernel value. If cache-miss occurs, the required
is utilized to reduce the required memory size by kernel value is recalculated.
680 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 4, APRIL 2012
Fig. 14. The FSM design in SPU and controlled by SMOC. Four statuses S0, S1, S2 and S3 are designed in SPU .
E. SMO Controller and FMMD speech datasets [30] are used for training and
testing. A total of ten speakers in SMD and FMMD are selected
SMOC is a finite state machine that controls SMO processes
for evaluation. The length of the training utterance of each
in the multiple modules that are presented in Fig. 1 and manages
speaker is 10 s. Each training vector is a 19-dimensional linear
five interrupt signals R0 R4. If one of the interrupt signals
predictive cepstral coefficient (LPCC) [31]. To implement
is triggered, then the Lagrange multiplier indices , change,
the one-versus-one recognition approach [33], [35], [36], a
and the SMO returns to the initial step. Next, a new joint SMO
total of 45 hyperplanes are learned from the utterances of ten
optimization process starts. The four states in SMOC are named
speakers. The penalty cost in (1) is set to 1. In the recognition
S0 S3. S0 is for initialization. S1 is for processing PreP; S2
phase, the two-class classification is performed by applying an
and S3 are for processing LMU and SPU, respectively.
SVM classifier to each LPCC frame and then combining the
outputs for target evaluation. The multi-class classification is
IV. EXPERIMENTAL RESULTS performed based on all the two-class classification results using
one-versus-one approach.
A. Software Simulation
Table I includes an SMO optimization loop, which ends when
Without loss of generality, SVM recognition and training all satisfy the KTT conditions. From the experiments, a certain
are realized and verified in speaker recognition application. number of iterations sufficed to provide the required recognition
Speaker identification and speaker verification are the two types accuracy. Accordingly, the proposed system did not require that
of speaker recognition task. In our experiments, a text-inde- all satisfy KTT conditions as the loop ending condition, but
pendent speaker identification system is examined. The SMD rather that one of the following two conditions be satisfied, i.e.,
KUAN et al.: VLSI DESIGN OF AN SVM LEARNING CORE ON SMO ALGORITHM 681
TABLE V
RESOURCE UTILIZATION OF CYCLONE II DE2-70
[24] D. Fradkin and I. Muchnik, “Support vector machines for classifica- Software Co-design on Speech Signal Processing”. His research interests
tion,” DIMACS Series Discrete Math. Theoretical Comput. Sci., vol. include multimedia signal processing including speech signal processing,
70, pp. 13–20, 2006. image processing, and VLSI system design. He has published about 130
[25] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy, journal papers on IEEE, SIAM, IEICE, and IEE and about 200 international
“Improvements to Platt’s SMO algorithm for SVM classifier design,” conference papers.
Neural Comput., vol. 13, pp. 637–649, 2001. Dr. Wang was a recipient of the Outstanding Research Awards and Out-
[26] R. F. Osuna and F. Girosi, “Support vector machines: Training and ap- standing Researcher Award from National Science Council in 1990, 1995,
plications,” Massachusetts Inst. Technol., Cambridge, AI Memo 1602, 1997, and 2006, respectively. He also received the Outstanding Industrial
1997b. Awards from ACER and Institute of Information Industry and the Outstanding
[27] J. C. Platt, “Sequential minimal optimization: A fast algorithm for Professor Award from Chinese Engineer Association, Taiwan, in 1991 and
training support vector machines,” Microsoft Research, Tech. Rep. 1996, respectively. He also received the Culture Service Award from the
MSR-TR-98-14, 1998. Ministry of Education, Taiwan, in 2008 and Distinguished Scholar Award of
[28] T. W. Kuan, J. F. Wang, J. C. Wang, and G. H. Gu, “VLSI design of KT Li from NCKU in 2009. He was also invited to give the Keynote Speeches
sequential minimal optimization algorithm for SVM learning,” in Proc. in PACLIC 12, in Singapore, in 1998, UWN, in 2005, in Taipei, WirelessCom
IEEE Int. Conf. Circuits Syst. (ISCAS), 2009, vol. 5, pp. 2509–2512. 2005, in HI, IIH-MSP2006, in Pasadena, CA, ISM2007, in Taichung, and PCM
[29] K. R. Müller, S. Mika, G. Räsch, K. Tsuda, and B. Schökopf, “An 2008, in Tainan, respectively. He also served as an Associate Editor on the
introduction to kernel-based learning algorithms,” IEEE Trans. Neural IEEE TRANSACTIONS ON NEURAL NETWORKS and the IEEE TRANSACTIONS
Netw., vol. 12, no. 2, pp. 181–201, 2001. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS and Editor-in-Chief
[30] J. F. Wang, T. W. Kuan, J. C. Wang, and G. H. Gu, “Ubiquitous and on International Journal of Chinese Engineering from 1995 to 2000.
robust text-independent speaker recognition for home automation dig-
ital life,” in Proc. 5th Int. Conf. Ubiquitous Intell. Comput. (UIC), Jun.
2008, vol. 5061, pp. 297–310.
[31] X. Huang, A. Acero, and H. W. Hon, “Spoken language processing: A Jia-Ching Wang (SM’09) received the M.S. and
guide to theory, algorithm and system development,” in All of Microsoft Ph.D. degrees from National Cheng Kung Univer-
Research, Redmond, WA. Englewood Cliffs, NJ: Prentice-Hall, 2001. sity (NCKU), Tainan, Taiwan, in 1997 and 2002,
[32] “Introduction to the Quartus II Software Handbook,” Altera, San Jose, respectively, both in electrical engineering.
CA, 2004. He is currently with the Department of Computer
[33] C. M. Bishop, Pattern Recognition and Machine Learning. London, Science and Information Engineering, National
U.K.: Springer, 2006. Central University, Jhongli, Taiwan, as an Assistant
[34] M. Papadonikolakis and C. S. Bouganis, “A scalable FPGA architec- Professor. His research interests include signal
ture for non-linear SVM training,” in Proc. Int. Conf. ICECE Technol., processing and associated VLSI architecture design.
Dec. 2008, pp. 337–340. He has published over 70 technical papers since
[35] K. S. Goh, E. Y. Chang, and B. Li, “Using one-class and two-class 1997. He has also obtained 3 U.S. and 4 R.O.C.
SVMs for multi-class image annotation,” IEEE Trans. Know Data invention patents.
Eng., vol. 17, no. 10, pp. 1333–1346, 2005. Dr. Wang is an Honor Member of Phi Tau Phi.
[36] K. Duan and S. S. Keerthi, “Which is the best multi-class SVM
method? An empirical study,” Control Div., Dept. Mechan. Eng., Nat.
Univ. Singapore, Singapore, Tech. Rep. CD-03-12, 2003.
[37] K. K. Cao, H. B. Shen, and H. F. Chen, “A parallel and scalable digital
architecture for training support vector machines,” J. Zhejiang Univ. Po-Chuan Lin received the M.S. and Ph.D. degrees
—Sci. C, vol. 11, no. 8, pp. 620–628, 2010. in electrical engineering from National Cheng Kung
[38] B. C. Catanzaro, N. Sundaram, and K. Keutzer, “Fast support vector University, Tainan, Taiwan, in 2000 and 2007,
machine training and classification on graphics processors,” in Proc. respectively.
25th Int. Conf. Mach. Learn., 2008, pp. 104–111. He is currently engaged in the research and devel-
opment of automatic minute/transcription generation
Ta-Wen Kuan (S’10) received the B.S. degree in system with the Multimedia and Embedded System
Design Laboratory, Department of Electronics Engi-
weapon system engineering from National Defense
University, Taoyuan, Taiwan, in 1991 and the M.S. neering and Computer Science, Tung-Fang Institute
of Technology, Kaohsiung, Taiwan. His research in-
degree in computer science and information engi-
neering from I-Shou University, Kaohsiung, Taiwan, terests include speech signal processing, VLSI archi-
in 2003. He is currently pursuing the Ph.D. degree tecture design, interactive robot design, and embedded system design.
Dr. Lin was a recipient of the Golden Silicon Award from the Macronix In-
from the Department of Electrical Engineering,
National Cheng-Kung University, Tainan, Taiwan. ternational Co., Ltd., in 2002. In 2008, he was selected as an invigilator for field
test of skills certification, skill category communication Techniques classes C
His research interests include VLSI architect de-
sign, speech signal processing, pattern recognition, certified technician by council of labor affairs, Executive Yuan. He is also a
member of ACLCLP.
and machine learning. His working career includes
MIS, DBMS, NMS, and C4ISR over ten years in MND and Navy.