VLSI Design of An SVM Learning Core On Sequential Minimal Optimization Algorithm

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO.
4, APRIL 2012 673
VLSI Design of an SVM Learning Core on Sequential

Minimal Optimization Algorithm
Ta-Wen Kuan, Student Member, IEEE, Jhing-Fa Wang, Fellow, IEEE, Jia-Ching Wang, Senior Member, IEEE,
Po-Chuan Lin, and Gaung-Hui Gu
Abstract—The sequential minimal optimization (SMO) algo- searches through the feasible region of the dual problem and
rithm has been extensively employed to train the support vector maximizes the objective function by selecting two Lagrange
machine (SVM). This work presents an efficient application multipliers and jointly optimizing them (with all others fixed)
specific integrated circuit chip design for sequential minimal
optimization. This chip is implemented as an intellectual property in each iteration. The power of this approach resides in the fact
core, suitable for use in an SVM-based recognition system on a that the optimization problem with two Lagrange multipliers ad-
chip. The proposed SMO chip was tested and found to be fully mits an analytical solution, eliminating the need to use an it-
functional, using a prototype system based on the Altera DE2 erative quadratic programming optimizer to solve it. Although
board with a Cyclone II 2C70 field-programmable gate array. the SMO algorithm makes SVM learning feasible for a large
Index Terms—Field-programmable gate array (FPGA), sequen- number of training samples, the required iterative computations
tial minimal optimization (SMO), support vector machine (SVM), still impose a heavy computational burden, especially for the
VLSI design.
standalone devices. This fact motivates the development herein
of a specific VLSI design for the SMO algorithm to accelerate
I. INTRODUCTION SVM learning.
Various VLSI designs for the SVM-based recognition system
have been presented [14]–[17]. Genov et al. [18] designed a
T HE support vector machine (SVM) is a new statistical ap-
proach and has recently attracted substantial interest in
various fields, including pattern recognition, machine learning,
mixed-signal VLSI chip that is dedicated to the most inten-
sive of SVM operations, which is the kernel evaluation over
large numbers of vectors of high dimensions. Kucher et al.
and bioinformatics [1]–[5]. An SVM learns by solving a con-
[19] adopted the margin propagation principle to design an
strained quadratic programming (QP) problem [6]–[8] whose
analog VLSI support vector machine that scales with scalable
size is equivalent to the number of training samples. Conven-
energy consumption without significant degradation of per-
tional methods [9] cannot be used for training an SVM with
formance. Manikandan et al. [20] proposed two schemes for
a large number of training samples, as the available memory
implementing a multi-class SVM-based recognition system in
cannot store all elements of a kernel-value matrix [10].
field-programmable gate array (FPGA). One exploits only logic
The sequential minimal optimization (SMO) algorithm pro-
elements while the other uses both soft-core processor and logic
posed by Platt [11] is an extensively utilized decomposition
elements. The main disadvantage of the aforementioned inves-
method [12], [13] for solving the QP problem. The basic con-
tigations is that they only perform kernel function evaluation or
cept of the SMO algorithm is the repetition of the following
decision function evaluation in the recognition phase. Anguita
two processes: 1) selecting a fixed number of variables and
et al. proposed an FPGA design for SVM learning [21]. The
2) solving the QP problem associated with the selected vari-
learning algorithm has two parts. In the first part, a recurrent
ables, until an optimal solution is found. The SMO algorithm
network is adopted to find the SVM parameters. A bisection
process is in the second part to compute the threshold. In [22],
Manuscript received August 08, 2010; revised November 16, 2010; accepted
Jiménez et al. used analog circuit and recurrent network to
December 30, 2010. Date of publication February 17, 2011; date of current ver- solve the QP problem in SVM learning. In [23], Genov et al.
sion March 12, 2012. This work was supported by the National Science Council extended their previous work [18] to design a hardware im-
under Grant NSC99-2218-E-006-001. plementation for performing online sequential SVM training.
T.-W. Kuan and J.-F. Wang are with the Department Electrical Engi-
neering, National Cheng-Kung University, Tainan 70101, Taiwan (e-mail: However, the main components in the above two investigations
gwam.davin@gmail.com; wangjf@csie.ncku.edu.tw). are based on analog circuit.
J.-C. Wang is with the Department of Computer Science and Information
Engineering, National Central University, Jhongli 32001, Taiwan (e-mail:
Papadonikolakis and Bouganis presented an FPGA design
jcw@csie.ncu.edu.tw). for nonlinear SVM training [34]. However, that investiga-
P.-C. Lin is with the Multimedia and Embedded System Design Laboratory, tion utilized conventional Gilbert’s algorithm rather than the
Department of Electronics Engineering and Computer Science, Tung-Fang In-
stitute of Technology, Kaohsiung 82941, Taiwan (e-mail: tony178.lin@gmail.
SMO—the state-of-the-art algorithm for SVM training. Cao et
com). al. [37] developed a digital design on SMO algorithm. The syn-
G.-H. Gu is with the Industrial Technology Research Institute, Hsinchu thesizable Verilog code was generated automatically from the
30011, Taiwan (e-mail: xquall@ms71.url.com.tw). Simulink Stateflow using Simulink HDL Coder. This high-level
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. design flow lacked dedicated-designed register transfer level
Digital Object Identifier 10.1109/TVLSI.2011.2107533 or circuit level architectures. Catanzaro et al. [38] adopted a
1063-8210/$26.00 © 2011 IEEE
674 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 4, APRIL 2012
Section IV elucidates the experimental results for proposed

FPGA implementation. Finally, Section V draws conclusions.
II. SVM LEARNING AND SMO ALGORITHM
A. SVM Learning
Support vector classification [24]–[26] is a computationally
efficient way of learning good separating hyperplanes in high-
dimensional feature space. Equation (1) indicates the dual pre-
sentation of an SVM primal optimization problem
subject to
(1)
where is a training sample, is the corre-

sponding target value, is the Lagrange multiplier, and is
a real value cost parameter.
Basically, it is a QP problem and can be solved using the
SMO algorithm. This presentation provides three advantages.
First, the kernel function need not be known explic-
itly, only the inner product between two points in feature space
needs to be determined. Second, the kernel function is a posi-
tive semi-definite function, and this property constrains the op-
Fig. 1. Proposed VLSI design for SMO algorithm.
timization problem: any local minima are also global, and given
a fixed tolerance, the solution can be found in a finite number of
steps [21]. The final advantage is that only the significant param-
graphics processing unit to perform SVM training. However, eters in the function have to be considered in the optimization
their investigation was based on a programmable processor, processes, including input samples, the alternative binary out-
which was not a dedicated chip. puts and Lagrange multipliers. In this paper, the linear kernel
In this paper, we propose an efficient ASIC solution to function is adopted for implementation.
the most popular SVM training algorithm—SMO. Since the KKT condition checking is the critical process for SMO al-
decision function evaluation is also embedded in the training gorithm to optimize two multipliers. An optimal point in (1)
process, the proposed VLSI architecture can perform both is obtained if and only if the KKT conditions are satisfied and
recognition and training with slight modification. Fig. 1 depicts is positive semi-definite. The KKT opti-
the proposed VLSI architecture which comprises an SMO mality conditions require that the product of a Lagrange multi-
Controller (SMOC), a memory block, a cache block, and plier and its corresponding constraint vanish. For
three main functional modules—a preprocess module (PreP),
a Lagrange multiplier updating (LMU) module, and an SVM (2)
parameter updating (SPU) module. SMOC is a finite-state
machine (FSM) sending controlling signals to handle all the where is a penalty parameter (slack variable) and is the
functional modules, memory block, and cache block. Herein Lagrange multiplier.
two methods are proposed to effectively use cache in order Three conditions in terms of must be considered by (2);
to improve training performance. The first method is to use they are , , and , as follows.
two characteristics with omitting calculation in linear kernel 1) For , and . Therefore,
function in order to exploit the cache sufficiently. Second, the .
property of cache-hit and cache-miss are adopted to diminish 2) For , so . Therefore,
the calculating loading in updating perdition error. This design .
is dedicated and heuristically checks the Karush-Kuhn-Tucker 3) For , so . Therefore,
(KKT) condition to find the optimal solutions. .
The rest of this paper is organized as follows. Section II The value of can be formulated as (3)
describes the SVM training and the SMO algorithm in de-
tail. Section III describes the proposed VLSI design of SMO
algorithm, including architecture for implementing the SMO al- (3)
gorithm and discusses an accelerated approach for fast training. where is the prediction error and .
KUAN et al.: VLSI DESIGN OF AN SVM LEARNING CORE ON SMO ALGORITHM 675
TABLE I
PSEUDO-CODE PRESENTATION OF THE SMO ALGORITHM
Fig. 2. Bounded line of two optimizing multipliers in a square: (a) if y 6= y

then 0 = ; (b) if y = y then + = , where is a constant.
The KKT condition can be summarized as (4), and this rep-

resentation simplifies the QP problem, making it easier to solve
(4)
B. SMO Algorithm
The SMO initially calculates the constraints on all multi- III. VLSI DESIGN OF SMO ALGORITHM
pliers, and then determines the constrained maximum. When
only two multipliers are to be optimized, the constraints can be A. Overview
regarded as defining a 2-D square with a diagonal line segment This section discusses the VLSI hardware implementation of
bounded on a box boarder. the SMO algorithm. Table I summarizes the SMO algorithm
Without loss of generality, assume two Lagrange using a pseudo code presentation. The first step is to initialize
multipliers from an old set of feasible solu- the parameters. The loop procedure comprises three main pro-
tions are to be optimized with cesses, which are to select two Lagrange multipliers, following
the initialization settings . From by their joint optimization and updating the SVM parameters.
, , the constraint in (1), In accordance with Table I, the SMO algorithm is partitioned
, where into three main processes and their corresponding VLSI mod-
. This equation will bound the optimization on a ules designed [28].
line, as shown in Fig. 2. In this presented design, the SMO algorithm is imple-
Given , and mented using fixed-point arithmetic rather than floating-point
, is substituted into the objection func- arithmetic. The distortion is caused by transforming the
tion (1). Through first and second derivatives by vanishing , floating-point format into the fixed-point format. To reduce
is obtained in (5), here is the prediction error this effect, the required finite word length accuracy is firstly
analyzed by simulation. In this analysis, an 8.15 fixed-point
(5) format is used with 1 sign bit, 8 integer bits, and 15 fraction
bits.
where , , , To increase the convergence speed, five-interrupt signals are
, and . designed in the proposed VLSI design. Table II presents the
For , the unconstrained maximum point must be five-interrupt signals and their triggering conditions. If an in-
checked to determine whether it is in the feasible range. Accord- terrupt-triggering condition is satisfied, then the interrupt signal
ingly, the constrained maximum is determined by clipping the is thereafter reset to zero, and the Lagrange multiplier indices
unconstrained maximum to the end of the line segment. Equa- and will be changed for the next iteration. The detailed usage
tion (6) is the corresponding clipping function. Eventually, of each interrupt signal will be described in the latter sections.
can be computed from using (6) and (7) The following subsections discuss the circuit design of the
three processing modules.
if
if (6) B. Step 1-Preprocess (PreP) Module
if
The PreP module is to initialize most of parameters in SMO
algorithm. The circuit design of the PreP module is given in
where (7) Fig. 3. This module consists of several sub-blocks: linear kernel
function, address generator, learned function, , gamma, , ,
Platt comprehensively discussed the SMO algorithm [27]. , , , and a controller.
TABLE II
FIVE INTERRUPT SIGNALS WITH SATISFIED CONDITIONS
Fig. 4. Circuit design of the liner kernel function (step. 1.1 in Table I).
cuit to evaluate the liner kernel function is depicted in Fig. 4.

The training samples are read sequentially and a multiplication
accumulator (MAC) is adopted to evaluate the value of the linear
kernel function. For a pair of training samples and , the cor-
responding linear kernel function values , , and are
calculated simultaneously by determining the two sample data
from two-PORT memory.
The heuristic method of picking two for optimization is
able to effectively accelerate SMO training [10]. In the non-
Fig. 3. Design of the PreP module (step. 1 in Table I).
heuristic method, SMO always optimizes two Lagrange multi-
pliers in each step. One of these two Lagrange multipliers may
The functions of these blocks are as follows. have previously violated the KKT condition before this step. In
1) Address generator: Calculating the memory addresses of the heuristic method, after a first is chosen, the second
the read data. in the inner loop looks for a non-boundary example that max-
2) Linear Kernel Function: Calculating , and at the imizes the difference between the prediction error .
same time. If , then non-boundary samples are scanned to find an
3) Learned Function: Calculating . example with minimum . Otherwise, if , an example
4) : Calculating . with maximum is chosen. Here, is determined from the
5) , , gamma: Calculating and from , , . prediction error cache or a learned function in accordance with
6) , , : Obtaining from prediction error cache the condition . The maximum error (max_err) and
or learned function subtracting the target value. Equation minimum error (min_err) can be found by updating the pre-
(3) is then used to get for KKT condition checking. diction errors in SPU, and the corresponding maximum point
7) Controller: FSM controls the operations in the PreP (max_point) and minimum point (min_point) can be thus found.
module. It switches between the setting/resetting states of To detect the maximal , an interrupt triggered condition
the following interrupt signals. R4 is controlled by SMOC. Fig. 5 displays the proposed circuit
a) If , then the interrupt signal R2 is set to to determine the maximal . The MSB signal selects
zero. the absolute value. The flag specifies the address. If
b) If , then the interrupt signal R1 is set to zero; , then is max_point; otherwise, , and
else it is set to one. is min_point. More details are presented in the subsequent SPU
c) If the KKT condition is satisfied, then the interrupt subsection.
signal R0 is set to zero; else, it is set to one. Fig. 6 presents the PreP FSM, which has two states S0 and
d) If the KKT condition is satisfied, then the interrupt S1. If is equivalent to , then R1 is triggered in S0; otherwise,
signal R0 is set to zero; else, it is set to one. the FSM enters the S1 state and sequentially checks the interrupt
The evaluation of the kernel function is the most time-con- triggered conditions that are related to R0, R4, and R2.
suming operation in the SMO algorithm. In the current design,
we develop a circuit module to evaluate linear kernel function. C. Step 2—Lagrange Multiplier Updating (LMU) Module
This circuit module can easily be modified if another type of The LMU module determines , clipped in (6) from the
kernel function is intended to be used [29]. The presented cir- unclipped Lagrange multiplier in (5) and bounds it in the
Fig. 5. Circuit to detect the maximal jE 0 E j .

Fig. 8. Clipping circuit design in the interior of LMU module.
Fig. 9. Circuit design for detecting R3 and performing enabled function (step.
2.2 in Table I).
Fig. 6. PreP FSM which has two states S0 and S1.
small, then this optimization process will take a very long time.
In this investigation, the threshold is set to .
D. Step 3—SVM Parameter Updating Module

The operation of the SVM parameter updating (SPU) module
comprises two steps. The first step updates , , and . The
second step updates prediction errors, finds the maximum
error (max_err) and minimum error (min_err) simultaneously.
Tables III and IV present the algorithms for these two steps,
respectively.
Fig. 7. Design of the LMU module design (step 2 in Table I). In the first step of SPU, , and are updated through
the operations that are described in Table III, in which the input
data and are derived from the LMU module. The up-
range from to . Fig. 7 displays the corresponding circuit dated values will be stored back to the same address in memory.
design of the LMU module. Fig. 10 displays the design of the circuit for the SPU first step.
The unclipping Lagrange multiplier is calculated first This circuit is composed of four primary blocks.
by (5). Specifically, (5) includes a division operation. Since The operations of each block are as follows.
the Lagrange multiplier is bounded between upper bound 1) and : Update , in terms of , ,
and lower bound , the following clipping approach must be and .
applied. 2) , , and : Calculate , , , and .
Fig. 8 shows the designed clipping circuit in the LMU 3) Updating: Fetch old from memory; update it into new
module. This circuit comprises two comparators and two mul- , and store back to memory.
tiplexers. Afterwards, is determined using the clipping 4) Controller: Control the operations using an FSM
circuit, can be obtained using The second step of the SPU module is to update all
. In particular, the will converge in a loop prediction errors, and to find out the maximum error (max_err)
procedure that is terminated when the absolute difference and the minimum error (min_err) simultaneously. An error
between and is smaller than a predefined threshold. cache approach can markedly reduce the computational load
The R3 interrupt signal is triggered by this condition and associated with this step. Table IV presents this procedure in
can then be determined. Fig. 9 depicts the design of the detail. If the address in error_cache or , then the error_cache
corresponding circuit. If the threshold is too large, then the ob- is set to zero. Otherwise, the error_cache is updated after the
tained Lagrange multipliers are imprecise. Otherwise, if it is too linear kernel function is calculated. After this updating, whether
TABLE III TABLE IV

ALGORITHM OF THE FIRST STEP PROCESSING IN SPU MODULE ALGORITHM OF THE SECOND STEP PROCESSING IN SPU MODULE
Fig. 10. First step processing of SPU module design in step. 3.1 of Table I.
max_err and min_err has to be updated is determined. Fig. 11

displays the designed circuit used in the second step of the
processing in the SPU module.
The functions of the blocks are as follows.
1) Linear Kernel Function: Evaluating the value of the linear
kernel function for two input samples. The corresponding
circuit design is shown in Fig. 4.
2) Prediction Error Update: Calculating new prediction er-
rors.
3) Maximum and Minimum Error: Finding the maximum used to avoid repeating the computations that were already
and minimum error_cache values that correspond to the made in the preceding iteration. The main computations for
max_point and min_point positions. updating prediction error are and
4) Address Generator: Calculating the required address for , as indicated in Table IV. A kernel
the above functional blocks. cache scheme to reduce the computational load of these kernel
5) Controller: Performing the following four tasks. operations is presented. Although the entire kernel values have
a) Checking whether and control the once been calculated in the PreP module, the kernel cache may
counter. have only stored a portion of the kernel values, because the
b) Checking whether or and control number of required training samples is usually large and the
the counter. memory size is quite limited. If a cache-hit occurs, then the
c) Controlling read/write status of memory. stored kernel value is fetched directly from the cache. Other-
d) Counter Block: Updating all prediction errors with wise, the kernel values must be recalculated for the cache-miss
reference to counter. situation. For an average of ten input samples with data sets of
In , the prediction error calculation necessitates size 32 24, the training time with the cache is only 65.4% of
a large number of operations. An error cache scheme is that without the cache.
Fig. 11. Design of second step processing of SPU module (step. 3.2 in Table I).
Fig. 13. Cache address generator.
approximately half. Additionally, the values will not be in-

volved in updating the prediction error and so they need not to
be stored. Notably, this approach allows the cache space to be
reduced. For example, if the kernel values that correspond to
five training samples are intended to be stored in the cache, then
only need ten cache spaces, rather than the original 25 spaces,
are required. Fig. 12 exemplifies the kernel cache design for five
training samples. To access the kernel value from the proposed
kernel cache, a cache address generator is designed, as presented
in Fig. 13. The corresponding equation is (8)
(8)
In Fig. 14, an FSM with four states S0, S1, S2, and S3 is
designed in . The FSM iteratively checks if or
is true or not. If it is true, perdition error is set to zero.
Fig. 12. Example of kernel cache design corresponding to five training
samples.
Otherwise, condition is checked to decide whether
the prediction error is fetched from the error cache or not. If
is true, therefore fetching kernel cache to acquire the
To improve the kernel cache design, the symmetric property corresponding kernel value. If cache-miss occurs, the required
is utilized to reduce the required memory size by kernel value is recalculated.
Fig. 14. The FSM design in SPU and controlled by SMOC. Four statuses S0, S1, S2 and S3 are designed in SPU .
E. SMO Controller and FMMD speech datasets [30] are used for training and
testing. A total of ten speakers in SMD and FMMD are selected
SMOC is a finite state machine that controls SMO processes
for evaluation. The length of the training utterance of each
in the multiple modules that are presented in Fig. 1 and manages
speaker is 10 s. Each training vector is a 19-dimensional linear
five interrupt signals R0 R4. If one of the interrupt signals
predictive cepstral coefficient (LPCC) [31]. To implement
is triggered, then the Lagrange multiplier indices , change,
the one-versus-one recognition approach [33], [35], [36], a
and the SMO returns to the initial step. Next, a new joint SMO
total of 45 hyperplanes are learned from the utterances of ten
optimization process starts. The four states in SMOC are named
speakers. The penalty cost in (1) is set to 1. In the recognition
S0 S3. S0 is for initialization. S1 is for processing PreP; S2
phase, the two-class classification is performed by applying an
and S3 are for processing LMU and SPU, respectively.
SVM classifier to each LPCC frame and then combining the
outputs for target evaluation. The multi-class classification is
IV. EXPERIMENTAL RESULTS performed based on all the two-class classification results using
one-versus-one approach.
A. Software Simulation
Table I includes an SMO optimization loop, which ends when
Without loss of generality, SVM recognition and training all satisfy the KTT conditions. From the experiments, a certain
are realized and verified in speaker recognition application. number of iterations sufficed to provide the required recognition
Speaker identification and speaker verification are the two types accuracy. Accordingly, the proposed system did not require that
of speaker recognition task. In our experiments, a text-inde- all satisfy KTT conditions as the loop ending condition, but
pendent speaker identification system is examined. The SMD rather that one of the following two conditions be satisfied, i.e.,
TABLE V
RESOURCE UTILIZATION OF CYCLONE II DE2-70
Fig. 15. Recognition accuracy versus different constrained iterations.
Fig. 16. Increased training performance compared between non-heuristic with

heuristic choice for second language multiplier.
1) all satisfy KTT conditions and 2) the set number of itera-

tions is reached. This approach reduces training time and only
slightly degrades recognition performance. In the experiments,
different lengths of testing utterances are applied. They are 4,
5, 6, 8, and 10 s. To determine required number of iterations to
yield sufficient recognition accuracy, different numbers of iter-
ations were carried out. From Fig. 15, the accuracy of recogni-
tion increases with the number of iterations from 200 to 300, but
tends to saturate when the number of iterations exceeds 400.
The training time comparison between heuristic and
non-heuristic methods to choose second Language multiplier
Fig. 17. Floorplan of the proposed design.
for optimization is given in Fig. 16. Apparently, the training
time of heuristic method is smaller than that of non-heuristic
method. The performance improvement is more significant for TABLE VI
a larger number of constrained iterations. COMPARISON OF TRAINING COST IN FPGA VERSUS PC SOFTWARE SIMULATION
B. FPGA Implementation and Performance Evaluation

In our design, an FPGA device is utilized to implement the
SMO prototype. The synthesizable Verilog-HDL descriptions
are coded and the Quartus II 7.2 sp3 software [32] is employed.
Altera Cyclone II DE2-70 FPGA is applied to realize the de-
sign. Table V presents the used resources, while Fig. 17 dis-
plays the floorplan of the proposed design of SMO in FPGA.
This floorplan gives information on the placement of the main
blocks—PreP, LMU, and SPU.
Table VI presents the required time and clock cycle number
according to the proposed FPGA prototype with various set-
tings. Moreover, processing time comparison for MATLAB soft-
ware and FPGA simulations is also provided. Table VI also
presents the average time cost and number of cycles for learning
one hyperplane. The experimental results indicate that the case using the SMD and FMMD datasets, respectively. Longer
with the cache takes much less time than that without the cache. testing speech utterances are associated with better recognition
The proposed SMO IP is a fixed-point design, while the accuracy. From 200 to 400 iterations, the recognition accuracy
MATLAB simulation used the floating-point version of SMO. tends to saturate. Most important of all, Tables VII and VIII
Tables VII and VIII present the experimental results obtained demonstrate that the recognition performance of the proposed
TABLE VII REFERENCES

SMD DATASET EVALUATION RESULTS
[1] W. Vincent and R. Steve, “Speaker verification using sequence discrim-
inant support vector machines,” IEEE Trans. Speech Audio Process.,
vol. 13, no. 2, pp. 203–210, Mar. 2005.
[2] W. M. Campbell, J. P. Campbell, T. P. Gleason, D. A. Reynolds, and
W. Shen, “Speaker verification using support vector machines and
high-level features,” IEEE Trans. Speech, Audio Lang. Process., vol.
15, no. 7, pp. 2085–2093, Sep. 2007.
[3] H. L. Huang and F. L. Chang, “Evolutionary support vector
machine for automatic feature selection and classification of micro
array data,” Elsevier Bioinform. BioSyst., vol. 90, no. 2, pp.
516–528, Sep. 2007.
[4] P. H. Chen, R. E. Fan, and C. J. Lin, “A study on SMO-type decompo-
sition method for support vector machines,” IEEE Trans. Neural Netw.,
vol. 17, no. 4, pp. 893–908, Jul. 2006.
[5] N. Takahashi, J. Guo, and T. Nishi, “Global convergence of SMO al-
gorithm for support vector regression,” IEEE Trans. Neural Netw., vol.
19, no. 6, pp. 1362–1369, Jun. 2008.
[6] A. Billionnet and S. Elloumi, “Using a mixed integer quadratic pro-
gramming solver for the unconstrained quadratic 0–1 problem,” Math.
Program., vol. 109, pp. 55–68, 2007.
[7] J. R. Bunch and L. Kaufman, “A computational method for the indefi-
nite quadratic programming problem,” Linear Algebra Its Appl. 34, pp.
341–369, 1980.
[8] J. C. Platt, “Using sparseness and analytic QP to speed training of sup-
port vector machines,” in Advances in Neural Information Processing
TABLE VIII
Systems 11, M. S. Kearns, S. A. Solla, and D. A. Cohn, Eds. Cam-
FMMD DATASET EVALUATION RESULTS
bridge, MA: MIT Press, 1999.
[9] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear Program-
ming: Theory and Algorithm. New York: Wiley, 1993.
[10] N. Takahashi and T. Nishi, “Rigorous proof of termination of SMO
algorithm for support vector machines,” IEEE Trans. Neural Netw., vol.
16, no. 3, pp. 774–776, May 2005.
[11] J. C. Platt, “Fast training of support vector machines using sequen-
tial minimal optimization,” in Advances in Kernel Methods of Support
Vector Machine, B. Schölkopf, C. Burges, and A. Smola, Eds. Cam-
bridge, MA: MIT Press, 1998.
[12] C. C. Chang, C. W. Hsu, and C.-J. Lin, “The analysis of decomposition
methods for support vector machines,” IEEE Trans. Neural Netw., vol.
11, no. 4, pp. 1003–1008, Jul. 2000.
[13] T. Joachims, “Making large-scale SVM learning practical,” in Ad-
vances in Kernel Methods—Support Vector Learning, B. Schölkopf,
C. J. C. Burges, and A. J. Smola, Eds. Cambridge, MA: MIT Press,
1998.
[14] S. Chakrabartty and G. Cauwenberghs, “Sub-microwatt analog VLSI
trainable pattern classifier,” IEEE Trans. Solid-State Circuits, vol. 42,
no. 5, pp. 1169–1179, May 2007.
[15] D. Anguita, A. Boni, and S. Ridella, “Learning algorithm for nonlinear
support vector machines suited for digital VLSI,” Electron. Lett., vol.
35, no. 16, pp. 1349–1350, 1999.
[16] S. Y. Peng, B. A. Minch, and P. Hasler, “Analog VLSI implementation
of support vector machine learning and classification,” in Proc. IEEE
Int. Symp. Circuits Syst. (ISCAS), May 2008, pp. 860–863.
fixed-point design (FPGA) is similar to that of the floating-point [17] S. Chakrabartty and G. Cauwenberghs, “Sub-microwatt analog VLSI
SMO which is performed by PC’s MATLAB. support vector machine for pattern classification and sequence estima-
tion,” in Adv. Neural Information Processing Systems (NIPS’2004).
Cambridge, MA: MIT Press, 2005.
V. CONCLUSION [18] R. Genov and G. Cauwenberghs, “Kerneltron: Support vector machine
in silicon,” IEEE Trans. Neural Netw., vol. 14, no. 5, pp. 1426–1434,
SVM learning with a large number of training samples leads Sep. 2003.
to a heavy computational burden. To make SVM applicable to [19] P. Kucher and S. Chakrabartty, “An energy-scalable margin propaga-
tion-based analog VLSI support vector machine,” in Proc. IEEE Int.
standalone systems with retraining flexibility, this investigation Conf. Circuits Syst. (ISCAS), May 2007, pp. 1289–1292.
presents a chip design and implements an SMO algorithm for [20] J. Manikandan, B. Venkataramani, and V. Avanthi, “FPGA imple-
SVM learning. The proposed VLSI architecture consists of three mentation of support vector machine based isolated digit recognition
system,” in Proc. 22nd IEEE Int. Conf. VLSI Des., Jan. 2009, pp.
modules and was tested using an FPGA prototype and found 347–352.
to be fully functional. The current design focuses on the SMO [21] D. Anguita, A. Boni, and S. Ridella, “A digital architecture for support
algorithm. However, since the SVM evaluation function is em- vector machines: Theory, algorithm, and FPGA implementation,” IEEE
Trans. Neural Netw., vol. 14, no. 5, pp. 993–1009, Sep. 2003.
bedded in the SMO algorithm, the current design can be easily [22] M. Jiméneza, H. Lamelaa, and J. Gimenoa, “An analogue circuit for
extended to perform SVM recognition. This extension can be sequential minimal optimization for support vector machines,” in Proc.
accomplished using a small-circuit controller. Our future works SPIE, May 2008, vol. 6979, pp. 697909–697909-6.
[23] R. Genov, S. Chakrabartty, and G. Cauwenberghs, “Silicon support
will also apply the proposed VLSI SVM to realize a complete vector machine with on-line learning,” Int. J. Pattern Recog. Artificial
biometric system on a chip (SoC). Intell., vol. 17, no. 3, pp. 385–404, 2003.
[24] D. Fradkin and I. Muchnik, “Support vector machines for classifica- Software Co-design on Speech Signal Processing”. His research interests
tion,” DIMACS Series Discrete Math. Theoretical Comput. Sci., vol. include multimedia signal processing including speech signal processing,
70, pp. 13–20, 2006. image processing, and VLSI system design. He has published about 130
[25] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy, journal papers on IEEE, SIAM, IEICE, and IEE and about 200 international
“Improvements to Platt’s SMO algorithm for SVM classifier design,” conference papers.
Neural Comput., vol. 13, pp. 637–649, 2001. Dr. Wang was a recipient of the Outstanding Research Awards and Out-
[26] R. F. Osuna and F. Girosi, “Support vector machines: Training and ap- standing Researcher Award from National Science Council in 1990, 1995,
plications,” Massachusetts Inst. Technol., Cambridge, AI Memo 1602, 1997, and 2006, respectively. He also received the Outstanding Industrial
1997b. Awards from ACER and Institute of Information Industry and the Outstanding
[27] J. C. Platt, “Sequential minimal optimization: A fast algorithm for Professor Award from Chinese Engineer Association, Taiwan, in 1991 and
training support vector machines,” Microsoft Research, Tech. Rep. 1996, respectively. He also received the Culture Service Award from the
MSR-TR-98-14, 1998. Ministry of Education, Taiwan, in 2008 and Distinguished Scholar Award of
[28] T. W. Kuan, J. F. Wang, J. C. Wang, and G. H. Gu, “VLSI design of KT Li from NCKU in 2009. He was also invited to give the Keynote Speeches
sequential minimal optimization algorithm for SVM learning,” in Proc. in PACLIC 12, in Singapore, in 1998, UWN, in 2005, in Taipei, WirelessCom
IEEE Int. Conf. Circuits Syst. (ISCAS), 2009, vol. 5, pp. 2509–2512. 2005, in HI, IIH-MSP2006, in Pasadena, CA, ISM2007, in Taichung, and PCM
[29] K. R. Müller, S. Mika, G. Räsch, K. Tsuda, and B. Schökopf, “An 2008, in Tainan, respectively. He also served as an Associate Editor on the
introduction to kernel-based learning algorithms,” IEEE Trans. Neural IEEE TRANSACTIONS ON NEURAL NETWORKS and the IEEE TRANSACTIONS
Netw., vol. 12, no. 2, pp. 181–201, 2001. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS and Editor-in-Chief
[30] J. F. Wang, T. W. Kuan, J. C. Wang, and G. H. Gu, “Ubiquitous and on International Journal of Chinese Engineering from 1995 to 2000.
robust text-independent speaker recognition for home automation dig-
ital life,” in Proc. 5th Int. Conf. Ubiquitous Intell. Comput. (UIC), Jun.
2008, vol. 5061, pp. 297–310.
[31] X. Huang, A. Acero, and H. W. Hon, “Spoken language processing: A Jia-Ching Wang (SM’09) received the M.S. and
guide to theory, algorithm and system development,” in All of Microsoft Ph.D. degrees from National Cheng Kung Univer-
Research, Redmond, WA. Englewood Cliffs, NJ: Prentice-Hall, 2001. sity (NCKU), Tainan, Taiwan, in 1997 and 2002,
[32] “Introduction to the Quartus II Software Handbook,” Altera, San Jose, respectively, both in electrical engineering.
CA, 2004. He is currently with the Department of Computer
[33] C. M. Bishop, Pattern Recognition and Machine Learning. London, Science and Information Engineering, National
U.K.: Springer, 2006. Central University, Jhongli, Taiwan, as an Assistant
[34] M. Papadonikolakis and C. S. Bouganis, “A scalable FPGA architec- Professor. His research interests include signal
ture for non-linear SVM training,” in Proc. Int. Conf. ICECE Technol., processing and associated VLSI architecture design.
Dec. 2008, pp. 337–340. He has published over 70 technical papers since
[35] K. S. Goh, E. Y. Chang, and B. Li, “Using one-class and two-class 1997. He has also obtained 3 U.S. and 4 R.O.C.
SVMs for multi-class image annotation,” IEEE Trans. Know Data invention patents.
Eng., vol. 17, no. 10, pp. 1333–1346, 2005. Dr. Wang is an Honor Member of Phi Tau Phi.
[36] K. Duan and S. S. Keerthi, “Which is the best multi-class SVM
method? An empirical study,” Control Div., Dept. Mechan. Eng., Nat.
Univ. Singapore, Singapore, Tech. Rep. CD-03-12, 2003.
[37] K. K. Cao, H. B. Shen, and H. F. Chen, “A parallel and scalable digital
architecture for training support vector machines,” J. Zhejiang Univ. Po-Chuan Lin received the M.S. and Ph.D. degrees
—Sci. C, vol. 11, no. 8, pp. 620–628, 2010. in electrical engineering from National Cheng Kung
[38] B. C. Catanzaro, N. Sundaram, and K. Keutzer, “Fast support vector University, Tainan, Taiwan, in 2000 and 2007,
machine training and classification on graphics processors,” in Proc. respectively.
25th Int. Conf. Mach. Learn., 2008, pp. 104–111. He is currently engaged in the research and devel-
opment of automatic minute/transcription generation
Ta-Wen Kuan (S’10) received the B.S. degree in system with the Multimedia and Embedded System
Design Laboratory, Department of Electronics Engi-
weapon system engineering from National Defense
University, Taoyuan, Taiwan, in 1991 and the M.S. neering and Computer Science, Tung-Fang Institute
of Technology, Kaohsiung, Taiwan. His research in-
degree in computer science and information engi-
neering from I-Shou University, Kaohsiung, Taiwan, terests include speech signal processing, VLSI archi-
in 2003. He is currently pursuing the Ph.D. degree tecture design, interactive robot design, and embedded system design.
Dr. Lin was a recipient of the Golden Silicon Award from the Macronix In-
from the Department of Electrical Engineering,
National Cheng-Kung University, Tainan, Taiwan. ternational Co., Ltd., in 2002. In 2008, he was selected as an invigilator for field
test of skills certification, skill category communication Techniques classes C
His research interests include VLSI architect de-
sign, speech signal processing, pattern recognition, certified technician by council of labor affairs, Executive Yuan. He is also a
member of ACLCLP.
and machine learning. His working career includes
MIS, DBMS, NMS, and C4ISR over ten years in MND and Navy.
Gaung-Hui Gu received the B.S. degree in

electronic engineering from National Kaohsiung
Jhing-Fa Wang (F’99) received the Bachelor’s and
Master’s degrees from National Cheng Kung Univer- University of Applied Sciences, Kaohsiung, Taiwan,
sity (NCKU), Tainan, Taiwan, and the Ph.D. degree in 2002, and the M.S. degree from the Department
from Stevens Institute of Technology, Hoboken, NJ, of Electrical Engineering, National Cheng Kung
in 1973, 1979, and 1983, respectively. University, Tainan, Taiwan, in 2008.
He is currently with the Industrial Technology
He is currently a Chair and Distinguished Pro-
fessor with the Department of Electrical Engineering, Research Institute, Taiwan. His research interests
include speech signal processing, VLSI architect
NCKU. He is the formal chair of IEEE Tainan Sec-
tion and now the Coordinator of Section/Chapter, design, and digital filter design.
Region 10, IEEE. He was elected as IEEE Fellow
in 1999 for his contribution on: “Hardware and

VLSI Design of An SVM Learning Core On Sequential Minimal Optimization Algorithm

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

VLSI Design of An SVM Learning Core On Sequential Minimal Optimization Algorithm

Diunggah oleh

Hak Cipta:

Format Tersedia

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO.

4, APRIL 2012 673

VLSI Design of an SVM Learning Core on Sequential

Section IV elucidates the experimental results for proposed

II. SVM LEARNING AND SMO ALGORITHM

where is a training sample, is the corre-

Fig. 2. Bounded line of two optimizing multipliers in a square: (a) if y 6= y

The KKT condition can be summarized as (4), and this rep-

cuit to evaluate the liner kernel function is depicted in Fig. 4.

Fig. 5. Circuit to detect the maximal jE 0 E j .

D. Step 3—SVM Parameter Updating Module

TABLE III TABLE IV

max_err and min_err has to be updated is determined. Fig. 11

Fig. 13. Cache address generator.

approximately half. Additionally, the values will not be in-

Fig. 15. Recognition accuracy versus different constrained iterations.

Fig. 16. Increased training performance compared between non-heuristic with

1) all satisfy KTT conditions and 2) the set number of itera-

B. FPGA Implementation and Performance Evaluation

TABLE VII REFERENCES

Gaung-Hui Gu received the B.S. degree in

Anda mungkin juga menyukai