{-1,1] 2.1
with the goal of finding the function, (x, o
0
) (x, o) that can best approximate the
function y = (x). The training data is considered linearly separable if the data can be
separated with a linear hyperplane,
(x, o) = (w
u
x) +b 2.2
There could be many different hyperplanes that could linearly separate the training data
but there is one hyperplane that has the biggest gap between the two data called the
maximum margin hyperplane. The maximum margin hyperplane must satisfy the
following condition:
y
|(w x
) +b] 1 2.3
while having the minimal norm:
w
2
= (w w) 2.4
9
Figure7.Maxmarginhyperplane
In figure 7 we see the separation of the two data points, H1 and H2 by the hyperplane, H.
The distance from H to H1 is:
|w x +b|
||w||
=
1
||w||
2.5
And from H1 to H2 is:
2
||w||
2.6
10
If the training data cannot be linearly separated, we can use a soft margin hyperplane,
which allow misclassification through the slack variable,
|(w x
) +b] 1 -
u 2.8
The optimal separating hyperplane for the separable and non-separable case is through
finding the saddle point of the lagrangian:
I(w, b, o, ) =
1
2
(w w) +C
- o
+y
|(w x
) +b] -1)
2.9
u , o
u
Since, w = o
x
and o is the maximum point of:
w(o) = o
-
1
2
,]
o
o
]
y
y
]
(x
x
]
)
, 2.10
o
= u, u o
C
The separating hyperplane is:
(x) = o
(x
x) +b
2.11
In SVM, we sometimes need to map input vectors non-linearly into a high dimensional
feature space. Having multiple inputs increases complexity of classification thus mapping
11
into high feature space allows for a better decision surface. Using polynomial kernel
allows for better classification:
K(x
, x
]
) = ((x
X
]
) +1)
d
, J = 1, 2.12
where d is the degree of the polynomial.
We can perform the kernel trick of replacing every dot product by the nonlinear
polynomial kernel function to create our nonlinear classifier. A kernel trick allows us to
map observations from a general set D into an inner product space S which only requires
dot products between the vectors in S. The kernel trick permits high-dimensional dot
products to be computed within the original space.
In order to do a multi-class classifier, we perform multiple binary classifications to
reduce the single multiclass problem (7). There are two general approaches in multiclass
classification, one-versus-all and one-versus-one classification. In a one-versus-all
strategy, a single classifier is trained per class that separates it from all the other classes.
The prediction with the highest confidence is chosen after predicting each binary
classifier. In a one-versus-one strategy, classification is done where every classifier
assigns the instance to one of the two classes, the assigned class gets one vote and
increases over time, in the end the class with the most votes determines that moments
classification.
12
FEATURE EXTRATION
In discrete wavelet transform, the signal, x is passed through a low pass filter with
impulse response b(n) resulting in the convolution (8):
y|n] = (x b)|n] = x|k]b|n -k]
k=-
3.1
The signal is also decomposed simultaneously using a high pass filter. The outputs are
the approximation and detail coefficients from the low and high pass filter respectively
(9):
y
I
|n] = x|k]b|2n -k]
k=-
Low-pass filter output 3.2
y
h
|n] = x|k]g|2n -k]
k=-
High-pass filter output 3.3
The result is one half the time resolution due to the decomposition.
Figure 8. Filter analysis
The decomposition is repeated and cascaded to create a filter bank.
13
Figure 9. 3 level filter bank of decomposition
Creating the decomposition in reverse is called reconstruction. The detail and
approximation coefficients are upsampled by two at every level and passed through a
low-pass and high-pass filter.
Figure 10. 3 level filter bank of reconstruction
The process of decomposition and reconstruction allows for de-noising and compression
of the signal while maintaining the unique features without affecting the accuracy rate of
classification (10).
14
NUMERICAL EXPERIMENTS
The ECG files were acquired from PhysioNet.org. PhysioNet offers free web access to
large collections of recorded physiologic signals (PhysioBank) and related open-source
software (PhysioToolkit) (11). From the website, twenty records were downloaded from
twenty different patients. The records that were captured were one hour in length from
patients as young as 24 years old to 87 years old. The files were in .mat format and came
with an annotation file for each ECG. The annotation files marked each heart beat and
assigned a label on the R peak of the QRS complex wave and based on the category of
the arrhythmia, it was given a letter. The letter identified each arrhythmia within the ECG
signal. Matlab was the software of choice for its numerical computing environment (12).
The files were uploaded into matlab with a software extension package called libsvm.
Libsvm is integrated software for support vector classification that supports multi-class
classification (13). Feature extraction was done through matlab via the wavelet
decomposition.
15
The annotation files were manually imported one by one into matlab and the datasets
entered into matrixes. Based on the arrhythmia needed, which was searched by a single
letter indicating the arrhythmias, (A =APC, V=PVC, L=LBBB, R=RBBB, N=Normal)
all the categories were grouped and saved as a separate .mat file. The procedure was
repeated for each patient file. Next the hour long ECG file was loaded into matlab and
with a sampling rate on 360 and the annotation time stamp, each labeled heart beat was
extracted. The result was a 301 sample heart beat that included the P wave, QRS complex
and T wave from start to finish. The second matlab script file combined all the groups
into one giant training file containing 5400 heart beats in a 5400 by 301 matrix. The
appropriate training labels were assigned and SVM classification was produce with the
libsvm package. The test file was created in similar fashion as the training dataset but
with only 2036 samples. Libsvm offers multiple types of kernel functions. In this paper
we only look at Linear, polynomial 2
nd
order, 3
rd
order and 4th order. The cost of the
kernel varies from 10
-3
to 10
+3
and the best value was calculated using a 10-fold cross
validation. The gamma in the kernel function was also manipulated with the 10-fold cross
validation. After the first classification on the original signal files, feature extraction was
performed to reduce the number of samples in the ECG. Having fewer samples gives us
the ability to run faster analysis, while maintaining a small memory size. The samples in
the ECG signal were reduced from 301 to 154, 80, 43, 25, 16, 11, and 9 in a seven level
reconstruction.
16
Figure 11. ECG graph of original 301 sample vs. level 5 feature extraction with 25
samples
Using the small sample size, SVM classification was redone for each step and my
algorithm maintained a high accuracy.
To evaluate the performance of the classifier, we first setup the kernel function for linear
classification by setting the kernel flag options with -t 0 and the cost parameter at 10
-3
.
After training and testing, our accuracy was 99.26%. Next we try changing the cost to C
=10
-2
and C =10
-1
up until C =10
3
and the accuracy remained the same at 99.26%
To properly visualize the classification performance and compare the accuracy, a
confusion matrix was computed. The results conclude that not only is the high accuracy
valid, its performance to properly classify the test data is efficient with regards to
providing true positive classification.
The next step in classification was to change the kernel function to polynomial in second
order. This is done by setting the option flag with -t 1 d 2. The d is for setting the
degree or order of the classifier. We did the testing by changing the cost uniformly from
17
C =10
-3
to C =10
3
. After training and testing the dataset, our results were similar to
linear classification, 99.26%. Changing the order of classification from 2
nd
order to 3
rd
and 4
th
order had no effect on accuracy, which remained at 99.26%. Computing a
confusion matrix yield the same results for the linear kernel, therefore table 2 and 3 could
be associated with a polynomial kernel.
Table 1. Accuracy on testing results
C 10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
Linear 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
2
nd
Order 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
3
rd
Order 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
4
th
Order 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Table 2. Confusion Matrix
Actual/Predicted APC PVC LBBB RBBB Normal
APC 74 1 0 0 1
PVC 0 106 0 0 1
LBBB 0 10 481 0 0
RBBB 0 0 0 625 0
Normal 1 1 0 0 735
18
Table 3. Table of confusion
True Positive False Negative False Positive True Negative
APC 74 2 1 1959
PVC 106 1 12 1917
LBBB 481 10 0 1545
RBBB 625 0 0 1411
Normal 735 2 2 1297
Our next focus was to evaluate how well feature extraction reduced a signal file while
maintaining the important characteristics. In a 7 level feature extraction, 301 samples are
reduce and trained and tested for each level and the accuracy recorded. First Linear
classification was performed. For the first 3 levels, the features were reduced from 301
samples to 157, 80 and 43, and the accuracy remained unchanged at 99.26%. The results
are listed in Table 2. Varying the cost also had no change. On the fourth level, where the
features were reduced to 25 samples, the accuracy went down to 99.17%. Adjusting the
Cost from C =10
-3
to 10
3
also had no effect on accuracy. But changing the cost to C =10
-
4
gave us an accuracy of 99.12%, and C =10
-5
increase it to 99.26%, the same accuracy
rate as our first evaluation with 301 samples. The accuracy rate of level 6 with only 11
features was 97.84% and 95.04% on level 7 with only 9 samples! Changing the cost does
have some slight effect on accuracy. The rest of the feature extraction results can be
viewed on tables 3 through 5.
19
Table 4. Feature extraction on linear kernel
Linear kernel
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
Level 1 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 2 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 3 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 4 99.17% 99.17% 99.17% 99.17% 99.17% 99.17% 99.17%
Level 5 98.62% 98.67% 98.58% 98.72% 98.67% 98.67% 98.53%
Level 6 97.94% 98.08% 97.69% 97.84% 97.59% 97.79% 97.69%
Level 7 96.02% 95.78% 95.87% 95.04% 95.24% 95.73% 95.14%
Table 5. Feature extraction on 2nd order kernel
2
nd
order polynomial kernel
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
Level 1 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 2 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 3 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 4 99.07% 99.07% 99.07% 99.07% 99.07% 99.07% 99.07%
Level 5 98.82% 98.82% 98.82% 98.82% 98.82% 98.82% 98.82%
Level 6 98.43% 98.33% 98.33% 98.38% 97.54% 98.28% 97.54%
Level 7 97.3% 97.05% 93.27% 96.76% 95.92% 95.92% 96.41%
20
Table 6. Feature extraction on 3rd order kernel
3
rd
order polynomial kernel
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
Level 1 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 2 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 3 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 4 99.07% 99.07% 99.07% 99.07% 99.07% 99.07% 99.07%
Level 5 98.77% 98.77% 98.77% 98.77% 98.77% 98.77% 98.77%
Level 6 98.08% 98.08% 98.08% 98.08% 98.08% 98.08% 98.08%
Level 7 97.05% 96.66% 97.45% 97% 96.56% 97.25% 96.76%
Table 7. Feature extraction on 4th order kernel
4
th
order polynomial kernel
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
Level 1 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 2 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 3 99.31% 99.31% 99.31% 99.31% 99.31% 99.31% 99.31%
Level 4 99.07% 99.07% 99.07% 99.07% 99.07% 99.07% 99.07%
Level 5 98.92% 98.92% 98.92% 98.92% 98.92% 98.92% 98.92%
Level 6 97.99% 97.99% 97.99% 97.99% 97.99% 97.99% 97.99%
Level 7 96.91% 96.66% 96.86% 96.71% 96.94% 96.66% 96.51%
21
The results from all 4 tables show a small variance in accuracy when changing the Cost
or the kernel type. The best result was from the 4
th
order polynomial after level 3 feature
extraction for a rate of 99.31% and the worst was the 2
nd
order polynomial at level 7 for a
rate of 93.27%
22
DISCUSSION
In my paper I describe the effects of support vector machine classification on ECG signal
for arrhythmia classification and seven level feature extractions. Classification was done
with matlab and libsvm software package to import twenty hour long ECG signal files
and analyze and collect each heart beat into its appropriate category. Once categorize and
labeled, the heart beats were trained and classification testing was done on a second set of
arrhythmias and the accuracy rate recorded. Several kernel functions were used,
including linear and high order polynomials to provide better classification. The results
show the high accuracy effectiveness of SVM on ECG signals. Through proper feature
extraction, we were able to reduce the sample set on each heart beat from 301 samples to
9 samples while maintaining a high accuracy rate. A reduction of samples will reduce the
memory size and processing time of classification without the cost of accuracy.
ECG signals are very important because of their noninvasive approach to diagnose
various heart conditions. We can help treat heart disease and prevent problems from
forming through early detection. My paper brought forth the ease of arrhythmia detection
through SVM classification. The feature reduction ability will allow low memory usage
to incorporate such SVM algorithms into mobile devices to be readily available to more
patients who can self-monitor their own heartbeat.
23
BIBLIOGRAPHY
1.Deaths:PreliminaryDatafor2011.Hoyert,yDonnaL.6,2012,Vol.61.
2.AmericanHeartAssociation.[Online][Cited:April1,2013.]www.americanheart.org/.
3.ECGPedia.org.[Online][Cited:March15,2012.]
http://en.ecgpedia.org/wiki/Atrial_Premature_Complexes.
4.JatinDave,MD,MPH.VentricularPrematureComplexes.http://emedicine.medscape.com/.
[Online]Nov.12,2012.
5.FrankG.Yanowitz,M.D.RBBB.ECGlearningcenter.[Online]2006.http://ecg.utah.edu/.
6.T.,SyedaMahmood.ShapebasedMatching.EngineeringinMedicineandBiologySociety.
2007.
7.Duan,KaiBoandandKeerthi,S.Sathiya.WhichIstheBestMulticlassSVMMethod?An
EmpiricalStudy.ProceedingsoftheSixthInternationalWorkshoponMultipleClassifierSystems.
2005.
8.Akansu,AliN.andHaddad,RichardA.Multiresolutionsignaldecomposition:transforms,
subbands,andwavelets.Boston,MA:AcademicPress,1992.
9.Mallat,S.AWaveletTourofSignalProcessing,2nded.1999.
10.HoTattWeiandJeoti,V.AwaveletfootprintsbasedcompressionschemeforECGsignals.
2004.
11.PhysioNet.[Online][Cited:September5,2012.]www.physionet.org.
12.Matlab.[Online][Cited:October2,2011.]http://www.mathworks.com/products/matlab/.
13.ChihChungChangandChihJenLin.Libsvm.[Online][Cited:September2012,2012.]
http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
14.LIBSVM:ALibraryforSupportVectorMachines.Lin,ChihChungChangandChihJen.Taipei,
Taiwan:s.n.,2001.
24
APPENDIX
Matlab code
%%
clc
close
clear
load('sapc1.mat');
a1 =poi2;
load('sapc2.mat');
a2 =poi2;
load('spvc1.mat');
p1 =poi2;
load('spvc2.mat');
p2 =poi2;
load('spvc3.mat');
p3 =poi2;
load('slbbb1.mat');
l =poi2;
load('srbbb1.mat');
r =poi2;
load('snor1.mat');
n1 =poi2;
load('snor2.mat');
n2 =poi2;
load('snor3.mat');
n3 =poi2;
load('snor4.mat');
n4 =poi2;
%%
train =[a2(1:300,:); p2(1:400,:); l(1:2000,:); r(1:1200,:); n2(1:1500,:)];
labels =[ones(300,1); 2*ones(400,1); 3*ones(2000,1); 4*ones(1200,1); 5*ones(1500,1)];
model =svmtrain(labels, train, '-t 1 -d 4 -c .001 ');
test =[a2(301:376,:); p2(401:507,:); l(2001:2491,:); r(1201:1825,:); n2(1501:2237,:)];
tlabels =[ones(76,1); 2*ones(107,1); 3*ones(491,1); 4*ones(625,1); 5*ones(737,1)];
%test =[a1(1:76,:); p2(401:428,:); p1; p3; l(2001:2491,:); r(1201:1825,:); n1(1:737,:)];
[predicted_label, accuracy, dec] =svmpredict(tlabels, test , model);
%% Coefficients of approximations at level 1
25
ca1 =mdwtdec('r',train,7,'sym4');
ca1t =mdwtdec('r',test,7,'sym4');
train2 =mdwtrec(ca1,'ca',1);
test2 =mdwtrec(ca1t,'ca',1);
model2 =svmtrain(labels, train2, '-t 1 -d 4 -c .001');
[predicted_label, accuracy, dec] =svmpredict(tlabels, test2 , model2);
%% Coefficients of approximations at level 2
train3 =mdwtrec(ca1,'ca',2);
test3 =mdwtrec(ca1t,'ca',2);
model3 =svmtrain(labels, train3, '-t 1 -d 4 -c 1000 ');
[predicted_label, accuracy, dec] =svmpredict(tlabels, test3 , model3);
%% Coefficients of approximations at level 3
train4 =mdwtrec(ca1,'ca',3);
test4 =mdwtrec(ca1t,'ca',3);
model4 =svmtrain(labels, train4, '-t 1 -d 4 -c .001 ');
[predicted_label, accuracy, dec] =svmpredict(tlabels, test4 , model4);
%% Coefficients of approximations at level 4
train5 =mdwtrec(ca1,'ca',4);
test5 =mdwtrec(ca1t,'ca',4);
model5 =svmtrain(labels, train5, '-t 1 -d 4 -c 1000 ');
[predicted_label, accuracy, dec] =svmpredict(tlabels, test5 , model5);
%% Coefficients of approximations at level 5
train6 =mdwtrec(ca1,'ca',5);
test6 =mdwtrec(ca1t,'ca',5);
model6 =svmtrain(labels, train6, '-t 1 -d 4 -c 1 ');
[predicted_label, accuracy, dec] =svmpredict(tlabels, test6 , model6);
%% Coefficients of approximations at level 6
train7 =mdwtrec(ca1,'ca',6);
test7 =mdwtrec(ca1t,'ca',6);
model7 =svmtrain(labels, train7, '-t 1 -d 4 -c 1000 ');
[predicted_label, accuracy, dec] =svmpredict(tlabels, test7 , model7);
%% Coefficients of approximations at level 7
train8 =mdwtrec(ca1,'ca',7);
test8 =mdwtrec(ca1t,'ca',7);
model8 =svmtrain(labels, train8, '-t 1 -d 4 -c .01 ');
[predicted_label, accuracy, dec] =svmpredict(tlabels, test8 , model8);
26
Figure 12. QRS complex wave with P and T wave.
Figure 13. ECG of a heart beat with APC
Figure 14. ECG of a heart beat with VPC
27
Figure 15. ECG of a heart beat with LBBB
Figure 16. ECG graph showing a heartbeat with RBBB
28
Figure 17. ECG of a normal heart beat.
Figure 18. Filter analysis
Figure 19. 3 level filter bank of decomposition
29
Figure 20. 3 level filter bank of reconstruction
Figure 21. ECG graph of original 301 sample vs. level 5 feature extraction with 25
samples
30
Table 8. Accuracy on testing results
C 10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
Linear 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
2
nd
Order 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
3
rd
Order 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
4
th
Order 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Table 9. Confusion Matrix
Actual/Predicted APC PVC LBBB RBBB Normal
APC 74 1 0 0 1
PVC 0 106 0 0 1
LBBB 0 10 481 0 0
RBBB 0 0 0 625 0
Normal 1 1 0 0 735
Table 10. Table of confusion
True Positive False Negative False Positive True Negative
APC 74 2 1 1959
PVC 106 1 12 1917
LBBB 481 10 0 1545
RBBB 625 0 0 1411
Normal 735 2 2 1297
31
Table 4. Feature extraction on linear kernel
Linear kernel
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
Level 1 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 2 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 3 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 4 99.17% 99.17% 99.17% 99.17% 99.17% 99.17% 99.17%
Level 5 98.62% 98.67% 98.58% 98.72% 98.67% 98.67% 98.53%
Level 6 97.94% 98.08% 97.69% 97.84% 97.59% 97.79% 97.69%
Level 7 96.02% 95.78% 95.87% 95.04% 95.24% 95.73% 95.14%
Table 5. Feature extraction on 2nd order kernel
2
nd
order polynomial kernel
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
Level 1 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 2 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 3 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 4 99.07% 99.07% 99.07% 99.07% 99.07% 99.07% 99.07%
Level 5 98.82% 98.82% 98.82% 98.82% 98.82% 98.82% 98.82%
Level 6 98.43% 98.33% 98.33% 98.38% 97.54% 98.28% 97.54%
Level 7 97.3% 97.05% 93.27% 96.76% 95.92% 95.92% 96.41%
32
Table 6. Feature extraction on 3rd order kernel
3
rd
order polynomial kernel
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
Level 1 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 2 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 3 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 4 99.07% 99.07% 99.07% 99.07% 99.07% 99.07% 99.07%
Level 5 98.77% 98.77% 98.77% 98.77% 98.77% 98.77% 98.77%
Level 6 98.08% 98.08% 98.08% 98.08% 98.08% 98.08% 98.08%
Level 7 97.05% 96.66% 97.45% 97% 96.56% 97.25% 96.76%
Table 7. Feature extraction on 4th order kernel
4
th
order polynomial kernel
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
Level 1 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 2 99.26% 99.26% 99.26% 99.26% 99.26% 99.26% 99.26%
Level 3 99.31% 99.31% 99.31% 99.31% 99.31% 99.31% 99.31%
Level 4 99.07% 99.07% 99.07% 99.07% 99.07% 99.07% 99.07%
Level 5 98.92% 98.92% 98.92% 98.92% 98.92% 98.92% 98.92%
Level 6 97.99% 97.99% 97.99% 97.99% 97.99% 97.99% 97.99%
Level 7 96.91% 96.66% 96.86% 96.71% 96.94% 96.66% 96.51%