Anda di halaman 1dari 9

SUPPORT VECTOR MACHINES

ANEEK ANWAR 2012-MS-EE-067 INTRODUCTION SVM is one of the most popular machine learning techniques for classifying data into binary classes. SVM finds the best possible hyperplane between the two classes using some of the data points, which are called support vectors. The maximal width of the slab parallel to the hyperplane that has no interior data points is called margin. This report explores the position of hyperplane and the support vectors used in determining that hyperplane for various configuration of data points. Data points used were all only 2-dimensional so that the results can be shown graphically. In total 5 experiments were performed, 2 for the case of separable classes and 3 for the case of non-separable classes. All experiments were performed on MATLAB R2013a using the inbuilt functions.

SEPARABLE CLASSES CASE I In first experiment 6 data points were taken, 3 from each class. It was made sure that data points were all outside the margin. The plot generated by MATLAB is shown in Fig. 1.

Figure 1

As indicated by the plot, 4 out of 6 points were used for determining the best hyperplane. The values of lambda (alpha in MATLAB) are given in Table 1. The points in green belong to class 1 and the points in red belong to class 2. The support vectors are indicated by a circle around the respective points. The hyperplane is shown by the solid line along the y-axis. The results are quite intuitive and match the expected values from theoretical calculations.
Table 1

1 2 3 4

-1 -0.2 1 0.2

SEPARABLE CLASSES CASE II In second experiment two more data points were added and they were chosen such that they cross the hard margin but are still on the correct side of the hyperplane with respect to their classes. The plot generated by MATLAB is shown in Fig. 2.

Figure 2

The hyperplane remains the same, as it should, but now the number of support vectors have increased from 4 to 6. Even though, hyperplane looks perfectly smooth but if one looks closely at the upper half part, it is slightly jittery. This might be due to the points introduced in the margin. The values of lambdas are given in Table 2.

Table 2

1 2 3 4 5 6

-0.1965 -0.1965 0.1965 0.1965 -1.0000 1.0000

The cost function for the case when points lie inside the margin (and also for the case of nonseparable classes) is:

J (w,s) = <w,w> + C si

(I)

where si are the slack variables and C is the penalty parameter. The default value of C is 1 in MATLAB. Now if we increase the value of C from 1 to 10, we should expect the margin to shrink. Fig. 3 shows this is indeed the case as only 2 support vectors are left now and the values of lambda for them are -3.5714 and 3.5714 respectively.

Figure 3

NON-SEPARABLE CLASSES CASE I In case of non-separable data, SVM tries to find a hyperplane that separates many, but not all, data points. The margin in this case is called soft margin. The cost function for this case is the same as given in Eq. 1. In order to simulate this situation, two more points were added inside the hard margin but this time they were on the wrong side of the hyperplane. Again the hyperplane and the margin remained the same and the number of support vectors increased from 6 (in Fig.2) to 8. The scenario is illustrated in Fig. 4. The value of penalty parameter was changed from 1 to 50 and then even 100, but the hyperplane remained the same due to the configuration of data points. There is no hyperplane that can separate all the data points completely, so increasing the penalty parameter makes no effect.

Figure 4

NON-SEPARABLE CLASSES CASE II To get a feel of how penalty changes the hyperplane, data set was changed slightly and now only one data point was allowed to be on the wrong side of the hyperplane. Initially the penalty parameter was kept 1. The hyperplane changed slightly but its orientation remained the same as well as width of margin and the number of support vectors. Fig. 5 illustrates the scenario. Due to the presence of the data point on the left, the hyperplane also moved slightly left to accommodate that data point but it was still on the wrong side of the hyperplane. Then the value of penalty parameter C was changed from 1 to 10. Now the hyperplane changed its orientation and tilted a bit to the left. The number of support vectors also reduced from 6 to 4

and the width of margin became one-half. So the increase in value of C forced the margin to shrink which is conceptually quite understandable.

Figure 5

Figure 6

Lastly, the value of C was further increased to 30, which forced the SVM to correctly classify all the data points and the margin reduced a bit more and the number of support vectors were also reduced to 3.

Figure 7

NON-SEPARABLE CLASSES CASE III Finally, just to see the behavior of rogue data points, the case I of separable class case was simulated again but this time a data point from class 1 was added to the data set such that it is not only on the left side of the hyperplane but also on the left of data of class 2. Fig. 8 illustrates this situation and it is evident that there is no change in the hyperplane. But if we take the point too far to the left, the hyperplane shifts to left and the now only that point from class 1 is on the correct side of the hyperplane. This situation is illustrated in Fig. 9. Now if we increase the penalty parameter C to any value, it doesnt matter and once the point is at certain distance away from the data points of class 2, no value of C will bring the hyperplane back to its original position.

Figure 8

Figure 9

In last experiment, another data point was added not very far from actual data but now from class 2 and the result was surprising, MATLAB filled the whole plot with the hyperplane. Fig. 10 illustrates the scenario. In this plot, green point on extreme left belongs to class 1 on right, and red point on extreme right belongs to class 2 on left.

Figure 10

Then penalty parameter C was increased but the effect was little. The value of C was then increased in large steps and at about C = 1000 we got almost a straight line for the hyperplane as shown in Fig. 11.

Figure 11

CONCLUSION The most important conclusion that can be drawn from the above experiments is that the control of C parameter is crucial for the desired working of the classifier. The value of C indirectly controls the width of the margin and may even force a change in the orientation of the hyperplane. Large values of C force the hyperplane to fit more data but the hyperplane may fail to generalize and perform poorly on the unseen data. Fig. 5, 6 and 7 clearly illustrates the control of C over the hyperplane. The last couple of figures may be astonishing but it must be kept in mind that data points were too few and thus SVM may fail in such cases. But where data is large, and appropriate value of C is set, it will try to fit many, but not all, data points also keeping the width of margin at an acceptable level so that it may generalize well in case of unseen data.