Multiple Classifiers

2012 Fifth International Symposium on Computational Intelligence and Design
Remote Sensing Image Classification with Multiple Classifiers based on Support Vector Machines
Wei WU
Computer Science Department, Inner Mongolia University, Huhhot, China cswuwei@imu.edu.cn
Guanglai GAO
Computer Science Department, Inner Mongolia University, Huhhot, China csggl@imu.edu.cn
AbstractClassification accuracy is one of major factors influencing the application of classified image. This Paper proposes a SVM-based multiple classifiers fusion method for remote sensing image classification. We use both spatial Gabor wavelet texture feature and spectral feature to construct SVM classifier separately. Then taking advantage of characteristic of SVM, namely for a given sample, the larger is the distance to the hyperplane, the more reliable is the class label. So the most reliable classification result is thus the one that gives the largest distance. This is our decision fusion rule. Using Landsat ETM+ satellite image as test data, the experimental results indicate that all classes including water, mountain, gobi, vegetation, desert and resident area could be well classified, and the overall accuracy achieved 86.5%, more than other each separate SVM classifier. Keywords- remote sensing image; SVM; classification;multiple classifiers
multiple classifiers decision rule. So the most reliable classification result is thus the one that gives the largest distance. In this paper, we propose to fuse the results obtained by a separate use of the spatial Gabor wavelet texture feature and initial spectral feature. Each feature data are processed by SVM classifiers. The results from each classifier are aggregated according to our multiple classifiers decision rule. The paper is organized as follows. Section describes the spatial Gabor feature extraction, and section describes SVM theory and multiple classifier system, then is the experimental results. Finally, Section V summarizes the concluding remarks to complete this paper. II. SPATIAL FEATURE EXTRACTION
I.
INTRODUCTION
Remote sensing data are used in a wide range of applications, including monitoring of the environment, management of major disasters, urban planning, precision agriculture, and strategic defense issues. In most of these applications, the classification of remote sensing image is one of the important research content in remote sensing image processing. Considering the complexity of the data and the variety of available algorithms, multiple classifier systems (MCS) [1] proved to be of the utmost interest in numerous remote sensing applications, significantly improving the classification performances. Recently, particular attention has been dedicated to support vector machines (SVM) for the classification of remote sensing images [2] [3] [4]. SVM have often been found to provide higher classification accuracies than other widely used pattern recognition techniques, such as the maximum likelihood and the multilayer perceptron neural network classifiers. Furthermore, SVM appear to be especially advantageous in the presence of heterogeneous classes for which only few training samples are available. So we take full advantage of characteristic of SVM, the larger is the distance to the hyperplane, the more reliable is the class label. We get rid of commonly used multiclassifier approaches such as bagging, boosting and consensus theory, and utilizing the largest distance to the SVM hyperplane as the
This work is supported by Inner Mongolia Nature Science Foundation under Grant Number 2009MS0902
Texture analysis has been extensively used to classify remote sensing images. The use of Gabor filters in extracting textured image features is motivated by various factors. The Gabor representation has been shown to be optimal in the sense of minimizing the joint two-dimensional uncertainty in space and frequency. These filters can be considered as orientation and scale tunable edge and line detectors, and the statistics of these micro features in a given region are often used to characterize the underlying texture information. Given an image I(x, y), its Gabor wavelet transform is then defined to be
Wmn ( x, y ) = I ( x1 , y1 ) g mn ( x x1 , y y1 )dx1dy1 (1)

Where * indicates the complex conjugate, the generating function gmn is derived from the mother Gabor wavelet g(x, y) which can be written as [7]:
1 g ( x, y ) = 2 x y
2 2 exp 1 x 2 + y 2 + 2jWx (2) 2 x y
978-0-7695-4811-1/12 $26.00 2012 IEEE DOI 10.1109/ISCID.2012.55
188
It is assumed that the local texture regions are spatially homogeneous, and the mean mn and the standard deviation
mn
feature space where the training set is separable (Figure 2). And we commonly use the so-called Kernel Trick to solve this non-separable problem [6].
of the magnitude of the transform coefficients are used

: x (x)
to represent the region for classification purpose:
mn = Wmn ( x, y )dxdy
, (3)
mn =
(| W
mn
( x, y ) | mn ) dxdy
A feature vector is now constructed using
mn
and
mn
as
Figure 2. non-linearly separable solution: mapping data to a highdimensional feature space
feature components. In the experiments, we use four scales S = 4 and six orientations K = 6, resulting in a feature vector
f = ( 00 00 01 01... 35 35 ) .
(4)
The most kernels are presented below: Polynomial. The inner product is computed in the space of all monomials up to degree d: k ( x, z ) = (1 + x z ) .The parameter tunes the weight of the higher order polynomial. Gaussian Radial Basis Functions. This kernel is given by
T d
Then we make use of this feature for classification using Support Vector Machines. III.
SUPPORT VECTOR MACHINES AND MULTIPLE CLASSIFIER
k ( x, z ) = exp( || x z ||2 ) . The parameter tunes the

flexibility of the kernel. SVMs are designed to solve binary classification problems. Two main approaches have been proposed to address multiclass (N classes) problems: One versus the rest: N binary classifiers are applied to each class against the others. Each sample is assigned to the class with the maximum output. One versus one: N (N1)/2 binary classifiers are applied on each pair of classes. Each sample is assigned to the class getting the highest number of votes. B. Multiple Classifier System Let us consider that m SVM classifiers are used (in our case 2 classifiers: one based on spatial Gabor texture features, one based on initial spectral information). The SVMs decision function returns the sign of the distance to the hyperplane. For the fusion scheme, it is more useful to have access to the belief of the classifier rather than the final decision result. For SVMs, it is possible to get the distance to the hyperplane. For a given sample, the more the distance to the hyperplane, the more reliable the classified label. For the combination process, we choose to fuse this distance. We consider that the most reliable source is the one that gives the highest distance. In this paper, we first used the absolute maximum decision rule. For an m-source problem {S1, S2, ... , Sm}, where
1 S1 = dij is the distance provided by the first SVM classifier
A. Support Vector Machines We first briefly recall the general formulation of SVM classifiers [5] [6]. Let us first consider a two-class problem in a n-dimensional space Rn. We assume that l training samples, xi Rn are available with their corresponding class labels given by yi = r 1, S = {(xi, yi) | i [1, l]}. The SVM method consists in finding the hyperplane that maximizes the margin (see Figure.1). The hyperplane Hp is defined as wT x + b = 0, x Hp Where wT x is the inner product between w and x. margin wTxb + b = -1 wTxa + b = 1 (5)
wT x + b = 0
Figure 1. two-class linear separable SVM
which separates class i from j, this decision rule is defined as follows:
S f = HMax( S1 , S 2 ,..., S m )
Where HMax is the set of logical rules:
(6)
For the non-linearly separable case, the original feature space can always be mapped to some higher-dimensional
189
if (|S1| > | S2| , . . . , | Sm |) then S1 else if (|S2| > | S1| , . . . , | Sm |) then S2 ... (7) else if (|Sm | > | S1| , . . . , | Sm-1 |) then Sm. Secondly, we also consider the agreement of the classifiers. Each distance is multiplied by the maximum probability associated to the two considered classes [8]:
Class Number 1 2 3 4 5 6 Total Name Water Urban Sand Vegetation Mountain Gobi Train 500 650 400 650 500 300 3000
Samples Test 6245 5670 3014 9892 3123 2056 30000
pi =
2 m(m 1)
j =0 , j i
I (d
ij
(8)
Where I (dij) is the indicator function. The maximum rule is applied on these weighted results:
S wf = HMax(max( pi1 , p1j ) S1 ,..., max( pim , p m j ) S m ) . (9)

The last step is the one that is used to combine classifiers in the one versus one strategy. If we have two SVM classifiers, and apply each of them on a datasets with the same number of classes, each classifier builds m (m 1)/2 binary classifiers and uses majority voting. Thus, we build a new multiple classifiers, containing m (m 1) classifiers. Then, we apply a classical majority voting scheme. Finally, the winning class is selecting as the one which has the highest number of votes. IV.
EXPERIMENTATION
Gaussian kernels were used for each SVM classifier. And the parameters (C, ) of the SVMs were tuned using a fivefold cross validation. We selected the best values (40, 0.2) in our experimentation. Using the one versus one classification strategy, 15 binary classifiers are used for each SVM classifier. Then applying our multiclassifier system, we calculated and storing for each result the actual distance to the hyperplane. Finally using the maximum rule (equation (9)), we get the classification results. The results are summarized on Table 2: the overall and average accuracies are clearly improved by the decision fusion, as well as the Kappa coefficient compared with each of separate SVM and commonly used multiple classifiers voting scheme.
TABLE II.
CLASSIFICATION ACCURACIES (%)
This paper takes the LANDSET7 ETM remote sensing image as a source data. The image region covers mainly Bayannaoer city, Inner Mongolia Autonomous Region, China. Before analysis, we do preprocessing for the image and then subset it. The subset image is used as an input to derive the feature characteristics. According to the prior-knowledge about the study area, we divide the selected areas into six categories (N=6 classes): mountain, water, vegetation, residential areas, desert and Gobi. These classes are used to evaluate the performance of the classification method. The images are first preprocessed to extract spatial Gabor feature, as above feature vector expression (4). Then using this feature construct Support Vector Machines (SVM) classifier. Another SVM classifier is applied on the initial spectral values, with no spatial information. Due to the large size of the study area and the effectiveness of the data for analysis, only TM bands 1(0.45-0.52m), 3(0.63-0.69m), 4(0.74-0.90m), 5(1.55-1.75m), 7(2.08-2.35m) were used. So the spectral feature vector is f= (f1, f 3, f 4, f 5, f 7). As a matter of fact, it has been demonstrated that both the spatial and the spectral information are required to actually achieve good classification performances. The land-cover classes that appeared in this study region include water, urban, sand, vegetation, mountain, and Gobi. We totally selected 3000 training samples for each class based on ground truth and aerial photographs and 30000 test samples. Details information can be seen as follows Table 1.
TABLE I. TRAINNING ADND TESET SAMPLES INFORMATION
Overall Accuracy Average Accuracy Kappa Class 1 Class 2 Class 3 Class 4 Class 5 Class 6
Spectral Feature 79.2 82.7 78.2 92.2 70.1 89.6 90.4 75.5 78.2
Spatial Feature 84.5 89.6 82.3 94.4 82.7 99.2 99.5 80.1 81.8
HMax 86.5 92.1 86.7 96.8 87.1 99.4 99.5 86.5 83.4
Voting 84.8 90.1 85.1 97.0 80.1 99.4 99.5 79.1 85.6
As can be seen from the Table 2, the fusion step using maximum rule improves the classification accuracies. The highest overall accuracies as well as the highest average accuracies and the highest Kappa value were achieved when the weighted maximum were used. By comparing the global accuracies (OA gains 86.5%, AA gains 92.1%, Kappa gains 84.7%), it is clear that this decision scheme does help so much in the fusion process of these experiments. The use of the majority voting rule does not improve more the results accuracy compared to our fusion method. V.
CONCLUSION
In this paper, we discussed the decision fusion for SVM classifiers and applied it to remote sensing image classification. In experiments, the proposed approach outperformed each of the individual classifiers in terms of
190
overall accuracies. The use of the weighted maximum distance leads to a significant improvement in terms of classification accuracy. The good performance of the proposed combination scheme is interesting because it uses no information about the reliability of the source. A topic of a future research is to use a more advance fusion scheme for improving the performance of the classifiers. REFERENCES
[1] Briem G.J., Benediktsson J.A., Sveinsson J.R., Multiple classifiers applied to multisource remote sensing data, IEEE Trans. on Geoscience and Remote Sensing.vol.40 no.10, pp. 22912299, 2002 L. Hermes, D. Frieauff, J. Puzicha, and J. M. Buhmann, Support vector machines for land usage classification in landsat TM imagery, in Proc. IGARSS, Hamburg, Germany, 1999, pp. 348350.
[3] [4]
[5]
[6]
[7] [8]
[2]
F. Roli and G. Fumera, Support vector machines for remote-sensing image classification, Proc. SPIE, vol. 4170, pp. 160166, 2001. C. Huang, L. S. Davis, and J. R. G. Townshend, An assessment of support vector machines for land cover classification, Int. J. Remote Sens., vol. 23, pp. 725749, 2002. Melgani, Lorenzo Bruzzone, Classification of Hyperspectral Remote Sensing Images With Support Vector Machines, IEEE Trans. on Geoscience and Remote Sensing, vol. 42, no. 8, pp. 17781790,August 2004 B. Scholkopf and A. J. Smola, Learning with Kernels, Support Vector Machines, Regularization, Optimization and Beyond, Cambridge: MIT Press, 2002. B.S. Manjunathi and W.Y. Ma, , IEEE Trans. on pattern analysis and machine intelligence, vol. 18, no. 8, pp. 837-842, Aug. 1996 T. Wu, C. Lin, and R. Weng, Probability estimates for multiclass classification by pairwise coupling, Journal of Machine Learning, vol. 5, pp.9751005, Aug. 2004.
191

Multiple Classifiers

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Multiple Classifiers

Diunggah oleh

Hak Cipta:

Format Tersedia

2012 Fifth International Symposium on Computational Intelligence and Design

Wmn ( x, y ) = I ( x1 , y1 ) g mn ( x x1 , y y1 )dx1dy1 (1)

2 2 exp 1 x 2 + y 2 + 2jWx (2) 2 x y

978-0-7695-4811-1/12 $26.00 2012 IEEE DOI 10.1109/ISCID.2012.55

of the magnitude of the transform coefficients are used

to represent the region for classification purpose:

A feature vector is now constructed using

k ( x, z ) = exp( || x z ||2 ) . The parameter tunes the

which separates class i from j, this decision rule is defined as follows:

Samples Test 6245 5670 3014 9892 3123 2056 30000

S wf = HMax(max( pi1 , p1j ) S1 ,..., max( pim , p m j ) S m ) . (9)

Anda mungkin juga menyukai