Anda di halaman 1dari 9

3824

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 12, DECEMBER 2006

Feature Extraction Using Recursive Cluster-Based Linear Discriminant With Application to Face Recognition
C. Xiang, Member, IEEE, and D. Huang
AbstractA novel recursive procedure for extracting discriminant features, termed recursive cluster-based linear discriminant (RCLD), is proposed in this paper. Compared to the traditional Fisher linear discriminant (FLD) and its variations, RCLD has a number of advantages. First of all, it relaxes the constraint on the total number of features that can be extracted. Second, it fully exploits all information available for discrimination. In addition, RCLD is able to cope with multimodal distributions, which overcomes an inherent problem of conventional FLDs, which assumes uni-modal class distributions. Extensive experiments have been carried out on various types of face recognition problems for Yale, Olivetti Research Laboratory, and JAFFE databases to evaluate and compare the performance of the proposed algorithm with other feature extraction methods. The resulting improvement of performances by the new feature extraction scheme is signicant. Index TermsCluster-based linear discriminant (CLD), face recognition, feature extraction, Fisher linear discriminant (FLD), principal component analysis (PCA), recursive cluster-based linear discriminant (RCLD), recursive Fisher linear discriminant (RFLD).

I. INTRODUCTION XTRACTING proper features is crucial for satisfactory design of any pattern classier. Usually, it is problem dependent and requires specialized knowledge of the specic problem itself. However, some of the principles of statistical analysis may still be used in the design of feature extractor, and how to develop a general procedure for effective feature extraction always remains an interesting, and also challenging, problem. Traditionally, principal component analysis (PCA) has been the standard approach to reduce the high-dimensional original pattern vector space into low-dimensional feature vector space. An alternative approach using Fisher linear discriminant (FLD) has gained popularity recently following a number of successful applications of FLD to face recognition problems since the late 1990s [1][8]. The main advantage of FLD over PCA for pattern classication problem may be attributed to the simple fact that FLD extracts features that are most efcient for discrimination, while
Manuscript received September 23, 2005; revised May 20, 2006. This work was supported by a NUS Academic Research Fund R-263-000-362-112. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Manuel Samuelides. The authors are with the Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117576 (e-mail: elexc@nus.edu.sg; huangdong@nus.edu.sg). Digital Object Identier 10.1109/TIP.2006.884932

PCA extracts features that are most efcient for representation, which may not be very useful at all from the point of view of classication. However, there is a serious limitation of FLD, which is that the total number of the features available from FLD , where is the number of classes. To overis limited to come this limitation, orthonormal FLD was proposed by Okada and Tomita [9] and Duchene and Leclercq [10] in the 1980s. Another feature extraction method, the locality preserving projection (LPP) method, which was suggested by He and his colleagues [11] to nd an embedding that preserves local information, is also free of the constraint on the total number of features. Recently, a recursive procedure termed RFLD, was proposed by Xiang et al. [12] and demonstrated to improve the face recognition rate substantially compared to traditional FLD and some of its variations. However, RFLD is computational intensive since only one feature is extracted at each iteration, and the recognition performance for facial expressions was also far from satisfactory compared to human experiences. Another shortcoming associated with FLD is that it may not fully utilize all information available if the training sample size is smaller than the dimension of the pattern vector. This problem was investigated in the past and some variations of FLD, such as LDA based on null space of within class scatter, [4], and MFLD [13], were proposed in order to address this issue. However, none of these methods actually succeeded in utilizing all discriminant information available as will be elaborated in Section II. As FLD calculates the between class scatter matrix by the means of each class, it is implicitly assumed that the underlying distributions of each class are uni-modal, which is often not the case for real-world problems. To address this issue, a nonparametric approach, named nonparametric discriminant analysis (NDA), was rst proposed by Fukunaga [14] for the case of two-class problems, and was later generalized for multiclass problems by Bressan and Vitria [15], and Li and his colleagues [16]. It is worth mentioning that NDA does not have the constraint on the total number of features available. An alternative solution for the multimodal class distribution problem is to use a cluster-based approach, termed CLD, which was rst proposed by Chen and Huang [17], and which will also be adopted by us. In this paper, we propose a new feature extraction method, termed RCLD, which combines the ideas of recursive procedure, from RFLD, and cluster-based approach, from CLD, such that it is able to relax the constraint on the total number of features, fully exploit all information available for discrimination, and cope with multimodal distributions. To evaluate the

1057-7149/$20.00 2006 IEEE

XIANG AND HUANG: FEATURE EXTRACTION USING RECURSIVE CLUSTER-BASED LINEAR DISCRIMINANT

3825

performance of the proposed scheme, we have carried out extensive experiments on a number of public databases including Yale, Olivetti Research Laboratory (ORL), and JAFFE [18] databases, for identity, facial expression, and glasses-wearing recognition problems. All of the experimental results have unanimously demonstrated that the performance can be improved signicantly by RCLD compared to other methods. The paper is organized as follows. Section II rst provides a brief introduction of FLD and some of its extensions. Following this, a new algorithm, termed recursive modied linear discriminant (RMLD), which is a modied version of RFLD, will be described in Section III, and then the main contribution of the paper, the RCLD, which is the integration of RMLD and CLD, is presented. The experimental results on face recognition problems are discussed in Section IV. II. FLD AND SOME VARIATIONS The variations of FLD discussed in this paper include LDA , MFLD, and RFLD, which will be based on null space of presented after FLD, is introduced rst. A. Fisher Linear Discriminant (FLD) Suppose that we have a set of -dimensional samples belonging to different classes with samples in the subset labeled , . Then, the objective not only maximizing the of FLD is to seek the direction between-class scatter of the projected samples, but also minimizing the within-class scatter, such that the following criterion function:

where the scatter matrix by

corresponding to class

is dened

(6) It can be readily shown that (2) is equivalent to

(7) which implies that maximization of criterion (1) would make the means (i.e., the prototypes) for different classes as far away as possible, while keeping the members within each class as close as possible. must It is easy to show that a vector that maximizes satisfy

(8) If is nonsingular, we can obtain a conventional eigenvalue problem by writing

(9) It is obvious that the at-most features may be extracted is, at most, from above procedure simply because the rank of . B. LDA Based on Null Space of If the number of samples is less than the dimension of the would be singular, which implies that the problem samples, of maximizing Fishers criterion (1) is not well dened and FLD cannot be directly applied. This problem is called the small sample size problem and is very common for pattern recognition problems, especially for face recognition. To address this issue, a typical approach [1] is to employ PCA to factorize the whole space into two subspaces: one subspace is spanned by the principal components with largest eigenvalues such that its ; the other subspace is ordimension is equal to the rank of thogonal and complementary to the rst one. The former subwhile the latter space is termed the principal subspace of called the null space of . The within-class scatter matrix is then projected onto the principal subspace of to make it nonsingular, which implies that information from the null space is completely discarded by FLD. However, it is possible of to nd some projection vectors inside the null space of such that and , which implies that all sample vectors from one class are projected to the same point and different classes are separated after projection. These projection vectors are useful for discriminating one class from the other. From the point of view that these projection vectors make the Fishers criterion (1) tend to innity, they are the most discriminative ones among all projection vectors useful for classication.

(1) is maximized, where the between-class scatter matrix ned by is de-

(2) in which is the -dimensional sample mean for the whole set

(3) and is the sample mean for class labeled given by

(4) and the within-class scatter matrix, is dened by

(5)

3826

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 12, DECEMBER 2006

LDA based on null space of [4] rst projects all samples , and then seeks projection vectors into the null space of . The shortcoming of in this null space that maximize this method is that it can only utilize information from the null and it also suffers the same limitation on the feature space of number as FLD. C. Modied Fisher Linear Discriminant (MFLD) Realizing that information from the null space of is not utilized by FLD, the Fishers criterion was modied so that all can information, both inside and outside the null space of be utilized [13]. The modied criterion is

III. RECURSIVE MODIFIED LINEAR DISCRIMINANT (RMLD) AND RECURSIVE CLUSTER-BASED LINEAR DISCRIMINANT (RCLD) A. Recursive Modied Linear Discriminant (RMLD) To utilize all discriminant information available and remove the feature number constraint, RMLD employs a recursive strategy which is similar to RFLD. However, RMLD differs from RFLD by the following two points. First of all, RMLD is truly able to extract discriminant information from both since it uses the the null space and principal subspace of modied Fishers criterion as dened in (10) while adopting features per recursive procedure. Second, RMLD extracts iteration rather than just one feature by RFLD, thus reducing the computational load signicantly. -dimensional samples ( ), the For a training set of after the intrinsic dimensionality or degree-of-freedom is mean is subtracted. Dimensionality reduction techniques like PCA can be used to save computational load and memory requirement while ensuring it is information lossless if all nontrivial principal components are retained. As RMLD aims to utilize all the information contained in the training sample set, it rst uses PCA to reduce the dimension of the samples from to so that no information is lost and the intrinsic structure of the training samples is not changed. Notice that in the case instead of RFLD, the dimension of samples is reduced to in order to make nonsingular, which implies that of some information is lost and the structure of the training samples may be modied. is nonsingular, and RMLD After the dimension reduction, discriminant features in the same can extract the rst set of way as MFLD. After the rst iteration, information already ex, is discarded and tracted, which constitute the null space of then another set of features are extracted. For subsequent iterations, all information extracted by previous iterations will be eliminated before going to the next iteration, just as the procedure of RFLD. Because there are feature vectors extracted at each iteration and all the extracted information are removed beis reduced by fore going to the next iteration, the rank of after every iteration. So, PCA is employed to reduce the at each iteration so that dimension of the sample space by the recalculated based on the reduced subspace is nonsingular. The algorithm for RMLD is outlined as follows. 1) Use PCA to reduce the dimension of the original sample so that is nonsingular. space to 2) For the rst iteration, use MFLD to extract the rst discriminative feature vectors. 3) Discard the extracted information from all samples. Then, use PCA to reduce the dimension of the sample space by . Recalculate and . 4) Use MFLD to extract another set of feature vectors. discriminative

(10) where is called the total scatter matrix, which is dened as (11) It is easy to prove that the modied criterion (10) is equivalent to the original criterion (1) if is nonsingular. However, if is singular, then all the vectors in the null space of would maximize criterion (10) with maximal value of one. If the number of samples is less than the dimension of the is nonsingular, pattern vector, PCA is rst performed so that with rank being equal to the number of nontrivial principle components, which implies that all information, from both inside , may be possibly utilized by and outside the null space of MFLD. Unfortunately, the maximal number of features can be just like FLD. As the dimension extracted by MFLD is also is and features from the null space of the null space of are most discriminant, the features extracted by of MFLD actually span the null space of . Therefore, MFLD is only able to utilize information from inside the null space of while losing information from the principal subspace of . D. Recursive Fisher Linear Discriminant (RFLD) RFLD was recently proposed by Xiang et al. [12] to overcome the feature number constraint using a recursive procedure. It extracts only one feature vector at every iteration. The rst iteration is the same as FLD. After the rst iteration, it discards all information already extracted by previous iterations from all samples before going to next iteration. It was shown in [12] that RFLD achieved signicant improvement for face recognition problem compared to other traditional methods. In particular, perfect recognition result was obtained for identity recognition problem. However, the lowest recognition error rate for facial expression recognition problem is around 30%, which is too high compared to the capacities of human beings. Another shortcoming of RFLD is that it is much more time consuming when compared to traditional approaches due to the fact that only one feature is extracted at every iteration. It is to overcome the weaknesses associated with FLD, MFLD, and RFLD discussed above that we develop improved algorithms for linear discriminant analysis, which will be presented in the following section.

5) If needed, go through the iteration from step 3) again to extract more feature vectors.

XIANG AND HUANG: FEATURE EXTRACTION USING RECURSIVE CLUSTER-BASED LINEAR DISCRIMINANT

3827

The dimension reduction by PCA and recalculation of and in step 3) are computational intensive. A much more efcient way is to use the null space of the extracted feature vectors and step 3) may be revised as follows. Get the null space of the extracted feature vectors, denoted as . Each column of is a basis vector of the null space. Calculate new and from old ones by

(12) (13) where the superscripts 1 and 2 on the two scatter matrices represent old and new, respectively. B. Recursive Cluster-Based Linear Discriminant (RCLD) One major problem with traditional FLD is that it makes an implicit assumption that the underlying distribution for each class is unimodal. However, this assumption is often violated in real world problems. For example, in the case of identity recognition, the variations of a persons image may be caused by illumination, pose and expression, etc., and the distribution for one person probably contains multiple clusters, with each cluster corresponding to one particular variation. The situation of multiple clusters in each class is especially true for other face recognition tasks like facial expression recognition and glasseswearing recognition, where each class contains images from different persons and the images from the same person are very likely to cluster together. In order to tackle the possible multimodal distributions, RCLD is proposed and discussed as follows. The general idea of RCLD is the same as RMLD except that it adopts a cluster-based approach, which was rst proposed in [17] and termed CLD. The objective of CLD is to separate clusters belonging to different classes and at the same time without putting any constraints on clusters belonging to the same class. It also minimizes each cluster scatter so as to keep each cluster and is redened as follows: compact. Therefore,

, as The reason for adding the weighting element shown in (14), is to take into account of different sizes of the clusters. A necessary procedure in CLD is clustering. While fuzzy -means clustering was used in [17], -means clustering was used in our experiments as we found that -means gave slightly better recognition accuracy. Notice that the maximum number of features that can be ex, where is the total number of tracted by CLD is still clusters which is at least , the number of classes. To relax this constraint on the total number of features, we apply the idea of RMLD and term this new algorithm as RCLD, which incorporates the advantages of both RMLD and CLD. RCLD adopts and by doing the redened formula (14) and (15) for . Then, it clustering analysis rst and obtains follows the same procedure as that of RMLD. IV. EXPERIMENTS ON FACE RECOGNITION PROBLEMS Extensive experiments have been carried out with different databases to test the effectiveness of the suggested RMLD and RCLD against other methods. A. Databases We experimented on three databases: Yale, ORL, and JAFFE. Yale database was used for identity, facial expression, and glasses-wearing recognition problems. ORL database was used for identity and glasses-wearing recognition problems. JAFFE database was used only for facial expression recognition. So for each type of face recognition problem, there are two different databases used to test the performance of various algorithms presented in this paper. There are 165 images in Yale database, which is made up of 15 different individuals with 11 images for each individual. The 11 images of each individual are labeled by facial expressions, lighting conditions or whether wearing glasses or not: normal, happy, sad, sleepy, surprise, wink, left light, central light, right light, without glasses, and with glasses. Images from Yale database are cropped manually to eliminate most of the background and some part of hair and chin. The size of images changes from 320 243 to 124 147. ORL database consists of 40 different individuals with 10 images for each individual. The images from ORL database were also cropped from 112 92 to 81 72. JAFFE database comprises 10 Japanese females. Each person has seven facial expressions: happy, sad, surprise, angry, disgust, fearful, and neutral. There are three or four images for each facial expression of each person. The resolution of each image is 256 256. B. Classication Methods for Comparison Nine different classication approaches have been tested and compared. While the same nearest neighbor classier based on Euclidian distance is applied for all of them, they differ in the feature extraction processes. The rst method uses PCA to reduce the high-dimensional images into lower dimensional ones, but no discriminant analysis is performed afterwards. The second method is FLD. To solve the small sample size problem, PCA is used rst to reduce

(14) (15)

is the where is the mean of the th cluster in the th class, number of samples in the th cluster of the th class, is the is the total number number of clusters in the th class, and of training samples. One point to note is that the denition for above is not the same as the original one in [17], which was dened as

(16)

3828

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 12, DECEMBER 2006

the sample dimension so that the within-class scatter matrix is nonsingular as discussed previously in Section II. The third method, enhanced FLD model (EFM) [7], is the same as FLD except that EFM selects a different subeigenspace, which is more optimal for subsequent FLD process. EFM aims to seek a proper number of PCA features that balance between the need to keep enough spectral energy of raw data and the requirement that the eigenvalues of within-class scatter in the reduced PCA space are not too small, for the tiny eigenvalues are associated with noise that make FLD over-tting while exposed to new data. Unfortunately, no quantitative criterion for measuring the adequacy of energy and the smallness of eigenvalues of within-class scatter is currently available and, hence, the cutoff point for the number of PCA components to retain has to be obtained through trial and error. In our experiments, the optimal number of PCA features is the one leads to the lowest error rate, and is found through simple exhaustive search rather than analyzing the spectrum of the eigenvalues as suggested in [7]. The fourth method is RFLD. Like FLD, RFLD also employs is nonsingular. PCA to reduce the sample dimension so that The fth method is RMLD which uses the full eigenspace extracted by PCA as discussed before. The sixth and seventh methods are CLD and RCLD. Analogously to FLD and RMLD, CLD selects only a subset of PCA is nonsingular while RCLD uses the components such that full eigenspace and adopts a recursive procedure. The eighth feature extraction method compared is nonparametric discriminant analysis (NDA) which is also free of the feature number limitation and supposed to deal with multimodal class distributions. NDA is very similar as FLD except that it . Notice that FLD calculates adopts a different denition for using the mean of each class the between-class scatter matrix as a representative of that class which is only suitable for unimodal class distribution. To tackle the multimodal distribution problem, NDA uses a nonparametric denition, originally proposed by Fukunaga [14] for two-class problems, later generalized for multiclass problems by Li and his colleagues [16], and Bressan and Vitria [15], respectively. The algorithm suggested in [16] is more straightforward and, hence, adopted by us in the comparative studies. The last method is locality preserving projection (LPP), which, in fact, is not an extension of FLD. Instead, it is an unsupervised learning algorithm that aims to nd a linear subspace that best preserves local structure and detects the essential face manifold structure. The complete derivation and theoretical justications of LPP were given in [11]. Like NDA, LPP does not have the feature number limitation problem as FLD. To compare the effectiveness of these methods, the simplest classier, nearest neighbor classier is used for all these nine methods. The similarity measure adopted is the simple Euclidean norm. There are two types of nearest-neighbor classiers. One is to compare the distance of the test sample to each sample in the training set and classify it to class which contains the closest training sample. The other one is to compute the similarity between the test sample and the mean of each class and assign the test sample to the class whose mean is closest to the test sample. In the case of CLD and RCLD, the similarity is

TABLE I IDENTITY RECOGNITION RESULT

between the test sample and the mean of each cluster, instead of mean of each class. In this section, due to space limitation, only the results by using the second type of nearest neighbor will be presented since it is computationally more economic while giving roughly the same performance as that of the rst type of nearest neighbor. C. Identity Recognition The identity recognition error rate is determined by leavingone-out strategy [1]: to classify one particular image, all the rest of the images are pooled together to form the training data set which are used to compute the projection directions by PCA, FLD, EFM, RFLD, NDA, LPP, RMLD, CLD, and RCLD. We selected two clusters per class in the experiment because it was found that the performance was roughly the same for using slightly different number of clusters and it is computationally more efcient to use less number of clusters per class. The lowest recognition error rates achieved by the nine methods are shown in Table I. Fig. 1 plots the error rate versus the number of features used for classication. As it looks overcrowded to plot the error rates of all the nine methods onto a single graph, we leave the error rates for PCA and RMLD out in our plotted graphs. The following important facts can be observed from comparison of the performances of these different methods. RCLD achieves the best recognition performance among all the nine methods. In particular, only RCLD can achieve perfect recognition for both Yale and ORL databases. RMLD and RCLD can improve the recognition performance compared to FLD and CLD, respectively. NDA achieves the same performance as that of FLD, and LPP performs worse than most other methods probably due to its nature of unsupervised learning. In some situations, speed might be the critical factor for people to choose an algorithm. To get a measure of the computational cost of each algorithm, the running time for each method is obtained and listed in Table II for the case of identity recognition applied to Yale face database. All the experiments

XIANG AND HUANG: FEATURE EXTRACTION USING RECURSIVE CLUSTER-BASED LINEAR DISCRIMINANT

3829

Fig. 1. Comparative identity recognition performance. (a) Yale database. (b) ORL database.

TABLE II COMPARISON OF RUNNING TIME FOR DIFFERENT ALGORITHMS

TABLE III FACIAL EXPRESSION RECOGNITION RESULT

are conducted using a Pentium-4, Dell PC, with CPU speed of 2 GHz and RAM of 1 GB. The time listed in the Table II is the total running time taken to get the recognition performance of each algorithm as plotted in Fig. 1(a), which includes all the time for data loading, feature extraction, and classication. The total number of features obtained by each method for evaluating the performance [which clearly depends upon the number of features used in the classication stage as shown in Fig. 1(a)] is also tabulated in the third row. Obviously, PCA and FLD are the fastest ones while RFLD is the slowest. EFM is also extremely time-consuming since exhaustive search is involved to nd the optimal number of PCA features for the best recognition. Both CLD and RCLD are very sluggish largely due to the slow process of clustering analysis. LPP and RMLD are much faster than RCLD, and NDA is also faster than RCLD but much slower than LPP. It is worth mentioning that the running time compared in Table II only counts in the classier design process. Once the features are extracted and implemented by the classier, there is little difference in the time cost (which are all shorter than 0.1 s for identity recognition from Yale Database) for the different algorithms to recognize a new face image. D. Facial Expression Recognition The facial expression recognition error rate is determined by cross-validation strategy rather than leaving-one-out, i.e., all the images belonging to one particular person will be pulled out and used as test images while the rest of the images are all included in the training data set.

There are six facial expressions for Yale database: normal, happy, sad, sleepy, surprise, and wink. As mentioned before, there are 11 images for each person in Yale database. They are labeled by facial expressions, lighting conditions or whether wearing glasses or not: normal, happy, sad, sleepy, surprise, wink, left light, center light, right light, without glasses, and with glasses. For those images not labeled by expression, their expressions are usually normal. As there are more samples in the normal expression, the number of clusters for this class used in our experiment is also larger than those of other classes. We used eight clusters for the normal expression, and two clusters per class for all the other expressions. So, there are 18 clusters in total and the maximum number of features CLD can extract is 17. For JAFFE database, there are seven expressions: happy, sad, surprise, angry, disgust, fearful, and wink. The number of clusters for each class is chosen to be 2. So the total

3830

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 12, DECEMBER 2006

Fig. 2. Comparative facial expression recognition performance. (a) Yale database. (b) JAFFE database.

number of clusters is 14 and CLD can extract, at most, 13 feature vectors. The lowest error rates achieved of the nine methods are tabulated in Table III, and the error rate versus the number of features with Yale database and JAFFE database are plotted out in Fig. 2, respectively. By comparing the performance of different methods, we may conclude the following. RCLD can achieve perfect recognition for Yale database, and error rate of 6.2% for JAFFE, which is a dramatic improvement over all other methods. CLD can improve the recognition accuracy substantially compared to other methods without clustering, which implies that clustering analysis is crucial. RMLD and RCLD can improve the recognition performance compared to FLD and CLD by using more than one iteration to extract more discriminant features. NDA is only slightly better than FLD, and LPP still performs very poorly. E. Glasses-Wearing Recognition The glasses-wearing recognition problem belongs to a special case of recognition problems: the two-class recognition problem. In the glasses-wearing recognition problem, each test image is either classied into with glasses or without glasses. The number of samples of the two classes is different for both Yale database and ORL database, and we assigned different number of clusters per class: ve clusters for the class without glasses and three clusters for the other class with glasses for Yale database; eight clusters for the class without glasses and four clusters for the other class with glasses for ORL database. Like facial expression recognition, cross validation is adopted for glasses-wearing recognition. The experimental results are shown in Table IV and plotted out in Fig. 3. We can observe several interesting points by comparing the performances of different methods and the previous results for identity and facial expression recognitions.

TABLE IV GLASSES-WEARING RECOGNITION RESULT

RCLD can improve the recognition performance by using more than one iteration to extract more discriminant features. However RMLD only does so for Yale database. The amount of improvement by RCLD with both databases is substantial compared to all the others. NDA is slightly better than FLD, which is consistent with the results reported in [16]; however, LPP still performs very badly, which is only better than PCA, but much worse than others. From the experimental results of the three face recognition problems, it is apparent that RCLD gives the best recognition performance among all the nine methods for all the three different face recognition problems. CLD is the second best method which hints that clustering is essential for improving the recognition performance. Among the methods without clustering, EFM appears to be the best, and RFLD improves over FLD which agrees with the results in [12]. The performance of NDA is only slightly better than FLD, which is consistent with

XIANG AND HUANG: FEATURE EXTRACTION USING RECURSIVE CLUSTER-BASED LINEAR DISCRIMINANT

3831

Fig. 3. Comparative glasses-wearing recognition performance. (a) Yale database. (b) ORL database.

that of [16], but worse than EFM, and LPP does not fare well at all, being only better than PCA, but much worse than all the other supervised learning schemes, which may not be surprising since it is essentially an unsupervised learning algorithm which does not exploit the class labeling information at all. V. CONCLUSION This paper deals with the important problem of extracting discriminant features for pattern classication. Two novel methods, termed RMLD and RCLD, are proposed to overcome the shortcomings of FLD, which constitute the main contribution of this paper. RMLD may be considered as the special case of RCLD assuming only one cluster exists within each class. The proposed method has the following main advantages over FLD and its variations: better exploitation of data structure by considering the existence of clusters within each class, hence, capability of handling multimodal distributions; full utilization of all discriminant information available by with in the criterion function; replacing elimination of feature number constraint by adopting the recursive procedure; less computationally expensive than RFLD by calculating features at each iteration instead of only one, where corresponds to the total number of clusters; no need to choose optimal number of PCA vectors for discriminant analysis as that of EFM, therefore, more computational efcient. The performance of the proposed method is much better than other methods for all the face recognition problems, including identity, facial expression, and glasses-wearing recognitions, which is observed from experimental results based on Yale, ORL and JAFFE databases. In particular, the perfect recognition performances for identity recognition on ORL database, and facial expression recognition on Yale database, achieved by our proposed methods, have never been reported in the literature so far according to our knowledge. These new feature

extraction tools can be easily applied to other pattern classication problems such as iris recognition, and it is our strong belief that substantial improvement can also be achieved. The issue of selecting proper number of clusters for each class is essential for achieving good performance with RCLD, which has been carried out by trial and error experiments in this paper, and work is currently under progress to develop a more systematic and efcient way for choosing the optimal number of clusters for pattern classication problems. ACKNOWLEDGMENT The authors would like to thank the associate editor and the anonymous reviewers for their critical comments to improve the quality of this paper. REFERENCES
[1] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, Eigenfaces vs.Fisherfaces: Recognition using class specic linear projection, IEEE Trans. Pattern Anal Mach. Intell., vol. 19, no. 7, pp. 711720, Jul. 1997. [2] K. Etemad and R. Chellappa, Discriminant analysis for recognition of human face images, J. Opt. Soc. Amer. A, vol. 14, pp. 17241733, 1997. [3] D. L. Swets and J. Weng, Using discriminant eigenfeatures for image retrieval, IEEE Trans. Pattern Anal Mach. Intell., vol. 18, no. 8, pp. 831836, Aug. 1996. [4] L. Chen, H. M. Liao, M. Ko, J. Lin, and G. Yu, A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recognit., vol. 33, pp. 17131726, Oct. 2000. [5] A. M. Martlnez and A. C. Kak, PCA versus LDA, IEEE Trans. Pattern Anal Mach. Intell., vol. 23, no. 2, pp. 228233, Feb. 2001. [6] H. Yu and J. Yang, A direct LDA algorithm for high-dimensional datawith application to face recognition, Pattern Recognit., vol. 34, pp. 20672070, 2001. [7] C. Liu and H. Wechsler, Gabor feature based classication using the enhanced Fisher linear discriminant model for face recognition, IEEE Trans. Image Process., vol. 11, no. 4, pp. 467476, Apr. 2002. [8] X. Wang and X. Tang, A unied framework for subspace face recognition, IEEE Trans. Pattern Anal Mach. Intell., vol. 26, no. 9, pp. 12221228, Sep. 2004. [9] T. Okada and S. Tomita, An optimal orthonormal system for discriminant analysis, Pattern Recognit., vol. 18, pp. 139144, 1985.

3832

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 12, DECEMBER 2006

[10] J. Duchene and S. Leclercq, An optimal transformation for discriminant and principal component analysis, IEEE Trans. Pattern Anal Mach. Intell., vol. 10, no. 6, pp. 978983, Jun. 1988. [11] X. He, S. Yan, Y. Hu, and H. Zhang, Learning a locality preserving subspace for visual recognition, in Proc. IEEE Int. Conf. Computer Vision, 2003, vol. 1, pp. 385392. [12] C. Xiang, X. A. Fan, and T. H. Lee, Face recognition using recursive Fisher linear discriminant, IEEE Trans. Image Process., vol. 15, no. 8, pp. 20972105, Aug. 2006. [13] X. Y. Jing, D. Zhang, and Y. F. Yao, Improvements on the linear discrimination technique with application to face recognition, Pattern Recognit. Lett., vol. 24, pp. 26952701, 2003. [14] K. Fukunaga, Statistical Pattern Recognition. New York: Adcademic, 1990. [15] M. Bressan and J. Vitria, Nonparametric discriminant analysis and nearest neighbor classication, Pattern Recognit. Lett., vol. 24, pp. 27432749, 2003. [16] Z. Li, W. Liu, D. Lin, and X. Tang, Nonparametric subspace analysis for face recognition, in Proc. IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition, 2005, pp. 961966. [17] X. W. Chen and T. Huang, Facial expression recognition: A clustering-base approach, Pattern Recognit. Lett., vol. 24, pp. 12951302, 2003. [18] M. J. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, Coding facial expressions with Gabor wavelets, in Proc. 3rd IEEE Int. Conf. Automatic Face and Gesture Recognition, 1998, pp. 200205.

C. Xiang (M01) received the B.S. degree in mechanical engineering from Fudan University, China, in 1991, the M.S. degree in mechanical engineering from the Institute of Mechanics, Chinese Academy of Sciences, in 1994, and the M.S. and Ph.D. degrees in electrical engineering from Yale University, New Haven, CT, in 1995 and 2000, respectively. From 2000 to 2001, he was a Financial Engineer with Fannie Mae, Washington DC. At present, he is an Assistant Professor in the Department of Electrical and Computer Engineering, National University of Singapore. His research interests include computational intelligence, adaptive systems, and pattern recognition.

D. Huang received the B. Eng degree in electrical and computer engineering from the National University of Singapore in 2005, where he is currently pursuing the M.Eng. degree under the supervision of Dr. C. Xiang.