I. INTRODUCTION
Identification of a person can be effectively done by making use of
Page 1
Multi-lobe Ordinal Filter (MOF) with a number of tunable parameters is proposed to analyze the
ordinal measures of biometric images (Fig. 1). MOF has a number of positive and negative lobes
which are specially designed in terms of distance, scale, orientation, number, and location so that
the filtering result of MOF with the biometric images can measure the ordinal relationship
between image regions covered by the positive and negative lobes. From Fig. 1 we can see that
M.Tech in Signal Processing, SIT, Tumkur
Page 2
Page 3
Page 4
(1)
where g means the intra- or inter-class label (+1 or -1), the components of A indicate the intra- or
inter-class matching results based on individual features in the training database, f denotes the
feature weight vector, and is a parameter controlling the balance between regression errors and
sparsity of selected features. The objective function includes two parts. The first part || g Af || 22
aims to minimize the regression errors and the second part 2 | f |1 uses L1 regularization to
enforce sparsity of the selected features. The The L1 regularized sparse representation was
evaluated to be better than Boosting for face detection and authentication in small size training
dataset. However, this approach also has some drawbacks. Firstly, although the optimization
M.Tech in Signal Processing, SIT, Tumkur
Page 5
Page 6
Page 7
recognition.
In this paper, ordinal feature selection is formulated as a constrained optimization problem as
follows
Page 8
where D is the number of ordinal features available for feature selection, N+ and N- denote the
number of intra- and inter-class biometric matching pairs in the training database respectively, wi
means the weight of ith ordinal feature for the biometric recognition system, Pi measures the
recognition accuracy of ith ordinal feature on the training database, x+ij denotes the Hamming
distance of ith ordinal feature for jth intra-class biometric image pairs in the training database, x-ik
denotes the Hamming distance of ith ordinal feature for kth inter-class iris image pairs in the
training database, and are two fixed parameters indicating the expected intra- and inter-class
biometric matching results respectively, +j and -k are slack variables for intra- and inter-class
biometric matching respectively, + and - are the constant parameters tuning the importance of
M.Tech in Signal Processing, SIT, Tumkur
Page 9
N N
k
j
N j 1
N j 1
aims to minimize the misclassification errors of intra- and inter-class matching samples according
to the expected thresh-olds and . Since and are defined as the mean intra- and interclassing Hamming distance for well performing ordinal features, a large margin principle is
actually incorporated into the optimization problem. The biometric matching samples failing to
meet the large margin requirement will suffer a penalty and such a penalty is deter-mined by the
distance from the dissimilarity measure to the expected thresholds and . Here a soft margin
technique is adopted by introducing slack variables +j and -k to guarantee that all intra-class and
inter-class matching results follow the large margin principle. So the first part of objective
function
N N
k
j N
N j 1
j 1
defines the overall penalty term of training samples according to the large margin principle. The
constant parameters + and - measure the penalty weights to the misclassifications of intra- and
inter-class matching samples respectively and their value can be tuned according to the
application requirements. For example, the FRR (False Reject Rate) sensitive applications such
as watch-list monitoring can set a larger + and the FAR (False Accept Rate) sensitive
applications such as banking can set a larger -. In normal applications, we can set + = -. In
M.Tech in Signal Processing, SIT, Tumkur
Page 10
enforces weighted sparsity of ordinal feature units. Sparsity of the ordinal feature units is very
important to effective and efficient biometric recognition. Firstly, the objective of biometric
recognition is to find a mapping function between the most characterizing features and the
identity label. Sparse learning is just for this purpose and it is possible to discover the intrinsic
features of biometric patterns. Secondly, sparsity means that it is possible to use a compact
feature set for biometric recognition, i.e., efficient encoding, storage, transmission and
comparison of biometric feature templates. Weighted sparsity proposed in this paper is a novel
idea in sparse representation. It differs from the existing sparse representation method, in that the
good performing individual features in the training database are given a higher weight in sparse
learning. Here the weight Pi represents the prior information of individual ordinal feature in
terms of recognition performance. It may be defined as the Equal Error Rate (EER), the Area
Under the ROC Curve (AUC) or the inverse of Discriminating Index (1/D-index). Since the
weight of each ordinal measure w i is constrained to be non-negative value, the second part of
objective function approximates the L1 regularization which is beneficial to generate a sparse
ordinal feature set after feature selection. The L1 regularization term in sparse representation
(Eqn. 1) can be regarded as the special case of
where Pi = 1 for all ordinal features. The prior information of each feature is not taken into
account in the Lasso method and all features are evenly treated to enforce sparsity. In our feature
selection method, better performing ordinal features are assigned with higher weights Pi so that
a more compact and effective feature set can be selected.
M.Tech in Signal Processing, SIT, Tumkur
Page 11
inter-class matching
results usually cannot be linearly separated. Therefore slack variables +j and -k are introduced to
the inequality constraints which makes our model more flexible and robust. Our LP formulation
is actually a soft margin model which can remove the influence of noisy samples or outliers
adaptively and also generate a larger margin to improve the accuracy and generalization
performance with the help of slack variables. Eqn. 7 indicates a non-negative constraint on the
weight of features w {w i }. We argue that the non-negative constraint of w is both reasonable and
beneficial. Firstly, the target of feature selection is to find the optimal solution of w, which is a
very important variable with physical meaning. Each element in w denotes the contribution of
each ordinal feature to the success of biometric recognition. Since we are discussing a feature
selection method, each feature should only have positive contribution to the resulting large-margin
classification. Secondly, the second part of objective function
Page 12
may be introduced to control the number of ordinal feature components (N) according to
practical requirements. And we can also add other application specific requirements to the
objective function and constraints. If these newly added terms can be expressed in linear
functions, our feature selection can also be efficiently solved based on linear programming.
Because our feature selection method can be transformed to a standard linear programming
model, it can be solved by the Simplex algorithm conveniently and efficiently. The Simplex
algorithm has a profound theory to obtain a globally optimal solution. We sort the weights of
features to get the desired number of features. In order to correct truncation errors, then extra
classifiers can be used for recognition, e.g., Nearest Neighbor (NN), SVM. Another advantage of
LP is that there are a number of software tools to solve the linear programming problem such as
CPLEX, LINDO etc. And state-of-the-art commercial mathematical toolboxes can efficiently
solve large-scale linear programming problems with millions of variables. The LP formulation of
this paper only involves thousands of variables so we choose the CPLEX software package
provided by IBM, which is free of charge to academic research.
There are a number of implementations of the fast algorithm of linear programming. The
computational complexity of linear programming based on interior-point method is O(DN 2),
where N is the number of training samples and D is the initial dimension of feature pool. In
contrast, the complexity of GentleBoost is O(dD 2 log D), where d is the number of selected
features and D is the initial dimension of feature pool. The complexity of Lasso is O(TND),
where T is the number of iterations, N is the number of training samples and D is the initial
dimension of feature pool. Therefore GentleBoost is efficient in biometric feature selection
because a small number (d) of effective features are accurate enough for personal identification.
Lasso algorithm is more time-consuming because it involves matrix operations. The complexity
of LP based feature selection is low for small training databases.
Page 13
Page 14
IV. OR D I N A L FE A T U RE SE L E C T I O N
F O R IR I S RE C O G N I T I O N
Previous works have demonstrated the effectiveness of ordinal measures for iris recognition and
there are a large number of stable ordinal measures in iris images. However, how to choose the
most effective feature set of ordinal measures for reliable iris recognition is still an unsolved
problem. In earlier methods, a di-lobe and a tri-lobe ordinal filter were jointly used for iris feature
extraction. The parameter settings of these ordinal filters are hand-crafted and they are performed
on all iris image regions. However, the texture characteristics such as scale, orientation and
salient texture primitives of iris patterns vary from region to region. So it is a better solution to
employ a region specific ordinal filter for iris feature analysis.
It should be noted that the process of ordinal feature selection does not consider the prior mask
information of eyelids, eyelashes, specular reflections. There are mainly two kinds of strategies to
deal with occlusion problem in iris recognition. The first is to segment and exclude occlusion
regions in iris images and label the regions using mask in iris matching. But it needs accurate and
efficient iris segmentation. In addition, the size of iris template becomes double. More
importantly, the computational cost of both iris image preprocessing and iris matching is
significantly increased because of the iris mask strategy. So it is more realistic to identify and
exclude the heavily occluded iris images in quality assessment stage. The remained iris images
used for feature extraction and matching are less occluded by eyelids and eyelashes. So that it is
beneficial to both accuracy and efficiency of iris recognition. This paper aims to learn a common
ordinal feature set applicable to less occluded iris images of all subjects. The process of the
feature selection is independent on any individual or image specific prior information such as iris
segmentation mask. We believe the commonly selected feature set should be accurate enough to
recognize almost all subjects because the individual or sample specific variations have already
been taken into consideration in feature selection. We have also tried to integrate the occlusion
mask into feature selection and feature matching but no improvement of accuracy on state-of-theart iris image databases which have usually excluded heavily occluded iris images. We believe the
common ordinal features discovered in this paper are valuable for practical iris recognition
systems.
Iris texture varies from region to region in terms of scale, orientation, shape of texture
M.Tech in Signal Processing, SIT, Tumkur
Page 15
Page 16
Page 17
Page 18
is equal to the L1 regularization term in Lasso algorithm. We argue that it is better to incorporate
the prior information of each ordinal feature unit into the objective function to enforce the priority
of well-performing ordinal feature units in the training dataset. In the experiment on CASIA-IrisThousand, four options of P (i.e., Pi = 1/D, Pi = 1/D - index(OMi), Pi = AUC(OMi ), Pi =
EER(OMi )) are tried to learn different ordinal feature sets for iris recognition. The testing
results of these four settings of parameter Pi are shown in Fig. 5. It is obvious the best iris
recognition result is achieved when Pi = 1/D - index(OMi), which indicates the discriminating
index is the most important prior information of each ordinal feature unit. And the results also
demonstrate incorporation of discriminative penalty terms such as EER and AUC into feature
learning module can significantly improve biometric recognition accuracy.
Page 19
Comparison results of the five feature selection methods and state-of-the-art iris recognition
methods on the testing dataset of CASIA-Iris-Thousand are shown in Fig. 6 and Table I. And the
baseline performance based on Random-OM is also listed in Table I.
M.Tech in Signal Processing, SIT, Tumkur
Page 20
Page 21
Page 22
It is interesting to investigate the sparsity property of Lasso and LP. The results show that
linear programming can achieve a much more sparse training result, i.e., 26 non-zero
components (LP) vs. 500 non-zero components (Lasso). Therefore LP is advantageous over
Lasso to achieve a much more compact feature representation for iris biometrics Some
typical ordinal feature units which are selected by mRMR, LP, Lasso and Boost are
illustrated in Fig. 8 (The results of ReliefF are not shown here because it performs much
worse than other feature selection methods). A number of conclusions can be drawn from
the visualization of feature selection results.
1) The lower part of iris image regions adjacent to pupil are the most effective for iris
recognition because these regions are rich of iris texture information and have much
smaller probability to be occluded by eyelids and eyelashes.
2) Both di-lobe and tri-lobe filters are selected so they are complementary for iris
recognition. And the orientation of most ordinal filters is horizontal because iris
texture is mainly distributed along the circular direction in iris images, i.e. horizontal
orientation in the normalized format.
3) There exist some differences among the four feature selection methods (mRMR, LP,
M.Tech in Signal Processing, SIT, Tumkur
Page 23
Page 24
Page 25
Page 26
It should be noted that the palmprint images of PolyU 1.0 are transformed from a small part of
images in PolyU 2.0 so there may exist correlation or overlap between PolyU 1.0 and PolyU 2.0.
It is usually suggested to use independent training and testing datasets in pattern recognition
experiments. However, this paper still uses PolyU 1.0 for training and PolyU 2.0 for testing due to
the following reasons.
Almost all public palmprint databases including PolyU and CASIA do not have a division of
training set and testing set like face biometrics. So most palmprint recognition researchers
usually report the best results which are tuned on the whole database. We think it is fair to
compare our methods with state-of-the-art palmprint recognition methods considering PolyU
1.0 is only related to 7.7% palmprint images of PolyU 2.0. It is better to report the palmprint
recognition accuracy on the full PolyU 2.0 for performance evaluation of the existing
methods.
Our previous work has demonstrated that it is easy to achieve 100% accuracy in PolyU 1.0 for
both competitive code and ordinal code. So the performance of state-of-the-art palmprint
recognition methods on the independent version of PolyU 2.0 (excluding all related images in
PolyU 1.0) can be measured and compared with the testing results on PolyU 2.0.
The generalization capability of LP-OM will be demonstrated on the CASIA database using
the ordinal features trained on PolyU 1.0. So it is unnecessary to emphasize the independence
M.Tech in Signal Processing, SIT, Tumkur
Page 27
Firstly 5,000 tri-lobe ordinal filters are generated with random parameter setting of location,
scale, and orientation. They are tested on the training dataset. The top 500 trilobe ordinal filters
with the smallest EER are selected as the candidate feature pool. Some tri-lobe ordinal filters in
the feature pool are shown in Fig. 9b. We can see that the ordinal filters are significantly
different to the filters used for iris recognition. And then the proposed linear programming
method is used to select the top 5 ordinal filters as shown in Fig. 11a. The experimental results
on the testing dataset show that we can only use the first two tri-lobe ordinal filters to achieve
state-of-the-art palmprint recognition performance. It is a grand challenge to search the huge
parameter space and find the optimal parameter setting of tri-lobe ordinal filters for palmprint
recognition because the design of tri-lobe ordinal filters totally involves 15 variables. Although
the top 2 trilobe ordinal filters selected from the random filter pool are good enough for
palmprint recognition, the candidate feature pool only has 500 tri-lobe ordinal filters and it is
possible to find better tri-lobe ordinal filters outside the candidate feature pool. Therefore we
further generate more tri-lobe ordinal filters based on the basic profiles of top 2 tri-lobe ordinal
filters by variations of the scale and location parameters of basic lobes in tri-lobe ordinal filters.
The newly generated tri-lobe ordinal filters are used to train a better palmprint recognition
M.Tech in Signal Processing, SIT, Tumkur
Page 28
Page 29
Page 30
VI. CONCLUSIONS
The authors have proposed a novel feature selection method to learn the most effective ordinal
features for iris and palmprint recognition based on linear programming. Due to the
incorporation of the large margin principle and weighted sparsity rules into the LP formulation
the LP feature selection becomes very successful. The feature selection model based on LP is
flexible to integrate the prior information of each feature unit related to biometric recognition
such as DI, EER and AUC into the optimization procedure. The experimental results have
demonstrated that the proposed LP feature selection method outperforms mRMR, ReliefF,
Boosting and Lasso.
A number of conclusions can be drawn from the study.
The identity information of visual biometric patterns comes from the unique structure of
ordinal measures. The optimal setting of parameters in local ordinal descriptors varies from
biometric modality to modality, subject to subject and even region to region. So it is
impossible to develop a common set of ordinal filters to achieve the best performance for all
visual biometric patterns. Ideally it is better to select the optimal ordinal filters to encode
individually specific ordinal measures via machine learning. However, such a personalized
solution is inefficient in large-scale personal identification applications. So the task of this
paper turns to a suboptimal solution, learning a common ordinal feature set for each biometric
modality, which is expected to work well for most subjects.
A main contribution of this paper is a novel optimization formulation for feature selection
based on linear programming (LP). Our expectations on the feature selection
results, i.e. an accurate and sparse ordinal feature set, can be described as a linear objective
function. Such a linear learning model has three advantages. Firstly, it is simple to build,
understand, learn and explain the feature selection model. Secondly, linear penalty term is
robust against outliers. Thirdly, linear model only needs a small number of training samples to
achieve a global optimization result with great generalization ability.
Weighted sparsity is proposed in this paper and the results show that it performs better than
traditional sparse representation methods. So it is better to incorporate prior information of
candidate features into the optimization model in sparse learning.
M.Tech in Signal Processing, SIT, Tumkur
Page 31
Page 32
REFERENCES
[1] T. Tan and Z. Sun, Ordinal representations for biometrics recognition, in Proc. 15th Eur.
Signal Process. Conf., 2007, pp. 3539.
[2] Z. Sun and T. Tan, Ordinal measures for iris recognition, IEEE Trans. Pattern Anal. Mach.
Intell., vol. 31, no. 12, pp. 22112226, Dec. 2009.
[3] Z. Sun, T. Tan, Y. Wang, and S. Z. Li, Ordinal palmprint representation for personal
identification, in Proc. Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 1. 2005, pp. 279
284.
[4] P. Viola and M. Jones, Robust real-time face detection, Int. J. Comput. Vis., vol. 57, no. 2,
pp. 137154, May 2004.
[5]PolyU Palmprint Database [Online]. Available: http://www.comp.polyu.edu.hk/~biometrics/
[6] S. Z. Li, R. Chu, S. Liao, and L. Zhang, Illumination invariant face recognition using nearinfrared images, IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 4, pp. 627639, Apr.
2007.
[7] CASIA Iris Image Database [Online]. Available: http://biometrics.idealtest.org
Page 33