Ordinal Feature Selection For Iris and Palmprint Recognition+Report 2

Ordinal Feature Selection for Iris and Palmprint Recognition
I. INTRODUCTION
Identification of a person can be effectively done by making use of
Iris and Palmprint
recognition techniques. Iris recognition is an automated method of biometric identification that

uses mathematical pattern-recognition techniques on video images of one or both of the irises of
an individual's eyes, whose complex random patterns are unique, stable, and can be seen from
some distance. An iris-recognition algorithm can identify up to 200 identification points
including rings, furrows and freckles within the iris. A palm print refers to an image acquired of
the palm region of the hand. It can be either an online image (i.e. taken by a scanner or CCD) or
offline image where the image is taken with ink and paper. The palm itself consists of principal
lines, wrinkles (secondary lines), and epidermal ridges. It differs to a fingerprint in that it also
contains other information such as texture, indents and marks which can be used when
comparing one palm to another. Palm prints can be used for criminal, forensic, or commercial
applications. Palm prints, typically from the butt of the palm, are often found at crime scenes as
the result of the offender's gloves slipping during the commission of the crime, and thus
exposing part of the unprotected hand.
The success of a texture biometric recognition system heavily depends on its feature
analysis model, against which biometric images are encoded, compared and recognized by a
computer. It is desirable to develop a feature analysis method which is ideally both
discriminating and robust for iris and palmprint biometrics. On one hand, the biometric features
should have enough discriminating power to distinguish inter-class samples. On the other hand,
intra-class variations of biometric patterns in uncontrolled conditions such as illumination
changes, deformation, occlusions, pose/view changes, etc. should be minimized via robust feature
analysis. Therefore it is a challenging problem to achieve a good balance between inter-class
distinctiveness and intra-class robustness.
Generally the problem of feature analysis can be divided into two sub-problems, i.e. feature
representation and feature selection. Feature representation aims to computationally characterize
the visual features of biometric images. Local image descriptors such as Gabor filters, Local
Binary Patterns and ordinal measures are popular methods for feature representation of texture
biometrics. However, variations of the tunable parameters in local image filters (e.g. location,
M.Tech in Signal Processing, SIT, Tumkur
Page 1

scale, orientation, and inter-component distance) can generate a large and over-complete feature
pool. Therefore feature selection is usually necessary to learn a compact and effective feature set
for efficient identity authentication. In addition, feature selection can discover the knowledge
related to the pattern recognition problem of texture biometrics, such as the importance of
various image structures in iris and palmprint images and the most suitable image operators for
identity authentication.
Ordinal measures (OM) provide a good feature representation for iris, palmprint and face
recognition. Ordinal measures are defined as the relative ordering of a number of regional image
features (e.g. average intensity, Gabor wavelet coefficients, etc.) in the context of visual image
analysis. The basic idea of OM is to characterize the qualitative image structures of texture-like
biometric patterns. The success of ordinal representation comes from the texture-like visual
biometric patterns where sharp and frequent intensity variations between image regions provide
abundant ordinal measures for robust and discriminating description of individual features.
Multi-lobe Ordinal Filter (MOF) with a number of tunable parameters is proposed to analyze the
ordinal measures of biometric images (Fig. 1). MOF has a number of positive and negative lobes
which are specially designed in terms of distance, scale, orientation, number, and location so that
the filtering result of MOF with the biometric images can measure the ordinal relationship
between image regions covered by the positive and negative lobes. From Fig. 1 we can see that
Page 2

variations of the parameters in multi-lobe ordinal filter can lead to an extremely huge feature set
of ordinal measures. For example, each basic Gaussian lobe in MOF has five parameters, i.e., xlocation, y-location, x-scale, y-scale and orientation. Thus there are totally 10 variables in a dilobe ordinal filter and 15 tunable parameters in a tri-lobe ordinal filter. Supposing that each
variable has ten possible values, the number of all possible di-lobe and tri-lobe ordinal measures
in a biometric image is at least in the order of 1010 and 1015 respectively. Although in general
ordinal measures are good descriptors for biometric feature representation, there are significant
differences between various ordinal features in terms of distinctiveness and robustness. Since the
primitive image structures vary greatly across different biometric modalities in terms of shape,
orientation, scale, etc., there does not exist a generic feature set of ordinal measures which can
achieve the optimal recognition performance for all biometric modalities. Even for the same
biometric modality, the existing individual difference in visual texture pattern determines that the
optimal ordinal features may vary from person to person. Moreover the redundancy among
different ordinal features should be reduced and it has been proven that it is possible to only use a
small number of ordinal features to achieve high accuracy in iris and palmprint biometrics.
Therefore it is unnecessary to extract all ordinal features because of the redundancy in the overcomplete set of ordinal feature representation. Based on the above analysis, a much smaller subset
of ordinal measures must be selected from the original feature space as a compact biometric
representation, into which the characteristics of visual biometric patterns should be incorporated,
for the purpose of efficient biometric identification.
Page 3
II. RELATED WORK

Feature selection is a key problem in pattern recognition and has been extensively studied.
However, finding an optimal feature subset is usually intractable and in most cases there are only
solutions to suboptimal feature selection. Since no generic feature selection methods are
applicable to all problems, a number of feature selection methods have been proposed. These
methods employ different optimization functions and searching strategies for feature selection.
For example, the criteria of Max-Dependency, Max-Relevance is used to formulate an
optimization based feature selection method mRMR. Minimum redundancy feature selection is
an algorithm frequently used in a method to accurately identify characteristics of genes and
phenotypes and narrow down their relevance and is usually described in its pairing with relevant
feature selection as Minimum Redundancy Maximum Relevance (mRMR). Feature selection,
one of the basic problems in pattern recognition and machine learning, identifies subsets of data
that are relevant to the parameters used and is normally called Maximum Relevance. These
subsets often contain material which is relevant but redundant and mRMR attempts to address
this problem by removing those redundant subsets. mRMR has a variety of applications in many
areas such as cancer diagnosis and speech recognition. Features can be selected in many different
ways. One scheme is to select features that correlate strongest to the classification variable. This
has been called maximum-relevance selection. Many heuristic algorithms can be used, such as
the sequential forward, backward, or floating selections. On the other hand features can be
selected to be mutually far away from each other while still having "high" correlation to the
classification variable. This scheme, termed as Minimum Redundancy Maximum Relevance
(mRMR) selection has been found to be more powerful than the maximum relevance selection.
ReliefF is a simple yet efficient feature selection method suitable for problems with strong
dependencies between features. ReliefF has been regarded as one of the most successful
strategies in feature selection because the key idea of the ReliefF is to estimate the quality of
features according to how well their values distinguish between instances that are near to each
other. RELIEF is a feature selection algorithm used in binary classification (generalisable to
polynomial classification by decomposition into a number of binary problems) proposed by Kira
and Rendell in 1992. Its strengths are that it is not dependent on heuristics, requires only linear
time in the number of given features and training instances, and is noise-tolerant and robust to
Page 4

feature interactions, as well as being applicable for binary or continuous data; however, it does
not discriminate between redundant features, and low numbers of training instances fool the
algorithm. Kononenko et al. proposed some updates to the algorithm (RELIEFF) in order to
improve the reliability of the probability approximation, make it robust to incomplete data, and
generalising it to multi-class problems
Most research works on feature selection mainly focus on generic pattern classification
applications rather than specific applications in biometrics. This paper mainly addresses the
efficient feature selection methods applicable to biometric authentication. Boosting and Lasso
have been proved as the well performed feature selection methods in face recognition.
Boosting has become a popular approach used for both feature selection and classifier design in
biometrics. Boosting algorithm aims to select a complementary ensemble of weak classifiers in a
greedy manner. A reweighting strategy is applied for training samples to make sure that every
selected weak classifier should have a good performance on the hard samples which cannot be
well classified by the previously selected classifiers. Boosting has achieved good performance in
visual biometrics, including both face detection and face recognition. However, boosting cannot
guarantee a globally optimal feature set and an over fitting result may be obtained if the training
data is not well designed.
Destrero et al. proposed a regularized machine learning method enforcing sparsity for feature
selection of face biometrics based on Lasso regression. The Lasso feature selection aims to solve
the following penalized least-squares problem:
f L arg min {|| g Af ||22 2 | f |1}
(1)
where g means the intra- or inter-class label (+1 or -1), the components of A indicate the intra- or
inter-class matching results based on individual features in the training database, f denotes the
feature weight vector, and is a parameter controlling the balance between regression errors and
sparsity of selected features. The objective function includes two parts. The first part || g Af || 22
aims to minimize the regression errors and the second part 2 | f |1 uses L1 regularization to
enforce sparsity of the selected features. The The L1 regularized sparse representation was
evaluated to be better than Boosting for face detection and authentication in small size training
dataset. However, this approach also has some drawbacks. Firstly, although the optimization
Page 5

problem defined in sparse representation is possible to achieve a global minimum, it is not
efficient in implementation due to the non-linear objective function. Therefore a three-stage
architecture is necessary to solve a large learning problem. Secondly, the squared sum of
regression errors defined in the objective function makes the feature selection sensitive to
outliers. Thirdly, the class label g can only take the value either + 1 or -1, therefore the model
could not generate a maximal margin. Margin analysis is important to the generalization ability
of machine learning algorithms and the most powerful machine learning methods, e.g. Support
Vector Machine and Boosting are motivated by margins. In addition, the features of training
samples should be normalized to match the class label, therefore additional computational cost is
needed. Fourthly, the model of Lasso is not flexible so that the optimization does not take into
account the characteristics of image features and biometric recognition. For example, L1
regularization term | f |1 in the optimization function Eqn. 1 assigns an identical weight to all
features and the discriminative information of each feature is not taken into consideration.
The L1 regularization is a popular technique for feature selection. The objective function aims
to minimize misclassification error and the L1 norm of feature weight.
In summary, both Boosting and Lasso have limitations in ordinal feature selection and it is
desirable to develop a feature selection method with the following properties.
1) The feature selection process can be formulated as a simple optimization problem. Here
simple means that both the objective function and the constraint terms can be defined
following a well-established standard optimization problem. So that it is easy to obtain a
global solution of the feature selection problem.
2) A sparse solution can be achieved in feature selection so that the selected feature set is
compact for efficient storage, transmission and feature matching.
3) The penalty of misclassification cannot be a high-order function of regression errors to
control the influence of outliers.
4) The model of feature selection should be flexible to take into account the characteristics of
the biometric recognition problem so that the genuine and imposter matching results can be
well separated from each other and the selected image features are accurate in training
database.
5) The feature selection problem has less dependence on the training data and it can be solved
Page 6

by a small set of training samples. It requires the feature selection method can circumvent the
curse of dimensionality problem and generalize to practical applications.
This paper proposes a novel feature selection method which meets all requirements listed above.
In our method, the feature selection process of ordinal measures is formulated as a constrained
optimization problem. Both the optimization objective and the constraints are modeled as linear
functions; therefore linear programming (LP) can be used to efficiently solve the feature
selection problem. The feature units used for LP formulation are regional ordinal measures,
which are tested on the training dataset to generate both intra- and inter-class matching
samples. Our feature selection method aims at finding a compact subset of ordinal features that
minimizes biometric recognition errors with large margin between intra and inter-class
matching samples. The objective function of our LP formulation includes two parts. The first
part measures the misclassification errors of training samples failing to follow a large margin
principle. And the second part indicates weighted sparsity of ordinal feature units. Traditional
sparse representation uses L1-norm to achieve sparsity of feature selection and all feature
components have an identical unit weight in sparse learning. However, we argue that it is better
to incorporate some prior information related to candidate feature units into sparse representation
so that the most individually accurate ordinal measures are given preferential treatment. And
the linear inequality constraints of LP optimization problem require that all intra- and interclass matching results are well separated from each other with a large margin. Slack variables
are introduced to ensure the inequality constraints of ambiguous and outlier samples. Slack
variable is a variable that is added to an inequality constraint to transform it into equality.
Introducing a slack variable replaces an inequality constraint with an equality constraint and a
non-negativity constraint.
Page 7
III. FEATURE SELECTION BASED ON

LINEAR PROGRAMMING
The objective of feature selection for biometric recognition is to select a limited number of
feature units from the candidate feature set (Fig. 2). In this paper, a feature unit is defined as the
regional ordinal encoding result using a specific ordinal filter on a specific biometric region. We
aim to use a machine learning technique to find the weights of all ordinal feature units. So that
feature selection can also be regarded as a sparse representation method, i.e. most weight values
are zero and only a compact set of feature units have the weighted contribution to biometric
recognition.
In this paper, ordinal feature selection is formulated as a constrained optimization problem as
follows
Page 8

subject to:
where D is the number of ordinal features available for feature selection, N+ and N- denote the
number of intra- and inter-class biometric matching pairs in the training database respectively, wi
means the weight of ith ordinal feature for the biometric recognition system, Pi measures the
recognition accuracy of ith ordinal feature on the training database, x+ij denotes the Hamming
distance of ith ordinal feature for jth intra-class biometric image pairs in the training database, x-ik
denotes the Hamming distance of ith ordinal feature for kth inter-class iris image pairs in the
training database, and are two fixed parameters indicating the expected intra- and inter-class
biometric matching results respectively, +j and -k are slack variables for intra- and inter-class
biometric matching respectively, + and - are the constant parameters tuning the importance of
Page 9

intra- and inter-class matching results for the biometric recognition system respectively. The idea
of this feature selection method is illustrated in Fig. 3.
The basic idea of the proposed feature selection method is to find a sparse representation of
ordinal features on the condition of large margin principle. On one hand, the intra and inter-class
biometric matching results are expected to be well separated with a large margin. On the other
hand, the number of selected ordinal features should be much smaller than the large number of
candidates. These two seemingly contradictory requirements are well integrated in our feature
selection method.
The objective function of LP formulation includes two parts motivated by the basic idea of
feature selection method. The first part of the objective function
N N
k
j
N j 1
N j 1
aims to minimize the misclassification errors of intra- and inter-class matching samples according
to the expected thresh-olds and . Since and are defined as the mean intra- and interclassing Hamming distance for well performing ordinal features, a large margin principle is
actually incorporated into the optimization problem. The biometric matching samples failing to
meet the large margin requirement will suffer a penalty and such a penalty is deter-mined by the
distance from the dissimilarity measure to the expected thresholds and . Here a soft margin
technique is adopted by introducing slack variables +j and -k to guarantee that all intra-class and
inter-class matching results follow the large margin principle. So the first part of objective
function
N N
k
j N
N j 1
j 1
defines the overall penalty term of training samples according to the large margin principle. The
constant parameters + and - measure the penalty weights to the misclassifications of intra- and
inter-class matching samples respectively and their value can be tuned according to the
application requirements. For example, the FRR (False Reject Rate) sensitive applications such
as watch-list monitoring can set a larger + and the FAR (False Accept Rate) sensitive
applications such as banking can set a larger -. In normal applications, we can set + = -. In
Page 10

summary, the objective function of the proposed LP feature selection method aims to minimize
the misclassification errors and enforce sparsity of the selected ordinal features simultaneously.
And the parameters + and - can balance the trade-off between accuracy and sparsity.
The second part of the objective function
enforces weighted sparsity of ordinal feature units. Sparsity of the ordinal feature units is very
important to effective and efficient biometric recognition. Firstly, the objective of biometric
recognition is to find a mapping function between the most characterizing features and the
identity label. Sparse learning is just for this purpose and it is possible to discover the intrinsic
features of biometric patterns. Secondly, sparsity means that it is possible to use a compact
feature set for biometric recognition, i.e., efficient encoding, storage, transmission and
comparison of biometric feature templates. Weighted sparsity proposed in this paper is a novel
idea in sparse representation. It differs from the existing sparse representation method, in that the
good performing individual features in the training database are given a higher weight in sparse
learning. Here the weight Pi represents the prior information of individual ordinal feature in
terms of recognition performance. It may be defined as the Equal Error Rate (EER), the Area
Under the ROC Curve (AUC) or the inverse of Discriminating Index (1/D-index). Since the
weight of each ordinal measure w i is constrained to be non-negative value, the second part of
objective function approximates the L1 regularization which is beneficial to generate a sparse
ordinal feature set after feature selection. The L1 regularization term in sparse representation
(Eqn. 1) can be regarded as the special case of
where Pi = 1 for all ordinal features. The prior information of each feature is not taken into
account in the Lasso method and all features are evenly treated to enforce sparsity. In our feature
selection method, better performing ordinal features are assigned with higher weights Pi so that
a more compact and effective feature set can be selected.
Page 11

The LP formulation subjects to a set of linear inequality constraints. Eqn. 3 and Eqn. 4 require
that all intra- and inter-class matching samples in the training database should be well separated
based on a large margin principle. In fact, a large number of training samples close to the
decision boundaries cannot meet the large margin principle and these
inter-class matching
results usually cannot be linearly separated. Therefore slack variables +j and -k are introduced to
the inequality constraints which makes our model more flexible and robust. Our LP formulation
is actually a soft margin model which can remove the influence of noisy samples or outliers
adaptively and also generate a larger margin to improve the accuracy and generalization
performance with the help of slack variables. Eqn. 7 indicates a non-negative constraint on the
weight of features w {w i }. We argue that the non-negative constraint of w is both reasonable and
beneficial. Firstly, the target of feature selection is to find the optimal solution of w, which is a
very important variable with physical meaning. Each element in w denotes the contribution of
each ordinal feature to the success of biometric recognition. Since we are discussing a feature
selection method, each feature should only have positive contribution to the resulting large-margin
classification. Secondly, the second part of objective function
is equal to a weighted L1 regularization term if w i is enforced to be positive, which can lead to a

sparse result of feature selection. Thirdly, non-negative constraint of w is beneficial to a stable
solution of the LP optimization problem. For example, if w i < 0, it means that intra-class
Hamming distance of ith ordinal features may be generally larger than inter-class Hamming
distance based on Eqn. 3 and Eqn. 4. Of course such a conclusion contradicting the fact may
bring instable factors to the LP learning problem.
The feature selection method proposed in this paper has a different optimization formulation to
the existing LP method in terms of the weighted sparsity term in objective function and nonnegative constraint of feature weights. Therefore our method is more suitable to learn
discriminant, robust and sparse features for biometric recognition.
It should be noted that our LP formulation is flexible and a number of variants may be generated
to meet the requirements of some specific feature selection applications. For example, the LP
Page 12

formulation turns to a 0-1 programming problem when w i is defined as a binary variable 1 or
0. And an additional constraint
may be introduced to control the number of ordinal feature components (N) according to
practical requirements. And we can also add other application specific requirements to the
objective function and constraints. If these newly added terms can be expressed in linear
functions, our feature selection can also be efficiently solved based on linear programming.
Because our feature selection method can be transformed to a standard linear programming
model, it can be solved by the Simplex algorithm conveniently and efficiently. The Simplex
algorithm has a profound theory to obtain a globally optimal solution. We sort the weights of
features to get the desired number of features. In order to correct truncation errors, then extra
classifiers can be used for recognition, e.g., Nearest Neighbor (NN), SVM. Another advantage of
LP is that there are a number of software tools to solve the linear programming problem such as
CPLEX, LINDO etc. And state-of-the-art commercial mathematical toolboxes can efficiently
solve large-scale linear programming problems with millions of variables. The LP formulation of
this paper only involves thousands of variables so we choose the CPLEX software package
provided by IBM, which is free of charge to academic research.
There are a number of implementations of the fast algorithm of linear programming. The
computational complexity of linear programming based on interior-point method is O(DN 2),
where N is the number of training samples and D is the initial dimension of feature pool. In
contrast, the complexity of GentleBoost is O(dD 2 log D), where d is the number of selected
features and D is the initial dimension of feature pool. The complexity of Lasso is O(TND),
where T is the number of iterations, N is the number of training samples and D is the initial
dimension of feature pool. Therefore GentleBoost is efficient in biometric feature selection
because a small number (d) of effective features are accurate enough for personal identification.
Lasso algorithm is more time-consuming because it involves matrix operations. The complexity
of LP based feature selection is low for small training databases.
Page 13

State-of-the-art iris and palmprint recognition methods and representative feature selection
methods are evaluated on the CASIA and PolyU biometrics databases for performance
comparison to show the merit of the proposed LP formulation. It should be noted that the main
purpose of this paper is to discover the most effective ordinal features for iris and palmprint
recognition. It can be regarded as a specific feature selection problem. Two representative
methods in generic feature selection, i.e. mRMR and ReliefF, and two popular feature selection
methods in biometrics, i.e. Boosting and Lasso are used for performance comparison.
Page 14
IV. OR D I N A L FE A T U RE SE L E C T I O N
F O R IR I S RE C O G N I T I O N
Previous works have demonstrated the effectiveness of ordinal measures for iris recognition and
there are a large number of stable ordinal measures in iris images. However, how to choose the
most effective feature set of ordinal measures for reliable iris recognition is still an unsolved
problem. In earlier methods, a di-lobe and a tri-lobe ordinal filter were jointly used for iris feature
extraction. The parameter settings of these ordinal filters are hand-crafted and they are performed
on all iris image regions. However, the texture characteristics such as scale, orientation and
salient texture primitives of iris patterns vary from region to region. So it is a better solution to
employ a region specific ordinal filter for iris feature analysis.
It should be noted that the process of ordinal feature selection does not consider the prior mask
information of eyelids, eyelashes, specular reflections. There are mainly two kinds of strategies to
deal with occlusion problem in iris recognition. The first is to segment and exclude occlusion
regions in iris images and label the regions using mask in iris matching. But it needs accurate and
efficient iris segmentation. In addition, the size of iris template becomes double. More
importantly, the computational cost of both iris image preprocessing and iris matching is
significantly increased because of the iris mask strategy. So it is more realistic to identify and
exclude the heavily occluded iris images in quality assessment stage. The remained iris images
used for feature extraction and matching are less occluded by eyelids and eyelashes. So that it is
beneficial to both accuracy and efficiency of iris recognition. This paper aims to learn a common
ordinal feature set applicable to less occluded iris images of all subjects. The process of the
feature selection is independent on any individual or image specific prior information such as iris
segmentation mask. We believe the commonly selected feature set should be accurate enough to
recognize almost all subjects because the individual or sample specific variations have already
been taken into consideration in feature selection. We have also tried to integrate the occlusion
mask into feature selection and feature matching but no improvement of accuracy on state-of-theart iris image databases which have usually excluded heavily occluded iris images. We believe the
common ordinal features discovered in this paper are valuable for practical iris recognition
systems.
Iris texture varies from region to region in terms of scale, orientation, shape of texture
Page 15

primitives, etc. So it is needed to use region specific ordinal filters to achieve the best
performance. Therefore iris images are divided into multiple blocks and different types of ordinal
filters with different parameter settings are applied on each image block. So that feature selection
methods can be used to find the most effective set of image blocks with the most appropriate
setting of parameters. In this paper, the preprocessed and normalized iris image is divided into
multiple regions and a number of di-lobe and tri-lobe ordinal filters with variable scale, orientation and inter-lobe distance are performed on each region to generate 47,042 regional ordinal
feature units (Fig. 2). Each feature unit, which is jointly determined by the spatial location of iris
region and the corresponding ordinal filter, is constituted by 256 ordinal measures or 32 Bytes in
feature encoding. The objective of feature selection is to select a limited number of OM feature
units from the candidate feature set.
The experimental part of this paper aims to test and compare the proposed Linear Programming
(LP) method with four feature selection methods for ordinal iris feature analysis, i.e., Boosting,
Lasso, mRMR and ReliefF . All these feature selection methods used for selecting the effective
set of ordinal measures are simply named as LP-OM, Boost-OM, Lasso-OM, mRMR-OM and
ReliefF-OM respectively. In this paper, three iris image datasets in CASIA Iris Image Database
Version 4.0 (CASIA-IrisV4), namely CASIA-Iris-Thousand, CASIA-Iris-Lamp and CASIAIrisInterval, are used in the experiments. To demonstrate the advantage of feature selection methods
for visual biometrics, a randomly selected ordinal feature set with the same number of feature
units is employed as the baseline algorithm. Such an ordinal feature representation method without
feature selection is denoted as Random-OM. To demonstrate the benefit of feature selection in
iris recognition, state-of-the-art iris recognition methods proposed by Daugman and Ma et al. are
implemented as the baseline algorithms. A number of hand-crafted parameter settings are tried
for these two methods and the best results are reported in this paper. The idea of sparse
representation of iris features has been recently proposed by Kumar using L1 regularization. So
the main feature selection method in can be represented by Lasso-OM.
CASIA-Iris-Thousand contains 20,000 iris images from 1,000 subjects. The samples in
CASIA-Iris-Thousand are 8-bit gray level iris images with resolution 640 480. The diameter of
iris ring is around 200 pixels. And all iris images in CASIA-Iris-Thousand are compressed to
JPEG format to save storage memory. The main sources of
Page 16
intra-class variations in CASIA-Iris-Thousand include illumination changes, motion blur,

eyeglasses, specular reflections, and JPEG compression. Since CASIA-Iris-Thousand is the
Page 17

largest iris image dataset in the public domain, it is well-suited for studying the uniqueness of iris
features and practical performance of iris recognition algorithms.
The iris images of the first 25 subjects are used as the training dataset and the remained 19,500
iris images of 975 subjects are used to test the performance of various feature selection methods.
There are totally 500 iris images of 50 eyes in the training dataset and they are used to generate
2,250 intra-class and 4,900 inter-class matching samples. We do not use all possible inter-class
matching samples due to three reasons. Firstly, it can keep the balance between the number of
positive and negative samples. Secondly, the use of a subset of inter-class comparisons can
minimize the number of linear constraints in linear programming so that the solution of
optimization problem is simplified. Thirdly, it can reduce the redundancy among negative
samples. Five iris recognition methods namely LP-OM, Boost-OM, Lasso-OM, mRMR-OM and
ReliefF-OM are performed on the training dataset to obtain the most effective feature set of
ordinal measures respectively. And then the selected ordinal feature units are evaluated on the
testing dataset.
Firstly we investigate the feature selection results of the proposed LP-OM method. The weights
of 47,042 ordinal feature units as the feature selection output are shown in Fig. 4a. There are
only 26 non-zero components and almost
all ordinal feature units (47,016, 99.94%) have zero weight in LP feature selection. Therefore two
conclusions can be drawn from this observation. Firstly, the fundamental assumption of high
dimensional ordinal feature selection is satisfied that the regression function from ordinal
measures to individual identity lies in a low dimensional manifold. So that it is possible to use
statistical inference methods such as linear programming to derive a compact feature set for
efficient iris recognition. Secondly, the proposed linear programming method can achieve a
sparse feature set. The sparsity property of our feature selection method comes from the second
part of objective function
and the non-negative constraints in Eqn. 7. So that the minimization of
Page 18
is approximately equivalent to the minimization of the L1 regularization term in the Lasso

algorithm, which has a solid theory to guarantee a sparse learning result. To further investigate
the relationship between iris recognition performance and the number of ordinal feature units, the
discriminating index of top N ordinal features chosen by the three feature selection methods (LPOM, Boost-OM, Lasso-OM) is shown in Fig. 4b. The experimental results indicate saturation of
iris recognition performance with increasing of the number of ordinal feature units. And this
result demonstrates the necessity and possibility of sparse representation of ordinal measures in
iris images. Because a limited number of ordinal features are sufficient to achieve high accuracy,
only 15 ordinal feature units (i.e., 420 Bytes ordinal code) with the largest weights are selected to
build an iris recognition system for the feature selection methods in the following experiments.
It is interesting to investigate the parameter Pi in the linear programming formulation. When P
is a unit vector, e.g.
is equal to the L1 regularization term in Lasso algorithm. We argue that it is better to incorporate
the prior information of each ordinal feature unit into the objective function to enforce the priority
of well-performing ordinal feature units in the training dataset. In the experiment on CASIA-IrisThousand, four options of P (i.e., Pi = 1/D, Pi = 1/D - index(OMi), Pi = AUC(OMi ), Pi =
EER(OMi )) are tried to learn different ordinal feature sets for iris recognition. The testing
results of these four settings of parameter Pi are shown in Fig. 5. It is obvious the best iris
recognition result is achieved when Pi = 1/D - index(OMi), which indicates the discriminating
index is the most important prior information of each ordinal feature unit. And the results also
demonstrate incorporation of discriminative penalty terms such as EER and AUC into feature
learning module can significantly improve biometric recognition accuracy.
Page 19
Comparison results of the five feature selection methods and state-of-the-art iris recognition
methods on the testing dataset of CASIA-Iris-Thousand are shown in Fig. 6 and Table I. And the
baseline performance based on Random-OM is also listed in Table I.
Page 20

TABLE I
COMPARISON OF PERFORMANCE OF IRIS RECOGNITION METHODS ON THE
CASIA-IRIS-THOUSAND
A number of conclusions can be drawn from the experimental results.

Ordinal features are effective for iris recognition. Even though we randomly select 15 ordinal
feature units from 47,016 candidates, it is possible to achieve a good recognition
performance (EER=2.91%) on the largest iris dataset in the public domain.
The ordinal features automatically selected by most machine learning approaches (Boost-OM,
Lasso-OM, mRMR-OM and LP-OM) perform much better than randomly chosen ordinal
features (Random-OM). Therefore it is necessary to adopt feature selection methods to learn a
distinctive and robust ordinal feature set for iris recognition. The feature selection based iris
recognition methods perform significantly better than state-of-the-art methods. There are two
advantages of our methods. The first is the advantage of ordinal measures over iris code and
shape code. The second advantage is the use of feature selection method. In contrast, the
implementation of state-of-the-art iris recognition methods is based on hand-crafted feature
parameters.
There exist performance differences between the five feature selection methods. Both mRMROM and ReliefF-OM are generic feature selection methods, which are worse than the proven
feature selection methods in biometrics such as Boost-OM and Lasso-OM. In general, the
global optimization methods such as Lasso-OM and LP-OM can achieve a higher accuracy in
testing dataset than greedy learning method such as Boost-OM. And LP-OM can learn a better
ordinal feature set than Lasso-OM. Therefore the experimental results demonstrate that the
Page 21

proposed linear programming method achieves the highest accuracy in terms of EER,
discriminating index and AUC. And the advantage of our feature selection method is more
significant in most practical iris recognition applications when FAR is usually required to be
smaller than 10-6. For example, when FAR=10-8, LP-OM can achieve a significantly smaller
FRR compared with Lasso-OM and Boost-OM.
The computational cost of feature selection is tested using Matlab2011 programming
environment on a 2.83GHZ personal computer. We can see linear programming is much more
efficient than Lasso and mRMR in feature selection. Boosting is the fastest method to select top
15 ordinal feature units because of its greedy feature selection strategy. In contrast, both linear
programming and Lasso can provide a global weighting result for all ordinal feature units, so
they are less efficient than boosting method in feature selection. This paper only tried feature
selection with ten thousands of variables so computational complexity of feature selection is
not so important in offline training stage. But it is possible to introduce millions of variables to
optimization in large-scale training database because more training data usually benefits pattern
recognition. In addition, it is possible to extend our work to online feature selection (e.g.,
person specific feature selection in forensic applications) when training time makes sense in
real-time applications.
The optimization objective functions of both Lasso and LP are mainly constituted by two
terms, namely misclassification penalty term and sparsity penalty term. We can use a parameter
to assign the importance weight to these two terms. It is interesting to investigate the
sensitivity of visual biometric recognition performance to the parameter . The EER of iris
recognition as a function of for these two feature selection methods in a cross validation
dataset is shown in Fig. 7. We can see that Lasso is sensitive to the parameter setting of but
LP can achieve a comparatively stable performance with variation of .
Page 22
It is interesting to investigate the sparsity property of Lasso and LP. The results show that
linear programming can achieve a much more sparse training result, i.e., 26 non-zero
components (LP) vs. 500 non-zero components (Lasso). Therefore LP is advantageous over
Lasso to achieve a much more compact feature representation for iris biometrics Some
typical ordinal feature units which are selected by mRMR, LP, Lasso and Boost are
illustrated in Fig. 8 (The results of ReliefF are not shown here because it performs much
worse than other feature selection methods). A number of conclusions can be drawn from
the visualization of feature selection results.
1) The lower part of iris image regions adjacent to pupil are the most effective for iris
recognition because these regions are rich of iris texture information and have much
smaller probability to be occluded by eyelids and eyelashes.
2) Both di-lobe and tri-lobe filters are selected so they are complementary for iris
recognition. And the orientation of most ordinal filters is horizontal because iris
texture is mainly distributed along the circular direction in iris images, i.e. horizontal
orientation in the normalized format.
3) There exist some differences among the four feature selection methods (mRMR, LP,
Page 23

Lasso and Boost) in terms of the selected ordinal filters and iris image regions. And
these minor differences of feature selection results determine the differences of iris
recognition performance.
Page 24
V. ORDINAL FEATURE SELECTION FOR

PALMPRINT RECOGNITION
Palmprint provides a reliable source of information for automatic personal identification
and has wide and important applications. Richness of visual information available on
palmprint images including principal lines, ridges, minutiae points, singular points, texture,
etc. provides various possibilities for palmprint feature representation and pattern recognition. A
number of feature representation methods for palmprint recognition have been proposed in the
literature, including geometric structure such as point and line patterns, global appearance
description based on subspace analysis, and local texture analysis, etc. Competitive code represents the state-of-the-art performance in palmprint recognition. There, each palmprint image
region is assumed to have a dominant line segment and its orientation is regarded as the palmprint
feature. Because the even Gabor filter is well suited to model the line segment, it was used to
filter the local image region along six different orientations, obtaining the corresponding contrast
magnitudes. Based on the winner take-all competitive rule, the index (ranging from 0 to 5) of the
minimum contrast magnitude was represented by three bits, namely competitive code. This paper
attempts to provide a new understanding and solution to the problem of palmprint feature analysis
using ordinal measures and linear programming.
Unique and rich texture information of palmprint images is useful for personal identification.
There are a large number of irregularly distributed line segments on palm surface, mainly
constituted by principle lines and wrinkles. Photometric properties of these line segments are
significantly different to that of non-line regions. Thus the reflection ratios between line and nonline regions have stable ordinal relationship, i.e. R ( l i n e ) < R ( n o n - l i n e ) . Since the
illumination strength of neighboring palmprint regions are approximately identical, it can be
derived that the ordinal measures of intensity between palmprint regions are robust descriptor for
identity verification. For each palm, spatial configuration of the line and non-line image regions
for ordinal measures, such as location, orientation, scale, has its unique layout. So the core idea
of ordinal measures based palmprint representation is to recover the random layout of ordinal
measures for feature matching.
This paper mainly focuses on feature analysis of palmprint biometrics. For palmprint images,
Page 25

the gaps between neighboring fingers can be used as the landmark points for correction of the
rotation and scale changes of palmprint images and then the central region can be cropped as the
input of feature analysis. In this paper, all palmprint images are normalized into a central ROI
region with resolution 128128. And then each ordinal filter is performed on the ROI to generate
3232=1024 Bits (128 Bytes) ordinal code following the feature extraction routine of most stateof-the-art palmprint recognition algorithms So if we select N ordinal filters for palmprint image
analysis, the template size for each palmprint image is 128 N Bytes.
Because of the difference between the texture primitives in iris and palmprint biometric patterns,
we need to provide biometric modality specific ordinal filters as the input of feature selection.
Previous work only tried di-lobe ordinal filters (Fig. 9a) for palmprint recognition and the results
show that the ordinal measures between two elongated, line-like and orthogonal image regions are
well-suited for palmprint feature analysis. In this paper we explore tri-lobe ordinal filters (Fig.
9b) for palmprint feature extraction because of the following reasons.
1) Tri-lobe ordinal filters are expected to be more discriminative and robust than di-lobe filters;
2) A much larger feature space can be generated by tri-lobe ordinal filters so that it is possible
for feature selection methods to search a better solution for palmprint recognition;
3) Di-lobe filters can be regarded as the special cases of tri-lobe filters so the good performing
di-lobe ordinal filters in our previous work are also included into our current development.
To test the proposed feature selection method for palm-print recognition, the PolyU palmprint
image database is used for performance evaluation. The PolyU Palmprint Database was collected
by a CCD camera-based imaging device. A subject puts his hand on a platform with the guidance
of six pegs. Hence the low-resolution images (75 dpi) are captured for online processing. The
latest PolyU Ver 2.0 contains 7,752 palmprint images from 386 palms. Each palm has two
sessions of images, either of which has at the most 10 images. Average time interval between
two sessions is two months. Light conditions and focus of the imaging device are changed
between two occasions of image capture, which is challenging to robustness of recognition
algorithms. All images are 8-bit gray-level images with resolution 384 284. It is a great
challenge to group together intra-class palm-print images without compromising inter-class
distinctiveness. The latest version of PolyU Palmprint Database or PolyUPalmprint Ver 2.0 has
Page 26

been widely used in the literature and most state-of-the-art palmprint recognition methods are
tested and compared on this database. The first version of PolyU Palmprint Database or PolyUPalmprint Ver 1.0 only has 600 palmprint images of 100 classes. In this paper we use PolyUPalmprint Ver 1.0 as the training dataset and PolyUPalmprint Ver 2.0 as the testing dataset.
It should be noted that the palmprint images of PolyU 1.0 are transformed from a small part of
images in PolyU 2.0 so there may exist correlation or overlap between PolyU 1.0 and PolyU 2.0.
It is usually suggested to use independent training and testing datasets in pattern recognition
experiments. However, this paper still uses PolyU 1.0 for training and PolyU 2.0 for testing due to
the following reasons.
Almost all public palmprint databases including PolyU and CASIA do not have a division of
training set and testing set like face biometrics. So most palmprint recognition researchers
usually report the best results which are tuned on the whole database. We think it is fair to
compare our methods with state-of-the-art palmprint recognition methods considering PolyU
1.0 is only related to 7.7% palmprint images of PolyU 2.0. It is better to report the palmprint
recognition accuracy on the full PolyU 2.0 for performance evaluation of the existing
methods.
Our previous work has demonstrated that it is easy to achieve 100% accuracy in PolyU 1.0 for
both competitive code and ordinal code. So the performance of state-of-the-art palmprint
recognition methods on the independent version of PolyU 2.0 (excluding all related images in
PolyU 1.0) can be measured and compared with the testing results on PolyU 2.0.
The generalization capability of LP-OM will be demonstrated on the CASIA database using
the ordinal features trained on PolyU 1.0. So it is unnecessary to emphasize the independence
Page 27

between PolyU 1.0 and PolyU 2.0.
Since PolyU Palmprint Database is collected using high-quality sensor and PolyU-Palmprint
Ver 1.0 is small in size, our previous work based on hand crafted di-lobe ordinal filter can
achieve zero EER on PolyU-Palmprint Ver 1.0. To learn a robust feature set of ordinal measures,
a more challenging training dataset is constructed by adding some noise and perturbations into
PolyU-Palmprint Ver 1.0 (Fig. 10). Finally the synthetic training dataset includes 4,200 palmprint
images of 100 classes.
Firstly 5,000 tri-lobe ordinal filters are generated with random parameter setting of location,
scale, and orientation. They are tested on the training dataset. The top 500 trilobe ordinal filters
with the smallest EER are selected as the candidate feature pool. Some tri-lobe ordinal filters in
the feature pool are shown in Fig. 9b. We can see that the ordinal filters are significantly
different to the filters used for iris recognition. And then the proposed linear programming
method is used to select the top 5 ordinal filters as shown in Fig. 11a. The experimental results
on the testing dataset show that we can only use the first two tri-lobe ordinal filters to achieve
state-of-the-art palmprint recognition performance. It is a grand challenge to search the huge
parameter space and find the optimal parameter setting of tri-lobe ordinal filters for palmprint
recognition because the design of tri-lobe ordinal filters totally involves 15 variables. Although
the top 2 trilobe ordinal filters selected from the random filter pool are good enough for
palmprint recognition, the candidate feature pool only has 500 tri-lobe ordinal filters and it is
possible to find better tri-lobe ordinal filters outside the candidate feature pool. Therefore we
further generate more tri-lobe ordinal filters based on the basic profiles of top 2 tri-lobe ordinal
filters by variations of the scale and location parameters of basic lobes in tri-lobe ordinal filters.
The newly generated tri-lobe ordinal filters are used to train a better palmprint recognition
Page 28

algorithm after the second round of feature selection (Fig. 1 1b).
The experimental results of the three feature selection methods on PolyU Palmprint Image
Database Ver 2.0 are shown in Fig. 12 and Table II. The state-of-the-art palmprint recognition
method based on competitive code and its variants and our previously proposed di-lobe OM are
used as the reference algorithms for performance comparison. We can see that the top 2 tri-lobe
ordinal filters in the first round of LP feature selection already achieve smaller EER than BoostOM, Lasso-OM, competitive code and di-lobe OM. Moreover, the LP-OM after the second round
of feature selection achieves the highest accuracy (EER= 6.19 10-5) with the smallest feature
template (256 Bytes) on the PolyU Ver 2.0 to the best of our knowledge.
Page 29
Page 30
VI. CONCLUSIONS
The authors have proposed a novel feature selection method to learn the most effective ordinal
features for iris and palmprint recognition based on linear programming. Due to the
incorporation of the large margin principle and weighted sparsity rules into the LP formulation
the LP feature selection becomes very successful. The feature selection model based on LP is
flexible to integrate the prior information of each feature unit related to biometric recognition
such as DI, EER and AUC into the optimization procedure. The experimental results have
demonstrated that the proposed LP feature selection method outperforms mRMR, ReliefF,
Boosting and Lasso.
A number of conclusions can be drawn from the study.
The identity information of visual biometric patterns comes from the unique structure of
ordinal measures. The optimal setting of parameters in local ordinal descriptors varies from
biometric modality to modality, subject to subject and even region to region. So it is
impossible to develop a common set of ordinal filters to achieve the best performance for all
visual biometric patterns. Ideally it is better to select the optimal ordinal filters to encode
individually specific ordinal measures via machine learning. However, such a personalized
solution is inefficient in large-scale personal identification applications. So the task of this
paper turns to a suboptimal solution, learning a common ordinal feature set for each biometric
modality, which is expected to work well for most subjects.
A main contribution of this paper is a novel optimization formulation for feature selection
based on linear programming (LP). Our expectations on the feature selection
results, i.e. an accurate and sparse ordinal feature set, can be described as a linear objective
function. Such a linear learning model has three advantages. Firstly, it is simple to build,
understand, learn and explain the feature selection model. Secondly, linear penalty term is
robust against outliers. Thirdly, linear model only needs a small number of training samples to
achieve a global optimization result with great generalization ability.
Weighted sparsity is proposed in this paper and the results show that it performs better than
traditional sparse representation methods. So it is better to incorporate prior information of
candidate features into the optimization model in sparse learning.
Page 31

The intra-class variations in visual biometrics mainly come from photometric (e.g.
illumination) and geometric changes (e.g. pose, deformation). In this paper we have shown that
LP feature selection is a good solution to sharp photometric variations and slight geometric
variations in iris and palmprint patterns. Our future work will apply LP feature selection to other
visual biometric traits such as palm vein, finger vein, face and fingerprint recognition, but some
additional efforts may be required to address the sharp geometric variations in face (pose) and
fingerprint (deformation) biometrics. The proposed linear programming formulation is used for
visual biometrics in this paper, but we think it is a general feature selection and sparse
representation method applicable to other computer vision and pattern recognition tasks.
Page 32
REFERENCES
[1] T. Tan and Z. Sun, Ordinal representations for biometrics recognition, in Proc. 15th Eur.
Signal Process. Conf., 2007, pp. 3539.
[2] Z. Sun and T. Tan, Ordinal measures for iris recognition, IEEE Trans. Pattern Anal. Mach.
Intell., vol. 31, no. 12, pp. 22112226, Dec. 2009.
[3] Z. Sun, T. Tan, Y. Wang, and S. Z. Li, Ordinal palmprint representation for personal
identification, in Proc. Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 1. 2005, pp. 279
284.
[4] P. Viola and M. Jones, Robust real-time face detection, Int. J. Comput. Vis., vol. 57, no. 2,
pp. 137154, May 2004.
[5]PolyU Palmprint Database [Online]. Available: http://www.comp.polyu.edu.hk/~biometrics/
[6] S. Z. Li, R. Chu, S. Liao, and L. Zhang, Illumination invariant face recognition using nearinfrared images, IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 4, pp. 627639, Apr.
2007.
[7] CASIA Iris Image Database [Online]. Available: http://biometrics.idealtest.org
Page 33

Ordinal Feature Selection For Iris and Palmprint Recognition+Report 2

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Ordinal Feature Selection For Iris and Palmprint Recognition+Report 2

Diunggah oleh

Hak Cipta:

Format Tersedia

Ordinal Feature Selection for Iris and Palmprint Recognition

Iris and Palmprint

recognition techniques. Iris recognition is an automated method of biometric identification that