Anda di halaman 1dari 57

Chapter 1 Introduction to Face Recognition

1.1

Image Processing An image is a single picture which represents something. It may be a picture of a

person, of people or animals, or of an outdoor scene, or a microphotograph of an electronic component, or the result of medical imaging. In electrical engineering and computer science, image processing is any form of signal processing, for which the input is the image, such as photograph or video frame and output may be either an image, or a set of characteristics (or parameters) related to the image. In image processing involves changing the nature of the image in order to improve the pictorial information for human interpretation and its further application or to render it more suitably for autonomous machine perception. Digital image processing is the use of computer algorithms to perform processing on digital images. A digital image can be considered as a large array of sampled points from the continuous image, each of which has a particular quantized brightness; these points are called as pixels. The pixels surrounding a given particular pixel constitute its neighborhood. A digital image processing allows the use of much more complex algorithms and the implementation of methods for processing, which are almost impossible by analog means. Digital image processing techniques now are used to solve a variety of problems. Analyzing and manipulating images with a computer. Image processing generally involves three major steps: (i) (ii) Import an image with an optical scanner or directly through digital photography. Manipulate or analyze the image in some way. This stage can include image enhancement and data compression, or the image may be analyzed to find patterns that aren't visible by the human eye. For example, meteorologists use image processing to analyze satellite photographs. (iii) Output the result. The result might be the image altered in some way or it might be a report based on analysis of the image. Technique of digital image processing can be described under some basic steps. Fundamental steps of digital image processing are: (1) acquisition (2) preprocessing (3) Segmentation (4) representation and description (5) recognition (6) knowledge about
1|P a g e

problem domain. Pictorial view of the digital image processing can be shown with the help of a block diagram. This particular diagram is shown in Fig.1 [4].
Preprocessing Segmentation Representation and description

Image acquisition

Knowledge base

Recognition and interpretation

Problem domain

Result

Fig.1. Block Diagram of Digital Image Processing [4] Brief overview of fundamental Steps of Digital Image Processing: In image acquisition digital image is acquired by one or another method. Preprocessing improves the digital image by noise reduction, contrast enhancement etc. of a particular image. Next stage segmentation deals with partitioning an input digital image into its constituent parts or objects. Representation transforms raw data into suitable from for its further processing. Description, which is also called as feature selection stage, deals with extracting different features from a digital image. Recognition and Interpretation is the process that assigns a label/name to all the features extracted by the descriptors in the previous stage and then to ensemble all the features together. Knowledge about a problem domain is coded in the form of a knowledge database. 1.2 Digital Image Fundamental There are various essential points regarding the digital image some of them are described below: 1.2.1 Neighbors of pixel A pixel p at coordinates (x, y) has four horizontal and vertical neighbors whose coordinates are given by: (x+1, y), (x-1, y), (x, y+1), (x, y-1). This set of pixels, called the
2|P a g e

4-neighbors of p, is denoted by N4 (p). Each pixel is a unit distance from (x, y), and some of the neighbors of p lie outside the digital image if (x, y) is on the border of the image. The four diagonal neighbors of the p have the coordinates and are denoted by ND (p). (x+1, y+1), (x+1, y-1), (x-1, y+1), (x-1, y-1). These points, together with the 4- neighbors, are called the 8-neighbors of p, denoted by N8 (p). 1.2.2 Connectivity, Adjacency Connectivity between pixels is a fundamental concept that simplifies the definition of numerous digital image concepts, such as regions and boundaries. The pixels are said to be connected if they are neighbors and their gray levels must satisfy a specified criterion of similarity. Adjacency can be considered of three types: a) 4-adjacency: Two pixels p and q with values from V are 4- adjacent if q is in the set N4 (p). b) 8-adjacency: Two pixels p and q with values from V are 8- adjacent if q is in the set N8 (p) c) M-adjacency: Two pixels p and q with values from V are 8- adjacent if q is in N4(p),or q is in Nd (p) and the set N4 (p) N4(q) has no pixels whose values are from V. 1.3 Image Segmentation For some applications such as image recognition or compression, we cannot process the whole image directly for the reason that it is inefficient and unpractical. Image segmentation is the initial stage of any recognition process where we simply subdivide the image into meaningful segments. Segmentation should stop when the objects of interest in an application have been isolated. Segmentation approaches broadly divided in two main ways, namely thresholding and edge-based methods. 1.3.1 Thresholding During the thresholding process, individual pixels in an image are marked as object pixels if their value is greater than some threshold value (assuming an object to be brighter than the background) and as "background" pixels otherwise. This convention is known as threshold above. Variants include threshold below, which is opposite of threshold above; threshold inside, where a pixel is labeled "object" if its value is between two thresholds; and threshold outside, which is the opposite of threshold inside . Typically, an object pixel is given a value of 1 while a background pixel is given a value of 0. Finally, a binary image is created by coloring each pixel white or black, depending
3|P a g e

on a pixel's labels. Thresholding techniques can be classified as employing either global or local methods. A global thresholding technique is one that thresholds the entire image with a single threshold value, whereas a local thresholding technique is one that partitions a given image into sub-images and determines a threshold for each of these sub-images. Threshold selection is the key parameter in the thresholding process is the choice of the threshold value (or values, as mentioned earlier). There are different methods for choosing a threshold exist and users can manually choose a threshold value, or a thresholding algorithm can compute a value automatically, which is known as automatic thresholding (Shapiro, et al. 2001:83). To choose a mean or median a simple method would be there, rationale being that if the object pixels are brighter than the background, they should also be brighter than the average. The mean or median will work well as the threshold if the noiseless image with uniform background and object values is there, however, this will generally not be the case. Another sophisticated approach might be to create a histogram of the image pixel intensities and use the valley point as the threshold. The histogram approach takes some average value for the background and object pixels, but that the actual pixel values have some variation around these average values. Histogram approach is computationally expensive, and image histograms may not have clearly defined valley points, often making the selection of an accurate threshold difficult. 1.3.2 Edge based segmentation Edge detection is particular has a staple of segmentation algorithms for detecting meaningful discontinuities in gray level image. Edge is set of connected pixels that lie on the boundary between the two regions. The edge segment generally is used if edge is short in a relation to the dimension of image. The classical approach to edge-based segmentation begins with edge enhancement, which makes use of digital versions of standard finite difference operators, as in first order gradient operators or in the secondorder Laplacian operator. 1.4 Image Representation The objective is to represent and describe the resulting aggregate of segmented pixels in a form suitable for further computer processing after segmenting an image into regions. Two choices for representing a region: External characteristics: its boundary; Internal characteristics: the pixels comprising the region.
4|P a g e

For example, a region may be represented by (a) its boundary with the boundary described by features such as its length, (b) the orientation of the straight line joining the extreme points, and (c) the number of concavities in the boundary. An external representation is chosen when the primary focus is on shape characteristics. An internal representation is selected when the primary focus is on reflectivity properties, such as color and texture. The segmentation techniques yield raw data in the form of pixels along a boundary or pixels contained in a region. Although these data are sometimes used directly to obtain descriptors (as in determining the texture of a region), standard practice is to use schemes that compact the data into representations that are considerably more useful in the computation of descriptors. This section introduces some basic representation schemes for this purpose. 1.4.1 Chain codes Chain codes are used to represent a boundary by a connected sequence of straight line segments of specified length and direction. The direction of each segment is coded by using a numbering scheme. This method generally is unacceptable to apply for the chain codes to pixels: The resulting chain of codes usually is quite long; Sensitive to noise: any small disturbances along the boundary owing to noise or imperfect segmentation cause changes in the code that may not necessarily be related to the shape of the boundary. A frequently used method to solve the problem is to resample the boundary by selecting larger grid spacing. 1.4.2 Polygonal approximation The objective is to capture the essence of the boundary shape with the fewest possible polygonal segments. This problem in general is not trivial and can quickly turn into a time-consuming iterative search. (i) Minimum-perimeter Polygons A given boundary is enclosed by cells. We can visualize this enclosure as consisting of two walls corresponding to the inside and outside boundaries of the cells. If the boundary is a rubber band, it will shrink the shape. The error in each cell would be at most 2d, where d is the pixel distance. (ii) Merging Technique
5|P a g e

It is based on error or other criteria have been applied to the problem of polygonal approximation. One approach is to merge points along a boundary until the least square error line fit of the points merged so far exceeds a preset threshold. Vertices do not correspond to corners of the boundary. 1.5 Image Analysis Image analysis is the process of discovering, identifying and understanding patterns that are relevant to the performance of an image-based task. One of the principal goals to image analysis by computer is to provide a machine with the capability to approximate. Thus an automated image analysis system should be capable of exhibiting various degrees of intelligence. Dividing the spectrum of techniques in image analysis into three basic areas is conceptually useful. These areas are (1) low-level processing, (2) intermediate-level processing, and (3) high-level processing. Although these subdivisions have no definite boundaries, they provide a useful framework for categorizing the various processes that are inherent components of an automated image analysis system. Low-level processing deals with functions that may be viewed as automatic reactions, requiring no intelligence on the part of the image analysis system. Image acquisition and pre-processing are treated as low-level functions. Intermediate-level processing deals with the task of extracting and characterizing components in an image resulting from a low-level process. Intermediate-level processes encompass segmentation and description (feature extraction). Finally, high-level processing involves recognition and interpretation. From 1964 until the present, the field of image processing has grown vigorously. Digital image processing techniques now are used to solve a variety of problems. In medicine, for instance, computer procedures enhance the contrast of code the intensity levels into color for easier interpretation of x-rays and other biomedical images. Similarly successful applications of image processing concepts can be found in astronomy, biology, nuclear medicine, law enforcement, defense and industrial applications. Typical problems in machine perception that routinely utilize image processing techniques are automatic character recognition, industrial machine vision for product assembly and inspection, military recognizance, automatic processing of fingerprints, screening of x-rays and blood samples, and machine processing of aerial and satellite imagery for weather prediction and crop assessment [1], [2], [3].

6|P a g e

Chapter 2 Pattern Recognition and Face Recognition


2.1 Introduction to pattern Recognition Informally, a pattern is defined by the common denominator among the multiple instances of an entity. For example, commonality in all fingerprint images defines the fingerprint pattern. See Figure 2 - showing a bunch of fingerprint of same finger.

Fig. 2 presents six fingerprints pattern of same finger of the same person [4]. A pattern is an entity, vaguely defined, that could be given a name, e.g. fingerprint image, handwritten word, human face, speech signal. Pattern recognition aims to make the process of learning and detection of patterns explicit, such that it can partially or entirely be implemented on computers. Automatic (machine) recognition, description,

classification (grouping of patterns into pattern classes) have become important problems in a variety of engineering and scientific disciplines such as biology, psychology, medicine, marketing, computer vision, artificial intelligence, and remote sensing. In almost any area of science in which observations are studied but the underlying mathematical or statistical models are not available, pattern recognition can be used to support human concept acquisition or decision making. Hence, it can be said that pattern recognition is the study of how machines (in place of human) can observe the environment, learn to distinguish patterns of interest, make sound and reasonable decisions about the categories of the patterns [19]. The design of a pattern recognition system essentially involves the four major aspects. A block diagram representing the various stages of pattern recognition is shown in Fig. 3.
Data acquisition and preprocessing Feature extraction Training Classification

Fig. 3 Block diagram of pattern classification.

7|P a g e

According to the block diagram of pattern recognition the description of various steps of pattern recognition are describe briefly as followed: (i) Data acquisition and Pre-processing: Different tasks can be performed under this stage, such as segmentation of the object to study, noise removal, normalizations etc. to pre-process the pattern to get acquisitive data from it. (ii) Feature extraction/selection: In this step, the data is reduced and represented with different qualitative or quantitative features or characteristics. Although most of the times feature selection and feature extraction are considered equivalent, extraction refers to creating new features using combinations or transformations of the existing feature set, whereas selection refers to selecting a subset of the available features without transformations. (iii) Classification and training: At this stage, the features are analyzed to determine to what class or category the pattern belongs. The features should discriminate among the different classes to recognize and be similar to the pattern of the same class. Given a group of objects, there are two ways to build a classification or recognition (Watanabe 1985), supervised, i.e., with a teacher, or unsupervised, without the help of a teacher, see Figure 4(a, b).

4(a)

4(b)

Fig. 4(a) supervised pattern recognition deals with classifying objects with (known) different labels. Fig. 4 (b) unsupervised pattern recognition, classes or subclasses have to be derived from the data. Interest in pattern recognition has been renewed recently due to emerging applications which are not only challenging but also computationally more demanding such as manufacturing industry, health-care and military. Some of them are optical character recognition, automatic speech recognition, recognition of objects on earth from the sky (by satellites), distinguish normal cells from cancerous cells, personal identification systems that use biometrics are very important for security applications in airports , ATMs, shops, hotels, and secure computer access. Recognition can be based on

8|P a g e

face fingerprint, iris or voice, and can be combined with the automatic verification of signatures and PIN codes [4]. 2.2 Face Recognition Face Recognition is an application of pattern recognition, performed specifically on faces. Face Recognition is one of the most successful applications of image analysis and understanding. Face Recognition system is a process of automatically identifying or verifying a persons face from a given database of digital image or a video frame. One of the ways to do this is by comparing selected facial features from the image, and then compares them with a facial database [5]. Face Recognition has received substantial attention in past several years by the researchers, because of its wide application area in commercial, security, and forensic fields. Facial recognition technology (FRT) has emerged as an attractive solution to address many contemporary needs for identification and the verification of identity claims of unknown faces. FRT has proven effective, with relatively small populations in controlled environments, for the verification of identity claims, in which an image of an individuals face is matched to a pre-existing image onfile associated with the claimed identity (the verification task).Face Recognition is an effective biometric approach that employs automated methods to verify or recognize the identity of a living person based on his/her face. Face Recognition is better approach than other biometric approach [6]. Face Recognition System can be classified into two types (i) Face verification System (ii) Face Identification system. (i) Face verification (Am I who I say I am?) is a one-to-one match that compares a query face image against a template face image whose identity is being claimed. To evaluate the verification performance, the verification rate (the rates at which legitimate users are granted access) vs. false accepts rate (the rate at which imposters is granted access) is plotted. (ii) Face identification (Who am I?) is a one-to-many matching process that compares a query face. Face image against all the template images in a face database to determine the identity of the query face. The identification of the test image is done by locating the image in the database that has the highest similarity with the test image. The identification process is a closed test, which means the sensor takes an observation of an individual that is known to be in the database. The test subjects (normalized) features are compared to the other features in the systems database and a similarity score is found for each comparison. These
9|P a g e

similarity scores are then numerically ranked in a descending order. The percentage of times that the highest similarity score is the correct match for all individuals is referred to as the top match core. 2.2.1 A Typical Face Recognition System A typical Face Recognition system has mainly six functional blocks which explains the main working strategy of the Face Recognition system. Fig. 5 represents the outline of a typical Face Recognition system. This outline carries the main characteristics of a typical pattern recognition system.
Face image Normalized face image

Acquisition

Pre-processing

Feature extractor

Face database

Training sets

Classifier Classified as known or unknown

Fig.5: Outline of a Typical Face Recognition System A brief explanation of six main functional blocks is given below: 1) Acquisition module: The first step in the facial recognition process is the acquiring of a face images. Acquisition module requests a face image from several different environments: This is normally being done using a still or video camera or file located on a magnetic disk. Images can be captured by a frame grabber or it can be scanned from paper with the help of a scanner. 2) Pre-processing module: In this module, by means of early vision techniques, face images are normalized and if desired, they are enhanced to improve the recognition performance of the system. Some or all of the following pre-processing step may be implemented: Image size normalization. It is usually done to change the acquired image size to image size on which particular Face Recognition system operates
10 | P a g e

Histogram equalization. Too dark or too bright images undergo this process in order to enhance image quality which than turn improve Face Recognition performance. It modifies the dynamic range (contrast range) of the image and as a result, some important facial features become more apparent.

Median filtering. For noisy images especially obtained from a camera median filtering can clean the image without losing information. High-pass filtering. Feature extractors that are based on facial outlines, may get benefit the results that are obtained from an edge detection scheme. High-pass filtering emphasizes the details of an image such as contours which can dramatically improve edge detection performance.

3)

Feature extraction module: After performing some pre-processing (if necessary), the normalized face image is presented to the feature extraction module in order to find the key features that are going to be used for classification. In other words, this module is responsible for composing a feature vector that is well enough to represent the face image.

4)

Classification module: With the help of a face classifier, extracted features of the face are compared with the features stored in a face library (or face database). After this, known faces can be recognized from the database.

5)

Training set: Training sets are used during the "learning phase" of the Face Recognition process in supervised face classifiers. The feature extraction and the classification modules adjust their parameters in order to achieve optimum

recognition performance by making use of training sets. 6) Face library or face database: After being classified as "unknown", face images can be added to a library (or to a database) with their feature vectors for later comparisons. The classification module makes direct use of the face library. 2.3 Feature Extraction Features are the basic elements for any of pattern recognition. Feature extraction is one of the challenging parts of our Face Recognition system. Our goal is to find the set of numerical features that best represent our data, as if the features are well representative and discriminative among the classes, the work of the classifier becomes easier. Furthermore, it is desirable that these features should be invariant to translation, rotation
11 | P a g e

and size changes of the objects to be classified. Hence, feature selection is a critical point, as it depends totally on the characteristics of the patterns to recognize, whereas the classifiers are standardized tools. Another objective of feature extraction is dimensionality reduction of the data and features extracted and consequently making the work of the classifier easy. As reviewed from the literature, facial feature extraction can be classified into the following subsections [7]: (i) Geometry-based technique The features are extracted using geometric information such as relative positions and sizes of the face components. This technique is proposed by Kanade [8] the eyes, the mouth and the nose base are localized using the vertical edge map. These techniques require threshold, which, given the prevailing sensitivity, may adversely affect the achieved performance. (ii) Template-based This technique, match facial components to previously designed templates using appropriate energy functional. The best match of a template in the facial image proposed by Yuille et al. [9] will yield the minimum energy, where these algorithms require a priori template modeling, in addition to their computational costs, which clearly affect their performance. Genetic algorithms have been proposed for more efficient searching times in template matching. (iii) Color segmentation techniques Color segmentation technique makes use of skin color to isolate the face. Any nonskin color region within the face is viewed as a candidate for eyes and/or mouth. The performance of such techniques on facial image databases is rather limited, due to the diversity of ethnical backgrounds [10] (iv) Appearance-based approaches The concept of feature in these approaches differs from simple facial features such as eyes and mouth. Any extracted characteristic from the image is referred to a feature. This approach is further classified in two ways linear or non-linear. Methods such as principal component analysis (PCA), independent component analysis, and LDA come under the linear and KPCA and others comes under the non-linear, are used to extract the feature vector [11]

12 | P a g e

2.4

Methods of Face Recognition Under the various above mentioned four Face Recognition techniques there are

different methods of feature extraction. According to the literature survey, list of some of the frequently used feature extraction methods is as (i) PCA, (ii) LDA, (iii) SOM, and (iv) ICA. Brief theory explaining these techniques is given below: 2.4.1 Principal Component Analysis (PCA) PCA was invented in 1901 by Karl Pearson. It is a standard technique used in statistical pattern recognition and signal processing for data reduction and feature extraction. Principal Component Analysis (PCA) is a dimensionality reduction technique based on extracting the desired number of principal components of the multi-dimensional data. The purpose of PCA is to reduce the large dimensionality of the data space (observed variables) to the smaller intrinsic dimensionality of feature space (independent variables), which are needed to describe the data economically. This is the case when there is a strong correlation between observed variables. It has been successfully used in Face Recognition systems in 1988 by L.Sirovich and M. Kirby [12]. PCA based Face Recognition system seeks to capture the variation in a collection of face images and use this information to encode and compare images of individual faces in a holistic manner. In mathematical terms, Eigen-face method finds the principal components of the distribution of faces. First the eigenvector (that be thought of as a set of features that together characterize the variation between face images) of the covariance matrix of the set of face images is found out and then they are sorted according to their corresponding eigen-values. Then a threshold eigen-value is taken into account and eigenvectors with eigen-values less than that threshold values are discarded. So ultimately the eigenvectors having the most significant eigen-values are selected. Then the set of face images are projected into the significant eigenvectors to obtain a set called eigenfaces. Every face has a contribution to the eigenfaces obtained. The best M eigenfaces from a M dimensional subspace is called face space. Each individual face can be represented exactly as the linear combination of eigenfaces or each face can also be approximated using those significant eigenfaces obtained using the most significant eigen-values. The number of possible Eigen-faces is equal to the number of face images in the training set. However we can also represent the faces by approximating these by the best Eigen-faces having largest Eigen-values which in turn account for the most variance within the set of face images. This increases the computational efficiency. The following steps are involved in the recognition process [13]:
13 | P a g e

(i) Initialization: The training set of face images is acquired and Eigen-faces are calculated which define the face space, (ii) When a new face is encountered, a set of weights based on input image and M Eigenfaces is calculated by projecting the input image onto each of the Eigen-faces, (iii) The image is determined to be face or not by checking if it is sufficiently close to face space, and (iv) If it is a face, the weight patterns are classified as either a known person or an unknown one. 2.4.2 Linear Discriminant Analysis (LDA) LDA is used in statistics, pattern recognition and machine learning by finding a linear combination of features which characterizes or separates two or more classes of objects or patterns. The resulting module may be further used for dimension reduction before classification which has to be done later. LDA works when the measurements made on independent variables for each observation. LDA is also closely related to principal component analysis (PCA) and factor analysis as they both look for linear combinations of variables which best explain the data. LDA explicitly attempts to model the difference between the classes of data. PCA on the other hand does not take into account any difference in class, and factor analysis builds the feature combinations based on differences rather than similarities. Discriminant analysis is also different from factor analysis in that it is not an interdependence technique: a distinction between independent variables and dependent variables (also called criterion variables) must be made [14]. 2.4.3 Independent Component Analysis Independent Component Analysis aims to transform the data as linear combinations of statistically independent data points. Therefore, its goal is to provide an independent rather that uncorrelated image representation. ICA is an alternative to PCA which provides a more powerful data representation [15]. Its a discriminant analysis criterion, which can be used to enhance PCA. 2.4.4. Neural Network Neural networks can be viewed as massively parallel computing systems consisting of an extremely large number of simple processors with many interconnections. Neural network models attempt to use some organizational principles (such as learning, generalization, adaptivity, fault tolerance and distributed representation, and computation) in a network of weighted directed graphs in which the nodes are artificial neurons and directed edges (with weights) are connections between neuron outputs and neuron inputs.[16] The main characteristics of neural networks are that they have the ability to
14 | P a g e

learn complex nonlinear input-output relationships, use sequential training procedures, and adapt themselves to the data. The increasing popularity of neural network models to solve pattern recognition problems has been primarily due to their seemingly low dependence on domain-specific knowledge (relative to model-based and rule-based approaches) and due to the availability of efficient learning algorithms for practitioners to use. Neural networks provide a new suite of nonlinear algorithms for feature extraction (using hidden layers) and classification. 2.4.5 Self-Origination Map (SOM) A self-organizing map (SOM) or self-organizing feature map (SOFM) is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional) representation of the input space of the training samples. Self-organizing maps are different from other artificial neural networks in the sense that they use a neighborhood function to preserve the topological properties of the input space. This makes SOMs useful for visualizing low-dimensional views of high-dimensional data. Like most artificial neural networks, SOMs operate in two modes: training and mapping. Training builds the map using input examples. It is a competitive process, also called vector quantization. Mapping automatically classifies a new input vector. A self-organizing map consists of components called nodes or neurons. Associated with each node is a weight vector of the same dimension as the input data vectors and a position in the map space. The usual arrangement of nodes is a regular spacing in a hexagonal or rectangular grid. The selforganizing map describes a mapping from a higher dimensional input space to a lower dimensional map space. The procedure for placing a vector from data space onto the map is to first find the node with the closest weight vector to the vector taken from data space. Once the closest node is located it is assigned the values from the vector taken from the data space [17]. 2.5 Face Classification Once the features are extracted and selected, the next step is to classify the image. Appearance-based Face Recognition algorithms use a wide variety of classification methods. Sometimes two or more classifiers are combined to achieve better results. On the other hand, most model-based algorithms match the samples with the model or template. Then, a learning method is can be used to improve the algorithm. One way or another, classifiers have a big impact in Face Recognition. Classification methods are used in many areas like data mining, finance, signal decoding, voice recognition, natural language
15 | P a g e

processing or medicine. Classification algorithms usually involve some learning supervised, unsupervised or semi-supervised. Unsupervised learning is the most difficult approach, as there are no tagged examples. However, many Face Recognition applications include a tagged set of subjects. Consequently, most Face Recognition systems implement supervised learning methods. There are also cases where the labeled data set is small. Sometimes, the acquisition of new tagged samples can be infeasible. Therefore, semisupervised learning is required. 2.5.1 Classifiers According to Jain, Duin and Mao [16], there are three concepts that are key in building a classifier - similarity, probability and decision boundaries. From that classifiers point of view classifiers are explained. (i) Similarity This approach is intuitive and simple. Patterns that are similar should belong to the same class. This approach has been used in the Face Recognition algorithms implemented later. The idea is to establish a metric that defines similarity and a representation of the same-class samples. For example, the metric can be the euclidean distance. The representation of a class can be the mean vector of all the patterns belonging to this class. The 1-NN decision rule can be used with these parameters. Its classification performance is usually good. This approach is similar to a k-means clustering algorithm in unsupervised learning. There are other techniques that can be used. For example, Vector Quantization, Learning Vector Quantization or Self-Organizing Maps - see 1.4. Other example of this approach is template matching. Researches classify Face Recognition algorithm based on different criteria. Some publications defined Template Matching as a kind or category of Face Recognition algorithms [18]. However, we can see template matching just as another classification method, where unlabeled samples are compared to stored patterns. (ii) Probability Some classifiers are constructing based upon a probabilistic approach. Bayes decision rule is often used. The rule can be modified to take into account different factors that could lead to miss-classification. Bayesian decision rules can give an optimal classifier, and the Bayes error can be the best criterion to evaluate features. Therefore, a posteriori probability functions can be optimal. There are
16 | P a g e

different Bayesian approaches. One is to define a Maximum a Posteriori (MAP) decision rule. There are other approaches to Bayesian classifiers. Moghaddam et al. proposed on [19] an alternative to the MAP - the maximum likelihood (ML). They proposed a non-euclidean measure similarity measure, and two classes of facial image variations: Differences between images from the same individual (I , interpersonal) and variations between different individuals. This algorithms estimate the densities instead of using the true densities. These density estimates can be either parametric or nonparametric. Commonly used parametric models in Face Recognition are multivariate Gaussian distributions, as in [19]. Two wellknown non-parametric estimates are the k-NN rule and the Parzen classifier.

METHODS Template-matching

NOTES Assign sample to most similar template. Templates must be normalized.

Nearest Mean 1-NN k-NN

Assign pattern to nearest class mean. Assign pattern to nearest patterns class Like 1-NN, but assign to the majority of k nearest patterns.

Self-Organizing Maps (SOM)

Assign pattern to nearest node, then update nodes pulling them closer to input pattern.

Table 1.1: Similarity-based classifiers (iii) Decision boundaries This approach can become equivalent to a Bayesian classifier. It depends on the chosen metric. The main idea behind this approach is to minimize a criterion (a measurement of error) between the candidate pattern and the testing patterns. One example is the Fishers Linear Discriminant (often FLD and LDA are used interchangeably). Its closely related to PCA. FLD attempts to model the difference between the classes of data, and can be used to minimize the mean square error or the mean absolute error. Other algorithms use neural networks. Multilayer perceptron is one of them. They allow nonlinear decision boundaries. However, neural networks can be trained in many different ways, so they can lead to diverse classifiers. They can also provide a confidence in classification, which can give an approximation of the posterior probabilities. Assuming the use of a Euclidean
17 | P a g e

distance criterion, the classifier could make use of the three classification concepts here explained. A special type of classifier is the decision tree. It is trained by an iterative selection of individual features that are most salient at each node of the tree. During classification, just the needed features for classification are evaluated, so feature selection is implicitly built-in. The decision boundary is built iteratively. See table 1.2 for some decision boundary-based methods, including the ones proposed in [19]:
METHODS Fisher Linear Discriminant (FLD) Binary Decision Tree NOTES Linear classifier. Can use MSE optimization Nodes are features. Can use FLD. Could need pruning. Perceptron Multi-layer Perceptron Iterative optimization of a classifier (e.g. FLD) Two or more layers. Uses sigmoid transfer functions. Radial Basis Network Optimization of a Multi-layer perceptron. One layer at least uses Gaussian transfer functions.

Table 1.2: Classifiers using decision boundaries 2.5.2 Classifier combination The classifier combination problem can be defined as a problem of finding the combination function accepting N-dimensional score vectors from M classifiers and outputting N final classification scores [20]. There can be several reasons to combine classifiers in Face Recognition. The designer has some classifiers, each developed with a different approach. For example, there can be a classifier designed to recognize faces using eyebrow templates. We could combine it with another classifier that uses other recognition approach. This could lead to a better recognition performance. There can be different training sets, collected in different conditions and representing different features. Each training set could be well suited for a certain classifier. Those classifiers could be combined. .One single training set can show different results when using different classifiers. A combination of classifiers can be used to achieve the best results. .Some classifiers differ on their performance depending on certain initializations. Instead of choosing one classifier, we can combine some of them. There are different combination schemes. They may differ from each other in their architectures and the selection of the combiner. Combiner in pattern recognition usually uses a fixed amount of classifiers. This allows taking advantage of the strengths of each
18 | P a g e

classifier. The common approach is to design certain function that weights each classifiers output score. Then, there must be a decision boundary to take a decision based on that function. Combination methods can also be grouped based on the stage at which they operate. A combiner could operate at feature level. The features of all classifiers are combined to form a new feature vector. Then a new classification is made. The other option is to operate at score level, as stated before. This approach separates the classification knowledge and the combiner. These types of combiners are popular due to that abstraction level. However, combiners can be different depending on the nature of the classifiers output. The output can be a simple class or group of classes (abstract information level). A classifier could have a more informative output by including some weight or confidence measure to each class (measurement level). If the combination involves very specialized classifiers, each of them usually has a different output. Combining different output scales and confidence measures can be a tricky problem. However, they will have a similar output if all the classifiers use the same architecture. Combiners can be grouped in three categories according to their architecture: Parallel. All classifiers are executed independently. The combiner is then applied. Serial. Classifiers run one after another. Each classifier polishes previous results. Hierarchical. Classifiers are combined into a tree-like structure. Combiner functions can be very simple or complex. A low complexity combination could require only one function to be trained, whose input is the scores of a single class. The highest complexity can be achieved by defining multiple functions, one for each class. They take as parameters all scores. So, more information is used for combination. Higher complexity classifiers can potentially produce better results. The complexity level is limited by the amount of training samples and the computing time. Thus its very important to choose a complexity level that best complies these requirements and restrictions. 2.6 Face Recognition: Different Approaches Face Recognition is an evolving area, changing and improving constantly. Many research areas affect Face Recognition - computer vision, optics, pattern recognition, neural networks, machine learning, and psychology. Previous sections explain the different steps of a Face Recognition process. However, these steps can overlap or change
19 | P a g e

depending on the bibliography consulted. There is not a consensus on that regard. All these factors hinder the development of a unified Face Recognition algorithm classification scheme. This section explains the most cited criteria of Face Recognition [7]. Geometric or Feature Based approaches Wholistic approaches Template/statistical/neural network approaches

2.6.1 Geometric or Feature Based Approaches The geometry feature-based methods analyze local facial features and their geometric relationships. This approach is sometimes called feature-based approach [18]. Examples of this approach are some Elastic Bunch Graph Matching algorithms [21]. There are algorithms developed using both approaches. For instance, a 3D morphable model approach can use feature points or texture as well as PCA to build a recognition system. 2.6.2 Wholistic approaches Faces can often be identified from little information. Some algorithms follow this idea, processing facial features independently. In other words, the relation between the features or the relation of a feature with the whole face is not taken into account. Many early researchers followed this approach, trying to deduce the most relevant features. Some approaches tried to use the eyes, a combination of features [18], and so on. Some methods of this approach are eigenfaces (most widely used method for Face Recognition), fisher faces, support vector machines, nearest feature lines, (NFL) and independentcomponent analysis approaches. They are all based on principal component-analysis (PCA) techniques that can be used to simplify a dataset into lower dimension while retaining the characteristics of dataset. In fact, facial features are processed wholistic. Thats why nowadays most algorithms follow a holistic approach. 2.6.3 Template/statistical/neural network approaches

A similar separation of pattern recognition algorithms into four groups is proposed by Jain and colleges in [19]. We can group Face Recognition methods into three main groups. The following approaches are proposed: i. Template matching: A template matching process uses pixels, samples, models or textures as pattern. The recognition function computes the differences between these features and the stored templates. It uses correlation or distance measures.
20 | P a g e

Although the matching of 2D images was the early trend, nowadays 3D templates are more common. ii. Statistical approach: Patterns are represented as features. The recognition function is a discriminant function. Various methods for statistical approach are PCA, LDA, and ICA etc. Traditional statistical classification procedures such as discriminant analysis are built on the Bayesian decision theory [21]. In these procedures, an underlying probability model must be assumed in order to calculate the posterior probability upon which the classification decision is made. One major limitation of the statistical models is that they work well only when the underlying assumptions are satisfied. The effectiveness of these methods depends to a large extent on the various assumptions or conditions under which the models are developed. Users must have a good knowledge of both data properties and model capabilities before the models can be successfully applied. iii. Neural networks: Neural network are nonlinear information processing devices, which are built from interconnected elementary processing devices called neurons. The representation may vary. Note that many algorithms, mostly current complex algorithms, may fall into more than one of these categories. 2.7 Neural Network Artificial neural networks are a popular tool in Face Recognition. Kohonen [17] was the first to demonstrate that a neuron network could be used to recognize aligned and normalized faces. Many different methods based on neural network have been proposed since then. Some of these methods use neural networks just for classification. Neural networks have emerged as an important tool for classification. The recent vast research activities in neural classification have established that neural networks are a promising alternative to various conventional classification methods. The advantage of neural networks lies in the following theoretical aspects. (i) Neural networks are data driven self-adaptive methods in that they can adjust themselves to the data without any explicit specification of functional or distributional form for the underlying model. (ii) Neural network are universal functional approximator in that neural networks can approximate any function with arbitrary accuracy [23], [24], [25]. Since any classification procedure seeks a functional relationship between the group
21 | P a g e

membership and the attributes of the object, accurate identification of this underlying function is doubtlessly important. (iii) Neural networks are nonlinear models, which makes them flexible in modeling real world complex relationships. (iv) Neural networks are able to estimate the posterior probabilities, which provide the basis for establishing classification rule and performing statistical analysis [26]. (v) Neural networks have been successfully applied to a variety of real world classification tasks in industry, business and science. Although many types of neural networks can be used for classification purposes [27], but the feed forward multilayer networks or multilayer perceptron (MLPs) and c (RBF) are the most widely studied and used neural network classifiers. A multi-layer neural network consists of large number of units (neurons) joined together in a pattern of connections. Units in a net are usually segregated into three classes: input units, which receive information to be processed; output units, where the results of the processing are found; and units in between known as hidden units. Feedforward ANNs allow signals to travel one way only, from input to output. An example of a multilayer perceptron neural network is shown in Fig. 6.

Fig. 6. The Architecture of a Typical MLP Network Radial Basis Function network (RBFN) contains one input layer and one output layer with a single hidden layer as shown in figure 6. The radial basis functions are used within the hidden layer. The training is done by adjusting the center parameters in the radial basis functions that will be used to calculate the connection strengths between the hidden layer and output layer. An RBF neural network structure is similar to a traditional three-layer feed forward neural network. The construction of the RBF neural network involves three different layers with feed forward architecture. The input layer of this network is a set of n units, which accept the elements of an n dimensional input feature
22 | P a g e

vector. The input units are fully connected to the hidden layer with r hidden units. Connections between the input and hidden layers have unit weights and, as a result, do not have to be trained. The goal of the hidden layer is to cluster the data and reduce its dimensionality. In this structure hidden layer is named RBF units. The RBF units are also fully connected to the output layer. The output layer supplies the response of neural network to the activation pattern applied to the input layer. The transformation from the input space to the RBF-unit space is nonlinear, whereas the transformation from the RBF unit space to the output space is linear [28]. The RBF network with input layer, RBF layer, output layer shown in figure 7.

Input Layer

RBF Unit

Output Unit

Fig .7 The Architecture of Radial Basis Network Neural networks with Gabor filter Bhuiyan et al. proposed in 2007 a neural network method combined with Gabor filter [29]. Their algorithm achieves Face Recognition by implementing a multilayer perceptron with back-propagation algorithm. Firstly, there is a pre-processing step. Every image is normalized in terms of contrast and illumination. Noise is reduce by a fuzzily skewed filter. It works by applying fuzzy membership to the neighbor pixels of the target pixel. It uses the median value as the 1 value membership, and reduces the extreme values, taking advantage from median filter and average filter. Then, each image is processed through a Gabor filter. The filter is represented as a complex sinusoidal signal modulated by a Gaussian kernel function. The Gabor filter has five orientation parameters and three spatial frequencies, so there are 15 Gabor wavelets. The architecture of the neural network is illustrated in figure.

23 | P a g e

Fig.8 The architecture of the neural network with Gabor filters. To each face image, the outputs are 15 Gabor-images which record the variations measured by the Gabor filters. The first layer receives the Gabor features. The number of nodes is equal to the dimension of the feature vector containing the Gabor features. The output of the network is the number of images the system must recognize. The training of the network, the back propagation algorithm, follows this procedure: 1. Initialization of the weights and threshold values. 2. Iterative process until termination condition is fulfilled: Activate, applying input and desired outputs. Calculate actual outputs of neurons in hidden and output layers, using sigmoid activation function. Update weights, propagating backwards the errors. Increase iteration value Although the algorithms main purpose is to face illumination variations, it shows a useful neural network application for Face Recognition. It could be useful with some improvements in order to deal with pose and occlusion problems. Fuzzy Neural Networks The introduction of fuzzy mathematics in neural networks for Face Recognition is another approach. Bhattacharjee et al. developed in 2009 a Face Recognition system using a fuzzy multilayer perceptron (MLP) [30]. The idea behind this approach is to capture decision surfaces in non-linear manifolds, a task that a simple MLP can hardly complete. The feature vectors are obtained using Gabor wavelet transforms. Then, the output vectors obtained from that step must be fuzzified. This process is simple: The more a feature vector approaches towards the class mean vector, the higher is the fuzzy value. When the difference between both vectors increases, the fuzzy value approaches towards 0. The selected neural network is a MLP using back-propagation. There is a network for each
24 | P a g e

class. The examples of this class are class-one, and the examples of the other classes form the class-two. Thus, it is a two-class classification problem. The results of the algorithm show a 2.125 error rate using ORL database. 2.8 Genetic Algorithm Genetic algorithm was developed by John Holland- University of Michigan (1970s)- to provide efficient techniques for optimization and machine learning applications through application of the principles of evolutionary biology to computer science. Genetic algorithms are family of computational model inspired by evolution. They belong to the stochastic algorithm whose search methods model some natural phenomena. GAs use a direct analogy of natural behavior. The algorithm starts with an initial set of random solutions called population. Each individual in the population, known as chromosome, represents a particular solution of the problem. Each chromosome is assigned a fitness value depending on how good its solution of the problem. Each chromosome is assigned a fitness value depending among how good its solution to the problem is after the fitness allotment, the natural selection is executed and the survival of the fittest chromosome can prepare to breed for the next generation. A new population is then generated by means of genetic operations: cross over & mutation. This evolution process is iterated until near-optimal solution is obtained or a given number of generations are reached. The fitness of a chromosome is defined as the function of the difference between the intensity value of the input image and that of the template image measured for the expected location of the chromosome. The effectiveness and robustness of the algorithm is justified using different images with various kinds of expressions. When a complex image is subjected in the input, the face detection result highlights the facial part of the image. The system can also cope with the problem of partial occlusions of mouth & wearing sunglasses images of different persons are taken at their own places and at different environments both in shiny & gloomy weather. The algorithm is capable of detecting single face in an image. The simplest form of GA involves three types of operators: 1) Selection: This operator selects chromosomes in the population for reproduction. The fitter the chromosome, the more times it is likely to be selected to reproduce. 2) Crossover: This operator exchanges subsequences of two chromosomes to create two offspring. For example, the strings 10000100 and 11111111
25 | P a g e

Could be crossed over after the third locus in each to produce the two offspring 10011111 and 11100100: This operator roughly mimics biological recombination between two singlechromosomes Organisms. (Higher organisms have chromosomes in pairs and are thus \diploid.").
Phenotype

Population (chromosome)

Selection

Mating pool (parent)


Replacement Genetic operation Sub-population Fitness Fitness Objective function

(offspring)

Phenotype

Fig. 8 Genetic Algorithm Cycle [81] 3) Mutation: This operator randomly selects some bits in a chromosome. For example, the string 00000100 might be mutated in its second position to yield 01000100. Mutation can occur at each bit position in a string with some probability, usually very small (e.g., 0.001) [82]. GAs characteristics They are independent on functional derivatives. They are parallel search procedure that can be implemented on parallel processing machines for massively speeding up their operation. They are applicable to both continuous and discrete optimization problem. GAs is less stochastic and less likely to trap to local minima, which inevitably are present in any optimization application.
26 | P a g e

GA flexibility facilitate both structure and parameter identification in complex model such as neural network. There are lots of advantages of genetic algorithm some of them are define here (i)

it can solve every optimization problem which can be described with the chromosome encoding. (ii) It solves problems with multiple solutions. (iii) genetic algorithm is method which is very easy to understand and it practically does not demand the knowledge of mathematics.(iv) genetic algorithms are easily transferred to existing simulation model. 2.9 Application of Face Recognition Biometric Photos of faces are widely used in mug shot identification (e.g., for passports and drivers licenses), where the possession authentication protocol is increased with a photo for manual inspection purposes; there is wide public acceptance for this biometric identifier Face Recognition system are used in Drivers licenses, immigration, national ID, passport. It is a fairly good biometric identifier for small-scale verification applications

Law enforcement and surveillance These applications include automated crowd surveillance, access control, face reconstruction, design of human computer interface (HCI), multimedia communication, and content-based image database management. Used in post-event analysis, shoplifting and suspect tracking and investigation. They are also used in CCTV control and advance video surveillance. Face-recognition can, at least in theory, be used for screening of unwanted individuals in a crowd, in real time Smart cards Stored value security, user authentication.

Access control This is used in facility access and vehicular access.

Information security Used in desktop logon (window NT, Window 95), application security. Intranet security, database security, files encryption.

27 | P a g e

Chapter 3 Literature Review


Face Recognition is one of the most relevant applications of image analysis. Its a true challenge to build an automated system which equals human ability to recognize faces. Although humans are quite good identifying known faces, but not very skilled when have to deal with a large amount of faces. The computers, with an almost limitless memory and computational speed, can overcome humans limitations to some extent. This literature survey describes the work of various researchers in the field of face recognition and the papers that are use in this literature survey are arranged in chronological order. Bledsoe W. et al. [31] researched on the programming computers to recognize human faces (Bledsoe 1966a, 1966b; Bledsoe and Chan 1965). Because the funding was provided by an unnamed intelligence agency, little of the work was published. The project was labeled man-machine because the human extracted the coordinates of a set of features from the photographs, which were then used by the computer for recognition. The

operator would extract the coordinates of features such as the center of pupils, the inside corner of eyes, the outside corner of eyes, point of widows peak, and so on. From these coordinates, a list of 20 distances, such as width of mouth and width of eyes, pupil to pupil, were computed. These operators could process about 40 pictures an hour. When building the database, the name of the person in the photograph was associated with the list of computed distances and stored in the computer. In the recognition phase, the set of distances was compared with the corresponding distance for each photograph, yielding a distance between the photograph and the database record. The closest records were returned. This brief description is an oversimplification that fails in general because it is unlikely that any two pictures would match in head rotation, lean, tilt, and scale (distance from the camera). Thus, each set of distances is normalized to represent the face in a frontal orientation. To accomplish this normalization, the program first tries to determine the tilt, the lean, and the rotation. Then, using these angles, the computer undoes the effect of these transformations on the computed distances. To compute these angles, the computer must know the three-dimensional geometry of the head. Because the actual

28 | P a g e

heads were unavailable, Bledsoe (1964) used a standard head derived from measurements on seven heads. The paper entitled Identification of human faces by A. Jay Goldstein et al. [32] described a vector to recognize faces using pattern classification techniques which were containing 21 subjective features like ear protrusion, eyebrow weight or nose length. For this vector three classes of experiments were used: 1) Gathering, analysis, and assessment of face-feature data for 256 faces. 2) Computer identification-studies. 3) Human

identification- studies. 256 persons photograph in three different views full face, 3/4 view, and profile were taken with carefully arranged technique. From the main set of 34 features, only 22 features were evolved and rest of them were excluded to provide significant, inbred, self-sufficient measures which could be judged by judges reliably. Computer studies represents the behavior of person in a face-identification task a

mathematical model established limits of performance of a person attempting to isolate a face from a population using feature descriptions. The model predicts that under certain conditions approximately 6 of an individuals features are required to isolate him from a population of 255. Human identification studies showed that in the search for a target, the original population can be reduced to 2 percent in over 52 percent of the tires and to 10 percent in over 66 percent of the tries. Human experiments under similar conditions showed unique identification occurred with an average of about 7 features. The model predicts that for a population of 4x10. Only 14 feature-descriptions are required. These studies form a foundation for continuing research on real-time man-machine interaction for computer classification and identification of multidimensional vectors specified by noisy components. The problem with both of this early solutions given by Bledsoe [31] or Goldstone [32] was that the location and measurement were manually computed and T. Kenade [33] was the first person who developed a fully automated Face Recognition system in 1973 and described his research in the paper entitled Picture Processing System by Computer Complex and Recognition of Human Faces. In which all steps were automated and algorithm detect 16 facial parameters from a single face image and use a pattern classification technique to match the face from a known set. Kanade worked solely with grayscale images. In his algorithm, the grayscale image was filtered with a Laplacian operator to extract edges. Then the head top, face sides, nose, mouth, chin, chin contour, and eyes were detected, in that order, by analyzing horizontal and vertical integral projections. This technique involved finding the integral projection of a slice of the image,
29 | P a g e

and then comparing it with a database of stored characteristic projection patterns. The center of the eyes, nose, and mouth are located by fusing and shrinking the edge map. There are a few problems with this approach. Edge detection can be sensitive to noise, and a noisy edge map makes it very difficult to isolate facial features from background noise. Also, the recognition of a feature from its edge map was a problem in itself. Kanades algorithm does not account for facial hair or a complex background. Edges detected in a moustache, beard, or a non-blank background would interfere with the search for facial features. Kanades algorithm also cannot account for different orientations of the face or different scales of the face within the image. Fischler and Elschlager [13] attempted to measure features automatically and they described a linear embedding algorithm that used local feature template matching and a global measure of fit to find and measure facial features. In this paper, image matching, which refer to the process of finding the object in an actual photograph on the basis of some given description of a visual object, was becoming an important requirement in various application like scene analysis and description, image change detection etc. for the solution to this matching author uses a combined descriptive scheme and decision metric. First, an embedding metric was presented which sets the framework for evaluating how well any composition of primitive picture pieces (parts of the decomposed picture) matches the desired composite picture. Second, a sequential optimization (dynamic programming-type) algorithm was developed which takes advantage of the decomposition to reduce drastically the computational requirements (our computational requirements grow linearly with the size of the picture, rather than exponentially). Authors believed that this was a general approach and no need to be written a new programming system for every new description; instead, one just specifies descriptions in terms of a certain set of primitives and parameters. This paper was a continuation of the Fischler work in which sequential optimization for matching two-dimensional scenes was described. Author introduced the generic form of the embedding metric and presents the concepts of coherent segmentation, arbitrary serialization, and sequential constraints. The relation of the heuristic embedding problem to formal decision theory was also discussed. The templatematching technique discussed by Fischler and Elschlager [13] has been continued and improved by Yuille et al. [9] in their research paper entitled Facial feature extraction from faces using deformable templates. Author proposed a method for detecting and describing features of faces using deformable templates. The feature of interest, an eye for
30 | P a g e

example, was described by a Parameterized template. Parameterized templates are flexible and facilitate a priori knowledge to guide the detection process. The final values of these parameters can be used to describe the features. The method should work despite variations in scale, tilt and rotation of head, and lighting conditions. Variations of the parameters should allow the template to fit any normal instance of the feature. Author described an energy function which links edges, peaks and valleys in the image intensity to corresponding properties of the template. The template then interacts dynamically with the image, by altering its parameter values to minimize the energy function, thereby deforming itself to find the best fit. Moreover, such templates are not only able to detect a feature but can also provide a description of it for classification and matching to a data base. In the 1980s there were a diversity of approaches actively followed, most of them continuing with previous tendencies. Some works tried to improve the methods used measuring subjective features. Various technique eigenfaces, fisherface, graph matching. Some tried to define a face as a set of geometric parameters and then perform some pattern recognition based on those parameters. For instance, Mark Nixon [34] applied eye spacing measurement for purpose of automated recognition because the eye offers a distinct feature within a face. The Geometric measurement of features with in a face has been identified as a major part of human facial recognition process. Measurement of eye spacing has been made using the Hough transform technique to detect the instance of a circular shape and of an ellipsoidal shape. This technique included both gradient strength and gradient direction in order to handle noise. The result of this paper illustrate that it was possible to derive a measurement of the spacing by detection of the position of both the iris, and the shape describing both the perimeter of the sclera and the eyebrows. It should be noted that measurement by detection of the position of the iris is the most accurate technique. Detection of the perimeter of the sclera is most sensitive but can provide result of precision which almost equal that of iris detection. Detection of the position of eyebrows shape provides a measurement of eye spacing which was greater than that for other technique. The first mention to eigenfaces in image processing, a technique that would become the dominant approach in following years, was made by L. Sirovich and M. Kirby in 1986 [12,35] Their methods were based on the Principal Component Analysis. Their goal was to represent an image in a lower dimension without losing much information, and then reconstructing it. Their work would be later the foundation of the proposal of many new face recognition algorithms. The PCA approach represented a
31 | P a g e

picture of a face in terms of an optimal coordinate system. The set of basis vectors which make up this coordinate system will be referred to as eigen picture. They are simply the eigen function of the covariance matrix of the ensemble of faces. In this paper rather using this approach directly author first extends the ensemble by including reflections about a midline of the faces, i.e. the mirror imaged faces. Using this extended ensemble in the computation of the covariance matrix imposes even and odd symmetry on the eigen functions but the complexity of calculation was not increases because doubling the size of the matrix in the eigenvector calculation was not done. The 1990s saw the broad recognition of the mentioned eigenface approach as the basis for the state of the art and the first industrial applications. In 1992 Mathew Turk and Alex Pentland of the MIT presented a work which used eigenfaces for recognition [36] presented an approach to detection and identification of human faces and described a working, near-real-time Face Recognition system which was fast, reasonably simple, accurate in constrained environment like household and offices. This system tracks a subjects head and then recognizes the person by comparing characteristics of the face to those of known individuals. There approach treat Face Recognition as two dimensional recognition problem, taking advantage of the fact faces are normally upright and thus can be described by a small set of 2-D characteristic views. The approach transform face images into a small set of characteristic feature images, called eigenfaces which are the principal component of initial training set of face images. Recognition has performed by projecting a new image into the subspace spanned by the eigenfaces and then classifies the image by comparing the face by comparing its position in face space with the position of known individual. The approach has advantages over other Face Recognition schemes was its simplicity, speed, learning capacity. In 1997 the result of eigenface technique of Turk and Pentland [36] investigated when applied to much larger recognition problems and then author generalize the technique to handle a variable viewing geometry, using a view-based approach by describing faces with a set of 2-D aspects. Author described experiments with eigenfaces for recognition and interactive search in a large-scale face database of 1000 faces. The problem of recognition under general viewing orientation has also examined. A view-based multiple-observer eigenspace technique has been proposed for use in Face Recognition under variable pose. The key to the success of such a view-based approach was that it has the ability to localize the object (or features on an object) and identify the correct, aspect. In this regard, author has shown that the distance-from-feature-space in a
32 | P a g e

view-based eigenspace formulation was an effective tool. Finally extended the approach to a modular representation by incorporating information from different levels of description. By using this modular approach they have been able to demonstrate robustness to localized variations in object appearance. Since the 1990s, face recognition area has received a lot of attention, with a noticeable increase in the number of publications. Many approaches have been taken into consideration. Elham Bagherian et al. [7] provides an up-to-date review of major human facial recognition research. Over the last decade facial feature extraction has been actively researched for Face Recognition. This paper presented an overview of Face Recognition and its applications. A literature review of the most recent Face Recognition technique was presented. The techniques are basically divided into three basic categories 1.) Holistic 2.) Geometry or feature based 3.) Hybrid . The most prominent feature extraction and the techniques are also given. The feature extraction techniques are basically divided into four basic categories 1.)Geometry based 2.) Template based 3.) Color based 4.) Appearance Based. The geometry feature-based methods analyze local facial features and their geometric relationships. This approach is sometimes called feature based approach. Examples of this approach are some Elastic Bunch Graph Matching algorithms. A recognition system has been developed by Laurenz Wiskott et al. [38, 39] which presented a face as an image graph and this graph has been extracted by an elastic graph matching process. This system extracted human faces out of a large database containing one image per person. In order to handle larger galleries and larger variations in pose, and to increase the matching accuracy, which provides the potential for further techniques to improve recognition rate three extensions have been made. Firstly, they use the phase of the complex Gabor wavelet coefficients to achieve a more accurate location of the nodes and to disambiguate patterns which would be similar in their coefficient magnitudes. Secondly, we employ object adapted graphs, so that nodes refer to specific facial landmarks, called fiducially points. The correct correspondences between two faces can then be found across large viewpoint changes. Thirdly, they have introduced a new data structure, called the bunch graph, which serves as a generalized representation of faces by combining jets of a small set of individual faces. This allows the system to find the fiducially points in one matching process, which eliminates the need for matching each model graph individually. This reduces computational effort significantly. After the geometry approach, Holistic approach comes, which is already followed by a number of researchers to deduct the most
33 | P a g e

relevant features. Some approaches tried to use the eyes, a combination of features, and so on. Some Hidden Markov Model (HMM) methods also fall in this category. A Face Recognition system using Hidden Markov Models that uses 2D-DCT coefficients as feature vectors has been developed by Nefian and Hayes [40]. HMM based approaches show the same recognition performance and better recognition rates than the eigenface method. Due to the compression properties of the DCT, the size of the observation vector in the current approach has been reduced from LW (L = 10 and W = 92) to 39, while preserving the same recognition rate conducted over large database. The use of a lower dimensional feature vector leads to a significant reduction of the computational complexity of the method and consequently to a significant decrease of the Face Recognition time. The HMM modeling of human faces appears to be an encouraging method for Face Recognition under a wider range of image orientations and facial expressions. Facial features are processed holistically thats why most of algorithms follow a holistic approach. Pawan Sinha et al. [41] presented 19 basic results, regarding Face Recognition system. Each result was described briefly which give the researcher a brief overview how human recognizes faces. One of the results of this paper show Facial features are processed holistically. Just one feature (such as the eyes or, notably, the eyebrows) can be enough for recognition of many famous faces. However, when features on the top half of one face are combined with the bottom half of another face, the two distinct identities are very difficult to recognize. The holistic context seems to affect how individual features are processed. When the two halves of the face are misaligned, presumably disrupting normal holistic processing, the two identities are easily recognized. These results suggest that when taken alone, features are sometimes sufficient for facial recognition. In the context of a face, however, the geometric relationship between each feature and the rest of the face can override the diagnosticity of that feature. Although feature processing was important for facial recognition, this pattern of results suggests that configural processing was at least as important, and that facial recognition was dependent on holistic processes involving an interdependency between featural and configural information. In statistical approach, each image is represented in terms of features. So, its viewed as a point (vector) in a d-dimensional space. The dimensionality means number of coordinates needed to specify a data point which is too high for data. Therefore, the goal is
34 | P a g e

to choose and apply the right statistical tool for extraction and analysis of the under laying manifold. Many of these statistical tools are not used alone. They are modified or extended by researchers in order to get better results. Some of them are embedded into bigger systems, or they are just a part of a recognition algorithm. Many of them can be found along classification methods like a DCT embedded in a Bayesian Network [42] or a Gabor Wavelet used with a Fuzzy Multilayer Perceptron [30]. The paper described a family of Embedded Bayesian Networks EBN which is hierarchical statistical model consisting of several layers but for Face Recognition problem only two layers of EBN is required. An EBN has several applications in the analysis and modeling of data with N-dimensional dependencies. With an EBN, one can model complex N-dimensional data, avoiding the complexity of N-dimensional BN while still preserving their flexibility and partial scale invariance. This paper describes a training and recognition algorithm for the EBN derived from the optimal state segmentation. Nefian [42] research result showed that the members of the EBN family outperform some of the existing approaches such as the eigenface method and the embedded HMM method. The EBN for faces inherits the flexibility of the HMM and CHMM in terms of natural face variations, scaling, and rotations, while significantly reducing the complexity of the fully connected 2D HMM. The optimal states segmentation for EBN can be efficiently implemented on parallel machines. Under the statistical approach D. Bhattacharjee et al. [30] develops a parallel framework for the training algorithm of a perceptron due to the problem of artificial neural networks that they are not suitable for real-time complex problems. In this paper, two general architectures for a Multilayer Perceptron (MLP) have been demonstrated. The first architecture is all-Class-in-One- Network (ACON) where all the classes are placed in a single network and the second one is One-Class-in-One-Network (OCON) where an individual single network is responsible for each and every class. Capabilities of these two architectures were compared and verified and experimental results show that the OCON structure performs better than the generally used ACON ones in term of training convergence speed of the network. Unlike the conventional sequential approach of training the neural networks, the OCON technique may be implemented by training all the classes of the face images simultaneously. Moreover, the inherent non-parallel nature of ACON has compelled us to use OCON for the complex pattern recognition task like human Face Recognition. One major limitation of the statistical models is that they work well only when the underlying assumptions are satisfied. The effectiveness of these
35 | P a g e

methods depends to a large extent on the various assumptions or conditions under which the models are developed. Users must have a good knowledge of both data properties and model capabilities before the models can be successfully applied. Neural network approach artificial neural networks are a popular tool in face recognition. They have been used in pattern recognition and classification. Kohonen was the first to demonstrate that a neuron network could be used to recognize aligned and normalized faces. Many different methods based on neural network have been proposed since then. Some of these methods use neural networks just for classification. One approach is to use decision-based neural networks, which classifies pre-processed and sub sampled face images. Approach has been explored in research paper entitled Decisionbased neural networks with signal/image classification applications with a supervised learning networks based on a decision based formulation. It combined three main attributes: perceptron-like learning rule and hierarchical nonlinear network structure and unification of static and temporal model. DBNN is useful formulation when teacher only tells the correctness of classification for each paring pattern. DBNNs have a modular and hierarchical architecture so use in broader application domain. More importantly, it adopts a competitive credit-assignment scheme that decides which subnets and/or sub nodes should be trained or used. The hierarchical structure provides a unified framework for other better known models (e.g. perceptron and LVQ), and offers a better understanding of the structural richness of decision-based neural networks. Based on simulation performance comparison, the DBNNs appear to be very effective for many signal/image based-classification applications. There were others methods which perform feature extraction using neural networks. For example, Steve Lawrence et al. proposed a hybrid or semi-supervised method [44]. They combined unsupervised methods for extracting features and supervised methods for finding features able to reduce classification error. Author presented a hybrid neural-network solution which combines local image sampling, a self-organizing map (SOM) neural network, and a convolutional neural network to identify particular people in real-time with varying facial detail, expression, pose, etc. This was fast, automatic system for Face Recognition. The SOM provides a quantization of the image samples into a topological space where inputs that are nearby in the original space and in output space, which results in invariance to minor changes in the image samples, and the convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. The method is capable of rapid classification, requires
36 | P a g e

only fast approximate normalization and preprocessing, and consistently exhibits better classification performance than the eigenfaces approach on the database considered as the number of images per person in the training database is varied from one to five. With five images per person the proposed method and eigenfaces result in 3.8% and 10.5% error, respectively. Other authors used probabilistic decision based neural networks (PDBNN). Lin et al. developed face detection and recognition algorithm using this kind of network [45].This system performs human face detection, eye localization, and Face Recognition in close-to-real-time speed. The PDBNN Face Recognition system consists of three modules: First, a face detector finds the location of a human face in an image. Then an eye localizer determines the positions of both eyes in order to generate meaningful feature vectors. Lastly, the third module is a face recognizer. The PDBNN can be effectively applied to all the three modules. It adopts a hierarchical network structures with nonlinear basis functions and a competitive credit-assignment scheme. The experimental results on two public database (FERET and ORL) and one in-house (SCR) shows that the processing speed, the whole recognition process consumes approximately one second on Sparc10, without using hardware accelerator or co-processor and recognition accuracies as well as false rejection and false acceptance rates are elaborated . A recent work is going on real-life smart interaction because face recognition provides passive identification that is the person to be identified does not need to cooperate or take any specific action. Real-life smart interaction work found in paper entitled Face Recognition for Smart Interactions [46] by H.K Ekenel et al. which presented an overview of Face Recognition research activities at the interact Research Center. The Face Recognition efforts at the interACT Research Center consist of development of a fast and robust Face Recognition algorithm and fully automatic Face Recognition systems that could be deployed for real-life smart interaction applications. Person identification is one of the most crucial building blocks for smart interactions. Among the person identification methods, Face Recognition and speaker identification are known to be the most natural ones, since the face and voice modalities are the modalities we use to identify people in our everyday lives. Although other methods, such as fingerprint identification, can provide better performance, they are not appropriate for natural smart interactions due to their intrusive nature. The advantage of speaker identification is its capability to perform identification over telephone lines where the person may not be visible to the identification system. In contrast, Face Recognition
37 | P a g e

provides passive identification that is the person to be identified does not need to cooperate or take any specific action. For example, a smart store can recognize its regular customers while they are entering the store. The customers do not need to talk or look directly to the camera to be recognized. In this paper the Face Recognition algorithm developed that was based on appearances of local facial regions that were represented with discrete cosine transform coefficients. Three fully automatic Face Recognition systems have been developed that are based on his algorithm. The first one is the "door monitoring system" that observes the entrance of a room and identifies the subjects while they are entering the room. The second one is the "portable Face Recognition system" that aims at environment-free Face Recognition and recognizes the user of a machine. The third system, 3D Face Recognition system, performs fully automatic Face Recognition on 3D range data. 3.1 Feature extraction and classification The importance of facial features for face recognition cannot be overstated. Many face recognition systems need facial features in addition to the holistic face, as suggested by studies in psychology. It is well known that even holistic matching methods, for example, eigenfaces proposed by Turk and Pentland [] and Fisherfaces, which proposed by Belhumeur et al. [47], need accurate locations of key facial features such as eyes, nose, and mouth to normalize the detected face. P. N. Belhumeur et al. [47] in their paper entitled Eigen faces vs. Fisher faces: Recognition using class specific linear projection developed a Face Recognition algorithm which is insensitive to large variation in lighting direction and facial expression. The lighting variability includes not only intensity, but also direction and number of light sources. As known both PCA and ICA construct the face space without using the face class (category) information. The whole face training data is taken as a whole. Ins LDA the goal is to find an efficient or interesting way to represent the face vector space. But exploiting the class information can be helpful to the identification tasks. The approach in this paper to Face Recognition exploits two observations: (i) Under varying illumination and from a fixed viewpoint the images of a Lambertian surface have taken, and (ii) Because of regions of shadowing, secularities, and facial expressions, the above observation does not exactly hold. In practice, certain regions of the face may have variability from image to image that often deviates significantly from the linear subspace, and, consequently, are less reliable for recognition. By using these observations a linear projection of the faces from
38 | P a g e

the high-dimensional image space to a significantly lower dimensional feature space which is insensitive both to variation in lighting direction and facial expression have found. Author chooses projection directions that are nearly orthogonal to the within-class scatter, projecting away variations in lighting and facial expression while maintaining discriminabiity. The method Fisherfaces, a derivative of Fishers Linear Discriminant maximizes the ratio of between-class scatter to that of within-class scatter. The Eigenface technique, another method based on linearly projecting the image space to a low dimensional subspace, has similar computational requirements. The test on the Harvard and Yale Face Databases has showed that Fisherface method has lower error rate than Eigenface technique. 1990s gets lots of attention; many approaches have been taken which lead to different technique. Different techniques for face recognition are: i) geometry [34], ii) template based[13], iii) color-based[10], and iv) appearance-based. Under the color-based technique T.C. Chang et al. [10] paper proposed a robust facial feature extraction algorithm that takes advantage of color in two different, but equally important ways: color segmentation and color thresholding to isolate and pinpoint the eyes, nostrils, and mouth on a color image. Before this there has been lots of work on other type of facial data but no previous facial feature extraction work has been attempted on color images. The most note able attempts thus far to extract facial features have been by Kanade and Gordon. In this the approach taken was to use color segmentation on color images as the initial step in isolating the eyes, nostrils, and mouth. This was done by segmenting and removing, in particular, the skin color, thus leaving the non-skin-colored areas of the face untouched (ie. hair, eyes, nostrils, mouth, and background). With further processing, the hair and background could be removed, and with color thresholding, the facial features could be pinpointed. The location of the face has given by the color segmentation, but both the scale (size of the head within the image) and the orientation of the head must be determined so that a proper search for the features can be performed. Highlights and shadows can cause some problems during the intensity and chromaticity thresholding stages. The algorithm would perform even more accurately if these artifacts were eliminated. Among various solutions to the problem the most successful seems to be those appearance-based approaches, which generally operate directly on images or appearances of face objects and process the image as two dimensional patterns. The main trend in feature extraction has been representing the data in a lower dimensional space computed
39 | P a g e

through a linear or non-linear transformation satisfying certain properties. The successfulness of appearance-based approach presented in paper entitled EigenspaceBased Face Recognition: A Comparative Study of Different Approaches by Javier Ruizdel-Solar [48]. Eigenspace-based Face Recognition corresponds to one of the most successful methodologies for the computational recognition of faces in digital images. Different eigenspace-based approaches for the recognition of faces with differ in kind of projection method (standard, differential, or kernel eigenspace), projection algorithm, and in similarity matching criteria have been proposed. The study considered standard, differential and kernel eigenspace methods. In the case of the standard methods, three different projection algorithms (PCA, FLD, and EP) and eight different similarity measures ,Cosine Distances, Whitening Cosine Distances, SOM Clustering, Whitening SOM Clustering, FFC, and Whitening FFC) were considered. In the case of the differential methods, two approaches were used: the pre-differential and the postdifferential. In both cases Bayesian and SVM classification were employed. Finally, regarding kernel methods, KPCA and KFD were used together with the eight similarity measures employed for the standard approaches. This study considers theoretical aspects as well as simulations performed using the Yale Face Database, a database with few classes and several images per class, and FERET, a database with many classes and few images per class. Appearance-based approaches further classified in linear or non-linear approaches. Among the linear techniques, the most known and used ones are principal component analysis (PCA), linear discriminant analysis (LDA) and independent component analysis (ICA) [49]. The concept of independent Component Analysis has been given by Marian Stewart et al. [49]. they used the concept that a task such as Face Recognition, much of the important information is contained in the high-order statistics of the images. A representational basis in which the high-order statistics are decorrelated may be more powerful for Face Recognition than one in which only the second order statistics are decorrelated, as in PCA representations. Independent Component Analysis (ICA) is a generalization of principal component analysis (PCA), which decorrelates the higher-order moments of the input (Comon, 1994). We considered the face images to be a linear mixture of an unknown set of statistically independent source images. The sources were recovered by a matrix of learned filters which produced statistically independent outputs. The independent components were found through an unsupervised learning algorithm that maximized the mutual information between the input and the output of a nonlinear
40 | P a g e

transformation. These source images comprised the kernels for the representation. The independent component, kernels gave superior class discriminabiity to the principal component kernels because kernels are optimally matched to the high order statistics of the ensemble as well as the second order statistics. The ICA representation gave 93% and 100% correct recognition of faces across changes in pose and changes in lighting respectively, compared to 86.5% and 88.6% with a principal component based representation. A survey paper of contrast functions and algorithms for ICA gave by Aapo Hyvrinen [50]. ICA is a general concept with a wide range of applications in neural computing, signal processing, and statistics. ICA gives a representation, or transformation, of multidimensional data that seems to be well suited for subsequent information processing. This is because the components in the representation are 'as independent as possible' from each other, and at the same time 'as non-Gaussian as possible'. The transformation may also be interesting in its own right, as in blind source separation. In the discussion of the many methods proposed for ICA, it was shown that the basic choice of the ICA method seems to reduce to two questions. First, the choice between estimating all the independent components at the same time, and estimating only a subset of them, possibly one-by-one. Most ICA research has concentrated on the first option, but in practice, it seems that the second option is very often more interesting, due to computational and other considerations. Second, one has the choice between adaptive algorithms and batch-mode (or block) algorithms. Daniel L. Swets and John Weng [51] used the concept that an increase or decrease in the number of eigenfeatures that were used in a training set does not necessarily lead to an improved success rate. Taking this observation as a base a feature space has been produced that was use to tessellate the space covered by the samples. To generate this optimal subspace use two projections: a Karhunen-LoPve projection to produce a set of Most Expressive Features (MEFs), and a subsequent discriminant analysis projection to produce a set of Most Discriminating Features (MDFs). The Most Expressive Features (MEF) in that they best express the population in the sense of linear transform as evidenced in reconstruction. Although the MEF projection is well-suited to object representation, the features produced are not necessarily good for discriminating among classes defined by the set of samples. The features used to effect this clear separation are called the Most Discriminating Features (MDFs) because in a linear projection sense, they optimally discriminate among the classes represented in the training set. The Most
41 | P a g e

Discriminating Features described in this paper provides an effective feature space to be used for classification. This MDF space discounts factors unrelated to classification, such as lighting direction and facial expression when such variations are present in the training data. Respectable recognition results were obtained for a large database of images. In the experiments described in this paper, a comparison must be made between a test probe and every image in the database. Kamran Etemad and Rama Chellappa [52] described the third approach, After the PCA and KLT and keep the first method as an optional stage that can be employed depending on the complexity of the specific task. This approach was a holistic linear discriminant analysis (LDA)- based feature extraction for human faces followed by an evidential soft-decision integration for multisource data analysis. The method was based on a projection-based scheme of low complexity that avoids any iterative search or computation. In this method both off-line feature extraction and on-line feature computation can be done at high speeds, and recognition can be done in almost real time. The experimental results show that high levels of recognition performance can be achieved with low complexity and a small number of features. In this paper an application of LDA to study the discriminatory power of various facial features in spatial and wavelet domain is presented. Also, an LDA-based feature extraction for Face Recognition is proposed and tested. A holistic projection-based approach to face feature extraction has been taken in which eigen templates are the most discriminant vectors derived from LDA of face images in a rich enough database. The effectiveness of the proposed LDA-based features has been compared with that of PCA-based eigenfaces. For classification a variation of evidential reasoning has been used, in which each projection becomes a source of discriminating information, with reliability proportional to its discrimination power. The weighted combination of similarity or dissimilarity scores suggested by all projection coefficients is the basis for membership values. Several results on Face Recognition and gender classification are presented, in which highly competitive recognition accuracies are achieved with a small number of features. The feature extraction can be applied to WT representation of images to provide a multi scale discriminant framework. In such cases the system becomes more complex at the expense of improving performance. The proposed feature extraction combined with soft classification seems to be a promising alternative to other face-recognition systems. Amid these linear techniques PCA perform better than other and this result is shown in paper title PCA verses LDA by Martizen and Kak [54] compare the Principal
42 | P a g e

Component Analysis with Linear Discriminant Analysis which are Appearance based method that are widely used in object recognition system. Within this paradigm, PCA and LDA have been demonstrated to be useful for many applications such as Face Recognition. Although one might think that LDA should always outperform PCA (since it deals directly with class discrimination). In this paper author showed that this is not always true. Author claim with his experiment that PCA outperform LDA when training data set is small or when training data non-uniformly sample the underlying distribution .in many practical domain specially in the domain of Face Recognition, one never know in advance the underlying distributions for the different classes. Argue can be done that in practice it would be difficult to ascertain whether or not the available training data is adequate for the job. The better performance of PCA than LDA also shown in nsen toyga [15] paper, the performances of appearance-based statistical methods such as Principal Component Analysis, Linear Discriminant Analysis and Independent Component Analysis techniques was conducted on Face Recognition using colored images. The performances were obtained using different number of training images and three sets of experiments were employed for relative performance evaluations of PCA, LDA and ICA methods. In the first set of experiments, the recognition performances of PCA, LDA and ICA were demonstrated on the original colored images. The effect of illumination variations were demonstrated in the second set of experiments by increasing the R, G, B color component values. The input images were partially occluded in the third set of experiments. The results of the experiments show that PCA was better than LDA and ICA in general and under different illumination variations. It is also demonstrated that LDA is more sensitive than PCA and ICA on partial occlusions, but PCA was less sensitive to partial occlusions compared with ICA sensitivity. That is, PCAs success rates are better than that of LDA and ICA on partial occlusions, and ICA is better than LDA on partial occlusion results. On the other hand, increasing the number of training images does not have a great impact on PCA and LDA in general, but for ICA, the performance decreases. For the illumination changes and partial occlusions, increasing the number of training images decreases the performance rates. The reason of this performance decrease may be because of the abundance of the training images or the difference between the samples of the training and test images. Ming-Hsuan Yang [53] seek a method that not only extracts higher order statistical dependencies of samples as features, but also maximizes the class separation whenever
43 | P a g e

these features project to a lower dimensional space for efficient recognition in the paper with a title Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition Using Kernel Methods Since much of the important information may be contained in the high order dependencies among pixels of a face image. Principal Component Analysis and Fisher Linear Discriminant methods have demonstrated their success in face detection, recognition. The representations in these subspace methods are based on second order statistics of image set, and do not address higher order statistic al dependencies such as the relationships among three or more pixels. Recently Higher Order Statistics and Independent Component Analysis (ICA) have been used as informative representations for visual recognition. Author investigate the use of Kernel Principal Component Analysis and Kernel Fisher Linear Discriminant for learning low dimensional representations for Face Recognition, which we call Kernel Eigenface and Kernel Fisher face methods demonstrated that they provide a more effective representation for Face Recognition. While Eigenface and Fisher face methods aim to find projection directions based on second order correlation of samples, Kernel Eigenface and Kernel Fisher face methods provide generalizations which take higher order correlations into account. After feature extraction next step is feature classification. Classification algorithms usually involve some learning - supervised, unsupervised. The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but in the unsupervised learning value of the class label is unknown. All the classification algorithm comes under these two learning. According to Jain, Duin and Mao [19], there are three concepts that are key in building a classifier - similarity, probability and decision boundaries. Similarity approach is intuitive and simple. Patterns that are similar should belong to the same class. The algorithm comes under this category are: Nearest Mean [55], k-NN [56], Self-Organizing Maps (SOM) [60]. Cover and Hart [55] gave the concept of nearest neighbor classifier. Nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points. This rule was independent of the

underlying joint distribution on the sample points and their classifications, and hence the probability of error R of such a rule must be at least as great as the Bayes probability of error R*--the minimum probability of error over all decision rules taking underlying probability structure into account. However, in a large sample analysis, author shown in
44 | P a g e

the M-category case that R* < R < R*(Z - MR*/(M-I)), where these bounds are the tightest possible, for all suitably smooth underlying distributions. Thus for any number of categories, the probability of error of the nearest neighbor rule is bounded. After the 1-NN Yang Song et al. [56] gave the concept of KNN decision rule and introduced a new metric and proposed two novel procedures. The K-nearest neighbor (KNN) decision rule has been a ubiquitous classification tool with good scalability. Past experience has shown that the optimal choice of K depends upon the data, making it laborious o tune the parameter for different applications. The K-nearest neighbor (KNN) classifier has been a workhorse and benchmark classifier .Without prior knowledge, the KNN classifier usually applies Euclidean distances as the distance metric. However, this simple and easy-to-implement method can still yield competitive results even compared to the most sophisticated machine learning methods. Author introduced a new metric that measures the in formativeness of objects to be classified. When applied as a query-based distance metric to measure the closeness between objects, two novel KNN procedures, Locally InformativeKNN (LI-KNN) and Globally Informative-KNN (GI-KNN), are proposed. By selecting a subset of most informative objects from neighborhoods, our methods exhibit stability to the change of input parameters, number of neighbors(K) and informative points (I). Probability approach: - some classifiers are constructing based upon a probabilistic approach. Bayes decision rule [57], naive based [59] is often used. Judea pearl [57] paper defines the properties of Bayes network. Bayesian networks are the most well known representative of statistical learning algorithms Pearl first introduce the Bayesian network, are directed acyclic graph in which the nodes represent the proposition, the arc signify the existence of direct casual dependencies. Bayesian network are necessary to guarantee completeness and consistency and show how dependencies and conditional-independence relationships can be tested using simple line tracing operations. This paper deals with the task of fusing and propagating the impacts of new evidence and beliefs through Bayesian network in such a way that when equilibrium is reached, each proposition will assign a belief measure consisted with assign data. Finally this paper discussed several approaches to achieving belief propagation in more general network. In spite of the remarkable power of Bayesian Networks, they have an inherent limitation. This is the computational difficulty of exploring a previously unknown network. Biao Qin et al. [58] proposed a new rule-based classification and prediction algorithm called uRule for classifying uncertain data. Due to data uncertainty is common in real-world applications including
45 | P a g e

sensor databases, spatial-temporal databases, and medical or biology information systems due to various causes, including imprecise measurement, network latency, out dated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. This algorithm introduces new measures for generating, pruning and optimizing rules. These new measures are computed considering uncertain data interval and probability distribution function. Based on the new measures, the optimal splitting attribute and splitting value could be identified and used for classification and prediction. This algorithm follows the new paradigm of directly mining uncertain datasets. The proposed uRule algorithm can process uncertainty in both numerical and categorical data. Their experimental results show that uRule has excellent performance even when data is highly uncertain. In the paper entitled The Optimality of Naive Bayes Harry Zhang [59] proposed a novel explanation on the superb classification performance of naive Bayes. Naive Bayes is one of the most efficient and effective inductive learning algorithms for machine learning and data mining. Its competitive performance in classification is surprising, because the conditional independence assumption, on which it is based, is rarely true in real-world applications. Author show that, essentially, the dependence distribution; i.e., how the local dependence of a node distributes in each class, evenly or unevenly, and how the local dependencies of all nodes work together, consistently (supporting a certain classification) or inconsistently (canceling each other out), plays a crucial role. Therefore, no matter how strong the dependences among attributes are, naive Bayes can still be optimal if the dependences distribute evenly in classes, or if the dependences cancel each other out. In addition, we investigated the optimality of naive Bayes under the Gaussian distribution, and presented the explicit sufficient condition under which naive Bayes is optimal, even though the conditional independence assumption is violated. Decision boundaries This approach can become equivalent to a Bayesian classifier. It depends on the chosen metric. The main idea behind this approach is to minimize a criterion (a measurement of error) between the candidate pattern and the testing patterns. Methods comes under this approach are Fisher Linear Discriminant (FLD), Binary Decision Tree [60], Perceptron, Multi-layer Perceptron, Radial Basis Network. The survey on the existing work of decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art gave by the
46 | P a g e

paper of SREERAMA K. MURTHY [60]. Due to the advancement in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. As the quantity and variety of data available to data exploration methods increases, there is a commensurate need for robust, efficient and versatile data exploration methods. Decision trees are a way to represent rules underlying data with hierarchical, sequential structures that recursively partition the data. They attempted to provide a concise description of the directions which decision tree work has taken over the years. Author goal is to provide an overview of existing work in decision trees, and a taste of their usefulness, to the newcomers as well as practitioners in the field of data mining and knowledge discovery. The hierarchical, recursive tree construction methodology is very powerful and has repeatedly been shown to be useful for diverse real-world problems. It is also simple and intuitively appealing. Decision trees are usually univariate since they use splits based on a single feature at each internal node. Most decision tree algorithms cannot perform well with problems that require diagonal partitioning. Johannes Fu Rnkranz[61] papers gave an overview of a large family of symbolic rule learning algorithms, the so called separate and conquer or covering algorithms. Decision trees can be translated into a set of rules by creating a separate rule for each path from the root to a leaf in the tree. However, rules can also be directly induced from training data using a variety of rule-based algorithms. Furnkranz provided an excellent overview of existing work in rule-based methods. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. Author will put this wide variety of algorithms into a single framework and analyze them along three different dimensions, namely their search, language and over fitting avoidance biases. All members of this family share the same top level loop: basically, a separate and conquer algorithm searches for a rule that explains a part of its training instances, separates these examples, and recursively conquers the remaining examples by learning more rules until no examples remain. This ensures that each instance of the original training set is covered by at least one rule. Genetic algorithm also becomes an emerging approach in face recognition. With this lots of researches have been done and various authors have provided their algorithm using this approach. Venkatesan and Madane [62] proposed a novel Face Recognition system to detect faces in images and video tracks faces, recognizes faces from galleries of
47 | P a g e

known people using Genetic and Ant colony Optimization algorithm. This system is caped with three steps. Initially pre-processing methods are applied on the input image. Consequently face features are extracted from the processed image by ANT Colony Optimization (ACO) and finally recognition is done by Genetic Algorithm (GA. The proposed method is tested on a number of test images. The experimental results show that this method is more robust suitable for low resolution, variable lighting and different facial expressions applied in real time video processing, single and multi threaded processing. The competence can be greater than before by using better face scanner, best technique of scaling and well-organized technique of edge detection and feature extraction of the face image. Face Recognition approach using the SIFT features in the paper entitled Feature Selection for Face Recognition Using a Genetic Algorithm by Derya Ozkan [63] has been proposed. Its a believe that finding the subset of those features, which are more useful for Face Recognition, will lead to better results for the Face Recognition problem. It will reduce the computation time since we remove the unnecessary features. The paper aim was to select the most useful features for Face Recognition. For this purpose, use a genetic algorithm to learn which features of SIFT features, used in object recognition, can describe an interest point of the face. In the paper, after giving the definitions for Face Recognition and genetic algorithm approach; a genetic algorithm has been suggested to select the most useful features of the face. However, the tests showed that some test images (faces) are wrongly classified due to some wrong matches between the interest points of the test images and training images. The underlying reason for such wrong matches is that two interest points have relatively small distances, even though we would expect them to have higher distances. Sarawat Anam, Md. Shohidul Islam [64] proposed a Face Recognition system for personal identification and verification using Genetic algorithm and Back-propagation Neural Network. The system consists of three steps. At the very outset some pre-processing are applied on the input image. Secondly face features are extracted, which will be taken as the input of the Back-propagation Neural Network (BPN) and Genetic Algorithm (GA) in the third step and classification is carried out by using BPN and GA. The proposed approaches are tested on a number of face images. Experimental results demonstrate the higher degree performance of these algorithms. The maximum efficiency is 82.61% for Face Recognition System by using Genetic algorithm and the maximum efficiency is 91.30% for Face Recognition System by using SET-BPN. The efficiency can be increased by using better face scanner, better technique of scaling,
48 | P a g e

and efficient technique of edge detection such as advanced edge detection technique and feature extraction of the face image. 3.2 The following inferences are laid down from this literature survey Throughout this literature survey we have been able to conclude some important points that will explain this survey in a brief way. Some concluded points are described below: (i) Face Recognition system may be divided into two modules: (ii) (iii) Feature extraction Feature recognition

Feature extraction technique can be divided in four ways: Geometry Template-based Color-based Appearance-based

Among all approaches appearance-based approach is mainly used because it uses small number of features and recognition rate is high.

(iv)

Appearance-based approach consist of many methods some them of are: Principal Component Analysis (PCA), LDA Linear discriminant analysis (LDA), Independent Component Analysis (ICA).

(v)

PCA perform better because it reduces the dimension of the image space which are needed to describe the data economically.

(vi)

Feature classification can be done in different ways: Supervised Unsupervised

(vii)

Among these classification ways supervised learning method provide better for the task of human posture recognition.

(viii) There are various algorithm based on supervised learning like Logical-based, neural network, Statistical-based. (ix) ANN approach performs better because of property of handling non-linear data and modules have been proven by many researchers over the previous linear data handling technique.

49 | P a g e

(x)

ANN also does not require any prior information about the system which is being used in the work.

(xi)

Though ANN method is better than previous used linear method there are more technique like hybrid methods can be formulated to improve recognition accuracy.

50 | P a g e

Chapter 4 Conclusions and Future Work


Throughout this work we have been able to get an overview of the face recognition problem and its application to a controlled environment. We have started considering face recognition as a particular case of pattern recognition, and then we have seen the different parts of a generic pattern recognition system. From the different stages of a pattern recognition system we have mainly focused on two of them, feature selection, classification. In any case, in this section we are not going to detail. After that we have focused on face recognition, which is a nonintrusive and easy to use method. A brief discussion about various feature extraction and recognition technique has been done. Concerning the feature extraction we have seen the different parameters that influence the PCA representation and after going through different paper results that compared PCA to LDA. We have seen that although the results are quite good with PCA and slightly better than LDA. For that reason, the suitability of LDA is left as a future aspect to study. About the classifiers, we have seen that classification have been done in two ways supervised and unsupervised. Among these ways supervised learning is better for face recognition. Under the supervised learning artificial neural network is simple and provide better result. We are not saying other classifiers are not able to provide better result but any classification procedure seeks a functional relationship between the group membership and the attributes of the object, accurate identification of this underlying function is doubtlessly important and the neural network are universal functional approximator in that neural networks can approximate any function with arbitrary accuracy and this is shown by result also in literature survey papers. Finally we should consider some future lines of work; although some approaches we have already been studied and try to implement them with improvement. After extracting feature with PCA, we will implement various neural network models and comparing their result. Even though we are going to take into account the basic techniques for feature extraction and classification, there are a lot of available options .For example, concerning feature extraction we have mainly focused on PCA but there are other types of

51 | P a g e

techniques are also available, like DCT (discrete cosine transform) or Gabor wavelets, whereas for the classifiers we could use tools like SVM (supporting vector machine).

52 | P a g e

Bibliography
[1] Jensen John R., Introductory Digital Image Processing: A Remote Sensing Perspective, Prentice-Hall, 1986. [2] Bernd Jahne Digital Image Processing Concepts, Algorithms, and Scientific Applications, Published by Springer Verlag, 1999. [3] [4] Gonzalez R C & Woods R E, Digital Image Processing, Addison-Wesley, 1993. Sanjeev Dhawan 1, Rakesh Kumar Benchmarking: An Interactive Tool for Vectorization of Raster Images [5] [6] http://en.wikipedia.org/wiki/Facial_recognition_system. R. Hietmeyer, Biometric identification promises fast and secure processing of airline passengers, The Intl Civil Aviation Organization Journal, vol. 55, no. 9, pp. 1011, 2000. [7] Elham Bagherian, Rahmita Wirza O.K. Rahmat Facial feature extraction for face recognition: a review Vol.2, pp.1-9, 2005. [8] T. Kanade, Computer Recognition of Human faces, Basel and Stuttgart: Birkhauser, interdisciplinary research 47, 1977. [9] Yuille, D. Cohen, and P. Hallinan, Facial feature extraction from faces using deformable templates, Proc. IEEE Computer Soc. Conf. On Computer Vision and Pattern Recognition, pp. 104-109, 1989. [10] T.C. Chang, T.S. Huang, and C. Novak, Facial feature extraction from colour images, Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 2, pp. 39-43, Oct 1994. [11] Y. Tian, T. Kanade, and J.F. Cohn, Evaluation of Gabor-wavelet-based facial action unit recognition in image sequences of increasing complexity, Proceedings of the Fifth IEEE International Conference on Automatic Face Recognition, pp. 218 223, May 2000. [12] M. kirby and l. sirovich Application of the Karhunen-Lokve Procedure for the Characterization of Human Faces IEEE transactions on pattern analysis and machine intelligence. vol. 12, january 1990. and Gesture

53 | P a g e

[13]

Martin a. fischler and robert a. elschlager The Representation and Matching of Pictorial Structures IEEE transactions on computers, vol. C-22, no. 1, january 1973 Elham Bagherian, Rahmita Wirza O.K. Rahmat Facial feature extraction for face recognition: a review 2005, IEEE transactions on neural networks 2000. nsen Toygar, Adnan Acan, Face Recognition using PCA, LDA and ICA Approaches on Colored Images, Journal of Electrical & Electronics Engineering, 2003, pp. 735-43. Anil K. Jain Statistical Pattern Recognition: A Review, IEEE transactions on pattern analysis and machine intelligence, Vol. 22, NO. 1, January 2000. T. Kohonen, Self-Organization and Associative Memory, Springer- Verlag, Berlin, 1989. R. Brunelli and T. Poggio. Face Recognition: Features versus Templates. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 15(10):10421052, October 1993. B. Moghaddam, T. Jebara, and A. Pentland, Bayesian face recognition. Pattern Recognition, pp177182, November 2000. S. Tulyakov, Review of Classifier Combination Methods, Studies in Computational Intelligence (SCI), pp.36186, 2008.

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

Wiskott, L. F.-M. Face recognition by elastic bunch graph matching, IEEE Trans. on Pattern Analysis and Machine Intelligence 1997. P. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973. G. Cybenko, Approximation by Super positions of A Sigmoidal Function, Math. Contr. Signals Syst, vol. 2, pp. 303314, 1989. K. Hornik, Approximation Capabilities f Multilayer Feed Forward Networks, Neural Networks, vol. 4, pp. 251257, 1991. K. Hornik, M. Stinchcombe, and H. White, Multilayer Feed forward Networks are Universal Approximators, Neural Networks, vol. 2, pp. 359366, 1989. M. D. Richard and R. Lippmann, Neural Network Classifiers Estimate Bayesian A Posteriori Probabilities, Neural Comp., vol. 3, pp. 461483, 1991. R. P. Lippmann, Pattern Classification Using Neural Networks, IEEE Commun. Mag., pp. 4764, Nov. 1989.
54 | P a g e

[22]

[23]

[24]

[25]

[26]

[27]

[28]

Antu Annam Thomas and M. Wilscy Face Recognition Using Simplified Fuzzy Artmap, Signal & Image Processing: An International Journal December 2010. A.-A. Bhuiyan and C. H. Liu. On Face Recognition Using Gabor Filters. In Proceedings of World Academy of Science, Engineering and Technology, volume 22, pages 5156, 2007. D. Bhattacharjee, D. K. Basu, M. Nasipuri, and M. Kundu. Human Face Recognition Using Fuzzy Multilayer Perceptron. Soft Computing A Fusion of Foundations, Methodologies and Applications, 14(6):559570, April 2009.

[29]

[30]

[31]

Bedsoe, W. W., "The Model Method in Facial Recognition", Panoramic Research Inc. Palo Alto, CA, Rep. PRI: 15, (August 1966). Golstein, L. Harmon, and A. Lest Identification of human faces. Proceedings of the IEEE Transaction, Vol. 59, pp.74860, 1971. T. Kenade. Picture Processing System by Computer Complex and Recognition of Human Faces. PhD thesis, Kyoto University, November 1973. M. Nixon. Eye Spacing Measurement For Facial Recognition Pro-ceedings of the Society of Photo-Optical Instrument Engineers, SPIE, Vol. 575(37),pp.279 285, August 1985. L. Sirovich and M. Kirby. Low-Dimensional Procedure for the Characterization f Human Faces. Journal of the Optical Society of America A - Optics, Image Science and Vision, pp 519524, March 1987. M. Turk and A. Pentland. Eigenfaces for Recognition. Journal of Cognitive Neuroscience, Vol. 3(1), pp.7186, 1991. Alex Pentland, Baback Moghaddam, Thad Starner, View-Based and Modular Eigenspaces for Face Recognition 1994 IEEE transaction L. Wiskott, J.-M. Fellous, N. Krueuger, and C. von der Malsburg. Face Recognition By Elastic Bunch Graph Matching. IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(7):775779, 1997. L. Wiskott, J.-M. Fellous, N. Krueuger, and C. von der Malsburg, Intelligent Biometric Techniques in Fingerprint and Face Recognition, chapter Face Recognition by Elastic Bunch Graph Matching, pp.355396. CRC Press, 1999. Nefian and M. Hayes. Hidden Markov Models for Face Recognition. In Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, USA, May 1998.
55 | P a g e

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

P. Sinha, B. Balas, Y. Ostrovsky, and R. Russell. Face Recognition By Humans: 19 Results All Computer Vision Researchers Should Know About. Proceedings of the IEEE, 94(11):pp.19481962, November 2006. Nefian. Embedded Bayesian Networks For Face Recognition In Proc. of the IEEE International Conference on Multimedia and Expo, volume 2, pages 133 136, Lusanne, Switzerland, August 2002. S. Kung and J. Taur, Decision-Based Neural Networks With Signal/Image Classification Applications. IEEE Transactions on Neural Networks, 6(1):170 181, 1995. S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back Face Recognition:A Convolutional Neural Network Approach, IEEE Transactions on Neural Networks, Vol.8 , pp.98113, 1997. H. Lin, S.-Y. Kung, and L.-J. Lin, Face Recognition/Detection By Probabilistic Decision-Based Neural Network., IEEE Transactions on Neural Networks, Vol.8,pp.114132, 1997. H.K Ekenel, J. Stallkamp, H. Gao, M. Fischer, R. Stiefelhagen, Face Recognition For Smart Interactions,IEEE trans. 2007. P.N. Belhumeur, J.P. Hespanha, and D. J. Kriegman, Eigen Faces vs. Fisher Faces: Recognition Using Class Specific Linear Projection, IEEE Trans. Pattern Anal. Machine Intel., vol. 19, PP. 711-720, may 1997. Javier Ruiz-del-Solar, Eigenspace-Based Face Recognition: A Comparative Study of Different Approaches IEEE transactions on systems, man, and cybernetics part c: applications and reviews, vol. 35, no. 3, august 2005 315 Marian Stewart Bartlett Terrence J. Sejnowski, Independent Components Of Face Images: A Represent At Ion For Face Recognition , Aapo Hyvrine Survey on Independent Component Analysis Neural Computing Surveys 2, 94-128, 1999. L. Swets and J.J. Weng, Using Discriminant Eigen Features For Image Retrieval, IEEE Transactions on pattern analysis and machine intelligence, vol. 18, no. 8, august 1996. Kamran Etemad and Rama Chellappa Discriminant analysis for recognition of human face images, 1997 Optical Society of America.
56 | P a g e

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

Ming-Hsuan Yang Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition Using Kernel Methods IEEE International Conference on Automatic Face and Gesture Recognition 2002. Alex M. Martizen, Avinash C. Kak PCA verses LDA IEEE Transactions on Pattern Analysis And Machine intelligence,Vol.23, No.2, pp.228-33, 2001. M. Cover, and P. E. Hart Nearest Neighbor Pattern Classification IEEE Transactions on Information theory, Vol.IT-13, No.1, 1967.

[54]

[55]

[56]

Yang Song, Jian Huang, Ding Zhou, Hongyuan Zha1, and C. Lee Giles, Informative K-Nearest Neighbor Pattern Classification, Springer-Verlag Berlin Heidelberg 2007.

[57] [58]

Judea pearl Biao Qin, Yuni Xia, Sunil Prabhakar, Yicheng Tu A Rule-Based Classification Algorithm for Uncertain Data IEEE international Conference, Vol. 8, No. 1, January 1997. Harry Zhang The Optimality of Naive Bayes, American Association for Artificial Intelligence 2004. Sreerama k. Murthy Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey Kluwer Academic Publishers 1998.

[59]

[60]

[61] [62]

Johannes Fu Rnkranz S.Venkatesan and Dr.S.Srinivasa Rao Madane Face Recognition System with Genetic Algorithm and ANT Colony Optimization International Journal of Innovation, Management and Technology, Vol. 1, No. 5, December 2010. David G. Lowe, Feature Selection for Face Recognition Using a Genetic Algorithm, International Journal of Computer Vision, 60, 2 (2004), pp. 91-110. Sarawat Anam, Md. Shohidul Islam, Face Recognition Using Genetic Algorithm , the International MultiConference of Engineers and Computer Scientists 2009, Vol IIMECS 2009, March 18 - 20, 2009.

[63]

[64]

57 | P a g e

Anda mungkin juga menyukai