Anda di halaman 1dari 55

Face Detection

1. INTRODUCTION
Images containing faces are essential to intelligent vision-based human computer interaction, and research efforts in face processing include face recognition, face tracking, pose estimation, and expression recognition. However, many reported methods assume that the faces in an image or an image sequence have been identified and localized. To build fully automated systems that analyze the information contained in face images, robust and efficient face detection algorithms are required. Given a single image, the goal of face detection is to identify all image regions which contain a face regardless of its three-dimensional position, orientation, and lighting conditions. Such a problem is challenging because faces are nonrigid and have a high degree of variability in size, shape, color, and texture. Face detection is the essential front end of any face recognition system, which locates and segregates face regions from cluttered images, either obtained from video or still image. It also has numerous applications in areas like surveillance and security control systems, content-based image retrieval, video conferencing and intelligent human-computer interfaces. Most of the current face recognition systems presume that faces are readily available for processing. However, in reality, we do not get images with just faces. We need a system, which will detect, locate and segregate faces in cluttered images, so that these segregated faces can be given as input to face recognition systems. Given an arbitrary image, the goal of face detection is to determine whether or not there are any faces in the image and, if present, return the image location and extent of each face. The purpose of this project is to try to replicate on a computer that which human beings are able to do effortlessly every moment of their lives, detect the presence or absence of faces in their field of vision. While it is something that to a layman appears trivial, to implement the necessary steps leading to the successful execution of this in an algorithm is difficult and still an unsolved problem in computer vision.

Face Detection

2. LITERATURE SURVEY
Peer et al. [10]constructed a set of rules to describe skin cluster in RGB space while Garcia and Tziritas [11]used a set of bounding rules to classify skin regions on both YCbCr and HSV spaces. In this paper, we present a novel skin color model (RGB) for human skin detection. This model utilizes standard RGB properties to improve the discriminality between skin pixels and non-skin pixels. Numerous techniques for skin color modeling and recognition have been proposed during several past years. A few papers comparing different approaches have been published [Zarit et al. 1999], [Terrillon et al. 2000], Brand and Mason 2000]. In these papers they have discussed pixel-based skin detection methods that classify each pixel as skin or non-skin individually, independently from its neighbors by the help of fuzzy rules. In contrast, regionbased methods [Kruppa et al. 2002], [Yang and Ahuja 1998], [Jedynak et al. 2002] try to take the spatial arrangement of skin pixels into account during the detection stage to enhance the methods performance. Pixel-based skin detection has long history, but surprisingly few papers that provide surveys or comparisons of different techniques were published. [Zarit et al. 1999] have provided a comparison of five color spaces (actually their chrominance planes) and two non-parametric skin modeling methods (lookup table and Bayes skin probability map). [Terrillon et al. 2000] have compared nine chrominance spaces and two parametric techniques (Gaussian and mixture of Gaussians models). [Brand and Mason 2000] have evaluated three different skin color modeling strategies. [Lee and Yoo 2002] also have compared two most popular parametric skin models in different chrominance spaces and have proposed a model of their own.

Face Detection

3. METHODS OF FACE DETECTION


In this section, we enunciate existing techniques to detect faces from a single intensity or color image. We classify single image detection methods into four categories;

1. Knowledge-based methods. These rule-based methods encode human knowledge of what constitutes a typical face. Usually, the rules capture the relationships between facial features. These methods are designed mainly for face localization.

2. Feature invariant approaches. These algorithms aim to find structural features that exist even when the pose, viewpoint, or lighting conditions vary, and then use these to locate faces. These methods are designed mainly for face localization.

3. Template matching methods. Several standard patterns of a face are stored to describe the face as a whole or the facial features separately. The correlations between an input image and the stored patterns are computed for detection. These methods have been used for both face localization and detection. 4. Appearance-based methods. In contrast to template matching, the models (or templates) are learned from a set of training images which should capture the representative variability of facial appearance. These learned models are then used for detection. These methods are designed mainly for face detection. 5. Color based approach. The approach studied and applied in this thesis is the skin colour based approach. The algorithm is pretty robust as the faces of many people can be detected at once from an image consisting of a group of people. The model to detect skin colour used here is the YCbCr model.

3.1 Knowledge-Based Top-Down Methods In this approach, face detection methods are developed based on the rules derived from the researchers knowledge of human faces. It is easy to come up with simple rules to describe the features of a face and their relationships. For example, a face often appears in an image with two eyes that are symmetric to each other, a nose, and a mouth. The relationships between features can be represented by their relative distances and positions. Facial features

Face Detection

in an input image are extracted first, and face candidates are identified based on the coded rules. A verification process is usually applied to reduce false detections. One problem with this approach is the difficulty in translating human knowledge into well-defined rules. If the rules are detailed (i.e., strict), they may fail to detect faces that do not pass all the rules. If the rules are too general, they may give many false positives. Moreover, it is difficult to extend this approach to detect faces in different poses since it is challenging to enumerate all possible cases. On the other hand, heuristics about faces work well in detecting frontal faces in uncluttered scenes.

3.2 Bottom-Up Feature-Based Methods In contrast to the knowledge-based top-down approach, researchers have been trying to find invariant features of faces for detection. The underlying assumption is based on the observation that humans can effortlessly detect faces and objects in different poses and lighting conditions and, so, there must exist properties or features which are invariant over these variabilities. Numerous methods have been proposed to first detect facial features and then to infer the presence of a face. Facial features such as eyebrows, eyes, nose, mouth, and hair-line are commonly extracted using edge detectors. Based on the extracted features, a statistical model is built to describe their relationships and to verify the existence of a face. One problem with these feature-based algorithms is that the image features can be severely corrupted due to illumination, noise, and occlusion. Feature boundaries can be weakened for faces, while shadows can cause numerous strong edges which together render perceptual grouping algorithms useless. 3.2.1 Facial Features Sirohey [4] proposed a localization method to segment a face from a cluttered background for face identification. It uses an edge map and heuristics to remove and group edges so that only the ones on the face contour are preserved. An ellipse is then fit to the boundary between the head region and the background. This algorithm achieves 80 percent accuracy on a database of 48 images with cluttered backgrounds. Instead of using edges, Chetverikov and Lerch [5] presented a simple face detection method using blobs and streaks (linear sequences of similarly oriented edges). Their face model consists of two dark blobs and three light blobs to represent eyes, cheekbones, and nose. The model uses streaks to represent the outlines of the faces, eyebrows, and lips. Two triangular configurations are utilized to encode the spatial relationship among the blobs. A low resolution Laplacian image

Face Detection

is generated to facilitate blob detection. Next, the image is scanned to find specific triangular occurrences as candidates. A face is detected if streaks are identified around a candidate. 3.2.2 Texture Human faces have a distinct texture that can be used to separate them from different objects. Augusteijn and Skufca [6]developed a method that infers the presence of a face through the identification of face-like textures. The textures are computed using second-order statistical features (SGLD) on sub-images of 16 X 16 pixels. Three types of features are considered: skin, hair, and others. They used a cascade correlation neural network for supervised classification of textures and a Kohonen self-organizing feature map to form clusters for different texture classes. To infer the presence of a face from the texture labels, they suggest using votes of the occurrence of hair and skin textures. However, only the result of texture classification is reported, not face localization or detection. 3.2.3 Skin Color Human skin color has been used and proven to be an effective feature in many applications from face detection to hand tracking. Although different people have different skin color, several studies have shown that the major difference lies largely between their intensity rather than their chrominance. Several color spaces have been utilized to label pixels as skin including RGB, normalized RGB, HSV (or HSI) , YCrCb , and YIQ Many methods have been proposed to build a skin color model. The simplest model is to define a region of skin tone pixels using Cr,Cb values, i.e., R[Cr, Cb], from samples of skin color pixels. With carefully chosen thresholds, [Cr1, Cr2] and [Cb1, Cb2], a pixel is classified to have skin tone if its values (Cr,Cb) fall within the ranges, i.e., Cr1 Cr Cr2 and Cb1 Cb Cb2. 3.2.4 Multiple Features Recently, numerous methods that combine several facial features have been proposed to locate or detect faces. Most of them utilize global features such as skin color, size, and shape to find face candidates, and then verify these candidates using local, detailed features such as eye brows, nose, and hair. A typical approach begins with the detection of skin-like regions. Next, skin-like pixels are grouped together using connected component analysis or clustering algorithms. If the shape of a connected region has an elliptic or oval shape, it becomes a face candidate. Finally, local features are used for verification. However, others have used different sets of features. Experimental results show that this method is able to detect faces at different orientations with facial features such as beard and glasses.

Face Detection

3.3 Template Matching In template matching, a standard face pattern (usually frontal) is manually predefined or parameterized by a function. Given an input image, the correlation values with the standard patterns are computed for the face contour, eyes, nose, and mouth independently. The existence of a face is determined based on the correlation values. This approach has the advantage of being simple to implement. However, it has proven to be inadequate for face detection since it cannot effectively deal with variation in scale, pose, and shape. Multiresolution, multiscale, subtemplates, and deformable templates have subsequently been proposed to achieve scale and shape invariance.

3.3.1 Predefined Templates An early attempt to detect frontal faces in photographs is reported by Sakai et al [7]. They used several subtemplates for the eyes, nose, mouth, and face contour to model a face. Each subtemplate is defined in terms of line segments. Lines in the input image are extracted based on greatest gradient change and then matched against the subtemplates. The correlations between subimages and contour templates are computed first to detect candidate locations of faces. Then, matching with the other subtemplates is performed at the candidate positions. In other words, the first phase determines focus of attention or region of interest and the second phase examines the details to determine the existence of a face. The idea of focus of attention and subtemplates has been adopted by later works on face detection. Craw et al. presented a localization method based on a shape template of a frontal-view face (i.e., the outline shape of a face). A Sobel filter is first used to extract edges. These edges are grouped together to search for the template of a face based on several constraints. After the head contour has been located, the same process is repeated at different scales to locate features such as eyes, eyebrows, and lips. Later, Craw et al. describe a localization method using a set of 40 templates to search for facial features and a control strategy to guide and assess the results from the template-based feature detectors. Tsukamoto et al. presented a qualitative model for face pattern (QMF). In QMF, each sample image is divided into a number of blocks, and qualitative features are estimated for each block. To parameterize a face pattern, lightness and edgeness are defined as the features in this model. Consequently, this blocked template is used to calculate faceness at every position of an input image. A face is detected if the faceness measure is above a predefined threshold.

Face Detection

3.3.2 Deformable Templates Yuille et al. used deformable templates to model facial features that fit an a priori elastic model to facial features (e.g., eyes). In this approach, facial features are described by parameterized templates. An energy function is defined to link edges, peaks, and valleys in the input image to corresponding parameters in the template. The best fit of the elastic model is found by minimizing an energy function of the parameters. Although their experimental results demonstrate good performance in tracking non-rigid features, one drawback of this approach is that the deformable template must be initialized in the proximity of the object of interest. 3.4 Appearance-Based Methods Contrasted to the template matching methods where templates are predefined by experts, the templates in appearance- based methods are learned from examples in images. In general, appearance-based methods rely on techniques from statistical analysis and machine learning to find the relevant characteristics of face and nonface images. The learned characteristics are in the form of distribution models or discriminant functions that are consequently used for face detection. Meanwhile, dimensionality reduction is usually carried out for the sake of computation efficiency and detection efficacy. Many appearance-based methods can be understood in a probabilistic framework. An image or feature vector derived from an image is viewed as a random variable x, and this random variable is characterized for faces and nonfaces by the class-conditional density functions p(x/face) and p(x/nonface). Bayesian classification or maximum likelihood can be used to classify a candidate image location as face or nonface. Unfortunately, a straightforward implementation of Bayesian classification is infeasible because of the high dimensionality of x, because p(x/face) and p(x/nonface) are multimodal, and because it is not yet understood if there are natural parameterized forms for p(x/face) and p(x/nonface).Hence, much of the work in an appearance-based method concerns empirically validated parametric and nonparametric approximations to p(x/face) and p(x/nonface). Another approach in appearance-based methods is to find a discriminant function (i.e., decision surface, separating hyperplane, threshold function) between face and nonface classes. Conventionally, image patterns are projected to a lower dimensional space and then a discriminant function is formed (usually based on distance metrics) for classification or a
7

Face Detection

nonlinear decision surface can be formed using multilayer neural networks .Recently, support vector machines and other kernel methods have been proposed. These methods implicitly project patterns to a higher dimensional space and then form a decision surface between the projected face and nonface patterns. 3.4.1 Eigenfaces An early example of employing eigenvectors in face recognition was done by Kohonen in which a simple neural network is demonstrated to perform face recognition for aligned and normalized face images. The neural network computes a face description by approximating the eigenvectors of the images autocorrelation matrix. These eigenvectors are later known as Eigenfaces. 3.4.2 Neural Networks Neural networks have been applied successfully in many pattern recognition problems, such as optical character recognition, object recognition, and autonomous robot driving. Since face detection can be treated as a two class pattern recognition problem, various neural network architectures have been proposed. The advantage of using neural networks for face detection is the feasibility of training a system to capture the complex class conditional density of face patterns. However, one drawback is that the network architecture has to be extensively tuned (number of layers, number of nodes, learning rates, etc.) to get exceptional performance. Feraud and Bernier [8]presented a detection method using autoassociative neural networks. The idea shows an autoassociative network with five layers is able to perform a nonlinear principal component analysis. One autoassociative network is used to detect frontal-view faces and another one is used to detect faces turnedupto 60 degrees to the left and right of the frontal view. A gating network is also utilized to assign weights to frontal and turned face detectors in an ensemble of autoassociative networks. 3.4.3 Support Vector Machines Support Vector Machines (SVMs) were first applied to face detection by Osuna et al.SVMs can be considered as a new paradigm to train polynomial function, neural networks, or radial basis function (RBF) classifiers. While most methods for training a classifier (e.g., Bayesian, neural networks, and RBF) are based on of minimizing the training error, i.e.,
8

Face Detection

empirical risk, SVMs operates on another induction principle, called structural risk minimization, which aims to minimize an upper bound on the expected generalization error. An SVM classifier is a linear classifier where the separating hyperplane is chosen to minimize the expected classification error of the unseen test patterns. This optimal hyperplane is defined by a weighted combination of a small subset of the training vectors, called support vectors. Estimating the optimal hyperplane is equivalent to solving a linearly constrained quadratic programming problem. However, the computation is both time and memory intensive. Osunaet al. developed an efficient method to train an SVM for large scale problems, and applied it to face detection. Basedon two test sets of 10,000,000 test patterns of 19 x 19 pixels, their system has slightly lower error rates and runs approximately 30 times faster than the system by Sung and Poggio. 3.4.4 Naive Bayes Classifier Schneiderman and Kanade [9]described a naive Bayes classifier to estimate the joint probability of local appearance and position of face patterns (subregions of the face) at multiple resolutions. They emphasize local appearance because some local patterns of an object are more unique than others; the intensity patterns around the eyes are much more distinctive than the pattern found around the cheeks. There are two reasons for using a naive Bayes classifier (i.e., no statistical dependency between the subregions). First, it provides better estimation of the conditional density functions of these subregions. Second, a naive Bayes classifier provides a functional form of the posterior probability to capture the joint statistics of local appearance and position on the object. At each scale, a face image is decomposed into four rectangular subregions. These subregions are then projected to a lower dimensional space using PCA and quantized into a finite set of patterns, and the statistics of each projected subregion are estimated from the projected samples to encode local appearance. Under this formulation, their method decides that a face is present when the likelihood ratio is larger than the ratio of prior probabilities. The proposed Bayesian approach shows comparable performance to and is able to detect some rotated and profile faces. Schneiderman and Kanade later extend this method with wavelet representations to detect profile faces and cars.

3.4.5 Information-Theoretical Approach The spatial property of face pattern can be modeled through different aspects. The contextual constraint, among others, is a powerful one and has often been applied to texture
9

Face Detection

segmentation. The contextual constraints in a face pattern are usually specified by a small neighborhood of pixels. Markov random field (MRF) theory provides a convenient and consistent way to model context-dependent entities such as image pixels and correlated features. This is achieved by characterizing mutual influences among such entities using conditional MRF distributions. According to the Hammersley- Clifford theorem, an MRF can be equivalently characterized by a Gibbs distribution and the parameters are usually maximum a posteriori (MAP) estimates. Alternatively, the face and nonface distributions can be estimated using histograms.

10

Face Detection

4. COLOR MODELS
4.1 RGB The RGB color model is an additive color model in which red, green, and blue light are added together in various ways to reproduce a broad array of colors. The name of the model comes from the initials of the three additive primary colors, red, green, and blue. The main purpose of the RGB color model is for the sensing, representation, and display of images in electronic systems, such as televisions and computers, though it has also been used in conventional photography. Before the electronic age, the RGB color model already had a solid theory behind it, based in human perception of colors. RGB is a device-dependent color space: different devices detect or reproduce a given RGB value differently, since the color elements (such as phosphors or dyes) and their response to the individual R, G, and B levels vary from manufacturer to manufacturer, or even in the same device over time. Thus an RGB value does not define the same color across devices without some kind of color management. Typical RGB input devices are color TV and video cameras, image scanners, and digital cameras. Typical RGB output devices are TV sets of various technologies (CRT, LCD, plasma, etc.), computer and mobile phone displays, video projectors, multicolor LED displays, and large screens as Jumbo Tron, etc. Color printers, on the other hand, are usually not RGB devices, but subtractive color devices (typically CMYK color model).

11

Face Detection

figure 1: A representation of additive color mixing. Projection of primary color lights on a screen shows secondary colors where two overlap; the combination of all three colors in appropriate intensities makes white. An RGB color space is any additive color space based on the RGB color model. A particular RGB color space is defined by the three chromaticities of the red, green, and blue additive primaries, and can produce any chromaticity that is the triangle defined by those primary colors. The complete specification of an RGB color space also requires a white point chromaticity and a gamma correction curve. An RGB color space can be easily understood by thinking of it as "all possible colors" that can be made from three colorants for red, green and blue. Imagine, for example, shining three lights together onto a white wall: one red light, one green light, and one blue light, each with dimmer switches. If only the red light is on, the wall will look red. If only the green light is on, the wall will look green. If the red and green lights are on together, the wall will look yellow. Dim the red light some and the wall will become more of a yellow-green. Dim the green light instead, and the wall will become more orange. Bringing up the blue light a bit will cause the orange to become less saturated and more white-ish. In all, each setting of the three dimmer switches will produce a different result, either in color or in brightness or both. The set of all possible results is the gamut defined by those particular color light bulbs. Swap out the red light bulb for one of a different brand that is slightly more orange, and there will be slightly different gamut, since the set of all colors that can be produced with the three lights will be changed. An LCD display can be thought of as a grid of thousands of little red, green, and blue light bulbs, each with their own dimmer switch. The gamut of the display will depend on the three colors used for the red, green and blue lights. A wide-gamut display will have very saturated, "pure" light colors, and thus be able to display very saturated, deep colors.

12

Face Detection

Geometric representation

Figure 2 : The RGB color model mapped to a cube The horizontal x-axis as red values increasing to the left, y-axis as blue increasing to the lower right and the vertical z-axis as green increasing towards the top. The origin, black, is hidden behind the cube. Since colors are usually defined by three components, not only in the RGB model, but also in other color models such as CIELAB and Y'UV, among others, then a three-dimensional volume is described by treating the component values as ordinary Cartesian coordinates in a Euclidian space. For the RGB model, this is represented by a cube using non-negative values within a 01 range and assigning black to the origin at the vertex (0, 0, 0), and with increasing intensity values running along the three axis up to white at the vertex (1, 1, 1), diagonally opposite black. An RGB triplet (r,g,b) represents the three-dimensional coordinate of the point of the given color within the cube or its faces or along its edges. This approach allows computations of the color similarity of two given RGB colors by simply calculating the distance between them: the shorter the distance, the higher the similarity.

4.2 HSV
HSV (Hue, Saturation and Value) - defines a type of color space. It is similar to the modern RGB and CMYK models. The HSV color space has three components: hue, saturation and value. 'Value' is sometimes substituted with 'brightness' and then it is known as HSB. The HSV model was created by Alvy Ray Smith in 1978. HSV is also known as the hex-cone color model. True color pictures (i.e. 24 bits bitmaps, or 24/32 bits SVGA screen modes) are based on RGB tri-pixels. Each pixel has three components (Red, Green and Blue) that have each 256
13

Face Detection

possible values, giving a total of 16.777.216 possible colors. Controlling this huge amount of possibilities is almost impossible without the HSV codification that provides an intuitive method for colors selection: 1. The blend of the three components is defined by a single parameter called "Hue". 2. The "Saturation" parameters selects how grey or pure the color will be. 3. The "Value" parameter defines the brightness of the color. Saturation: Saturation indicates the range of gray in the color space. It ranges from 0 to 100%. Sometimes the value is calculated from 0 to 1. When the value is '0,' the color is gray and when the value is '1,' the color is a primary color. A faded color is due to a lower saturation level, which means the color contains more gray. Value: Value is the brightness of the color and varies with color saturation. It ranges from 0 to 100%. When the value is '0' the color space will be totally black. With the increase in the value, the color space brightness rises up and shows various colors. The HSV color wheel is used to pick the desired color. Hue is represented by the circle in the wheel. A separate triangle is used to represent saturation and value. The horizontal axis of the triangle indicates value and the vertical axis represents saturation. When you need a particular color for your picture, first you need to pick a color from the hue (the circular region), and then from the vertical angle of the triangle you can select the desired saturation. For brightness, you can select the desired value from the horizontal angle of the triangle. Sometimes the HSV model is illustrated as a cylindrical or conical object. When it is represented as a conical object, hue is represented by the circular part of the cone. The cone is usually represented in the three-dimensional form. The saturation is calculated using the radius of the cone and value is the height of the cone. A hexagonal cone can also be used to represent the HSV model. The advantage of the conical model is that it is able to represent the HSV color space in a single object. Due to the two-dimensional nature of computer interfaces, the conical model of HSV is best suited for selecting colors for computer graphics.

14

Face Detection

The application of the cylindrical model of HSV color space is similar to the conical model. Calculations are done in a similar way. Theoretically, the cylindrical model is the most accurate form of HSV color space calculation. In practical use, it is not possible to distinguish between saturation and hue when the value is lowered. The cylindrical model has lost its relevance due to this and the cone shape is preferred over it. Transformation between HSV and RGB

From RGB to HSV: Let MAX equals the maximum of the (R, G, B) values, and MIN equals the minimum of those values.

15

Face Detection

HSV to RGB:

Hi = f= p= q= t= R = V, G = t, B = p R = q, G = V, B = p R = p, G = V, B = t R = p, G = q, B = V R = t, G = p, B = V R = V, G = p, B = q In computer graphics, it is typical to represent each channel as an integer from 0 to 255 instead of a real number from 0 to 1. It is worth noting that when encoded in this way, every possible HSV color has an RGB equivalent. However, the inverse is not true. Certain RGB colors have no integer HSV representation. In fact, only 1/256th of the RGB colors are 'available' in HSV, effectively eliminating a single channel of control from the graphics artist.

4.3 YCbCr:
YCbCr or Y'CbCr is a family of color spaces used as a part of the Color ImagePipeline in video and digital photography systems. Y' is the luma component and Cb and Cr are the bluedifference and red-difference chroma components. The prime (') on the Y is to distinguish the luma from luminance, meaning that light intensity is non-linearly encoded using gamma. Y'CbCr is not an absolute colorspace, it is a way of encoding RGB information. The actual color displayed depends on the RGB colorants used to display the signal. Therefore a value expressed as Y'CbCr is only predictable if standard RGB colorants are used.

16

Face Detection

Figure 3 : YCbCr color representation Cathode ray tube displays are driven by red, green, and blue voltage signals, but these RGB signals are not efficient as a representation for storage and transmission, since they have a lot of mutual redundancy. YCbCr and Y'CbCr are a practical approximation to color processing and perceptual uniformity, where the primary colors corresponding roughly to Red, Green and Blue are processed into perceptually meaningful information. By doing this, subsequent image/video processing, transmission and storage can do operations and introduce errors in perceptually meaningful ways. Y'CbCr is used to separate out a luma signal (Y') that can be stored with high resolution or transmitted at high bandwidth, and two chroma components (Cb and Cr) that can be bandwidth-reduced, subsampled, compressed, or otherwise treated separately for improved system efficiency. One practical example would be decreasing the bandwidth or resolution allocated to "color" compared to "black and white", since humans are more sensitive to the black-and-white information as shown in the example in the left side. A color image and the Y, Cb and Cr elements of it. Note that the Y image is essentially a grayscale copy of the main image; that the white snow is represented as a middle value in both Cr and Cb; that the brown barn is represented by weak Cb and strong Cr; that the green grass is represented by weak Cb and weak Cr; and that the blue sky is represented by strong Cb and weak Cr.YCbCr is sometimes abbreviated to YCC. Y'CbCr is often called YPbPr when used for analog component video, although the term Y'CbCr is commonly used for both
17

Face Detection

systems, with or without the prime. Y'CbCr is often confused with the YUV color space, and typically the terms YCbCr and YUV are used interchangeably, leading to some confusion; when referring to signals in video or digital form, the term "YUV" mostly means "Y'CbCr". Y'CbCr signals (prior to scaling and offsets to place the signals into digital form) are called YPbPr, and are created from the corresponding gamma-adjusted RGB(red, green and blue) source using two defined constants Kb and Kr as follows: YPbPr (analog version of Y'CbCr) from R'G'B' ==================================================== Y' = Kr * R' + (1 - Kr - Kb) * G' + Kb * B'

Pb = 0.5 * (B' - Y') / (1 - Kb) Pr = 0.5 * (R' - Y') / (1 - Kr) .................................................... R', G', B' in [0; 1] Y' in [0; 1] Pb in [-0.5; 0.5] Pr in [-0.5; 0.5]

where Kb and Kr are ordinarily derived from the definition of the corresponding RGB space. (The equivalent matrix manipulation is often referred to as the "color matrix.") Here, the prime (') symbols mean gamma correction is being used; thus R', G' and B' and to nominally range from 0 to 1, with 0 representing the minimum intensity (e.g., for display of the color black) and 1 the maximum (e.g., for display of the color white). The resulting luma (Y) value will then have a nominal range from 0 to 1, and the chroma (Cb and Cr) values will have a nominal range from -0.5 to +0.5. The reverse conversion process can be readily derived by inverting the above equations.

18

Face Detection

When representing the signals in digital form, the results are scaled and rounded, and offsets are typically added. For example, the scaling and offset applied to the Y' component per specification results in the value of 16 for black and the value of 235 for white when using an 8-bit representation. The standard has 8-bit digitized versions of Cb and Cr scaled to a different range of 16 to 240. Consequently, rescaling by the fraction (235-16)/(240-16) = 219/224 is sometimes required when doing color matrixing or processing in YCbCr space, resulting in quantization distortions when the subsequent processing is not performed using higher bit depths. The scaling that results in the use of a smaller range of digital values than what might appear to be desirable for representation of the nominal range of the input data allows for some "overshoot" and "undershoot" during processing without necessitating undesirable clipping. This "head-room" and "toe-room" has also been proposed to be used for extension of the nominal color gamut. The form of Y'CbCr that was defined for standard-definition television use in the ITU-R BT.601 (formerly CCIR 601) standard for use with digital component-video is derived from the corresponding RGB space. Derivation of Conversion Equations

To generate the luminance (Y, or gray value) component, biometric experiments were employed to measure how the human eye perceives the intensities of the red, green and blue colors. Based on these experiments, optimal values for coefficients CA and CB were determined, such that:

.Equation 1

Actual values for CA and CB differ slightly in different standards. Conversion from the RGB color space to luminance and chrominance (differential color components)could be described with Equation 2.

19

Face Detection

Coefficients CA and CB are chosen between 0 and 1, which guarantees that the range of Y is constrained between the maximum and minimum RGB values permitted, RGBmax and RGBmin respectively. The minimum and maximum values of R-Y are: minR-Y= RGBmin (CA*RGBmin + (1- CA- CB)*RGBmax + CB*RGBmax) = (CA-1) * (RGBmax -RGBmin) maxR-Y= RGBmax (CA*RGBmax + (1- CA- CB)*RGBmin + CB*RGBmin) = (1-CA) * (RGBmax -RGBmin) Thus, the range of R-Y is: .Equation 3 Similarly, the minimum and maximum values of B-Y are: minB-Y=RGBmin-(CA*RGBmax+(1-CA-CB)RGBmax+CB*RGBmin)=(CB-1)(RGBmaxRGBmin) maxB-Y=RGBmax-(CA*RGBmin+(1-CA-CB)RGBmin+CB*RGBmax)=(1-CB)(RGBmaxRGBmin) Thus, the range of B-Y is: ..Equation 4 In most practical implementations, the range of the luminance and chrominance components should be equal. There are two ways to accomplish this: chrominance components (B-Y and R-Y) can be normalized (compressed and offset compensated), or values above and below the luminance range can be clipped. Both clipping and dynamic range compression result in loss of information; however, the introduced artifacts are different. To leverage differences in the input (RGB) range, different standards choose different trade-offs between clipping and normalization. The RGB to YCrCb color space conversion core facilitates both range compression and optional clipping and clamping. Range, offset, clipping and clamping levels are parameterizable. The core supports conversions that fit the following general form:

20

Face Detection

STEPS FOR COLOR BASED APPROACH :

Eigenimage Generation A set of eigenimages was generated using 106 test images which were manually cut from 7 test images and edited in Photoshop to catch exact location of faces with a square shape. The cropped test images were converted into gray scale, and then eigenimages were computed using those 106 test images. In order to get a generalized shape of a face, the largest 10 eigenimages in terms of their energy densities, have been obtained .

Building Eigenimage Database: In order to save time to magnify or shrink an eigenimage to meet the size of the test image, a group of eigenimages was stored in the database so that an appropriate eigenimage can be called with ease without going through image enlarging or shrinking process. The eigenimages were stored in 20 files from 30 pixel-width square image to 220 pixel-width square image with 10-pixel . Test Image Selection: After the color-based segmentation process, skin-colored area can be taken apart and gives this binary image, a set of small test images needs to be selected and passed to the image matching algorithm for the further process. The result of image selection solely based on the color information .A square box was applied on each segment with the quantified window size which was selected to meet the size of a face. Unwanted box rejection : Many non-face areas are also selected like hands and areas having colours similar to the skin colour. Thresholding : As faces generally dont fall far below the image, lower boxes around 100 pixels below the image were discarded.

21

Face Detection

5. SKIN SEGMENTATION
Beginning with a color image, the first stage is to transform it to a skin-likelihood

image. This involves transforming every pixel from RGB representation to chroma representation and determining the likelihood value based on the skin model used. The skinlikelihood image will be a gray-scale image whose gray values represent the likelihood of the pixel belonging to skin. In the resulting skin-likelihood image the skin regions (like the face, the hands and the arms) are observed to be brighter than the non-skin region. It is important to note that the detected regions may not necessarily correspond to skin. It is only reliable to conclude that the detected region have the same color as that of the skin. The adaptive segmentation process can reliably point out regions that do not have the color of the skin and such regions would not need to be considered anymore in the face finding process. Since the skin regions are brighter than the other parts of the images, the skin regions can be segmented from the rest of the image through a thresholding process. To process different images of different people with different skin, a fixed threshold value is not possible to be found. Since people with different skins have different likelihood, an adaptive thresholding process is required to achieve the optimal threshold value for each run. The adaptive thresholding is based on the observation that stepping the threshold value down may intuitively increase the segmented region. However, the increase in segmented region will gradually decrease (as percentage of skin regions detected approaches 100%), but will increase sharply when the threshold value is considerably too small that other non-skin regions get included. The threshold value at which the minimum increase in region size is observed while stepping down the threshold value will be the optimal threshold. Using this technique of adaptive thresholding, many images yield good results; the skin-colored regions are effectively segmented from the non-skin colored regions. But not all the detected skin regions in the results obtained contain faces. Some correspond to the hands and arms and other exposed part of the body, while some corresponds to objects with colors similar to those of the skin. Hence various other steps like template matching are adopted to detect facial regions.

22

Face Detection

6. MORPHOLOGICAL OPERATIONS
Mathematical morphology (MM) is a theory and technique for the analysis and processing of geometrical structures, based on set theory, lattice theory, topology, and random functions. MM is most commonly applied to digital images, but it can be employed as well on graphs , surface meshes, solids, and many other spatial structures. Topological and geometrical continuous-space concepts such as size, shape, convexity, connectivity, and geodesic distance, can be characterized by MM on both continuous and discrete spaces. MM is also the foundation of morphological image processing, which consists of a set of operators that transform images according to the above characterizations. MM was originally developed for binary images, and was later extended to grayscale functions and images. The subsequent generalization to complete lattices is widely accepted today as MM's theoretical foundation. Binary morphology In binary morphology, an image is viewed as a subset of an Eucledian Space. integer grid , for some dimension d. or the

Structuring Element The basic idea in binary morphology is to probe an image with a simple, pre-defined shape, drawing conclusions on how this shape fits or misses the shapes in the image. This simple "probe" is called structuring element, and is itself a binary image (i.e., a subset of the space or grid). Here are some examples of widely used structuring elements (denoted by B):

Let Let

; B is an open disk of radius r, centered at the origin. ; B is a 3x3 square, that is, B={(-1,-1),(-1,0),(-1,1),(0,-1),(0,0),(0,1),(1,-

1),(1,0),(1,1)}.

Let

; B is the "cross" given by: B={(-1,0),(0,-1),(0,0),(0,1),(1,0)}.

23

Face Detection

Basic operators The basic operations are shift-invariant (translation invariant) operators strongly related to Minkowski addition. Let E be an Euclidean space or an integer grid, and A a binary image in E. Erosion:

figure 4 : The erosion of the dark-blue square by a disk, resulting in the light-blue square The erosion of the binary image A by the structuring element B is defined by: , where Bz is the translation of B by the vector z, i.e., , .

When the structuring element B has a center (e.g., B is a disk or a square), and this center is located on the origin of E, then the erosion of A by B can be understood as the locus of points reached by the center of B when B moves inside A. For example, the erosion of a square of side 10, centered at the origin, by a disc of radius 2, also centered at the origin, is a square of side 6 centered at the origin. The erosion of A by B is also given by the expression:

24

Face Detection

Dilation:

figure 5: The dilation of the dark-blue square by a disk, resulting in the light-blue square with rounded corners. The dilation of A by the structuring element B is defined by:

. The dilation is commutative, also given by

If B has a center on the origin, as before, then the dilation of A by B can be understood as the locus of the points covered by B when the center of B moves inside A. In the above example, the dilation of the square of side 10 by the disk of radius 2 is a square of side 14, with rounded corners, centered at the ori gin. The radius of the rounded corners is 2. The dilation can also be obtained by: denotes the symmetric of B, that is, , where Bs

25

Face Detection

Opening:

figure 6: The opening of the dark-blue square by a disk, resulting in the light-blue square with round corners. The opening of A by B is obtained by the erosion of A by B, followed by dilation of the resulting image by B: .

The opening is also given by

, which means that it is the locus of

translations of the structuring element B inside the image A. In the case of the square of radius 10, and a disc of radius 2 as the structuring element, the opening is a square of radius 10 with rounded corners, where the corner radius is 2. Closing:

figure 7: The closing of the dark-blue shape (union of two squares) by a disk, resulting in the union of the dark-blue shape and the light-blue areas.

26

Face Detection

The closing of A by B is obtained by the dilation of A by B, followed by erosion of the resulting structure by B:

The closing can also be obtained by complement of X relative to E (that is,

, where Xc denotes the ). The above means that

the closing is the complement of the locus of translations of the symmetric of the structuring element outside the image A. Properties of the basic operators Here are some properties of the basic binary morphological operators (dilation, erosion, opening and closing):

They are translation invariant. They are increasing , that is, if , etc. , then , and

The dilation is commutative. If the origin of E belongs to the . structuring element B, then

The dilation is associative, i.e., erosion satisfies . . .

. Moreover, the

Erosion and dilation satisfy the duality Opening and closing satisfy the duality The dilation is distributive over set union. The erosion is distributive over set intersection.

The dilation is a pseudo-inverse of the erosion, and vice-versa, in the following sense: if and only if .

Opening and closing are idempotent. Opening Is anti-extensive, i.e., . , whereas the closing is extensive, i.e.,

27

Face Detection

7. PROCEDURE
The following are the steps used for face detection using template matching. 1. Building a skin model: The first step to make skin model is by taking 50 number of skin sample images-making sure that all the samples are cropped to the same size of 15x15 and that only pure skin areas are included (should not include nose, eyes or lips). These sample images are converted from RGB to YCbCr color space and only Cb and Cr components are to be considered while luminance Y is to be discarded. The sample mean for both Cb and Cr components and the covariance matrix are computed. 2. Skin Segmentation: After skin model is produced, the test images have to be segmented. But before the image is skin segmented, the skin likelihood value is computed (varies from 0 to 1) for each pixel in the test image by computing mahalanobis distance from the mean. Then the skin likelihood values are normalized and gray scale image is obtained where whiter areas have higher probability of being skin areas than the darker areas. The next step is to place a threshold so that the pixels which having skin likelihood values lesser than threshold will be discarded. The adaptive threshold method is used to keep as many pixels as possible since this stage is the beginning of the whole system. After the optimal threshold value has been set, all the pixel values which have likelihood values higher than threshold are set to1 and the rest of pixels are set to 0, thus resulting in a binary image. 3. Morphological Operation: After obtaining a binary image with 1s representing skin pixels and 0s representing nonskin pixels, morphological operations such as filtering, erosion and dilation are applied to separate skin areas which are loosely connected. Morphological erosion is applied by using structuring element of disk size of 10. 4. Region Labeling: The binary image obtained as a result of the morphological operation, needs to be labeled so that each clustered group of pixels can be identified at a single region so that each
28

Face Detection

region can be analyzed further to determine whether it is a face region or not, that is, instead of 1 and 0, each region is labeled as 1, 2, 3, and so on. Pixels with zero values remain unchanged. 5. Euler Test: This stage is processed for each region at a time. The objective of this stage is to reject regions which have no holes under the assumption that a face region will contain atleast one hole. Number of holes = 1- Euler Number After this stage, only those regions that have atleast one hole are left for consideration in the next stage. 6. Aspect Ratio Test: In this step, the aspect ratio of each region is computed. Any region with aspect ratio found unlikely to be a face is rejected. By considering that the face and neck regions may be connected, the aspect ratio is assumed to lie between 1 to 4.5. 7. Template Matching: This is the final stage of face detection where the cross-correlation between template face and grayscale region is obtained by multiplying the binary image region with grayscale original image. The template face is an average frontal face of 30 people. Here in this stage, the width, height, orientation and centroid of binary region under consideration has to be computed. Then the template face image is resized, rotated and its centroid is placed on the centroid of the region in the original grayscale image with only one region on it. The rotated template needs to be cropped properly and the size needs to be same as that of the region. Then the cross-correlation between the region and the template is calculated. The threshold value is to be selected to be a lower value so that small face in the image can be detected but in order to detect larger faces in the image, this threshold value is to

29

Face Detection

be reset to an upper value. And after again going through the above steps of detection, if an upper threshold value is reached where larger faces are detected, then quit.

30

Face Detection

8. ALGORITHM:
Step 1 : The first step is to build a skin model. The function named make_model( ) is called in the main function face_detection. This function calls another function get_crcb().get_crcb( ) takes small skin sample images as input. The sample skin consists of pure skin area excluding nose, eyes and lips. Then the images are converted from RGB to yCbCr and the Cb , Cr components of all 50 such sample skin are considered. Then the mean of all Cb values is stored in bmean variable and mean of Cr is stored in rmean. The matrix rcov stores the covariance of Cr and Cb matrix. Step 2: The second step is to compute the skin likelyhood region in the test image. The function get_likelyhood( ) is called which takes test image, rmean, bmean and rcov as input arguments. The image is converted to YCbCr , Cb and Cr values for each pixel is obtained. The matrix likely_skin of size same as test image initially containing only zeros stores the skin likelihood values that are computed. The skin likelihood values are normalized and then a grayscale image is obtained where whiter region have higher probability of being skin areas than the darker region . Step3: After having calculated the skin likelihood values the next step is to segment the test image based on a threshold range. The function segment_adaptive( ) is called which takes

likely_skin matrix as its input. This matrix is the output from the function called in second step. A matrix temp of size same as the input matrix. The threshold range is set from 0.55 to 0.01 with 0.1 as stepsize. After the optimal threshold value has been set, the pixels having skin likelihood values greater than the optimal value are set to 1 and otherwise set to 0. The values are stored in the matrix binary_skin. Step4: After segmenting the test image into skin areas we obtain binary_skin matrix as our output image. The next function called is label_regions( ), which takes the above matrix as its input.
31

Face Detection

A matrix filledBW created of size same as binary_skin stores the binary_skin after it has filled with holes using imfill( ). Morphological erosion is applied on filledBW using disk as the structuring element of size 8 and dilation using disk of size 6 subsequently. To retain the holes the dilated image is multiplied with skin segmented image binary_skin and stored as dilateBW. Labeling of connected components in this binary image is done using the predefined function bwlabel( ). labelBW is the matrix containing the labels for all connected regions and num stores the number of such connected regions. Then the output image shows various regions in different colors with background in black. Step5: In this step, the function euler_test() is called. The input for this step is erodedBW. Using regionprops, euler number is calculated for each labeled region as eulers. The number of holes for each region is calculated as holes=1- euler-no. Then region_index returns the index of those regions having minimum one hole. The output of this step is eulerBW. Step6: In the main program, the function aspect_test() is called which takes the input eulerBW which is the output of the previous step. This step is used to determine if the aspect ratio of each region lies within the range for a face region. The input image is filled with holes and dilated. Then the function get_aspect() is called where major-axis length and minor-axis length for each region are computed as major_length and minor_length respectively. The aspect ratio, aspect_ratio, is determined as the ratio of major_length to minor_length. The regions having this ratio between 1 and 4.5 are taken into consideration as face regions. The output of this step is aspectBW. Step7: Here, the function template_test() is called. The input are aspectBW, the original image and template. The original image is converted to grayscale. The orientation angle and centroid are computed for each region using regionprops. In this step, the template image is resized, rotated and cropped according to region properties. Firstly, the original grayscale is generated with only one face region. The width and height of the region, regw and regh, are determined using the region property boundingbox. The ratio of height to width is computed. If this ratio is too long, then it is set to reasonable height. A new image is generated having the same size as the original one, but with the face of the model on it rotated accordingly. Lastly, crosscorrelation between the model and the region is performed. If the cross-correlation value is greater than 0.6, the region is deemed to pass this step represented as template passed. The detected faces on the original image are finally marked with *.

32

Face Detection

9. EXPERIMENTAL RESULTS:

Figure8 : test image for face detection

Figure 9: black isolated hole rejection

33

Face Detection

Fig. 10. Small regions eliminated image

Figure11: edges detected by Roberts cross operator

34

Face Detection

Fig.12. Integrated binary image

35

Face Detection

10.EXPERIMENTAL RESULT OF TEMPLATE MATCHING:


Example 1

Example2:

36

Face Detection

Example3:

37

Face Detection

11. CONCLUSION
A first step of any face processing system is detecting the locations in images where faces are present. However, face detection from a single image is a challenging task because of variability in scale, location, orientation (up-right, rotated), and pose (frontal, profile). Our program is able to detect frontal faces with greater accuracy than side faces. Facial expression, occlusion, and lighting conditions also change the overall appearance of faces .The challenges observed in our face detection technique can be attributed to the following factors: Pose: The images of a face vary due to the relative camera-face pose (frontal, 45 degree, profile, upside down), and some facial features such as an eye or the nose may become partially or wholly occluded. Presence or absence of structural components: Facial features such as beards, mustaches, and glasses may or may not be present and there is a great deal of variability among these components including shape, color, and size. Facial expression: The appearance of faces is directly affected by a persons facial expression. Occlusion: Faces may be partially occluded by other objects. In an image with a group of people, some faces may partially occlude other faces. Image orientation: Face images directly vary for different rotations about the cameras optical axis. Imaging conditions: When the image is formed, factors such as lighting (spectra, source distribution and intensity) and camera characteristics (sensor response, lenses) affect the appearance of a face.

We tested our program on about 20 sample images and found it to be reasonably fast, taking on an average of 80 to 120 seconds depending on the internal downsampling rate applied to the input images and various other parameters. Suppose the number of images in a test image is 17, the program could detect 15 images. All frontal faces with no beards, spectacles etc. were detected with 100% accuracy. Performance accuracy obtained was in the range of 85% to 100

38

Face Detection

12. REFERENCES:
1. R. Gonzalez and R. Woods, Digital Image Processing - Second Edition, Prentice Hall, 2002. 2. R. Gonzalez and R. Woods, Digital Image Processing using MATLAB, Prentice Hall. 3. R. L. Hsu, M. A. Mottaleb, and A. K. Jain. Face detection in color images. IEEE Trans. Pattern Analysis and Machine Intell., 24:696706, 2002. 4. S.A. Sirohey, Human Face Segmentation and Identification,Technical Report CS-TR3176, Univ. of Maryland, 1993. 5. D. Chetverikov and A. Lerch, Multiresolution Face Detection,Theoretical Foundations of Computer Vision, vol. 69, pp. 131-140,1993. 6. M.F. Augusteijn and T.L. Skujca, Identification of Human Faces through Texture-Based Feature Recognition and Neural Network Technology, Proc. IEEE Conf. Neural Networks, pp. 392-398, 1993. 7. T. Sakai, M. Nagao, and S. Fujibayashi, Line Extraction and Pattern Detection in a Photograph, Pattern Recognition, vol. 1, pp. 233-248, 1969. 8. R. Feraud and O. Bernier, Ensemble and Modular Approaches for Face Detection: A Comparison, Advances in Neural Information Processing Systems 10, M.I. Jordan, M.J. Kearns, and S.A. Solla, eds., pp. 472-478, MIT Press, 1998. 9. H. Schneiderman and T. Kanade, Probabilistic Modeling of Local Appearance and Spatial Relationships for Object Recognition, Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 45-51, 1998. 10. Peer et al., Segmentation and Tracking of Faces in Color Images, Proc. Second Intl Conf. Automatic Face and Gesture Recognition, pp. 236-241, 1996. 11. Garcia and Tziritas, Frontal-View Face Detection and Facial Feature Extraction Using Color, Shape and Symmetry Based Cost Functions, Pattern Recognition Letters, vol. 17, no. 8, pp. 669-680, 1998. 12. http://www.enggstudentsproject.com 13. http://www.mathworks.com 14. http://www.wikipedia.com

39

Face Detection

13. APPENDIX
SOURCE CODE FOR TEMPLATE MATCHING METHOD: clc; clear all; close all; inputfile=imread('79.jpg'); [rmean,bmean,rbcov]=make_model(); %convert input image into skin likelihood image [likely_skin]=get_likelyhood(inputfile,rmean,bmean,rbcov); %segmented skin by adaptive threshold method [skinBW,opt_th] = segment_adaptive(likely_skin); %apply erosion,dilations and label the regions [erodedBW]=label_regions(skinBW); %apply Euler test i.e number holes >=1 [eulerBW]=euler_test(erodedBW); %apply aspect ratio test [aspectBW]=aspect_test(eulerBW); % apply template matching test [templateBW]=template_test(aspectBW,'79.jpg','2.jpg'); [K,P]=bwlabel(templateBW,8);

%compute centroid of each region and plot on original image s = regionprops(bwlabel(templateBW), 'centroid'); centroids = cat(1, s.Centroid); subplot(4,3,12); imshow(imread('79.jpg')) if(P>0)
40

Face Detection

hold on plot(centroids(:,1), centroids(:,2),'r*') hold off end title('Final Detection')

%% BUILDING A SKIN MODEL TO OBTAIN rmean, bmean ,rbcov function [rmean,bmean,rbcov]=make_model() %get crominance values of skin sample images [cr1, cb1] = get_crcb('sampleset/1.jpg'); [cr2, cb2] = get_crcb('sampleset/2.jpg'); [cr3, cb3] = get_crcb('sampleset/3.jpg'); [cr4, cb4] = get_crcb('sampleset/4.jpg'); [cr5, cb5] = get_crcb('sampleset/5.jpg'); [cr6, cb6] = get_crcb('sampleset/6.jpg'); [cr7, cb7] = get_crcb('sampleset/7.jpg'); [cr8, cb8] = get_crcb('sampleset/8.jpg'); [cr9, cb9] = get_crcb('sampleset/9.jpg'); [cr10, cb10] = get_crcb('sampleset/10.jpg'); [cr11, cb11] = get_crcb('sampleset/11.jpg'); [cr12, cb12] = get_crcb('sampleset/12.jpg'); [cr13, cb13] = get_crcb('sampleset/13.jpg'); . . . [cr50, cb50] = get_crcb('sampleset/50.jpg'); %concatenate all values
41

Face Detection

cr = [cr1 cr2 cr3 cr4 cr5 cr6 cr7 cr8 cr9 cr10 cr11 cr12 cr13cr50]; cb = [cb1 cb2 cb3 cb4 cb5 cb6 cb7 cb8 cb9 cb10 cb11 cr12 cb13cr50]; %compute statistics of sample values rmean = mean(cr); bmean = mean(cb); rbcov = cov(cr,cb);

%% Following function returns cromatic values of an input image function [cr, cb] = get_crcb(filename) im= imread(filename); % convert RGB to YCbCr imycc = rgb2ycbcr(im); % low pass filter matrix lpf = 1/9 * ones(3); % take Cr and Cb channels cr = imycc(:,:,3); cb = imycc(:,:,2); % pass through low pass filter cr = filter2(lpf, cr); cb = filter2(lpf, cb); %concatenate all rows cr = reshape(cr, 1,prod(size(cr))); cb = reshape(cb, 1, prod(size(cb))); %% CALCULATING SKIN LIKELIHOOD REGION OF A IMAGE function[likely_skin]=get_likelyhood(filename,rmean,bmean,rbcov) %read input file img=filename;
42

Face Detection

% convert RGB to YCrCb color space imycbcr = rgb2ycbcr(img); [m,n,l] = size(img); %create a 2D matrix with same dimension of image likely_skin = zeros(m,n); for i = 1:m for j = 1:n %get crominance values for each pixel cr = double(imycbcr(i,j,3)); cb = double(imycbcr(i,j,2)); % compute the likelyhood of each pixel x = [(cr-rmean);(cb-bmean)]; likely_skin(i,j) = [power(2*pi*power(det(rbcov),0.5),-1)]*exp(-0.5* x'*inv(rbcov)* x); end end %pass through low pass filter lpf= 1/9*ones(3); likely_skin = filter2(lpf,likely_skin); %normalize the likelyhood values with maximum value likely_skin = likely_skin./max(max(likely_skin)); %show skin likelyhood grayscale image subplot(4,3,3); imshow(img, [0 1]) title('Original RGB Image') subplot(4,3,4); imshow(likely_skin, [0 1]) title('Skin Likelyhood Image')
43

Face Detection

%% ADAPTIVE SEGMENTATION TECHNIQUE function [binary_skin,opt_th] = segment_adaptive(likely_skin) %intialize [m,n] = size(likely_skin); temp = zeros(m,n); diff_list = []; %set threshold range and stepsize by experiment high=0.55; low=0.01; step_size=-0.1; bias_factor=1; indx_count=[(high-low)/abs(step_size)]+2; % finding optimal threshold for threshold = high:step_size:low binary_skin = zeros(m,n); binary_skin(find(likely_skin>threshold)) = 1; diff = sum(sum(binary_skin - temp)); diff_list = [diff_list diff]; temp = binary_skin; end % optimal threshold is the threshold where minimum diff occur [C, indx] = min(diff_list); opt_th = (indx_count-indx)*abs(step_size)*bias_factor; binary_skin = zeros(m,n); binary_skin(find(likely_skin>opt_th)) = 1; %show skin segmented binary image subplot(4,3,5);
44

Face Detection

imshow(binary_skin, [0 1]) title('Skin Segmented Image') %% LABELLING OF THE REGIONS function[labelBW]=label_regions(binary_skin) [m,n] = size(binary_skin); %Fill the regions filledBW=zeros(m,n); filledBW = imfill(binary_skin,'holes'); %Apply Erosion se2 = strel('disk',8); erodedBW=zeros(m,n); erodedBW = imerode(filledBW,se2); subplot(4,3,6); imshow(erodedBW) title({'After Erosion';'(disk size: 10)'}) %Apply Dilation se1 = strel('disk',6); dilateBW=zeros(m,n); dilateBW=imdilate(erodedBW,se1); %multiply eroded image with skin segmented image to retain holes dilateBW = immultiply(dilateBW,binary_skin); subplot(4,3,7); imshow(dilateBW) title({'After Dilation';'(disk size: 8)'}) %Label skin regions labelBW=zeros(m,n); [labelBW,num] = bwlabel(dilateBW,8);
45

Face Detection

%color the labeled regions -background is black color_regions=zeros(m,n); color_regions= label2rgb(labelBW, 'hsv', 'black', 'shuffle'); %show colored labeled regions of image subplot(4,3,8); imshow(color_regions) title({'Labeled Regions';['(',num2str(num),' regions)']})

%% PROGRAM TO CALCULATE EULER NO. TO OBTAIN FACE REGION function [eulerBW]=euler_test(labelBW) %compute Euler number of each region e = regionprops(labelBW,'EulerNumber'); eulers=cat(1,e.EulerNumber); % compute number of holes for each region holes=1-eulers; %take regions which has at least 1 hole region_index = find(holes>=1); [m,n]=size(labelBW); eulerBW=zeros(m,n); % make new binary image only with regions which pass euler test for i=1:length(region_index) % Compute the coordinates for this region. [x,y] = find(bwlabel(labelBW) == region_index(i)); % Get an image that only has this region, the rest is black bwsegment = bwselect(labelBW,y,x,8); eulerBW=eulerBW+bwsegment; end
46

Face Detection

subplot(4,3,9); imshow(eulerBW) title({'After Euler Test';['(',num2str(length(region_index)),' regions)']}) %% Following function determines whether the region aspect ratio is within range of being a face region function [aspectBW]=aspect_test(eulerBW) [m,n]=size(eulerBW); %fill holes in image filledBW = imfill(eulerBW,'holes'); %Apply Dilation se1 = strel('disk',3); growBW=zeros(m,n); growBW=imdilate(filledBW,se1); %label as binary [labels,num] = bwlabel(growBW,8); [aspect_ratio]=get_aspect(labels); %take regions which has aspect ratio within range region_index = find(aspect_ratio<=4.5 & aspect_ratio>=1); aspectBW=zeros(m,n); % Make new binary image only with regions which aspect ratio test for i=1:length(region_index) % Compute the coordinates for this region. [x,y] = find(bwlabel(filledBW) == region_index(i)); % Get an image that only has this region, the rest is black bwsegment = bwselect(filledBW,y,x,8); aspectBW=aspectBW+bwsegment; end

47

Face Detection

subplot(4,3,10); imshow(aspectBW) title({'After Aspect Ratio Test';['(',num2str(length(region_index)),' regions)']}) %% COMPUTING THE ASPECT RATIO OF THE REGION TO BE CONSIDERED FACE function [ratiolist] = get_aspect(inputBW) %compute length of major axis (of ellipse)for each region major = regionprops(inputBW,'MajorAxisLength'); major_length=cat(1,major.MajorAxisLength); %compute length of minor axis (of ellipse)for each region minor = regionprops(inputBW,'MinorAxisLength'); minor_length=cat(1,minor.MinorAxisLength); %compute aspect ratio ratiolist=major_length./minor_length; %TEMPLATE DESIGN a=imread('1.bmp'); a=rgb2gray(a); a=histeq(a); a=imcrop(a,[0 0 100 100]); a=imdivide(a,30); file=dir('*.bmp'); for k=2:30(file) r=imread(file(k).name); p=rgb2gray(r); h=histeq(p); h=imcrop(h,[0 0 100 100]); h=imdivide(h,30); a=imadd(h,a);

48

Face Detection

end temp=imresize(a,[30 30]); imshow(temp); %% Following function returns image regions which pass template matching test function [template_passed]=template_test(aspectBW,originalRGB,template) %convert original image into grayscale imgray=rgb2gray(imread(originalRGB)); imtemplate=imread(template); %label the binary image with regions [labels,num] = bwlabel(aspectBW,8); [m,n]=size(aspectBW); %compute orientation angle for each region orient = regionprops(labels,'Orientation'); angles=cat(1,orient.Orientation); %compute centroid for each region c = regionprops(labels,'Centroid'); centroids=cat(1,c.Centroid); %image with regions which pass template matching test template_passed=zeros(m,n); gray_matched=zeros(m,n); %resize,rotate and crop the template image according to region properties for j=1:num, % Compute the coordinates for this region. [x,y] = find(labels == j); % Get an image that only has this region, the rest is black bwsegment = bwselect(aspectBW,y,x,8); % Generate orignal gray scale with only one face region
49

Face Detection

oneface=immultiply(bwsegment,imgray); %centroid of the region cx1=centroids(j,1); cy1=centroids(j,2);

%width and height of the region p=regionprops(bwlabel(bwsegment),'BoundingBox'); boxdim=cat(1,p.BoundingBox); regw=boxdim(3); regh=boxdim(4); ratio=regh/regw; %if region is too long, set to resonable new height and shift centroid up if(ratio>1.6) regh=1.5*regw; cy1=cy1-(0.1*regh); end %resize the model with same scale as the region gmodel_resize=imresize(imtemplate,[regh regw],'bilinear'); %rotate the resized model by the angle of orientation of the region if(angles(j)>0) gmodel_rotate=imrotate(gmodel_resize,angles(j)-90, 'bilinear', 'loose'); else gmodel_rotate=imrotate(gmodel_resize,90+angles(j),'bilinear', 'loose'); end %computing the centroid and size of the model bwmodel=im2bw(gmodel_rotate,0); %size of the model before crop
50

Face Detection

[g,h]=size(bwmodel); %ensure that the bw model region has only one region bwmorphed = bwmorph(bwmodel,'clean'); [L,no]=bwlabel(bwmorphed,8); if(no==1) bwsingle=bwmorphed; else ar=regionprops(bwlabel(bwmorphed),'Area'); areas=cat(1,ar.Area); [C,I]=max(areas); % Compute the coordinates for this region. [x1,y1] = find(bwlabel(bwmorphed)== I); % Get an image that only has this region, the rest is black bwsingle = bwselect(bwmorphed,y1,x1,8); end %fill the model and crop - this option of region props crops automatically filledmodel=regionprops(bwlabel(bwsingle),'FilledImage'); bwcrop=filledmodel.FilledImage;

%size of the scaled and rotated model after crop [modh,modw]=size(bwcrop); %crop the grayscale model to same size as that of bwlabel model gmodel_crop=imresize(gmodel_rotate,[modh modw],'bilinear'); %centroid of scaled,rotated and cropped model cenmod=regionprops(bwlabel(bwcrop),'Centroid'); central=cat(1,cenmod.Centroid); cx2=central(1,1);
51

Face Detection

cy2=central(1,2); mfit = zeros(size(oneface)); mfitbw = zeros(size(oneface)); [limy, limx] = size(mfit);

% Compute the coordinates of where the face model is going to be in the main image startx = cx1-cx2; starty = cy1-cy2; endx = startx + modw-1; endy = starty + modh-1; % Check for boundaries of the image startx = checklimit(startx,limx); starty = checklimit(starty,limy); endx = checklimit(endx,limx); endy = checklimit(endy,limy); % The following is to generate a new image having the same size as the original one, but with the face of the model on it rotated accordingly. for i=starty:endy, for j=startx:endx, mfit(round(i),round(j)) = gmodel_crop(round(i-starty+1),round(j-startx+1)); end; end; % Get the cross-correlation value between model and region gray_matched=gray_matched+mfit; crosscorr =corr2(mfit,oneface); %if cross-correlation value higher than threshold, add the region if(crosscorr>=0.6)

52

Face Detection

template_passed=template_passed+bwsegment; end; subplot(4,3,11); imshow(gray_matched,[0 255]) title('Template Matching') end; %% Verifies that the coordinate is between the image region. function newcoord = checklimit(coord,maxval) newcoord = coord; if (newcoord<1) newcoord=1; end; if (newcoord>maxval) newcoord=maxval; end; SOURCE CODE FOR COLOR BASED APPROACH: function outFaces = faceDetection(img) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%% % Function 'outFaces' returns the matrix with the information of % face locations and gender. % outFaces = faceDetection(img) % img: double formatted image matrix % coefficients effect_num=3; min_face=170; small_area=15; imgSize = size(img); uint8Img = uint8(img); gray_img=rgb2gray(uint8Img);
53

Face Detection

% get the image tranformed through YCbCr filter filtered=YCbCrbin(img,161.9964,-11.1051,22.9265,25.9997,4.3568,3.9479,2); % black isolated holes rejection filtered=bwfill(filtered,'holes'); % white isolated holes less than small_area rejection filtered=bwareaopen(filtered,small_area*10); % first erosion filtered = imerode(filtered,ones(2*effect_num)); % edge detection with the Roberts method with sensitivity 0.1 edge_img=edge(gray_img,'roberts',0.1); % final binary edge image edge_img=~edge_img; % integeration of two images, edge + filtered image filtered=255*(double(filtered) & double(edge_img)); % double % second erosion filtered=imerode(filtered,ones(effect_num)); % black isolated holes rejection filtered=bwfill(filtered,'hole'); % small areas less than the minumum area of face rejection filtered=bwareaopen(filtered,min_face); % group labeling in the filtered image [segments, num_segments] = bwlabel(filtered); YCbCrseg.m function result=YCbCrbin(RGBimage,meanY,meanCb,meanCr,stdY,stdCb,stdCr,factor) % YCbCrbin returns binary image with skin-colored area white. % Example: % result=YCbCrbin(RGBimage,meanY,meanCb,meanCr,stdY,stdCb,stdCr,factor) % RGBimage: double formatted RGB image % meanY: mean value of Y of skin color % meanCb: mean value of Cb of skin color % meanCr: mean value of Cr of skin color % stdY: standard deviation of Y of skin color % stdCb: standard deviation of Cb of skin color % stdCr: standard deviation of Cr of skin color
54

Face Detection

% factor: factor determines the width of the gaussian envelop. % All the parameters are based on the training facial segments taken from 7 training images YCbCrimage=rgb2ycbcr(RGBimage); % set the range of Y,Cb,Cr min_Cb=meanCb-stdCb*factor; max_Cb=meanCb+stdCb*factor; min_Cr=meanCr-stdCr*factor; max_Cr=meanCr+stdCr*factor; % min_Y=meanY-stdY*factor*2; % get a desirable binary image with the acquired range imag_row=size(YCbCrimage,1); imag_col=size(YCbCrimage,2); binImage=zeros(imag_row,imag_col); Cb=zeros(imag_row,imag_col); Cr=zeros(imag_row,imag_col); Cb(find((YCbCrimage(:,:,2) > min_Cb) & (YCbCrimage(:,:,2) < max_Cb)))=1; Cr(find((YCbCrimage(:,:,3) > min_Cr) & (YCbCrimage(:,:,3) < max_Cr)))=1; binImage=255*(Cb.*Cr); result=binImage;

55

Anda mungkin juga menyukai