AbstractFace detection is already incorporated in many YUV, and YCbCr [8], or a combination of them [9]. Also, the
biometrics and surveillance applications. Therefore, the reduction use of morphological ltering is considered in many cases for
of false detections is a priority in those systems. However, face noise-removal and to preserve the largest connected compo-
detection is still challenging. Many factors, such as pose variation
and complex backgrounds, contribute to false detections. Besides, nent which commonly represents the face candidates [10].
the delity of a true detection, measured by precision rate, In this paper, we propose an improvement on face detection
is a concern in content-based information retrieval. Following performance regarding the precision rate for weak face detec-
those issues, combinations of methods are developed focusing on tors by a skin detection post-processing. We considered here
balancing the trade-off between hit-rate and miss-rate. In this weak face detectors the ones that reach maximum detection
paper, we present an approach that improves face detection based
on a post-processing of skin features. Our method enhanced rate with a high false detection rate, for example, the Viola-
the performance of weak detectors using a straightforward and Jones face detector [11]. We adopted the post-processing
low complex skin percentage threshold constraint. Furthermore, combined approach which we threshold the ratio of skin pixels
we also present a statistical analysis comparing our approach over the non-skin pixels under an experimental constraint.
and two face detectors, under two different conditions for Furthemore, we chose to use the conventional RGB color
skin detection training, using a robust dataset for testing. The
experimental results showed a signicant drop in the number space, and we did not use any morphological ltering, which
of false positives, reducing in 53%, while the precision rate was usually have to ne-tune many parameters. Also, our approach
elevated in almost 5% when the Viola-Jones approach was used differs from what is used in the literature that is based on the
as face detector. number of pixels in the ROI and a ne-tuned parameter [12].
Index TermsFace detection, Skin detection, Performance Our main contributions in this work are: 1) an improvement
Improvement, Post-processing
of the precision rate for weak detectors using post-processing
skin features analysis with a simple constraint; 2) a thorough
I. I NTRODUCTION
experimental evaluation of our combination method for two
Among many popular topics in object detection, detecting different face detectors, using two different datasets.
faces is one of the most studied subjects of research in This paper is organized as follows. In Section II, we
computer vision. Many daily applications have face analysis review previous works that motivated our study. Next, we
as an important step, for instance: video surveillance, medical describe in details our method in Section III. Sequentially,
assistance, and human-computer interaction [1]. experiments and results to validate the method are reported in
Optimal hit rates with low missing detections is a hard Section IV. Finally, conclusions and future work are presented
task in face detection [1] [2]. Image rotation, pose change, in Section V.
complicated backgrounds, and other factors contribute to raise
the number of false positives [3]. To this end, combined II. R ELATED WORK
approaches have been widely applied, and the incorporation There is an extensive literature on face detection exploring
of skin features in face detection have achieved optimal skin features that is not covered in this section. We will men-
performance rates [4][6]. tion here the relevant papers that most inuenced our work [9],
In general, the combination of skin features with face [10], [12][19] which are classied into two major groups:
detection consists of two major approaches [7]: 1) pre-ltering pre-ltering and post-ltering approaches. More details can
or pre-processing and 2) post-ltering or post-processing skin be found in surveys such as Zafeiriou et al. [1].
analysis. In the rst method, for an input image, the face
detector is applied in regions which the human skin is already A. Pre-ltering approaches
segmented. The second approach uses skin information to Kovac et al [13] proposed a human skin color for face detec-
classify face candidates after they skin have been detected. tion. They used some heuristic rules for the skin classication.
In the last method, the skin detector works as a false positive Subsequently, regions that are not likely to be face candidates
corrector. are removed based on the geometric properties of a human
Usually, combined approaches to face detection are com- face. This work was the rst to explicitly dene boundary
monly done in many different color spaces, such as RGB, rules for the skin segmentation.
301
Fig. 1. Flowchart diagram of the proposed algorithm, the blank square indicates that no skin pixels were identied in the ROI.
detector is responsible for one feature, and the AdaBoost it represents a 3D histogram where the pixel intensity varies
algorithm is used to combine all weak detectors into an from black (RGB = 0, 0, 0) to white (RGB = 255, 255, 255).
effective detector. According to Kawulok et al. [31], the reduction of bins per
2) PICO Face Detector: This face detector is a modica- channel improves the algorithm speed and promotes better
tion of Viola-Jones, being responsible for scanning an image histogram models. Therefore, in this work, we adopted 64 bins
with a cascade of binary detectors at all reasonable positions per channel.
and scales [27]. Its code implementation is provided by the As previously stated, the skin and non-skin color models
authors in C/C++ languages4 . Each binary detector consists of are built using an image dataset. In our case, we used Jones
an ensemble of decision trees with pixel intensity comparisons & Rehg dataset [24] and ECU dataset [20]. Those models are
as binary tests in their internal nodes. The learning process computed once, and then they are used to detect skin for a
consists of a greedy regression tree construction procedure new input. Afterwards, a comparison between the image and
and a boosting algorithm. the corresponding mask is performed as follows: for each skin
color that appears in the image, an increment of one is done for
B. Skin Detection the correspondent bin at the skin histogram. The same process
For each input image, the skin detection module applies is repeated to non-skin colors in the non-skin histogram.
a pixel-wise classication. In this module, we built a skin The skin detection problem is formalized as following:
detector based on statistical color models proposed by Jones et given a set of images I = {i1 , i2 , . . . , im } and a function
al. [24]. The algorithm performs skin classication comparing , such that
the skin and non-skin color models under a threshold
constraint. Basically, given a value for , the skin detector :IH (1)
checks for every pixel if the amount of skin over the amount
of non-skin is greater or equal to . If so, it classies as where ij = {vk |0 vk 255}, and H = {x|0 x 1}.
human skin, if not it classies as non-human skin. The function computes the relative frequency distribution
Statistical color models are RGB histograms. The skin and (RFD), RF D = {fl |fl H, 0 l 63}, for all pixel values,
non-skin color models consist in RGB histograms calculated vk ij . Such that,
from a dataset of images, which contain labeled masks for skin
f
l 0
and non-skin pixels. Each histogram has 256 bins per channel,
resulting in a 16.7 million degrees of freedom (2563 ), thus,
l fl = 1
In order to classify a pixel value, vk , as being skin (or not
4 https://github.com/nenadmarkus/pico skin), one may apply the function , for each vk ij .
302
Fig. 2. Example of proposed algorithm application.
P ((vk ) = 1) 0 , otherwise.
303
where K is the quantity of image pixels, 1{.} is the indicator Finally, each experiment was conducted by the following
function, and is a small real number (e.g., 0.0000001) to procedures: 1) select a face detector, 2) select a dataset for
guarantee that the ratio will be a real number even when there skin detection training and a threshold value for , and 3)
is no pixel classied as non-skin. perform the skin percentage evaluation for all values. Then,
we compare the results of our face detector algorithm with the
IV. E XPERIMENTS AND R ESULTS original Viola-Jones and PICO face detectors. All the analysis
were done using a workstation equipped with an Intel Core i7-
In this section, we describe in subsection IV-A the details
3632QM 2.2 GHz (8 GB RAM) processor, Windows 10 Home
of how our experiments were conducted. In subsection IV-B,
(64 bits), and the codes were implemented in C++ language
we detail the experimental results with a discussion of the
using OpenCV5 library.
proposed method performance, and its statistical analysis.
A. Experimental Setup
We used the FDDB dataset [25] to evaluate the performance
of our method. This dataset contains 2,845 images with a total
of 5,171 faces, both grayscale and color images. The ground
truth is composed of manually annotated faces localized using
rectangular and ellipse regions. Furthermore, since our skin
detection uses RGB color space to generate the color models,
we modied the FDDB dataset to t our approach. Therefore,
we semi-automatically removed all the grayscale images in
the dataset and their correspondent annotated faces, resulting
in 2,793 images with a total of 5,060 faces.
The experiments were conducted by ne-tuning the parame-
ters of the modules from our algorithm. The tuned parameters Fig. 3. Precision comparison among our method, Viola-Jones, and PICO Face
detectors.
were: face detector, skin train dataset and its optimal , and
the values. For the Face Detection module, we used the
two following detectors: Viola-Jones and PICO, which were B. Results
described in Section III. We computed the ROC curve, TPR, FP, Precision or Positive
For the Skin Detection module, in the training phase, the Predictive Value (PPV), F1-score, and AUC of our experi-
experiments were carried out for two datasets, namely: Jones ments. Initially, we measured those rates and plot the curves
& Rehg dataset and ECU dataset. Jones & Rehg dataset for our method (OM) using the Viola-Jones, also shown as
originally contains 13,640 images with a total of 8,965 non- Haar or H in the Figure 3, and PICO as face detectors showed
skin images and 4,675 skin images, and the ECU dataset as P in Figure 3. Then, we compared the results without OM.
contains 4,000 images. In our work, we used 6,839 images The AUCs were measured for the best FP behaviors from OM
from Jones & Rehg dataset with a total of 4,511 non-skin in all cases were for equals to 40%. Thus, all curves were
images and 2,328 skin images and 2,000 images from the ECU limited to that number of FPs for this metric.
dataset to generate the models. We also had to experimentally We present the statistical results of our experiments in
compute the optimal thresholds for both datasets. The Table I. The rst row shows the reference results for the Viola-
threshold is based on the trade-off of the correct and false Jones and PICO face as face detectors without using OM.
detections from the ROC curve, which is done using different The other rows present the results for a specic skin dataset
images for the training and the testing phases. In other words, training and a specic value. For example, ECU-OM-5 is
to excerpt the optimal thresholds it is necessary to train an experiment conducted with ECU as a skin dataset training
and test the skin detector. Later, we ended up with = using our method with equals to 5%.
0.9 using Jones & Rehg dataset and = 1.3 using ECU The application of OM improved the PPV and reduced the
dataset. Afterwards, the changes in this module were to select number of FPs in all cases. The best values are in bold on the
a training dataset and a threshold value for the . Table I, and the best case scenario improved the Precision rate
For the Skin Percentage module, we used the following in 5% reducing in 201 the number of FPs, which represents
values for : 5%, 10%, 15%, 20% and 40%. Empirically, 53% of FPs that the original detector hit. The drawback of
we veried that the values above 40% drastically reduced our approach is that the improvement of the PPV reduced the
the number of positive detections. That occurred because, in TPR in the experiments. This happened because in the FDDB
general, the skin pixels from the detected face occupies less dataset there are very small an occluded annotated faces with
than 50% in the rectangular detection from the the Viola-Jones only a small part of the face visible in the image [18]. Due that,
and PICO face detectors. Thus, values above 40% were not
5 http://opencv.org/
considered.
304
TABLE I
E XPERIMENTAL S TATISTICAL R ESULTS .
when detected, these face candidates were too small compared PPV perspective, our method allows using a weak face detector
with the rectangular ROI drew around the face, resulting in with the same hit delity to a more robust face detector as the
skin percentage levels for much lower than the used in this PICO.
work. Therefore, they were rejected by the skin percentage
V. C ONCLUSION
evaluation.
Our method had the best AUC and F1-score along the In this paper, we introduced an improvement for face
experiments with the Viola-Jones, being very close to the detection exploring skin features post-processing. We remark
original detector when the PICO was used as face detector. that a simple constraint was used to analyze the inuence of
Thus, that conrms we had improved the performance of weak a threshold obtained by the ratio among quantities of skin and
detectors. non-skin pixels. Our experimental results showed a large drop
in the number of false positives, reducing in 53%, while the
The ROC curves are shown in Figures 4 and 5. These
precision rate was elevated in almost 5% when the Viola-Jones
curves show the performance of the detectors for different
was used as face detector. Furthermore, we noted that weak
face condence thresholds and the limited curves for the AUC
face detector algorithms, i.e. Viola-Jones, when submitted to
calculation, respectively. It is possible to see that when the
our method, had their precision rate greater than more robust
number of FP increases, the performance of OM is worse than
algorithms.
the regular face detectors. However, in the best case scenarios,
Additionally, we brought a signicant statistical analysis
when the Precision of OM is the greatest one, the behavior of
under different conditions for face detector and skin detectors,
the ROC curves looks like the same or have better performance
using complex datasets, such as the FDDB. For future works,
than the regular face detectors.
we suggest employing an illumination correction algorithm
In Figure 3 we present a bar plot comparing the Precision and the investigation of skin detection approach based on
of the face detectors with and without OM. Each block in robust features, such as Local Binary Patterns (LBP) [33].
the graph represents the comparison of ve rows for one face
detector from Table I, as follows: H-Jones-0.9 indicates that ACKNOWLEDGMENT
was used Viola-Jones as face detector with Jones & Rehg skin The authors would like to thanks the Laboratory of Com-
dataset training, with equals to 0.9. Also, each bar represents puter Perception (LPC) for supporting the implementation
a precision value for one using our algorithm. of this work in their facilities. Also, we thank the Federal
From Figure 3 we conclude that the PPV from the exper- University of Campina Grande (UFCG).
iments using OM with Viola-Jones, when compared with the R EFERENCES
reference values, were more effective than the experiments
[1] S. Zafeiriou, C. Zhang, and Z. Zhang, A survey on face detection
using OM with PICO. That happened because the Viola-Jones in the wild: Past, present and future, Computer Vision and Image
algorithm is less robust than the PICO, and therefore it is more Understanding, vol. 138, pp. 1 24, 2015.
sensitive to adjustments. Hence, the weak detectors had better [2] J. Galbally, S. Marcel, and J. Fierrez, Biometric antispoong methods:
A survey in face recognition, IEEE Access, vol. 2, pp. 15301552,
performance using OM again. 2014.
Another point to consider from the bar plot graph in Figure 3 [3] Q. Zhang, L. F. Zhou, W. S. Li, K. Ricanek, and X. Y. Li, Face detection
method based on histogram of sparse code in tree deformable model,
is the PPV value using Viola-Jones and equals to 40% is in 2016 International Conference on Machine Learning and Cybernetics
greater than the reference PICO Precision. Thereby, using the (ICMLC), vol. 2, July 2016, pp. 9961002.
305
HJones09 HJones13
0.65 0.65
True Positive Rate TPR
0.55 0.55
Haar Haar
0.5 OM5 0.5 OM5
OM10 OM10
OM15 OM15
0.45 0.45
OM20 OM20
OM40 OM40
0.4 0.4
0 20 40 60 80 100 0 20 40 60 80 100
False Postives FP False Postives FP
(a) H-Jones-09 (b) H-ECU-13
PJones09 PJones13
0.65 0.65
True Positive Rate TPR
0.55 0.55
PICO PICO
0.5 OM5 0.5 OM5
OM10 OM10
OM15 OM15
0.45 0.45
OM20 OM20
OM40 OM40
0.4 0.4
0 20 40 60 80 100 0 20 40 60 80 100
False Postives FP False Postives FP
(c) P-Jones-09 (d) P-ECU-13
Fig. 4. ROC curves comparing our method with the references. As it is possible to see, our method for different values of (SPT in the gure) shows
similar performance to the reference, except for = 40%
306
HJones09 HECU13
0.4 0.4
0.3 0.3
Haar Haar
OM5 OM5
0.2 0.2
OM10 OM10
OM15 OM15
0.1 OM20 0.1 OM20
OM40 OM40
0 0
0 50 100 150 0 50 100 150
False Postives FP False Postives FP
(a) H-Jones-09 (b) H-ECU-13
PJones09 PECU13
0.6 0.6
True Positive Rate TPR
0.4 0.4
Theory, Applications and Systems (BTAS), 2012 IEEE Fifth International Localizing parts of faces using a consensus of exemplars, IEEE
Conference on, Sept 2012, pp. 365370. Transactions on Pattern Analysis and Machine Intelligence, vol. 35,
[19] Y. Ban, S.-K. Kim, S. Kim, K.-A. Toh, and S. Lee, Face detection no. 12, pp. 29302940, Dec 2013.
based on skin color likelihood, Pattern Recognition, vol. 47, no. 4, pp. [27] N. Markus, M. Frljak, I. S. Pandzic, J. Ahlberg, and R. Forchheimer,
1573 1585, 2014. A method for object detection based on pixel intensity comparisons,
[20] S. L. Phung, D. Chai, and A. Bouzerdoum, Adaptive skin segmentation Computing Research Repository, vol. abs/1305.4537, march 2013.
in color images, in Multimedia and Expo, 2003. ICME 03. Proceed- [28] S. Kang, B. Choi, and D. Jo, Faces detection method based on skin
ings. 2003 International Conference on, vol. 3, July 2003, pp. III1736 color modeling, Journal of Systems Architecture, vol. 64, pp. 100
vol.3. 109, 2016, real-Time Signal Processing in Embedded Systems.
[21] W. R. Tan, C. S. Chan, P. Yogarajah, and J. Condell, A fusion approach [29] M. H. Yang, D. J. Kriegman, and N. Ahuja, Detecting faces in
for efcient human skin detection, IEEE Transactions on Industrial images: A survey, IEEE Transactions on Pattern Analysis and Machine
Informatics, vol. 8, no. 1, pp. 138147, Feb 2012. Intelligence, vol. 24, pp. 3458, January 2002.
[22] X. Wang, H. Xu, H. Wang, and H. Li, Robust real-time face detection [30] C. Papageorgiou, M. Oren, and T. Poggio, A general framework for
with skin color detection and the modied census transform, in Infor- object detection, International Conference on Computer Vision, pp.
mation and Automation, 2008. ICIA 2008. International Conference on, 555562, January 1998.
June 2008, pp. 590595. [31] M. Kawulok, J. Kawulok, and J. Nalepa, Spatial-based skin detection
[23] H. T. Ho and R. Chellappa, Pose-invariant face recognition using using discriminative skin-presence features, Pattern Recognition Let-
markov random elds, IEEE Transactions on Image Processing, vol. 22, ters, vol. 41, pp. 313, May 2014.
no. 4, pp. 15731584, April 2013. [32] T. Fawcett, An introduction to roc analysis, In: Pattern Recognition
[24] M. J. Jones and J. M. Rehg, Statistical color models with application Letters, vol. 27, pp. 861874, December 2005.
to skin detection, International Journal of Computer Vision, vol. 46, [33] T. Ojala, M. Pietikainen, and T. Maenpaa, Multiresolution gray-scale
pp. 8196, January 2002. and rotation invariant texture classication with local binary patterns,
[25] V. Jain and E. Learned-Miller, Fddb: A benchmark for face detection IEEE Transactions on Pattern Analysis and Machine Intelligence,
in unconstrained settings, University of Massachusetts, Amherst, Tech. vol. 24, no. 7, pp. 971987, 2002.
Rep. UM-CS-2010-009, 2010.
[26] P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar,
307