Perceptual Color Representation of the Face: Extracting the Color of Skin, Hair, and Eyes

Journal of Pattern Recognition Research 2 (2011) 166-174 Received November 3, 2010. Accepted May 24, 2011. Perceptual Color Representation of the Fac...
Author: Wendy Craig
1 downloads 1 Views 407KB Size
Journal of Pattern Recognition Research 2 (2011) 166-174 Received November 3, 2010. Accepted May 24, 2011.

Perceptual Color Representation of the Face: Extracting the Color of Skin, Hair, and Eyes Levente Saj´ o

[email protected]

Korn´ el Bert´ ok

[email protected]

Attila Fazekas

[email protected]

University of Debrecen, Debrecen, Hungary

Abstract A color representation of the face is presented to detect the color of the skin, eyes and hair in high resolution color images. A coarse segmentation of facial features is followed by color reduction in which the high number of colors within the original image is replaced by a smaller color set. The color categories were defined based on human cognition using psychophysical experiments. Experimental results of dominant color assignment were in excellent agreement with test subjects who were to give their subjective categorization of facial feature colors.

www.jprr.org

Keywords: perceptual color model, face color representation, skin, eyes, hair, color detection

1. Introduction In human-human communication the face conveys a lot of information. People are identified by their face and it also has a strong effect on first impressions. We can recognize gender, estimate age, or deduce some cultural characteristics. Analyzing faces in human-computer communication is also becoming increasingly important. Efficient face representation is key to any further analysis. From face detection, through face and facial feature tracking, to face classification problems (face recognition, gender, age, race, facial expression detection), there have been various face representations used, all of them having their advantages in their specific domain [10, 11, 1, 12]. In this paper we present a novel face representation for determining the color of various facial features, like skin, hair and eyes. In order to cope with the numerous complicating external factors like varying lighting conditions and camera settings, the full color range of the segmented face image will be reduced to color categories based on human cognition principles [3]. Such a representation of colors in face-images makes it easier to extract the color of a given facial feature. The extracted information can also be used to support further face analysis. The colors of the facial features are determined in two steps. First, the skin, eyes and hair are segmented in the image using only structural information (Section 2). Then, within the segmented regions the huge number of colors in real color images is substituted by a smaller color set, which is used to determine the color of a given feature. The way the color model was defined to resemble human perception and the color extraction method based on the model are presented in Section 3. The solution was tested with automatic categorization of facial images and a color-based image retrieval application. The results are presented in Section 4.

c 2011 JPRR. All rights reserved. Permissions to make digital or hard copies of all or part

of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or to republish, requires a fee and/or special permission from JPRR.

Perceptual Color Representation of the Face

2. Facial Feature Segmentation Color assignment to the skin, eyes and hair starts with localizing the face in the image. Detecting faces in images has a wide literature, the most relevant being [10, 11]. Various approaches have been published, but probably the most successful is the one introduced by Viola and Jones in 2001 [13] having the same accuracy as other techniques, but exceeds them in speed (approx. 15 times faster). A big advantage of the Viola and Jones algorithm is that the resulting face regions are eye-centered, the eyes are in cover. This way we get the approximate location of the wanted features. These locations are used as starting points for the segmentation processes. The segmentation processes take only structural information. The results of this coarse segmentation is fined by passing the output to the color extractor. The exact steps of segmentation for each separate feature will be presented in the following. 2.1 Skin segmentation The aim here is to segment the skin from other parts of the head and to determine the dominant color of the segmented pixels. We started with the assumption that the skin is the largest connected region in the face with a similar color (the distance between the colors should be below an empirically chosen threshold). For determining the skin region we applied a region growing algorithm. The algorithm collects the pixels belonging to the skin region and discards those parts where the skin is occluded (e.g. beard) or its color is changed by various lighting effects (shadows, gleams). • Because of the various lighting effects and local ’valleys’ on the skin, region growing uses different starting points (see Figure 1(a)). • In every iteration the color values of the outer border points are compared to the mean color of the region. Those points whose distance from the mean is below 10 percent are added to the region. • The resulting regions are merged which is then considered to be the skin region (see Figure 1(b)).

(a)

(b)

Fig. 1: Localizing the skin region: (a) the starting points, (b) the resulted mask of the region growing.

2.2 Eye segmentation For determining the color of the eyes the exact location of the iris and the pupils should be found. 167

´ , Korne ´l Berto ´ k, and Attila Fazekas Levente Sajo

In the first step the approximate location of the eyes is determined. In the upper part of the head region, Viola&Jones detectors that are specially trained for eyes are used to localize the eye region (Figure 2(a)). Within this eye region, the following processing steps are performed: • For eliminating noise (eyelashes, makeup) median filtering (with 11x11 mask) is applied. • Using the Z channel from the CIE XYZ color space representation [6] of the image gives a more contrasted image than simply converting the image to grayscale (Figure 2(b)). • Then, contrast enhancement is applied. (See Figure 2(c).) • In the grayscale image the iris and the pupil are usually darker than other parts of the eye. First, histogram equalization is done (Figure 2(d)). • Then the result is thresholded to segment the iris and the pupil from other parts of the eye. (The threshold value we used was 20.) Smaller noises are eliminated with erosion (using a 11x11 sized rectangular mask with the anchor in the center). See Figure 2(e). • For separating the iris from the pupil a circular mask is used (Figure 2(f)). Assuming that the dimensions of the rectangle retrieved by the Viola and Jones eye detector is W xH, the radius of the mask was calculated using: Rmask = H ∗ 0.8. The mask placed in the center of the patch resulted from the previous step (Figure 2(g)).

(a)

(b)

(e)

(c)

(f)

(d)

(g)

Fig. 2: Localizing the iris: (a) haar-detected image, (b) XYZ transform, (c) contrast enhanced image, (d) histogram equalized image, (e) thresholded, (f) mask, (g) masked image.

2.3 Hair segmentation Hair segmentation is the most challenging task among the ones mentioned above. Hair is a non-rigid object and as there are many different hair styles it makes localization more difficult. • First median filtering (with 7x7 sized mask) is applied. 168

Perceptual Color Representation of the Face

• Then, a half-moon shaped mask is placed on top of the skin region (Figure 3(a)).To determine its dimensions the scale of the head is taken into account: assuming that W and H are the width and height of the rectangle retrieved by the Viola&Jones face detector, we use the following formula: Wmask = W ∗ 0.8, Hmask = H ∗ 0.4. • The region growing algorithm presented for skin segmentation is used to determine the hair region. In the first phase, region growing is limited to the mask. For 5 randomly selected starting points, the largest region is selected. In the second phase, starting from the selected region, region growing is repeated, now for the whole image (Figure 3(b)).

(a)

(b)

Fig. 3: Localizing the hair region: (a) the mask, (b) result of the region growing.

3. Color Extraction It is obvious that in the case of a search system used by humans, in the queries it is enough to use only a subset of the 16 million colors. A set of color categories has to be defined and every color in the color space should be mapped to one of these categories. The way this mapping should be done is given by the color model. For our color model we used Broek’s protocol [3]. The main idea is that color quantization should be done from the perspective of human cognition. It is a general protocol which describes how to select the color categories and how to determine colors in every category. Broek defined a general color categorization. In our case, three different categorizations were needed specifically for every feature. 3.1 Defining the color categories Following Broek’s protocol, the first step is to define the color categories with psychophysical experiment. The experiment had two phases. In the first phase the color categories had to be fixed. Based on the ideas from [8, 2] 5 categories were selected for every feature. Due to the specific character of the categorization, the colors chosen do not cover the whole color space, so a sixth category was defined for invalid colors, (see Table 3.1 for details). In the second phase 40 subjects were asked to fill in a web-questionnaire. The questionnaire had 40 pages, every page contained a photo of a frontal face (with a 800x600 pixel resolution) and the subjects had to classify the color of the skin, eyes and hair into one 169

´ , Korne ´l Berto ´ k, and Attila Fazekas Levente Sajo

(a)

(b)

(c)

(d)

Fig. 4: The segmentation steps of the HI plane in case of hair colors: (a) color marker points, (b) convex hull, (c) distance transform and (d) the segmented plane.

Skin very light light intermediate dark or ”brown” very dark or ”black” invalid

Eyes Hair blue blonde grey auburn green brown brown black or dark brown dark brown gray or white invalid invalid

Table 1: Color categories

of the categories The photos were collected form various face databases [4, 5] and some of them were collected from the internet using Google image search. The answers to the questionnaire were evaluated which resulted a few color examples of every category. As specific views of people in some cases led to contradictory votes, the answers had to be divided into two groups: The questions where at least 50% of the subjects voted the same way were considered significant examples and were used in further processing steps and the rest with ambiguous votes were dropped from the database. The data obtained were used to define a color-marker database in which color values are mapped to categories. 3.2 Segmenting the color space Since the color-marker table contains only a few value-category pairs, for the complete model the whole color space should be segmented using the color-marker table as base points. For this segmentation the idea from [3] is used. Instead of doing the segmentation 170

Perceptual Color Representation of the Face

in the RGB color space, it is done in the two 2D projections of the HSI color space, because it is perceptually more intuitive and gives us a simpler segmentation. The RGB values of the items in the color-marker table are transformed into the HSI space. The structure of the HSI color space can be seen as a cylinder where intensity is the central axis, hue is the angle and saturation is the distance from the axis. In this space, colors can be separated in two main groups: colors around the intensity axes with low saturation values are the achromatic colors and the rest are the chromatic colors. Using this kind of partition, color segmentation can be done in two steps. First, projecting the 3D color space to the SI plane (ignoring the hue axis) achromatic colors can be separated from chromatic colors and the segmentation of achromatic colors can be done in this plane, too. Then, the segmentation of chromatic colors is done in the HI plane (ignoring the saturation axis). The segmentation of the two 2D planes is done in a similar way: the only difference is that in the first case the S and I coordinates, in the second case the H and I coordinates of the color values are taken from the color-marker table. Plane segmentation is done in three steps: • First, taking elements from the color-marker table, grouped clouds of data points are formed in these planes. • Second, the convex hull of these groups is calculated, and the inner points of the hull is added to the corresponding group. As representation a binary image is used, in which 1 represents an element if it is connected to any of the groups and all the other values are 0. • Finally, a distance mapping is applied to classify all the points within the plane into one of the groups. For distance mapping, chamfer distance transformation was used [7]. The binary image is filtered in two rounds with a mask (see on Figure 5) which gives a grayscale image where the intensity valleys represent the borders of different groups.

Fig. 5: The mask used in chamfer distance transformation

The various steps of the plane segmentation process are demonstrated on Figure 4. 3.3 Color Extraction The color of a feature is defined to be the dominant color (the color with the highest frequency) within the segmented region. The dominant color is given by the maxima of the color histogram. Since the original image contains three channel colors it would give a 3D color histogram (having 16M elements) where finding a global maximum is hard work. Instead of searching for the dominant color in the whole color space and then finding the corresponding color in the reduced color category set, it is better to go the other way round. The colors on the original image are replaced by their corresponding category index taken 171

´ , Korne ´l Berto ´ k, and Attila Fazekas Levente Sajo

from the color lookup table. This gives a smaller color histogram with only five elements. The color category with the highest frequency is considered to be the color of a feature (see Figure 6).

Fig. 6: Reducing the number of colors and extracting the dominant color in the case of skin

4. Results very light light intermediate dark or ”brown” very dark or ”black” blue grey green brown dark brown blonde auburn brown black or dark brown gray or white

.04 .47 .45 .01 0 0 0 .21 .21 .44 0 0 0 .39 0

.08 .56 .34 0 0 0 0 .06 .77 .09 0 .03 .48 .46 0

.43 .28 0 0 0 .30 .10 0 .08 0 .10 .76 .10 0 0

.34 .52 .02 .02 .02 .02 .01 .10 .59 .09 0 .19 .53 .22 0

.23 .26 .08 .02 0 .01 .04 .04 .53 .19 0 .05 .31 .17 0

.11 .34 .27 .10 0 .01 .01 .06 .38 .18 0 .02 .17 .58 0

.35 .19 0 0 0 .53 .06 0 0 0 .51 .02 0 0 .43

0 .33 .50 .15 0 .01 0 .24 .34 .10 .23 0 0 0 .65

images .20 0 .61 .05 .12 .25 0 .53 0 0 .01 0 .01 0 .51 .09 .39 .40 .02 .33 .08 0 .58 0 .16 .02 0 .58 0 0

.04 .54 .20 0 0 0 0 .22 .30 .08 .05 0 0 0 .70

.23 .22 0 0 0 0 0 .66 .11 0 .76 0 0 0 .23

0 0 .01 .27 .25 0 0 .02 .12 .18 0 0 .16 .33 0

0 0 0 .24 .67 0 .01 .23 .18 .40 .40 .03 0 0 0

0 0 .02 .29 .41 0 0 .07 .22 .37 0 .01 .04 .87 0

.14 .36 .24 .02 0 0 .01 .58 .06 0 .07 0 0 0 .31

.03 .37 .32 .23 0 .02 .20 .18 .46 .12 0 0 0 0 100

0 0 0 .51 .46 0 0 .06 .28 .58 0 0 .01 .79 0

Table 2: Comparing the votes of the subjects and the output of the color extractor: green fields represent the category selected by experiment, orange fields mark the category with the highest value returned by the color extractor, blue represents if the two categories are matching.

The representation of colors presented above gives a reliable color extraction. Based on these extracted colors, faces can be categorized. To test the validity of automatic color extraction, the experiment used for color category definition was repeated. This time 20 images were used. The subjects had to select the best defined color category for every facial feature. The same images were passed on to the color extractor application, which calculated a value between 0 and 1 for every class. The votes of the subjects were evaluated and compared to the output of the color extractor. The relationship between the two sets of results is demonstrated in Table 4. It can be seen that 75% of the color categrories predicted by the color extractor match the categories selected by the subjects. If matches with neighboring categories are also accepted the rate is 95%. The correlation between the two sets is 0.73 on average – 0.79 for skin, 0.61 for eyes and 0.78 for hair. A color extractor based on the face-color representation presented in this paper can be used for automatic indexing in a content based image retrieval system. For demonstrational purposes, the color extraction method was attached to a web-based image retrieval system. The application has two pages. On the back page users can upload images to the database which are then processed by the color extractor and the colors of the facial features are automatically added to a metadescription file. On the front page the images from the database are shown according to queries. It takes the color categories of the facial features 172

Perceptual Color Representation of the Face

in the search query and, based on the meta-descriptor, lists those faces which have the proper skin, eye and hair color. The system can also be found on the web1 .

5. Conclusion In this paper we have presented a face representation method to detect the color of skin, eyes and hair. First, using structural information about the face, certain features are segmented from the others. Then, within the segmented region the dominant color is determined. For better applicability, a reduced number of colors were used. For describing the mapping between the whole color space and the feature specific color categories, a color model was defined from the perspective of human cognition. Such a representation of colors in face-images makes it easier to extract the color of a given facial feature. Using the color extraction methods described above, for demonstrational purposes, an image retrieval system was constructed, which searches a face database using the color of the skin, eye and hair as queries. Beyond the image search system, we intend to use the presented low level feature extraction methods for race classification. We saw that using color information only is not robust enough to recognize race, but combining it with other feature extracting methods (shape of the head, distance of the eyes) and holistic learning-based methods could give good results.

Acknowledgments We would like to express our gratitude to Mr. Tapio Seppanen whose support and useful suggestions were valuable help in completing this work. This research was done as part ´ of the project titled ”Human-Computer Communication Technologies” with id: TAMOP 4.2.2-08/1/2008-0009.

References [1] A.F. Abate, M. Nappi, D. Riccio, and G. Sabatino, 2D and 3D face recognition: A survey. Pattern Recognition Letters, vol. 28 (14), pp. 1885–1906, 2007. [2] L. Bickford, The EyeCare Reports. URL: http://www.eyecarecontacts.com/eyecolor.html (accessed December 2009). [3] E.L. van den Broek, Th.E. Schouten, P.M.F. Kisters, Modeling human color categorization. Pattern Recognition Letters, vol. 29 (8), pp. 1136–1144, 2008. [4] Caltech Face Datebase, URL: http://www.vision.caltech.edu/html-files/archive.html (accessed December 2009). [5] CBCL Face Database, MIT Center For Biological and Computation Learning. URL: http://www.ai.mit.edu/projects/cbcl (accessed December 2009). [6] CIE, 1931. Commission internationale de l’Eclairage proceedings. Cambridge University Press, Cambridge, 1931. [7] R. Fabbri, L. Da F. Costa, J. C. Torelli, O. M. Bruno, 2D Euclidean Distance Transform Algorithms: A Comparative Survey. ACM Computing Surveys, vol. 40, no. 1, article 2, pp. 1–14, 2008. [8] P. Frost, European hair and eye color - A case of frequency-dependent sexual selection? Evolution and Human Behavior, vol. 27, pp. 85-103, 2006. [9] M. Grundland, A.N. Dodgson, Decolorize: Fast, contrast enhancing, color to grayscale conversion. Pattern Recognition, vol. 40 (11), pp. 2891–2896, 2007. [10] E. Hjelmas, B.K. Low, Face detection: A survey. CVIU, vol. 83, pp. 236–274, 2001. [11] A. King, A Survey of Methods for Face Detection. 2003. [12] C. Shan, S. Gong, P.W. McOwan, Facial expression recognition based on Local Binary Patterns: A comprehensive study. Image and Vision Computing, vol. 27 (6), pp. 803–816, 2009. 1 http : //www.inf.unideb.hu/ipgd/F aceColorSegmentation

173

´ , Korne ´l Berto ´ k, and Attila Fazekas Levente Sajo

[13] P. Viola, M. Jones, Robust real-time object detection. Technical Report CRL 20001/01, Cambridge Research Laboratory, 2001.

174