Automated Hair Color Determination

Automated Hair Color Determination 1 Daniel S. Rosen1, and Cambron Carter2 Director, Imaging Technology, GumGum Inc., Santa Monica, CA, US 2 Image Sc...
Author: Dora Houston
0 downloads 0 Views 2MB Size
Automated Hair Color Determination 1

Daniel S. Rosen1, and Cambron Carter2 Director, Imaging Technology, GumGum Inc., Santa Monica, CA, US 2 Image Scientist, GumGum Inc., Santa Monica, CA, US

Abstract— The detection of human features utilizing computer vision techniques can provide significant information for exploitation of image content. Identification of human hair and its color is known to be of use for a variety of endeavors including targeting advertisements of hair care products. The daily volume of imagery which must be processed for advertising as well as the uncontrolled environment in which they are typically captured, negates the use of semi-automated techniques. A method of automated hair color determination which achieves high accuracy is presented.

relies on careful training of a location prior model. Finally, a technique has been developed using active shapes based on training a hair shape model [4]. While yielding good results, this technique is susceptible to error in cases of large shape variation, which may be caused by lighting effects or image geometry.

2

Approach The GHCD algorithm logic is shown in Figure 1.

Keywords- segmentation; parameter estimation; expectation maximization; hair detection; heuristics

1

Introduction

There are two main pigments found in human hair (both being melanins): eumelanin, which gives color to brown or black hair and pheomelanin, which produces the color in blonde or red hair. Thus, natural hair color will only occur as shades/combinations of blonde, red, brown and black, with shades of gray (all the way to white) occurring as melanin production slows and/or stops. While hair color may also be influenced by the optical effects of light reflecting off the surfaces of the different hair layers, natural hair will never contain blue or green. However, since white hair is devoid of pigment, white or gray hair may appear to have a bluish color due to the refraction of light. The GumGum hair color determination algorithm (GHCD) was created for the purpose of intelligently targeting hair care products in advertisements. As such, “ground truth” is defined as what the average observer would consider the hair color to be, not necessarily what the person in the image states about their hair color. In addition, “unnatural” hair colors, such as blue, purple and green are classified as unknown. Hair detection has been performed based on analysis of a hair mask obtained by segmentation [1], and by segmentation using frequential and color analysis followed by matting [2]. While producing excellent results, the former technique lacks robustness–suffering accuracy when the hair and background are non-uniform. The primary disadvantage of the latter technique is that it requires a manual seeding step. A technique has been developed utilizing Markov random fields [3], but this also requires seeding and further

Hair Color Identification Algorithm 1.

Identify candidate facial region

2.

Expand facial region and change area aspect ratio to be 1.0:1.2

3.

Run Expectation Maximization (EM) algorithm to segment facial region

4.

Filter segmented regions which have a low likelihood of containing hair based on relative location to detected face, area of segmented region etc.

5.

Detect interest points and match against a series of hair template interest points

6.

Create hair mask using skeletonization of segmented regions and interest points

7.

Convert masked area to Hunter LAB color space

8.

Exclude any pixels with values outside “natural” LAB triplets

9.

Generate mean of remaining pixels in each channel: Pµ=(µL, µA, µB)

10. Assign hair color based on lookup of Pµ in hair color measured photometric value table. Figure 1. GHCD pipeline

First, a facial classifier such as that defined by Viola and Jones [5] is utilized to detect all faces within the image. In particular, a frontal face classifier [6] and a profile face classifier [7] are used to detect face candidates. It is understood that other techniques for locating a face, such as eye detection, etc. may be substituted. Once the face has been located, an expanded area R, (Figure 2) is created, centered at the center of the face detection rectangle (white square in Figure 2) (Xc,Yc). This increases the probability of including significant amounts of hair, given that human hair is attached to the human head. With the Viola/Jones detection rectangle as a baseline, we create the expanded search area R with height H and width W given by H = Hv * 2.0

(1)

where Hv=height of Viola/Jones rectangle W= Wv * 1.67

(2)

representing mean, 𝜇, covariance, 𝛴, and mixing coefficient, 𝛼. Given the GMM form, each Expectation step determines the posterior probability based on the current 𝜇, 𝛴, and 𝛼 and serves to satisfy the probability that Z=  𝑧 best represents the hidden (or missing data) given what we have observed. Each Maximization step updates 𝜇, 𝛴, and 𝛼 using update equations. As to be expected, the algorithm requires an initialization of 𝜃 (𝜇, 𝛴, and 𝛼) and will increase the loglikelihood of the data that has been observed until a maximum is reached. The EM algorithm logic is shown in Figure 3. EM Algorithm for Gaussian Mixture Model 1.

Initialize parameter vector θ0 = {α1, …, αk, μ1, … , μk, Σ1, … , Σk}.

2.

Begin EM: t = 0

3.

While the log-likelihood of observed data is increasing:

where Wv=width of Viola/Jones rectangle a.

E-step:

b.

M-step: re-estimate parameters using the update equations:

c.

Determine θt+1 based on the reestimation for each Gaussian in the mixture model.

Figure 2. Input image with the detected face outlined. The expanded search area R is then sent to an Expectation Maximization (EM) algorithm for segmentation. The EM algorithm is an iterative procedure for finding maximum likelihood estimates of parameters describing statistical processes in cases where the process depends on hidden, random variables, i.e. missing/sparse data. Assuming a data point, 𝑍, we wish to ascertain the likelihood 𝑍 ∈ 𝑧, where 𝑧 is representative of a class present within the given data. The EM algorithm iteratively alternates between an expectation step and a maximization step. The expectation step finds the expectation of the log-likelihood current parameter estimates while the maximization step maximizes the expected log-likelihood produced in the expectation step. This process leapfrogs back and forth until converging to stable parameter estimates, which describe the statistical process. This procedure is straightforward provided 𝑝(𝑋, 𝑍 𝜃) takes on a closed form. For our purposes, the form of 𝑝 is assumed to be a Gaussian Mixture Model (GMM), with 𝜃

4. θt  =θt+1  ;  t  =  t  +  1 ;

jump to a.

Figure 3. Summary of the EM Algorithm The EM algorithm is slightly sensitive to initialization as it may converge to a local maximum rather than the true, global maximum of the log-likelihood of the observed data. Figure 4 illustrates iteratively fitting GMMs with 1, 2, 3, 4 and 5 Gaussians profiles respectively to a raw, 1-channel histogram (representative of our observed data). By iteratively employing the EM algorithm, an image may be adequately partitioned into regions of interest, which

can be treated as a reliable, initial blind segmentation. EM generates a set of candidate memberships based on the number of Gaussians fitted to the parameter space. Experimentation has led to the choice of a GMM utilizing up to 7 Gaussians. Given subsequent segmentations from this procedure, we now wish to logically obtain those segmentations which best represent regions containing hair.

conditions to which images containing human hair appear. A subset of the template database is shown in Figure 6:

Figure 6. Subset of hair templates database used to generate feature descriptors offline for the purpose of feature matching with an input image.

Figure 4. Iteratively fitting 1,2,3,4, and 5 Gaussians to some observed data using EM. Accepting a segmented region as a possible hair candidate begins with narrow-banding around an approximated ellipse, which is assumed to outline the facial region. This follows as a direct result of face detection. Using this narrow-band in combination with a connectedcomponent labeling of the segmented region, candidates are taken as those connected components, which intersect the narrow-band. These candidates must then be further processed to determine the likelihood that they, in fact, represent hair. To address this, the pipeline in Figure 5 is followed for each candidate hair region:

Figure 5. Pipeline to filter segmented regions deemed to be “nonhair” regions. Each input image undergoes feature extraction, where those features may come from any state-of-the-art method including HoG, SURF, SIFT, Daisy, etc. These features are then matched with a set of descriptors generated offline using a database of known hair templates. These templates were chosen to exploit the extreme variation in the appearance of human hair, as well as the extreme variations in imaging

In parallel, each candidate region obtained from the EM algorithm is skeletonized using basic morpholoical operations. Merging information from the feature matching and skeletonization results in a score. This score represents the number of feature points at or within a rigid threshold distance to the nearest index along the candidate skeleton. This process is illustrated visually below:

Figure 7. Top Left : Segmented regions obtained from EM. Bottom Left : Interest point detection. Right : Localization of interest points based on average distance to skeletonized region. The process of localizing interest points acts as a scoring system for each segmented region. Those skeletons with a large average distance to the interest points will be considered non-hair regions. This technique, combined with heuristic selections based on human physiology, yields a final hair candidate mask as in Figure 8.

Figure 8. Left : Original hair mask (brighter equals higher hair likelihood) Right : Masking the raw input image.

Figure 9. Hunter Lab color space.

In order to negate the effects of illumination, the CIELab color space was originally chosen for the final hair color detection and identification. However, it was found that the cosmetic industry, as well as the paint and pigment industries, performs all their research and QA colorimetry in the Hunter Lab (HLab) space. So, it was decided to convert RGB to HLab instead of CIELab for all further processing.

It can be seen from Figure 9, that the range of values for natural hair color is constrained to positive and slightly negative values of a and b, e.g. -10.0 ≤ a and -10.0 ≤ b.

The Hunter Lab space, shown in Figure 9, was developed in 1948 by R. S. Hunter as a uniform color space which could be read directly from a photoelectric colorimeter (tristimulus method). Values in this space are defined by the following formulas:

Given P={L,a,b}, P is valid⇔ -10.0 ≤ a ∧ -10.0 ≤ b

!

𝐿 = 100 ∙  

!.!"!#!!

𝑎 = 175   ∙

(! !! ) !.!!"#$!!

𝑏 = 70   ∙ where: diffuser

(3)

!!

(! !! )





! !! ! !!





! !! !

!!

(4)

(5)

These constraints on a and b are used to detect and tag pixels P in the masked image which are deemed invalid by the simple test: (T1)

The results of applying (T1) to the masked image are shown in Figure 10, with the convex hull of candidate pixels also being shown. Once the candidate region has been formed, the mean values of L, a, b (𝐿, 𝑎, 𝑏) are determined by 𝐿=

!

! !!,!

!

𝑎=

!

𝑏=

!

! !!,!

! ! !!,!

!

(9) (10) (11)

X, Y, Z = Tristimulus values of the specimen. X0,Y0,Z0 = Tristimulus values of the perfect reflecting

For the 2˚ Standard Observer and Standard Illuminant C, the above equations would become: 𝐿 = 100 ∙   𝑌 𝑎=

!".!  ∙ !.!"!!!

𝑏=

!"∙ !!!.!"#!

!

!

(6) (7) (8)

Figure 10. Application of hair mask to input image. Black pixels represent non-hair and white contour represents the convex hull of hair region.

The hair care industry has devoted many decades of research towards developing their products. In the course of these efforts, many controlled measurements of natural and dyed hair color have been made. As previously noted, these measurements are performed in the Hunter Lab color space. A detailed table of color measurements was obtained [8] and consolidated slightly to reduce the number of possible hair color assignments from 68 to 62. A small portion of the hair color table is shown in Figure 9. Color

L min

L max

a a b b min max min max

Black Very Dark Brown – cool overtones Very Dark Brown Very Dark Brown – warm overtones Dark Auburn – cool overtones Lightest Blonde Red Blonde Red

0.0 14.0

14.0 16.0

-10.0 3.0 -10.0 3.0

-10.0 5.0 -10.0 1.0

14.0 14.0

16.0 16.0

-10.0 3.0 -10.0 3.0

1.0 1.25 1.25 3.0

16.0

19.0

2.0

3.0

-10.0 2.7

40.0 27.0 19.0

50.0 40.0 22.0

1.8 7.0 2.0

5.0 30.0 30.0

9.0 6.0 3.5

10.0 30.0 4.0

Figure 11. Hair color table used for identifying hair color. Values are represented in HLab. There is significant variation in the actual description of various shades of hair color. For example, “chestnut” is usually meant as a “browner auburn”, with no objective definition of “browner”. Because of this, GHCD identifies hair color as a triplet (C,S,O) where C = color, S = shading notation, and O = overtone. These are defined as: (C ∈ n) :n={black, brown, blonde, red, auburn, gray} (S ∈ l) :l={darkest, very dark, dark, medium, lightest, very light, light } (O ∈ t) :t={cool, medium, warm} The triplet (𝐿, 𝑎, 𝑏) is compared to the measurement table and the hair triplet (C,S,O)is thus determined. Figure 12 shows a final hair color determination.

Figure 12. Final hair color determination.

3

Results and Further Work

A set of 675, random images (containing faces) were chosen from the Internet and hair color was identified in each by GHCD. 602 hair color identifications by GHCD agreed with manual identifications, an 89.2% accuracy. 13 images of the 602 GHCD classified were disputed; e.g. hair color identification could not be agreed upon by manual observers. GHCD incorrectly identified the hair color in 73 images versus the manual identification, a 10.8% error rate. GHCD has been shown to be an effective algorithm for the automated determination of hair color. Our research has shown that there are three significant sources of error. First, is the segmentation of hair versus non-hair regions, second is the effect of extreme illumination variations in the image and third, the natural or artificial variations which can occur in a person’s hair, e.g. “highlights” added to hair. Moving forward, work still remains to improve the hair segmentation procedure and we are investigating techniques, which can detect and identify significant variations of hair color in a single individual.

4

References

[1] Y. Yacoob and L.S. Davis, “Detection and analysis of hair”, IEEE Trans. Pattern Analysis and Machine Intelligence, 1164-1169, 2006 [2] Rousset, C. and Colon, P.Y. “Frequential and Color Analysis for Hair Mask Segmentation”, 2008 [3] Lee, Kuang-chih and Anguleov, Dragomir and Sumengen, Baris and Gokturk, Salih, “Markov Random Field Models for Hair and Face Segmentation”, Proceeding of 8th IEEE International Conference on Automatic Face & Gesture Recogniton, 2008 [4] Julian, P. and Dehais, C. and Lauze, F. and Charvillat, V. and Bartoli, A. and Choukroun, A., "Automatic Hair Detection in the Wild", ICPR 2010 - 20th International Conference on Pattern Recognition, 4617-4620, 2010 [5] Viola, P., & Jones, M. (2001). Rapid Object Detection Using a Boosted Cascade of Simple Features. IEEE Conference CVPR. 1, pp. 511-518. IEEE. [6]

Lienhart, R. (2003). 20x20 gentle adaboost frontal face detector. OpenCV . Intel.

[7]

Bradley, D. (2003). 20x20 profile face detector. OpenCV . Princeton University / Intel.

[8] MacFarlane, Darby and MacFarlane, David and Billmeyer, Fred, “Method and Apparatus for Hair Color Characterization and Treatment”, WO 96/41139

Suggest Documents