Distinguishing Facial Features for Ethnicity-Based 3D Face Recognition

Distinguishing Facial Features for Ethnicity-Based 3D Face Recognition STEFANO BERRETTI, ALBERTO DEL BIMBO, and PIETRO PALA, University of Firenze Am...

Author: Stuart Cummings

2 downloads 2 Views 2MB Size

Report

Download PDF

Recommend Documents

Fusing Facial Features for Face Recognition

3D Face Recognition Using Multiple Features for Local Depth Information

Face recognition and facial-expression

Automatic 3D Reconstruction for Face Recognition 1

Contributions to facial feature extraction for Face recognition

Combining Facial Skin Mark and Eigenfaces for Face Recognition

2D-3D Mixed Face Recognition Schemes

3D Morphable Model Construction for Robust Ear and Face Recognition

Adapting Geometric Attributes for Expression-Invariant 3D Face Recognition

On Multi-scale differential features for face recognition

Learning the best subset of local features for face recognition

3D Face Recognition Benchmarks on the Bosphorus Database with Focus on Facial Expressions

Deep Representation of Facial Geometric and Photometric Attributes for Automatic 3D Facial Expression Recognition

Automatic Facial Makeup Detection with Application in Face Recognition

Facial Cosmetics Database and Impact Analysis on Automatic Face Recognition

Summary. Distinguishing Career Features

Specific Sensors for Face Recognition

Facial Expression Recognition Based on 3D Dynamic Range Model Sequences

CCTV Facial Recognition Analysis

Face-to-Face with Facial Recognition Evidence: Admissibility Under the Post-Crawford Confrontation Clause

Face Recognition with Support Vector Machines and 3D Head Models

A model of familiar and unfamiliar 3D face recognition

Anthropometric Facial Emotion Recognition

Distinguishing Facial Features for Ethnicity-Based 3D Face Recognition STEFANO BERRETTI, ALBERTO DEL BIMBO, and PIETRO PALA, University of Firenze

Among different approaches for 3D face recognition, solutions based on local facial characteristics are very promising, mainly because they can manage facial expression variations by assigning different weights to different parts of the face. However, so far, a few works have investigated the individual relevance that local features play in 3D face recognition with very simple solutions applied in the practice. In this article, a local approach to 3D face recognition is combined with a feature selection model to study the relative relevance of different regions of the face for the purpose of discriminating between different subjects. The proposed solution is experimented using facial scans of the Face Recognition Grand Challenge dataset. Results of the experimentation are two-fold: they quantitatively demonstrate the assumption that different regions of the face have different relevance for face discrimination and also show that the relevance of facial regions changes for different ethnic groups. Categories and Subject Descriptors: I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling—Curve, surface, solid, and object representations; I.5.2 [Pattern Recognition]: Design Methodology—Classifier design and evaluation; feature evaluation and selection General Terms: Algorithms, Experimentation, Performance, Theory Additional Key Words and Phrases: Feature selection, 3D face recognition, ethnicity-based learning, iso-geodesic stripes ACM Reference Format: Berretti, S., Del Bimbo, A., and Pala, P. 2012. Distinguishing facial features for ethnicity-based 3D face recognition. ACM Trans. Intell. Syst. Technol. 3, 3, Article 45 (May 2012), 20 pages. DOI = 10.1145/2168752.2168759 http://doi.acm.org/10.1145/2168752.2168759

1. INTRODUCTION

Face representation and matching for the purpose of verification or identification has been an active research area in the last years, with a major emphasis targeting detection and recognition of faces in still images and videos (see Zhao et al. [2003] for a survey). More recently, the increasing availability of 3D data has paved the way for the use of 3D face scans to improve the effectiveness of face recognition systems [Bowyer et al. 2006]. In fact, face recognition based on 3D face scans features less sensitivity to pose variations and lighting conditions and allows facial deformations induced by expression changes to be better analyzed using the 3D geometry of the face, rather than its 2D appearance. In general, approaches to 3D face recognition can be grouped in two broad categories [Zhao et al. 2003]: global (or holistic), that is, performing face matching based on representations extracted from the whole face [Pan et al. 2005; Wang et al. 2005, Authors’ address: S. Berretti, A. Del Bimbo, and P. Pala, Dipartimento di Sistemi e Informatica, University of Firenze, via S. Marta 3, 50139 Firenze, Italy; email: [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from the Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701, USA, fax +1 (212) 869-0481, or [email protected]. c 2012 ACM 2157-6904/2012/05-ART45 $10.00 DOI 10.1145/2168752.2168759 http://doi.acm.org/10.1145/2168752.2168759

ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

45

45:2

S. Berretti et al.

2006]; and local (or region-based), that is, partitioning the face surface into regions and extracting and matching appropriate descriptors for each of them [Berretti et al. 2010; Cook et al. 2006; Faltemier et al. 2008b; Queirolo et al. 2010]. There are also a few hybrid solutions that combine both global and local descriptors, as well as multimodal solutions that fuses 2D and 3D information to improve the recognition accuracy [Chang et al. 2005; Mian et al. 2007, 2008]. A thorough review of the state-of-the-art in 3D face recognition is out of the scope of this work and can be found in the survey of Bowyer et al. [2006] and in the literature reviews of Kakadiaris et al. [2007], Mian et al. [2007], and Berretti et al. [2010]. In general, global representations are extracted from the whole facial surface which usually makes them compact and, therefore, computationally efficient. Global representations are robust to noise, although they are sensitive to face alignment and to occlusions, and the recognition accuracy is severely affected if the 3D model of the face includes elements such as the hair, ears, and neck. Moreover, their accuracy decreases significantly in the presence of nonneutral facial expressions. In contrast, local representations are extracted from small patches of the facial surface, and in some cases can reduce to very small regions around detected key-points. The characterizing trait of local approaches is their potential to cope with occlusions and facial expressions. Since description of the face derives from the combination of many local descriptors, the deformation of a few parts due to occlusions or nonneutral facial expressions does not compromise the overall match. Following these considerations, many recent approaches to 3D face recognition have used local features of the face, reporting very high accuracies on benchmark databases like the Face Recognition Grand Challenge version 2.0 dataset (FRGC v2.0). Some of these approaches [Chang et al. 2006; Drira et al. 2009; Faltemier et al. 2008a] explicitly define a matching scheme that provides for the use of a subset of the identified facial parts to perform matching. In Drira et al. [2009], nasal surfaces are represented using indexed collections of iso-curves so as to enable comparison of nose shapes by way of their iso-curves. In Chang et al. [2006], multiple overlapping regions around the nose are segmented, and the scores of iterative closest point (ICP) matching on these regions are combined together. This idea is extended in Faltemier et al. [2008a] by using a set of 38 regions that densely cover the face, and selecting the best-performing subset of 28 regions to perform matching using the ICP algorithm. However, in all these approaches, the use of multiple independent facial regions to perform matching of 3D face models is formalized in terms of a fusion process that does not make explicit the relative relevance of the different regions for the purpose of matching. Differently, a viable approach to increasing the descriptiveness of local facial representations and defining optimized local solutions is to investigate the relevance that individual facial regions play in determining the overall recognition accuracy. This could enable different weighting of the regions and permit dynamically selecting and adapting the subset of local descriptors that are more suited to represent the face in different contexts. Just as some local characteristics of the face can be used to distinguish between different individuals, other characteristics exist that can be used to distinguish between different ethnic groups. In fact, it is a common intuition that facial traits/regions that are most discriminative change across different ethnic groups, and this knowledge can be used to boost the accuracy of the recognition within that group. This has been supported by anthropometrical statistics showing that discriminative facial traits change across different ethnic groups, and that a close relationship exists between the 3D shape of the human face and ethnicity [Enlow and Hans 1990; Farkas 1994]. In particular, Farkas used 25 measurements between accurately identified landmarks on the head and face to show the ethnic craniofacial morphometric differences between ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

Distinguishing Facial Features for Ethnicity-Based 3D Face Recognition

45:3

North-American Caucasians, African-Americans, and Chinese. Several differences were identified in these three groups. For example, the Chinese have the widest faces; The main characteristics of the orbits of the Chinese group were the largest intercanthal width. Further, the nose is less protruding and wider in the Chinese. This group has also the highest upper lip in relation to mouth width. This suggests using knowledge of the most discriminative characteristics of each ethnic group to improve the accuracy of recognition by training different classifiers for different ethnic groups. In the literature, the potential of interethnic differences measured on 3D face models has been investigated to address the problem of ethnicity classification. In Lu et al. [2006], a multimodal approach for ethnicity identification using a support vector machine is presented. Characterizing facial traits are captured combining depth and texture facial data. These are extracted by sampling the rectangular area enclosing the mouth and the eyes on a 10×18 grid. The solution addresses a simplified classification task of reducing the number of possible classes to two: Asian and non-Asian (the same approach is adopted in Shakhnarovich et al. [2002]). The experiments demonstrated that the depth modality provides more discriminative power for ethnicity identification compared to the intensity modality. As expected, the integration of both modalities improves the accuracy of ethnicity identification. In Zhong et al. [2009], a fuzzy 3D face ethnicity categorization algorithm is proposed to discriminate between Eastern and Western people. Intrinsic discriminative information embedded in 3D faces are extracted using Gabor filters, and k-means clustering is adopted to learn the centers of the filter response vectors. Then, a visual codebook is constructed by these learned centers [Zhong et al. 2007], and merging and mapping distances are learned from the visual codes. These distances are used to learn, respectively, the Eastern and Western human codes and to compute the probabilities of 3D faces to map to Eastern and Western individuals. More qualitative studies have been performed by Hu et al. [2010], where subjective experiments are conducted to test the human capability in gender/ethnicity recognition on different face representations (including 3D face models), thus providing baselines for designing computer-based face gender/ethnicity recognition algorithms. In this work, we propose an original approach to study the discriminative relevance of facial regions for different ethnic groups in the presence of facial expressions. The approach relies on the use of feature selection [Kohavi and John 1997] as the methodology for evaluating the informative content of 3D facial regions. Given a multivariate classification task, feature selection determines whether a subset of the original set of variables exists that supports the classification task without compromising classification accuracy—indeed, in some cases, the classification accuracy is improved. In so doing, feature selection provides a number of advantages: dimensionality reduction to decrease the computational cost; reduction of noise to improve the classification accuracy; and more interpretable features or characteristics. In the proposed solution, the minimal-redundancy maximal-relevance [Peng et al. 2005] feature selection model is combined with the approach to 3D face recognition based on iso-geodesic stripes [Berretti et al. 2006] to study changes of the relative relevance of the facial stripes due to interethnic group differences and facial expressions. The iso-geodesic stripes approach, by its nature, can be easily combined with a feature selection model. In fact, iso-geodesic stripes are identified by measuring distances of surface points to a fiducial (reference) point located on the nose tip, and the point-wise spatial relationships between stripes are used to capture the relevant traits of the face. Facial information captured by iso-geodesic stripes is then represented in a compact form by evaluating the spatial relationships between every pair of stripes. Mutual arrangements between pairs of iso-geodesic stripes are encoded in a graph structure where nodes correspond to the iso-geodesic stripes and arcs between two nodes are labeled with the descriptor ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

45:4

S. Berretti et al.

of the spatial arrangement between the corresponding stripes. In this way, the similarity between two 3D face scans can be estimated by extracting their graphs representation and combining distances between arc labels of the graphs of the two scans. The proposed framework has been experimented on the FRGC dataset [Phillips et al. 2005] so as to quantitatively evaluate the relative relevance of pairs of stripes for supporting face recognition within heterogeneous and homogeneous ethnic groups and the extent to which facial expressions change the relevance of pairs of stripes. The article is organized as follows. In Section 2, the main features of the face recognition approach based on iso-geodesic stripes are summarized, thus providing the necessary information for the application of feature selection. In Section 3, the minimal-Redundancy Maximal-Relevance feature selection model is described. The application of this feature selection model for measuring the relevance of pairs of iso-geodesic stripes for the purpose of 3D face recognition is described in Section 4. Experimental results are provided in Section 5, which report qualitatively and quantitatively how the measure of stripe relevance changes for different ethnic groups, how ethnicity conditions the stripes that are less sensitive to facial expressions, and how these distinctive traits of each ethnic group can be exploited to boost face recognition accuracy. Finally, conclusions and future research directions are drawn in Section 6. 2. ISO-GEODESIC STRIPES OF THE FACE

In the approach of Berretti et al. [2010], the structural information of a face scan is captured through the 3D shape and relative arrangement of iso-geodesic stripes identified on the 3D surface. Iso-geodesic stripes are defined by computing, for every surface point, the normalized geodesic distance between the point and a reference point located at the nose tip of the face. Normalized values of the geodesic distance are obtained by dividing the geodesic distance by the Euclidean eye-to-nose distance, that is, the sum of the distances between the nose tip and the two points located at the inner commissure of the left and right eye fissure, and the distance between the two points at the inner eyes. The algorithm reported in Mian et al. [2007] is used for the identification of the nose tip and of the two inner eye points. This normalization guarantees invariance of the distance values with respect to scaling of the face scan. Furthermore, since the Euclidean eye-to-nose distance is invariant to face expressions, this normalization factor does not bias values of the distance under expression changes. Computation of the geodesic distance on the piecewise planar mesh is accomplished through Dijkstra’s algorithm and approximates the actual geodesic distance between two surface points with the length of the shortest piecewise linear path on mesh edges. Once values of the normalized geodesic distance are computed for every surface point, iso-geodesic stripes can be identified. For this purpose, the range of the normalized geodesic distance values is quantized into n intervals c1 , . . . , cn. Accordingly, n stripes concentric with respect to the nose tip are identified on the 3D surface with the ith stripe corresponding to the set of surface points for which the value of the normalized geodesic distance falls within the limits of interval ci. Figure 1(b) shows the projection on the XY plane of the pairs of iso-geodesic stripes of the three subjects in Figure 1(a), thus evidencing the shape variations of the stripes. As an example, Figure 2 shows the first nine iso-geodesic stripes identified on the face scans of two individuals. Results of the analysis of the deformation that nonneutral facial expressions induce in the shape of the iso-geodesic stripes, as is detailed in Berretti et al. [2010], motivate the decomposition of the facial stripes into three parts: upper-left (UL), upper-right (UR), and lower (L), with respect to the coordinates of the nose tip (see Figure 1(b)). In general, under the effect of nonneutral facial expressions, the region around the mouth is subject to larger deformations than the other regions of the ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

Distinguishing Facial Features for Ethnicity-Based 3D Face Recognition

45:5

Fig. 1. (a) Sample face models, where the 4th and 7th iso-geodesic stripes are evidenced. The 3DWW relationship descriptors are computed for the UL, UR, and L parts of the stripe pair. (b) Projection of the pairs of iso-geodesic stripes in (a) on the XY plane, with the partitioning of the iso-geodesic stripes into three parts.

Fig. 2. The first nine iso-geodesic stripes for two sample face scans. The graphs constructed on a part of the stripes and their matching are also shown.

face. Furthermore, decomposition of the upper part into upper-left and upper-right allows the face representation model to better deal with slight asymmetries of the face that constitute a characterizing trait of some individuals. This subdivision resulted necessary in improving the performance of the approach in the case of faces with expression variations (results of the iso-geodesic stripes approach without face partitioning were first reported in Berretti et al. [2006]). Once facial stripes are extracted, distinctive structural features of 3D face scans are captured by describing the pointwise 3D spatial relationships between homologous parts of pairs of iso-geodesic stripes. To this end, the 3D Weighted Walkthroughs (3DWWs) descriptor has been used. The 3DWW have been first introduced in Berretti et al. [2006], and their use and properties in the context of 3D face recognition have been extensively discussed in Berretti et al. [2010]. 3DWW define a set of integral measures over the points of two regions A and B in the 3D domain. These measures are captured through weights wi, j,k ( A , B) that encode the number of pairs of points belonging respectively to A and B, whose displacement is captured by the walkthrough i, j, k (with i, j, k taking values in {−1, 0, +1}). 1 wi, j,k ( A , B) = · Ci(xb − xa)C j(yb − ya)Ck (z b − z a) db da, (1) K i, j,k A B where db = dxb dyb dz b and da = dxadyadz a; K i, j,k acts as a normalization factor to guarantee that wi, j,k takes value in [0, 1]; C±1 (.) are the characteristic functions of the positive and negative real semi-axis (0, +∞) and (−∞, 0), respectively; and C0 (·) ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

45:6

S. Berretti et al.

denotes the Dirac’s function that is used to reduce the dimensionality of the integration domain to enable a finite nonnull measure. 3DWWs are capable of quantitatively measuring the relative spatial arrangement of two extended sets of 3D points by computing 27 weights and organizing them in a 3 × 3 × 3 matrix. As a particular case, the 3DWWs computed between an extended 3D entity and itself also account for intrinsic shape information. The properties of 3DWW, joined by the geodesic distance computation and face partitioning, provide the method with robustness to expression variations. In fact, geodesic distances between two facial points remain sufficiently stable under expression changes resulting in the fact that the large majority of the points of each stripe still remain within the same stripe, even under facial expression changes. In addition, due to the constrained elasticity of the skin tissue, neighbor points can be assumed to feature very similar motion for moderate facial expressions in most parts of the face. For all these points, the mutual displacement between the two points is mainly determined by the geometry of the neutral face. This property is preserved by 3DWWs that provide an integral measure of displacements between pairs of points. 2.1. Face Representation and Matching

A generic face model F is represented through a set of NF stripes. In that 3DWWs are computed for every pair of iso-geodesic stripes (including the pair composed by a stripe and itself), a face is represented by a set of NF · (NF + 1)/2 relationship matrixes. According to the proposed representation, iso-geodesic stripes and 3DWWs computed between pairs of stripes (interstripe 3DWW) and between each stripe and itself (intrastripe 3DWW) have been cast to a graph representation where intrastripe 3DWWs are used to label the graph nodes and interstripe 3DWWs to label the graph edges (see Figure 2). In order to compare graph representations, distance measures for node labels and for edge labels have been defined. Both of them rely on the L 1 distance measure D defined between 3DWWs [Berretti et al. 2006]. The similarity measure between two face models represented through the graphs P and G with nodes pk and gk , is then derived as μ(P, G) =

NP α · D(w( pk , pk ), w(gk , gk )) NP

(2)

k=1

+

NP k−1 2(1 − α) · D(w( pk , ph), w(gk , gh )), NP (NP − 1) k=1 h=1

where the first summation in Equation (2) accounts for the intrastripe 3DWWs’ similarity measure, and the second summation evaluates the interstripe 3DWWs’ similarity measure. The α parameter permits for weighting differently the two distance components, and its value is set to 0.3 in the experiments. This value has been tuned in a preliminary set of experiments on the FRGC v1.0 database and shows that in order to support face recognition, interstripe spatial relationships are more discriminant than intrastripe spatial relationships. Implicitly, Equation (2) assumes that the number of nodes NP in graph P is not greater than the number of nodes NG of graph G. This can be assumed with no loss of generality in that if NP > NG , graphs P and G can be exchanged. Following these considerations, distances between faces of two individuals are measured by computing the 3DWW for each pair of iso-geodesic stripes separately in the three face parts and then comparing the 3DWWs of homologous pairs of the two faces. ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

Distinguishing Facial Features for Ethnicity-Based 3D Face Recognition

45:7

The final dissimilarity measure is obtained by averaging the distances in the three parts. According to Equation (2), the overall runtime complexity can be estimated as O(N2P T D ); T D being the complexity in computing D, that is, estimated to be a constant value [Berretti et al. 2010]. This permits efficient implementation for face identification in large datasets and also with the use of appropriate index structures, with great savings in performance. More details on the index structure and its performance are discussed in Berretti et al. [2001]. 3. FEATURE SELECTION WITH MRMR

Given a multivariate classification task, feature selection is a methodology for searching for a compact subset of the components of the multivariate feature vector that enables the same (if not superior) classification accuracy. Feature selection is mainly motivated by the dimensionality curse, which states that in the presence of a limited number of training samples (affected by noise), for each one represented as a feature vector in Rn, the mean accuracy does not always increase with vector dimension (n). Rather, the classification accuracy increases until a certain dimension of the feature vector and then decreases. In other words, the higher the dimensionality of the feature space, the higher the number of training samples required to achieve the same classification accuracy. Therefore, the challenge is to identify m out of the n features which will yield similar (if not better) accuracies, as compared to the case in which all the n features are used in a classification task. Approaches to feature selection can be grouped into three main categories according to the dependence to the classifiers: wrappers, filters, and embedded [Guyon and Elisseeff 2003]. Filter approaches perform feature selection by considering a performance metric (i.e., evaluation function) based entirely on the training data without reference to the classifier for which the features are to be selected [Almuallim and Dietterich 2004; Arauzo-Azofra et al. 2004]. The performance metrics are based on statistical separability measures that provide an estimate of how separable training data classes are, hence giving an indication of how easily the data may be correctly classified. In wrapper type methods [Kohavi and John 1997; Langley 1994], feature selection is wrapped around a learning method: the usefulness of a feature is directly judged by the estimated accuracy of the learning method. Wrappers use a search algorithm to search through the space of possible features and evaluate each subset by running a classifier on the subset and optimizing the performance of the classifier. This requires the identification of the subset of features that yields the best classification accuracy. Assuming ns is the dimensionality n of the original feature vectors, the number of all possible subsets of m features is m . Since, in general, all values of m ∈ {1, . . . , n} have to be tried out, these approaches can be computationally inefficient and have a risk of overfitting to the classifier. However, one can often obtain a set with a very small number of nonredundant features which gives high accuracy, because the characteristics of the features match well with the characteristics of the learning method. By using the learning machine as a black box, wrappers are remarkably universal and simple. But embedded methods [Fu et al. 2009; Rivals and Personnaz 2003] that incorporate variable selection as part of the training process may be more efficient in several respects: they make better use of the available data by not needing to split the training data into a training and validation set, and they reach a solution faster by avoiding retraining a predictor from scratch for every variable subset investigated. The minimal-Redundancy Maximal-Relevance (mRMR) is a filter approach to feature selection [Peng et al. 2005]. For a given classification task, the aim of mRMR is ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

45:8

S. Berretti et al.

to select only those features that are maximally relevant to yield correct classification and, at the same time, minimally redundant with each other. In particular, with this approach, the relevance and redundancy are defined in terms of the mutual information between the features. N , their joint Given two discrete random variables x and y, taking values in {si}i=1 probability P(x, y) and the respective marginal probabilities P(x) and P(y), the mutual information between x and y is defined as the difference between the Shannon’s entropy of x and the conditional entropy of x given y, that is, I(x, y) = H(x) − H(x|y), where the entropy is used as a measure of the uncertainty of a random variable. In practice, this expression states that if from the uncertainty of x is subtracted the uncertainty of x once y is known, the information that the variable y provides about x is obtained. According to this, mutual information provides a measure of the dependency of variables and can also be computed as I(x, y) =

N N

P(si, s j) log

i=1 j=1

P(si, s j) . P(si)P(s j)

(3)

Peng et al. [2005] propose to jointly maximize the dependency between a feature variable xi and the classification variable l and minimize the dependency between pairs of feature variables xi, x j. According to this, feature selection is performed by selecting from the complete set of n features Sn a subset Sm of m < n features that maximizes 1 1 I(xi, l) − m I(xi, x j). (4) m 2 xi∈Sm

xi,x j∈Sm

This expression takes into account the relevance of features with respect to the class label l, while penalizing redundancy among the features. Since the exhaustive search of the optimum subset Sm is intractable, Sm is determined incrementally by means of a forward search algorithm. Having a subset Sm−1 of m − 1 features, the feature xi ∈ {Sn − Sm−1 } that determines a subset {xi, Sm−1 } maximizing Equation (4) is added. It can be shown that this nested subset strategy is equivalent to iteratively optimize the following condition. 1 max I(xi, l) − I(x j, xi) . (5) xi∈Sn−Sm−1 m−1 x j∈Sm−1

Experiments in Peng et al. [2005] show that for subsets of more than 20 features, the Sm obtained with this method achieves more accurate classification performance than the subset obtained by maximizing the I(Sm, l) value (i.e., the mutual information between the whole subset of variables and the classification label l), while the required computation cost is significantly lower. In summary, the benefits of the mRMR approach can be realized in two ways: (i) with the same number of features, the mRMR feature set is expected to be more representative of the targeted characteristics, therefore leading to better generalization property; and (ii) a smaller mRMR feature set can be used to effectively cover the same space than a larger conventional feature set does. 4. RELEVANCE OF PAIRS OF ISO-GEODESIC STRIPES

In our approach, 3D face scans are represented by using the 3DWWs between every pair of iso-geodesic stripes as labels of the nodes and edges of the face graphs. Nine isogeodesic stripes of 1 cm width are considered, thus resulting in 45 3DWW relationship matrices representing each of the three parts of the face (i.e., UL, UR, and L). Given ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

Distinguishing Facial Features for Ethnicity-Based 3D Face Recognition

45:9

two face scans and their associated face graphs, matching the node and edge labels of these graphs results in 45 × 3 = 135 distance components, each corresponding to the individual distance terms of Equation (2). These distance components are arranged into a distance vector V135 = (d1 , . . . , d135 ). Every distance vector is also associated with a binary classification label l with value equal to 1 if the distance vector measures the dissimilarity between two facial scans of the same person, and 0 otherwise. Following this protocol, pairwise matches are performed between every pair of face scans in a train database so that every scan is matched against every other. In this way, a large set of distance vectors is generated and used as a reference set for selecting the most relevant features (being the features the individual distance components of the vector V135 ). According to this, the overall set of distance vectors is used to feed the mRMR algorithm so as to determine the pairs of iso-geodesic stripes whose distances are more relevant in discriminating between 3D face scans of different subjects. In our case, the feature variables are continuous variables which imply equivalent expressions to those reported in Section 3, replacing summations with integrals and probabilities with density functions. This would require the adoption of density estimation methods like the Parzen window model [Parzen 1962] to approximate I(x,y) [Kwak and Choi 2002]. As an alternative, a more efficient yet effective solution is to incorporate data discretization as a preprocessing step. Following this latter approach, first the mean value μi and the standard deviation σi for every feature di are computed, then discretized values dˆ i are obtained according to the following rule [Peng et al. 2005]. ⎧ ⎪ ⎨ 2 if di < μi − β · σi ˆdi = 3 if μi − β · σi ≤ di ≤ μi + β · σi (6) ⎪ ⎩ 4 if di > μi + β · σi , where β is a parameter that regulates the width of the discretization interval. The value of β has been set to 0.5 through a pilot set of experiments carried out on the FRGC v1.0. Summarizing, with respect to the formalism used in the previous section, the indiˆ i taking values in {2, 3, 4}. (N = 3) represent the discrete vidual distance components d feature variables xi, whereas Vˆ 135 is the array of normalized and discretized distance ˆ i and corresponds to the set of features Sn (n = 135, in our case). The components d result of mRMR provides us with a subset Vˆ m (with m < 135) of maximally salient and minimally redundant distance components with respect to the binary classification label (l). As a final step, ordered sets of the most relevant features obtained by the feature selection are used to perform face verification experiments. In particular, these features are used to train one-vs.-all SVMs capable of discriminating intrasubject distance vectors to intersubject distance vectors. In this way, SVM classifiers are trained so as to decide, given the distance vector computed between two generic facial scans, if the two scans represent the same subject or not. 5. EXPERIMENTAL RESULTS

In the following, we present the results obtained by applying the feature selection analysis to the iso-geodesic stripes (IGS) of the face. Results show qualitatively and quantitatively how the measure of stripe relevance changes for different ethnic groups, how ethnicity conditions the stripes that are less sensitive to facial expressions, and how these distinctive traits of each ethnic group can be exploited to boost face recognition accuracy. The IGS approach resulted in the best performing methods at the ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

45:10

S. Berretti et al.

last SHREC 3D face recognition contest [Daoudi et al. 2008]. Extensive performance analysis of the IGS approach on the FRGC v2.0 database, results of the last SHREC competition, and comparisons with respect to state-of-the-art solutions for 3D face recognition have been reported in Berretti et al. [2010]. We performed experiments on the FRGC dataset which is composed of a training set (subfolder Spring2003) referred to as FRGC v1.0 that includes 943 3D face scans of 275 individuals showing a neutral facial expression and a test set known as FRGC v2.0 (union of subfolders Fall2003 and Spring2004) including 4,007 3D face scans of 466 individuals acquired with different facial expressions (about 60% of the faces have neutral expression and the others show expressions of disgust, happiness, sadness, and surprise). The fact that no model in the FRGC v1.0 shows expression variations makes the generalization of training performed on FRGC v1.0 quite challenging. Face scans are given as matrices of size 480 × 640 of 3D points with a binary mask indicating the valid points of the scan (i.e., the foreground points typically corresponding to the head and shoulders). Due to different distances of the subjects from the sensor during acquisition, the actual number of valid points in a scan can vary. Individuals have been acquired with frontal views from the shoulder level with very small pose variations. Some scans include occlusions of the face due to hair. More details on the FRGC dataset can be found in Phillips et al. [2005]. All the experiments reported in this section have been performed after some preprocessing of face scans in the datasets. A sphere of radius 100mm centered on the nose tip was used to crop the 3D face. Then, spikes in the 3D face were removed using median filtering in the z-coordinate. Holes were filled using cubic interpolation, and 3D scans were resampled on an uniform square grid at 1mm resolution. Since the scans are not provided with the locations of the nose tip and the points at the inner eyes, these fiducial points were located according to the method reported in Mian et al. [2007]. Finally, scans were also processed by iteratively performing PCA alignment and resampling of the cropped portion of the face in order to obtain pose normalization [Mian et al. 2007]. In our experiments, following the protocol suggested in the FRGC, we used the FRGC v1.0 as a training set and the FRGC v2.0 as a test set. According to this, two different experimental evaluations have been performed. In the training phase, the relevance of stripe pairs according to mRMR has been evaluated so as to identify the most representative features for face recognition using the IGS representation (Section 5.1). This analysis is carried out on the whole FRGC v1.0 dataset as well as on partitions of this dataset obtained by grouping people based on their ethnicity. In so doing, the most relevant features for different ethnic groups have been identified. In the testing phase, 3D face verification is performed by applying SVM classification to the most relevant mRMR features, and the effect of facial expressions for different ethnic groups is investigated (Section 5.2). Performance of the proposed solution in terms of recognition accuracy is also reported in comparison with state-of-the-art solutions. 5.1. Relevance of Pairs of Iso-Geodesic Stripes Across Ethnic Groups

To investigate how the performance of the IGS approach is affected by the variability of subjects across different ethnic groups, face scans of the FRGC have been partitioned by distinguishing between Asian and Caucasian individuals. For training, the 275 subjects in the FRGC v1.0 have been classified as 65 Asian (236 scans) and 198 Caucasian (666 scans), whereas the remaining 12 subjects (41 scans) have been left out as belonging to different ethnic groups.1 Based on 1 Lists

of the subjects in the two classes, for both FRGC v1.0 and FRGC v2.0, are available at http://www.dsi.unifi.it/∼berretti/download/frgc/.

ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

Distinguishing Facial Features for Ethnicity-Based 3D Face Recognition

45:11

Table I. Training Datasets of Faces and Distance Vectors Derived from the FRGC v1.0 Faces #subjects #scans All Asian Caucasian

275 65 198

943 236 666

Distance Vectors #examples #pos. examples 444,143 27,730 221,445

1,780 487 1,208

Table II. FRGC v1.0: Most Relevant Stripe Pairs Ordered by Decreasing Relevance Scores for the Distance Vectors Datasets (a) All Region pair Relevance 90.07% (4,5) (6,9) 88.58% (6,8) 87.74% (5,8) 87.67% 87.32% (2,4) (3,5) 86.27% (5,9) 85.98% (3,6) 85.95% (4,8) 85.36% (3,7) 84.34%

(b) Asian Relevance Region pair 93.84% (3,7) 90.83% (6,8) 87.87% (6,9) 87.21% (3,6) 87.03% (5,9) 86.05% (5,8) 86.03% (6,7) 84.44% (4,5) 84.08% (7,9) 82.14% (2,9)

(c) Caucasian Relevance Region pair 92.97% (4,5) 92.19% (5,8) 91.78% (6,9) 91.31% (4,8) 91.15% (2,4) 90.97% (6,8) 90.30% (5,7) 89.92% (3,5) 89.10% (4,7) 89.03% (5,9)

this division, three different Faces datasets have been defined: All includes the 943 scans of the FRGC v1.0; Asian includes the 236 scans of individuals of Asian ethnicity; Caucasian includes the 666 scans of individuals of Caucasian ethnicity. Each Faces dataset originates a Distance Vectors dataset obtained by computing the 135-dimensional distance vector for every pair of scans in the dataset. Table I reports the number of subjects and the number of scans for the three Faces datasets, as well as for the overall number of distance vectors and the number of positive distance vectors (i.e., measuring the distance between two models of the same subject) for the corresponding Distance Vectors datasets. Applying the mRMR algorithm to the Distance Vectors datasets, the 135 distances between iso-geodesic stripe pairs in the UL, UR, and L parts of the face are ranked according to their mutual information value (relevance) returned by mRMR and normalized to the mutual information of the most important pair. To simplify the analysis of results, one value of relevance is reported for each pair of stripes by averaging values of relevance for the UL, UR, and L parts of the same pair of stripes. The results for the ten most relevant stripe pairs are given in Table II and Figure 3. In particular, Table II(a) reports, for the All dataset, the most important pairs of iso-geodesic stripes, ordered by decreasing values of relevance. It results that the most relevant information for the purpose of discriminating between generic faces (regardless of the ethnic group) is captured by the spatial arrangement between the 4th and the 5th stripe, followed by the (6, 9) pair. Tables II(b) and (c) report the most relevant stripe pairs for the Asian and Caucasian datasets, respectively. It can be observed that the ten most relevant pairs of stripes for the two datasets are quite different (e.g., just the pair (6, 9) appears among the top-five ranked pairs in Tables II(b) and (c)). Furthermore, the relevance of stripe pairs for the Caucasian dataset decreases smoothly from the most relevant, whereas the same is not true for the Asian dataset where ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

45:12

S. Berretti et al.

Fig. 3. The five most relevant iso-geodesic stripe pairs for two sample subjects, respectively, of the Asian and Caucasian datasets.

Fig. 4. Fraction of the n most relevant features shared between the Asian and the Caucasian datasets at varying number of features.

a relevance gap of about 6% can be observed between the 1st and 3rd ranked stripe pairs. In Figure 3, the five most relevant stripe pairs for the Asian and Caucasian datasets are evidenced on two sample 3D face scans. It can be observed that the area surrounding the nose (roughly corresponding to the 4th and 5th stripe) represents a discriminating trait for the Caucasian group, whereas it is of less importance for the Asian one. To highlight the difference in the relevance of facial features, the fraction of the features that are in common among the n most relevant features—the number n ranging from 1 to the overall number of features (135) with steps of 15—of the Asian and Caucasian datasets is reported in Figure 4. For instance, for n = 45, this fraction is 0.55, meaning that among the first 45 most relevant features, just 25 features are common to both the Asian and Caucasian datasets. These results suggest that the morphological traits of the face that are more relevant for recognition vary across different ethnic groups. This characteristic has been not previously investigated or reported in studies on 3D face recognition, whereas it represents a viable approach for improving ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

Distinguishing Facial Features for Ethnicity-Based 3D Face Recognition

45:13

Table III. Three Test Datasets Derived from the FRGC v2.0 Faces #subjects #scans All-v2 Asian-v2 Caucasian-v2

466 103 349

4,007 1,133 2,739

Distance vectors #examples #pos. examples 8,026,021 641,278 3,749,691

23,456 7,761 15,018

Note: 135 scans of 14 subjects have been classified as belonging to other ethnic groups.

Fig. 5. FRGC v2.0: TAR at 0.001 FAR at varying number of features. Plots of the Asian-v2, Caucasian-v2, and All/Asian-v2, All/Caucasian-v2 tests are reported.

recognition accuracy by enabling training of more accurate classifiers on specific ethnic groups, as reported in the next section. 5.2. 3D Face Veriﬁcation Using the Most Relevant mRMR Features

Results of the feature selection analysis reported in the previous section have been used to train a set of one-vs.-all SVM classifiers. For every dataset reported in Table I, the n top relevant features have been selected—n ranging from 15 to 135 with step of 15. In this way, the original datasets originate a collection of training datasets Alln, Asiann, Caucasiann that have been used to train the SVM classifiers. Following the FRGC protocol, we used the FRGC v2.0 dataset to test the accuracy of these classifiers. For this purpose, we subdivided the FRGC v2.0 subjects into Asians and Caucasians, thus obtaining the datasets reported in Table III (these datasets are referred to with the v2 postfix in order to differentiate them from the datasets of Table I). These datasets have been used to perform verification experiments to estimate the classification accuracy of classifiers trained on the datasets Alln, Asiann, and Caucasiann. Results of these experiments are measured using the true acceptance rate (TAR) with respect to the false acceptance rate (FAR). According to the evaluation protocol defined in the FRGC [Phillips et al. 2005], in Figure 5, the value of TAR at 0.001 FAR is used as a performance index and plotted against the number of features used in the classification. In these plots, the number of features used for the classification is the same for all the datasets, but the features are different due to the different results of the feature selection applied to the datasets. In particular, plots denoted with Asian-v2 and Caucasian-v2 are obtained by processing the corresponding datasets of Table III through the classifiers trained on Asian ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

45:14

S. Berretti et al.

Fig. 6. FRGC v2.0: ROC curves of the Asian-v2 and Caucasian-v2 datasets obtained using (a) the 75 most relevant features and (b) all the 135 features.

and Caucasian, respectively. These results are compared in the same figure with those obtained by processing the same test datasets through a classifier trained on the All dataset. This corresponds to the case in which an unique classifier is trained for all the subjects independently from their ethnicity; then, datasets Asian-v2 and Caucasian-v2 are tested separately with this classifier providing the curves indicated with All/Asianv2 and All/Caucasian-v2 in Figure 5. In this way, we can measure the extent to which the verification accuracy is improved by the use of a classifier that is trained on a homogeneous ethnic group, compared to the use of a classifier trained on a nonhomogeneous group. In general, for all the datasets, values of TAR at 0.001 FAR increase with the number of features. However, it results that using approximately the 75 most relevant features is sufficient to provide a reasonably high value of TAR. More importantly, it should be observed that using all 135 features, the average accuracy on the All/Asian-v2 and All/Caucasian-v2 datasets are considerably lower when compared with the accuracy on the corresponding experiments, Asian-v2 and Caucasian-v2. In other words, training one classifier for each ethnic group can yield better accuracy than training one classifier for all the ethnic groups. This paves the way to the design of more accurate 3D face recognition systems that combine two classification modules: a first one for classification of the ethnic group and a second one for face recognition within a specific ethnic group. Finally, it should be noticed that the recognition accuracy reported in Figure 5 is measured assuming the availability of a perfect ethnicity classifier. In practice, incorporating a real ethnicity classifier is expected to decrease the recognition accuracy by about 2%.2 However, this does not alter the trend evidenced in Figure 5 that clearly shows the advantage of using different classifiers trained for every ethnic group. Figures 6(a) and (b) report the complete receiver operating characteristics (ROC) in the case, respectively; the first 75 most relevant features and all the 135 features are used for classification of the Asian-v2 and Caucasian-v2 datasets. 5.2.1. Variability to Facial Expressions. Facial expressions can induce large variations in the morphology and topology of regions of the face and, consequently, have a major impact on the performance of face recognition approaches. In fact, in typical face recognition applications, reference (gallery) face scans acquired with neutral 2 The

development of an ethnicity classifier is out of the scope of this work. Existing ethnicity classifiers can be used for this purpose. For example, in the work of Lu et al. [2006] a multimodal approach for ethnicity identification is proposed and tested on a dataset that also includes the FRGC v1.0. Reported results show an error rate of 0.7% and 5.5%, respectively, for non-Asian and Asian, with an overall error of 2.0%.

ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

Distinguishing Facial Features for Ethnicity-Based 3D Face Recognition

45:15

Table IV. Number of Scans Categorized as Neutral and Expression Comprised in the Asian-v2 and Caucasian-v2 Datasets of FRGC v2.0

Asian-v2 Caucasian-v2

#scans

Faces #neutral

#expression

Distance vectors #matches

1,133 2,739

761 1,625

372 1,114

2,497 5,597

Note: The #matches column indicates the number of matches between expression and neutral scans of same subjects.

expression are used to verify the identity of target scans (probes) in which subjects can exhibit nonneutral facial expressions. Considering this particular scenario, we have designed an experiment to investigate how expression changes affect face distances of the IGS approach across different ethnic groups. For this experiment, FRGC v2.0 scans were clustered into two distinct sets3 , respectively, with neutral expression (2,469 scans) and small cooperative expression (796 scans) plus large noncooperative expression (742 scans) for a total of 1,538 expression scans. With this subdivision, face scans in Asian-v2 and Caucasian-v2 are partitioned as summarized in Table IV. Then, we performed matches between expression and neutral scans of same subjects separately for the Asian-v2 and Caucasian-v2. This resulted in 2,497 and 5,597 matches for the two sets, respectively. Using the IGS approach, every match is obtained by Equation (2) applied to the UL, UR, and L parts of the face. For the matching distances between every pair of stripes, we computed statistical measures and visualized them with a box-plot representation, as shown in Figures 7(a) and (b) for Asian-v2 and Caucasian-v2, respectively. Statistics for the first ten stripe pairs according to the mRMR ranking are reported. In this visualization, the horizontal line through the middle of the box is the median of the performance range, and the top and bottom of the box marks the 1st quartile (25th percentile) and 3rd quartile (75th percentile) values of the observations, respectively. Thus, 50% of the performance range is contained in the box. Above and below the box are vertical dashed lines, the “whiskers,” that end with a short horizontal line. The ends of the whiskers correspond to the minimum and maximum data value. The small horizontal lines above or below the whiskers represent outliers (i.e., more precisely, the points are drawn as outliers if they are larger than q3 + w(q3 − q1 ) or smaller than q1 − w(q3 − q1 ), where q1 and q3 are the 25th and 75th percentiles, respectively, and w = 1.5 is the whisker length). In general, from Figures 7(a) and (b), it emerges that the dynamic of variations of the box-plots (including the whiskers) is smaller for Caucasians than for Asians. Instead, considering just the height of the box-plots, it results a converse behavior (i.e., on average the height of the box is greater for Caucasians than for Asians). This is in agreement with the fact that verification rates are better for Caucasians than for Asians, as reported in Figure 5. To further investigate the effects of nonneutral facial expressions on the relevance of facial stripes, we applied the feature selection analysis to scans of the FRGC v2.0 datasets. In particular, we applied the same methodology described in Section 5.1 to the analysis of the Asian-v2 and Caucasian-v2 datasets and compared measures of stripe relevance to those obtained from the analysis of the Asian and Caucasian datasets. Results are reported in Table V. Comparison of these results with those reported in Table II for neutral expression evidences that the presence of facial expressions only 3 This

classification was originally performed at Geometrix and used for the experiments reported in Maurer et al. [2005] and suggested by the FRGC organizers.

ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

45:16

S. Berretti et al.

Fig. 7. FRGC v2.0: Every boxplot reports, for a pair of stripes, the median and the 1st and 3rd quartile of the distances between expression and neutral scans of same subjects. Values are reported for the ten most relevant stripe pairs (as ranked by mRMR) of the Asian-v2 and Caucasian-v2 datasets in (a) and (b), respectively.

marginally alter the relevance of the top ten most relevant stripe pairs. In particular, for the Asian-v2 dataset, among the top ten relevant stripe pairs, the (6,8) pair and the (6,7) pair were the only two with a significant reduction in their relevance, being ranked out of the first ten with respect to the Asian dataset. These are replaced by the (3,5) and (3,9) pairs (the pair (3,5) is ranked in the top five and is reported in Figure 8(a)). This indicates that pairs of stripes that are modified by expression changes become less significant in discriminating between subjects; conversely, pairs including parts of the face more rigid with respect to expression variations (or at least whose descriptors change less) acquire more importance. For the Caucasian-v2 dataset, only the pair (6,8) among the top ten relevant stripe pairs on the Caucasian dataset is now ranked out of the first ten. This is replaced by the (3,4) pair, which is shown in Figure 8(b). Similar considerations to those reported for the Asian dataset apply to this case. In terms of verification accuracy, passing from the training FRGC v1.0 dataset to the test FRGC v2.0 dataset, in the all-vs.-all experiment, the TAR at 0.001 FAR drops from 98.7% to 90.3% using all the 135 features. In summary, the fact that the most relevant stripe pairs selected on the Asian-v2 and Caucasian-v2 datasets do not change substantially with respect to those selected ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

Distinguishing Facial Features for Ethnicity-Based 3D Face Recognition

45:17

Table V. FRGC v2.0: Most Relevant Stripe Pairs Ordered by Decreasing Relevance Scores for the Distance Vectors Datasets (a) Asian-v2 Region pair Relevance 91.22% (3,7) (4,5) 88.71% (5,9) 87.13% (3,5) 86.51% 85.72% (3,6) 85.14% (7,9) (5,8) 84.01% (7,9) 83.66% 83.01% (6,9) (2,9) 81.24%

(b) Caucasian-v2 Relevance Region pair 93.23% (4,5) 91.68% (2,4) 90.92% (3,5) 90.17% (3,4) 90.03% (5,8) 89.76% (4,8) 89.33% (5,7) 88.84% (6,9) 88.55% (4,7) 88.20% (5,9)

Fig. 8. The iso-geodesic stripe pairs that are included in the five top most relevant, performing feature selection on the Asian-v2 and Caucasian-v2 datasets and that were not shown in Figure 3.

on the Asian and Caucasian datasets demonstrates the robustness of the iso-geodesic stripes approach to nonneutral facial expressions. In addition, the fact that variations with the FRGC 2.0 datasets are more evident for Asians than for Caucasians is in accordance with the fact that the verification rates on the Caucasian-v2 dataset outperform those on the Asian-v2 dataset. 5.2.2. Comparative Evaluation. In Table VI, we reported the values of TAR at FAR = 0.001 with the FRGC v2.0 dataset for the all-vs.-all experiment (i.e., the verification experiment matching every FRGC v2.0 scan against every other) in comparison with the performance figures of state-of-the-art methods, as reported in Mian et al. [2007], Faltemier et al. [2008a], and Queirolo et al. [2010]. For our approach, the best value obtained for the All-v2 dataset is reported, showing an overall performance close to the best methods. In particular, a gap of about 6% and 3% with respect to the approaches in Queirolo et al. [2010] and Faltemier et al. [2008a] is observed, and a performance increase of about 3% with respect to the solution in Mian et al. [2007]. However, compared to these solutions, our approach has the main advantage in computational efficiency. In fact, in our approach, the main computational cost is moved to the SVM training phase which is performed offline, whereas face verification can be efficiently performed online (about 12 ms are taken to classify a probe model using non-optimized Matlab code on an Intel Core 2 Duo 2.2GHz processor with 2GB of memory). Instead, the best performing method [Queirolo et al. 2010] reports an average time of about 4 seconds4 in matching regions of two faces using simulated annealing for range image 4

Intel Pentium D 3.4 GHz processor with 1 GB of memory.

ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

45:18

S. Berretti et al. Table VI. FRGC v2.0: TAR at 0.001 FAR with the Overall FRGC v2.0 Dataset Approach Berretti et al. Faltemier et al. [2008a] Mian et al. [2007] Queirolo et al. [2010]

all-vs.-all 90.3% 93.2% 86.6% 96.5%

registration and the surface interpenetration measure as the similarity measure. Similarly, for the approach in Faltemier et al. [2008a], an average time of 7.5 seconds is reported for preprocessing and of 2.5 seconds for matching facial regions using an ICP-based matching (total of 10 seconds).5 These times can be acceptable for verification but far less for identification. Finally, the approach in Mian et al. [2007], uses a rejection classifier which quickly eliminates a large number of candidate faces at an early stage of recognition but then verifies remaining faces using an ICP-based region matching that it is known to be a time-consuming algorithm. In particular, 8 ms are reported as the time spent by the rejection classifier to compare a probe with a gallery scan6 ; no time is reported for the ICP matching of a probe with a nonrejected gallery scan. 6. DISCUSSION AND FUTURE WORK

In this article, two main original contributions have been presented. First, a feature selection approach has been experimented with in combination with an effective and efficient approach to 3D face recognition, thus permitting the evidence of the regions of the face that are most relevant for recognition. Then, feature selection results have been used to identify the facial features that are involved in the recognition of different ethnic groups and in the recognition under expression variations. These studies have not been previously addressed in the literature of 3D face recognition and can provide useful information for setting up more accurate recognition approaches. Future work will address the use of the region selection results to identify and classify different ethnic groups and then use this information to optimize the recognition results. REFERENCES A LMUALLIM , H. AND D IETTERICH , T. 2004. Learning with many irrelevant features. In Proceedings of the National Conference on Artificial Intelligence. 547–552. A RAUZO -A ZOFRA , A., B ENITEZ , J., AND C ASTRO, J. 2004. A feature set measure based on relief. In Proceedings of the International Conference on Recent Advances in Soft Computing. 104–109. B ERRETTI , S., D EL B IMBO , A., AND V ICARIO, E. 2001. Efficient matching and indexing of graph models in content-based retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 23, 10, 1089–1105. B ERRETTI , S., D EL B IMBO , A., AND PALA , P. 2006. Description and retrieval of 3D face models using iso-geodesic stripes. In Proceedings of the ACM International Workshop on Multimedia Information Retrieval. 13–22. B ERRETTI , S., D EL B IMBO , A., AND PALA , P. 2010. 3D face recognition using iso-geodesic stripes. IEEE Trans. Pattern Anal. Mach. Intell. 32, 12, 2162–2177. B OWYER , K. W., C HANG, K. I., AND F LYNN, P. J. 2006. A survey of approaches and challenges in 3D and multimodal 3D+2D face recognition. Comput. Vision Image Understand. 101, 1, 1–15. C HANG, K. I., B OWYER , K. W., AND F LYNN, P. J. 2005. An evaluation of multimodal 2D+3D face biometrics. IEEE Trans. Pattern Anal. Mach. Intell. 27, 4, 619–624. 5 Intel 6

Pentium IV 2.4GHz processor with 1GB of memory. Intel Pentium IV 2.3GHz processor with 1GB of memory.

ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

Distinguishing Facial Features for Ethnicity-Based 3D Face Recognition

45:19

C HANG, K. I., B OWYER , K. W., AND F LYNN, P. J. 2006. Multiple nose region matching for 3D face recognition under varying facial expression. IEEE Trans. Pattern Anal. Mach. Intell. 28, 6, 1695–1700. C OOK , J., C HANDRAN, V., AND F OOKES, C. 2006. 3D face recognition using log-gabor templates. In Proceedings of the British Machine Vision Conference. Vol. 2, 769–778. D AOUDI , M., TER H AAR , F., AND V ELTKAMP, R. 2008. Shrec contest session on retrieval of 3D face scans. In Proceedings of the Shape Modeling International Conference. D RIRA , H., B EN A MOR , B., S RIVASTAVA , A., AND D AOUDI , M. 2009. A riemannian analysis of 3D nose shapes for partial human biometrics. In Proceedings of the International Conference on Computer Vision. IEEE. E NLOW, D. AND H ANS, M. G. 1990. Facial Growth 3rd Ed. Saunders. FALTEMIER , T. C., B OWYER , K. W., AND F LYNN, P. J. 2008a. A region ensemble for 3D face recognition. IEEE Trans. Inform. Forensics Secur. 3, 1, 62–73. FALTEMIER , T. C., B OWYER , K. W., AND F LYNN, P. J. 2008b. Using multi-instance enrollment to improve performance of 3D face recognition. Comput. Vision Image Understand. 112, 2, 114–125. FARKAS, L. G. 1994. Anthropometry of the Head and Face. Raven Press, New York, NY. F U, H., X IAO, Z., D ELLANDREA , E., D OU, W., AND C HEN, L. 2009. Image categorization using esfs: A new embedded feature selection method based on sfs. In Proceedings of the 11th International Conference on Advanced Concepts for Intelligent Vision Systems. Lecture Notes in Computer Science, vol. 5807, 288–299. G UYON, I. AND E LISSEEFF , A. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182. H U, Y., F U, Y., T ARIQ , U., AND H UANG, T. S. 2010. Subjective experiments on gender and ethnicity recognition from different face representations. In Proceedings of the International Multimedia Modeling Conference. Vol. 5916/2010, 66–75. K AKADIARIS, I. A., PASSALIS, G., T ODERICI , G., M URTUZA , N., L U, Y., K ARAMPATZIAKIS, N., AND T HEOHARIS, T. 2007. Three-dimensional face recognition in the presence of facial expressions: An annotated deformable approach. IEEE Trans. Pattern Anal. Mach. Intell. 29, 4, 640–649. K OHAVI , R. AND J OHN, G. 1997. Wrapper for feature subset selection. Artif. Intell. 97, 1–2, 273–324. L ANGLEY, P. 1994. Selection of relevant features in machine learning. In Proceedings of the AAAI Fall Symposium on Relevance. 140–144. L U, X., C HEN, H., AND J AIN, A. K. 2006. Multimodal facial gender and ethnicity identification. In Proceedings of the International Conference on Advances in Biometrics. Vol. 3832, 554–561. M AURER , T., G UIGONIS, D., M ASLOV, I., P ESENTI , B., T SAREGORODTSEV, A., W EST, D., AND M EDIONI , G. 2005. Performance of geometrix activeid 3D face recognition engine on the frgc data. In Proceedings of the IEEE Workshop Face Recognition Grand Challenge Experiments. M IAN, A. S., B ENNAMOUN, M., AND O WENS, R. 2007. An efficient multimodal 2D-3D hybrid approach to automatic face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 29, 11, 1927–1943. M IAN, A. S., B ENNAMOUN, M., AND O WENS, R. 2008. Keypoint detection and local feature matching for textured 3D face recognition. Int. J. Comput. Vision 79, 1, 1–12. PAN, G., H AN, S., W U, Z., AND WANG, Y. 2005. 3D face recognition using mapped depth images. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Vol. 3, 175–181. P ENG, H., L ONG, F., AND D ING, C. 2005. Feature selection based on mutual information: Criteria of maxdependency, max-relevance and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 8, 1226– 1238. P HILLIPS, P. J., F LYNN, P. J., S CRUGGS, T., B OWYER , K. W., C HANG, J., H OFFMAN, K., M ARQUES, J., M IN, J., AND W OREK , W. 2005. Overview of the face recognition grand challenge. In Proceedings of the IEEE Workshop on Face Recognition Grand Challenge Experiments. 947–954. Q UEIROLO, C. C., S ILVA , L., B ELLON, O. R., AND S EGUNDO, M. P. 2010. 3D face recognition using simulated annealing and the surface interpenetration measure. IEEE Trans. Pattern Anal. Mach. Intell. 32, 2, 206–219. R IVALS, I. AND P ERSONNAZ , L. 2003. Mlps (mono-layer polynomials and multi-layer perceptrons) for nonlinear modeling. J. Mach. Learn. Res. 3, 1383–1398. S HAKHNAROVICH , G., V IOLA , P. A., AND M OGHADDAM , B. 2002. A unified learning framework for real time face detection and classification. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition. 14–21.

ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.

45:20

S. Berretti et al.

WANG, S., WANG, Y., J IN, M., G U, X., AND S AMARAS, D. 2006. 3D surface matching and recognition using conformal geometry. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. Vol. 2, 2453–2460. WANG, Y., C HIANG, M.-C., AND T HOMPSON, P. M. 2005. Mutual information-based 3D surface matching with applications to face recognition and brain mapping. In Proceedings of the IEEE International Conference on Computer Vision. 527–534. Z HAO, W., C HELLAPPA , R., P HILLIPS, P. J., AND R OSENFELD, A. 2003. Face recognition: A literature survey. ACM Comput. Survey 35, 4, 399–458. Z HONG, C., S UN, Z., AND T AN, T. 2007. Robust 3D face recognition using learned visual codebook. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 1–6. Z HONG, C., S UN, Z., AND T AN, T. 2009. Fuzzy 3D face ethnicity categorization. In Proceedings of the International Conference on Advances in Biometrics. Vol. 5558/2009, 386–393. Received April 2010; revised August 2010; accepted October 2010

ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 3, Article 45, Publication date: May 2012.