Facial Expression Space for Smooth Tele-Communications. Kohtaro Ohba, Takehito Tsukada, Tetsuo Kotoku and Kazuo Tanie

Facial Expression Space for Smooth Tele-Communications Kohtaro Ohba, Takehito Tsukada, Tetsuo Kotoku and Kazuo Tanie Mechanical Engineering Laboratory...
Author: Charlene Dean
2 downloads 1 Views 253KB Size
Facial Expression Space for Smooth Tele-Communications Kohtaro Ohba, Takehito Tsukada, Tetsuo Kotoku and Kazuo Tanie Mechanical Engineering Laboratory, AIST, 1-2 Namiki, Tsukuba, 305, JAPAN

fkohba,tsukada,toku,[email protected]

Abstract

In this paper, the facial expression is mainly focused to achieve the smooth tele-communications. The facial expressions has been considered as a most of the signi cant factors in tele-communications, such as tele-services. To realize the real time facial expression transportation system, we proposed the facial expression space (FES) and a correspondence technique between each personal facial expression spaces. Then, real time facial expression transportation system is developed, which transport the facial expression but not the image itself. This nal system is able to display the same facial expressions in another persons, further more in cartoon characters. The experimental results show the validity of these criteria. 1

Audio Visual

Human

(a) Telephone & TV Telephone

Human

Tactile Vision Audio

Introduction

Recently, the network based tele-communications is widely used as shown in Fig.1. Original telecommunication may be tele-phone, and recently TV telephone has been developed. To operate the robot in a long distance, tele-operation technique has been investigated. Tele-service is widely required for information systems and guiding systems. In other side, Email makes possible to communicate between two persons who are sitting in the di erent countries with text data. And WWW can distribute the text, image, and sound data to many persons at same time. But they are basically one connection between two points and have no simultaneous feedback. Furthermore, nowadays, to share the virtual world environments with several persons, not only two, can be obtained with VRML technology [1] to establish more realistic tele-communication with real time response, such as for tele-meeting, chatting, teleshopping, tele-museum, and other amusement purposes. In this virtual world, people can acts as a avator, and communicate and interact with other persons. But several problems, which have to be solved, has been discussed. Most of them is related to that the communication in this virtual world does not provide same feeling as that we can get in the real communication. Generally speaking, humans can communicate each other very uently in the usual life with no consideration about the way how to make communication. This communication is made up with a lot of ways, verbal and non-verbal ways, such as \language", \facial expression", \gesture", etc, as shown in Fig. 2.

Human

(b) Tele-operation

Robot

Audio Visual Map

Human (Unknown Person)

Map

(c) Tele-service

Human (Unknown Person)

Figure 1: Tele-communications. Basically, the ve organs of sense, \taste", \smell", \tactile", \auditory", and \vision" has been used freely on this uently communications with human. We have the fact that the 70-80 % of all emotion for human is depending upon the \vision". \Taste" and \smell" can be negligible in most of the situation. \Tactile" feedback has to be necessary in the case of that there is physical contact between human and human, or human and objects. Then, we would like to concentrate only on non-contact and non-verbal communication in this paper. Image is including a lot of information, but what we need for uent communication will be human's movement on face and body. Actually, there are many re-

Five Organs of Sense Vision Auditory Tactile Smell Taste

Verbal

λ

Language

Non-Verbal Face Expression Gesture

Human

Human

W

Figure 2: Communication. searches for facial expression with several algorithm [3]-[10]. The most important image emblem for communication seems to be \facial expression" and \gesture", but not the image itself [2]. In other words, image transportation system has been developed for TV telephone and monitoring system. But proposed system is focused on no image transportation, but facial expression transportation. Then, this system can realize to display the same facial expression with other persons, even though with cartoon characters. Recently, eigen space analysis has been familiar to the computer vision researchers for object recognition [11]-[17]. Pentland used this criteria for face recognition, referred \eigen-face" [11] [12] [13] [14]. This method is one of the principal components analysis (PCA) methods. This method can reduce the dimension of images without any consideration of any image features, and also extract the di erence of each images. In this paper, the real time tele-communication system based on facial expression is proposed using the eigen space technique. First facial expression space is proposed with the principle component analysis, and classify several facial expressions. Secondly, the corresponding technique between two image database is discussed. Finally, the facial expression transporting system show the validity of these criteria. 2

dimension of PCA

Figure 3: Eigen Values. network criteria [3], 3D face geometric modeling with a range nder[9], and color image analysis technique. But most of these methods have a limitation for real time applications. 3

Facial Expression Space (FES)

In face recognition research, the eigen space technique is widely used [11]-[14]. Basically, this technique has been used in the object recognition eld [15]. The main advantage of this technique is that the original dimension of images can be drastically reduced without considering the image feature, such as line or points. Then we would brie y review this eigen space technique in this section, and classify the facial expression images with this technique.

3.1 Principle Components Analysis (PCA)

Let M be the number of the images in a training set. Each image has been converted into a column vector zi of length N : [z1 ; z2 ; 1 1 1 ; zM ] :

Facial Expression

Facial expression is coming up with humans emotion. The research for categorizing of facial expressions has been discussed [2]. The expression is almost composed with six main emotions, such as \happiness", \sadness", \surprise", \disgust", \anger" and \fear". In the human society, there are micro-expression and macro-expression in the emotions, furthermore, there will be fake-expression. But this kind of expression can be seen in a moment, and may be negligible. The expression analysis method has been widely proposed by many researchers [3]-[10]. Some of these technique is related to extraction of the motion of nose, mouth, eye brows, and eyes with tracking algorithm [7] [10], optical ow [4], motion energy [6],

(1)

By subtracting the average image of the all images, we obtain the training matrix, Z

= [z1 0 c; z2 0 c; 1 1 1 ; zM 0 c] ;

(2)

where c is the average image, and the size of the matrix Z is N by M. The sample covariance matrix Q, N 2 N , is obtained as: T Q = ZZ : (3) This sample covariance matrix provides a series of eigenvalues i and eigenvectors ei(i = 1; 1 11 ; N ),

where each corresponding eigenvalue and eigenvector pair satis es: i ei = Qei :

smile

Wk = i=1 N X

surprise

normal

(4)

That is, matrix Q can be decomposed into N orthonormal components, of which the eigenvalues are i . Thus, each image set can be described by a set of eigenvectors with associated weight factors, i.e. eigenvalues. If the number of images M is much smaller than the number of pixels N, the implicit sample covari~ = ZT Z can be used instead of the ance matrix Q sample covariance matrix Q to calculate the rst M eigenvectors [17]. For the sake of memory eciency, we will ignore small eigenvalues and their corresponding eigenvectors using a threshold value, Ts :

k X

anger

alldata0812

3000

2000

1000

e

0

3 −1000

−2000

i

FES

 Ts ;

−3000 0.96

(5)

0.98 1 1.02

i

i=1 where k is suciently smaller than the original dimension N. From this reduced set of eigenvectors, the matrix E = [e1 ; e2 ; 1 11 ; ek ] is constructed to project an image, zi (dimension N ) into the eigen-space as an eigen point,  i (dimension k).  i = ET (zi 0 c): (6) This eigen-space analysis can drastically reduce the dimension of the images (N) to the eigen-space dimension (k) while keeping several of the most e ective features that summarize the original images.

3.2 Classi cation on Facial Expressions

In [2], the traditional classi cation on facial expression will be categorized into six emotions. But only main four emotions, \anger", \normal", \surprise" and \smile", will be used in this section to verify the validity of eigen space analysis in the analysis of facial expression. The original image data are projected into facial expression space with equation (6) with considering with the weights of the principle components. Fig.3 shows these weights of each components, and only three dimension is enough to reconstruct the 95% of original images. Fig.4 shows the results on classi cation of typical facial expression of one particular person. In this experiments, we used the SGI Indigo2 workstation to digitize images with the resolution of 243 2 320pixel 8bits black/white, and project them into a \facial expression space (FES)", i.e. eigen space. To reduce the e ect of human's head movements in image, we use the SONY CCD camera EVI-D30, which has the function of auto-tracking, auto-zooming, and auto-iris on a particular object with camera head servoing and color object tracking criteria in real time.

4

x 10

e2

1.04 1.06 1.08

−3000

−2000

−1000

1000

0

2000

3000

e

1

Figure 4: Classi cation of Facial Expressions with PCM. In this experiments, we make a request to subjects as \make your maximum anger face". There should be a degree of each facial expression to be considered, but to classify the facial expressions, obvious expressions are needed. Actually in the transportation of facial expressions mentioned later in section 5, any facial expressions can be treated with this FES. 4

Correspondence between Two Personal Facial Expression Spaces

In the previous section, the facial expression space have been proposed with the eigen space analysis. But we can understood easily that this space must be a personal, because of this facial expression space is, more ore less, including the informations which is make up with the personal facial characters. And again, main purpose of this paper is focused on no image transportation, but facial expression transportation. Then in this section, we would like to make a correspondence between the same facial expression in two FES with geometric transformation, ane transformation as follows, iB = RAB iA + T AB ;

(7)

where  A and  B depict the personal facial expression points in FES-A and FES-B, respectively. RAB and T AB are the ane parameters from personal facial expression points  A to  B , as shown in Fig.5.

The same facial expression points have some dispersion in each FES. To take account into this dispersion, the least square method has been used to calculate the ane parameters with equation (8). The twelve parameters in ane transform can be obtained with several sampling facial expression points' pairs, A and Z B ", \Z A B A \Zanger anger normal and Znormal ", \Zsmile B A B and Zsmile ", and \Zsurprise and Zsurprise ". Each person Zexpression is composed with 20 FES points. 3 2 A 2 3 Zanger B Zanger 7 A 6 6 ZB 7 3 6 Znormal 7 2 7 6 normal 7 = RAB T AB 6 Z A 6 smile 7 4 ZB 5 A smile 4 Zsurprise 5 B Zsurprise 1 (8) Figure6 show the typical transformation from one personal facial space to another, and ane parameter can be obtained as follows:

00:8464 03:7396

"

RAB =

"

T AB =

00:0964 00:8140

0:0568 1:3313 7:1058 0:0482 # 1:9726e + 04 3:7767e + 04 : 07:1745e + 04

e3

A

ζ1

; (9) (10)

ζ2

ζ2 B

ζ1 e1 e2

ζn

e1

FES-A

B

e2

ζn FES-B

Figure 5: Projection from FES-A to FES-B. 5

A

B

B

ζ i =E (zBi -c ) B

2000 1500 1000 500 0 −500 −1000

B

AB

A

AB

ζ i =R ζ i +T

−1500 −2000 4000 1.15

2000 1.1

0 1.05

−2000 −4000

FES-A

4

x 10

1

FES-B

Figure 6: Relation between Two Facial Images. The real time facial transportation system is make up with the three components as follows shown in Fig.7: [Projection]: One image in a image sequence is projected into the private FES-A with equation (6), and transform it into another private FESB with equation (7). [Transportation]: Then deliver this low dimensional data with network to another point.

B

A

A

A

#

0:4535

e3

A

ζ i =E (zAi -c )

Real Time Facial Expression Transportation System

In this section, real time facial expression transportation system is proposed. To show the validity of this facial expression transportation, we would like to realize to display the same facial expression with other persons. Up to this section, classi cation and correspondence of the major components of facial expression has been discussed. In this section, the name of facial expression, such as \happiness", \sadness", etc., is not important. Once the transformation from personal FES to another FES has been obtained in previous section, any facial expression can be projected into another space.

[Correspondence]: To make a corresponding face image in personal FES-B, the transported data is made a correspondence with a facial expression point in the personal FES-B, which has been already projected. Then display a corresponding image. In this process of correspondence, the quasi inverse matrix of ET can make the quasi image ~z from equation (6) as follows: T 01 ~ zi = (E )  i + c: (11) But because of the low resolution of the quasi image, we have to have the image database, which will be displayed, and projected all of these images into the FES-B before the transportation. In run time, what we have to do is just project the image into FES-A and transform it into FESB, and make correspondence with the FES database searching a minimum similarity data. Up to now, we are still under construction of this system. But the validity has been already evaluated on o -line system as shown in Fig.8. 6

Conclusion

In this paper, the facial expression space (FES) has been proposed with the principle component analysis criteria. And a corresponding technique between each

Transportation

AB A AB B ζ i =R ζ i +T

11 00

t

01 1010 10 10 t

t

11 00

Network

01 1010 10 10

FES-A FES-B

A

A

ζ i =E (zAi -cA)

FES-B

FES Database

Image Database

B B ζ i =E (zBi -cB)

Correspondence

Projection Image Output

Image Input Human

Human

Figure 7: Transportation of Facial Expressions.

Input Sequence

Output Sequence

personal FES has also mentioned. Experimental results show the validity of these criteria. In future works, we would like to nish the construction of this system soon, and execute and evaluate it in real time. In this process, we will have to evaluate the smoothness of this constructed image sequence, and also evaluate how people feels. References

Figure 8: A Sample of Correspondence of Facial Expressions.

[1] \VRML2.0 Speci cation Appendix C. Java Scripting Reference", VRML Architecture Group, 1996. [2] Paul Ekman and Wallace V. Friesen, \Unmasking the Face", Prentice-Hall, 1975 [3] K.Matsuno, Chil-Woo Lee, Satoshi Kimura, and Saburo Tsuji, \Automatic Recognition of Human Facial Expressions", ICCV'95, pp.352-359. [4] K.Mase, \Recognition of facial expressions from optical ow", IEICE Trans. Special Issue on Computer Vision and its Applications, E74(10),1991. [5] I.Essa, T.Darrell and A.Pentland, \Tracking facial motion", Proc. Workshop on Motion and Non-rigid and Articulated Objects, pp.36-42, 1994 [6] Irfan A. Essa and Alex P.Pentland, \Facial Expression Recognition using a Dynamic Model and Motion Energy", ICCV'95, pp.360-367, 1995. [7] A.Lanitis, C.J.Taylor and T.F.Cootes, \A Uni ed Approach to Coding and Interpreting Face Images", ICCV'95, pp.368-373, 1995.

[8] M.Rosenblum, Y.Yacoob and L.Davis, \Human emotion recognition from motion using a radial basis function network architecture", The Workshop on Motion of Non-rigid and Articulated Objects, pp.43-49, IEEE Computer Society, 1994 [9] Y.Yacoob and L.S.Davis, \Labeling of human face components from range data", CVGIPImage Understanding, 60(2), pp.168-178, 1994. [10] Michael J. Black and Yaser Yacoob, \Tracking and Recognition Rigid and Non-Rigid Facial Motions using Local Parametric Models of Image Motion", ICCV'95, pp.374-381, 1995. [11] M.Turk and A.Pentland, \Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp.71-86, 1991. [12] Matthew A. Turk and Alex P. Pentland, \Face Recognition Using Eigenfaces", Proc. CVPR 1991, pp.586-591, 1991 [13] Baback Moghaddam and Alex P. Pentland, \Face Recognition using View-Based and Modular Eigenspaces", Automatic Systems for the Identi cation and Inspection of Humans, SPIE Vol. 2277, 1994. [14] Stan Sclaro and Alex P. Pentland, \Model Matching for Correspondence and Recognition", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.17, No.6, pp.545-561, 1995. [15] Hiroshi Murase and Shree K. Nayar, \Visual Learning and Recognition of 3-D Objects from Appearance", International Journal of Computer Vision, Vol.14, No.1, pp.5-24, 1995. [16] Erkki Oja, \Subspace Methods of Pattern Recognition", Research Studies Press Ltd., 1983. [17] H. Murakami and V. Kumar, \Ecient Calculation of Primary Images from a Set of Images", IEEE Transactions of Pattern Analysis and Machine Intelligence, Vol.4, No.5, pp.511 - 515, 1982.

Suggest Documents