PRODUCTION AND PERCEPTION OF VOWEL CATEGORIES

Carre et al. Vowel categories PRODUCTION AND PERCEPTION OF VOWEL CATEGORIES René Carré 1, Willy Serniclaes 2 & Egidio Marsico 3 1 2 ENST-CNRS, Pari...
Author: Adela Williams
4 downloads 0 Views 90KB Size
Carre et al. Vowel categories

PRODUCTION AND PERCEPTION OF VOWEL CATEGORIES René Carré 1, Willy Serniclaes 2 & Egidio Marsico 3 1

2

ENST-CNRS, Paris, France LEAPLE CNRS, Villejuif, France 3 DDL-CNRS, Lyon, France [email protected]

ABSTRACT It is well known that isolated vowel perception is non-categorical: “if we conceive of a continuum between the ideals of categorical perception and continuous perception, we find that stop consonants are closer to the categorical ideal and vowels are closer to the continuum ideal” (Repp, et al., 1979). But, such a statement was based on perceptual tests with stable vowels and mainly for vowel continua located on the [ai] trajectory. In a preceding study (Carré, 2004), the area function space and the corresponding acoustic space were already shown to be dynamically structured following the predictions of the DRM model of speech production. Privileged formant trajectories were as follows: [ai], [ay], [au] for the [a, , e, i] [a, œ, ø, y] and [, , o, u] vowel sets respectively. The present study examines further evidence in support of the DRM model, based on data pertaining to the production and perception French vowels. Results will be discussed in terms of vowel representation as sub-products of vowel-to-vowel trajectories. INTRODUCTION In preceding paper (Carré, 2004), formant trajectories were automatically deduced from acoustic theory applied on a 18cm length tube. Two main criteria were used: maximum acoustic contrast and maximum formant variation for minimum area function deformation. This deductive approach yields places of articulation and phonological systems both corresponding quite well to those observed in speech production. The results obtained by the deductive approach can be modeled by the Distinctive Region Model (DRM) (Mrayati, et al., 1988). This model structures the acoustic and area function spaces; it displays privileged vowel trajectories in the F1-F2 plane, on which vowels can be represented. This means that, according to the deductive approach, vowel trajectories appear first, and then isolated vowels. This result is worth to study because it can be interpreted at the production and perception level: vowel trajectories could be observed directly or indirectly in production and represented as such in perception. Vowels could not be independent static exemplars but lie on these trajectories. In this paper, different measurements and perceptual experiments are proposed to study these hypotheses for French. The nasal vowels are not taken into account. VOWEL TRAJECTORIES The vowel trajectories intrinsic to an acoustic tube structured by the DRM model are represented in Figure 1. They structure in a simple way the acoustic space and are the results of the deformation gestures of the model. It should be noted that these gestures are closely correlated to the muscular effects of the genioglossus and styloglossus. The figure represents the phonological capabilities of the acoustic tube when this tube is efficiently deformed, i.e. when minimum area deformation involves maximum acoustic variations. The phonological

From Sound to Sense: June 11 – June 13, 2004 at MIT

C-7

Carre et al. Vowel categories

capacities are in terms of trajectories, not in terms of static privileged locations. Vowel categories lie on vowel trajectories in the F1-F2 plane and are then the consequences of vocalic trajectories. The phonological trajectories are obtained by one or two phonological gestural deformations of the model: one constriction gesture (with consequences in terms of cavities) and one lip gesture. In French, 3 trajectories are selected among the set of possible trajectories: [ai] obtained with only one deformation gesture controlling the main constriction from back to front by transversal control; [au] (or [u]) also with only one deformation gesture controlling both the place of the constriction and the degree of lip opening; and [ay] which can be described as [ai] gesture plus labial gesture or as only one deformation gesture controlling both the place of the constriction and the degree of lip opening.

F1-F2 Plane 3000

F2 (Hz)

[i] [y]

2000

[e] [ø]

[] [] [] []

1000

[u]

[o]

[] [œ] [] []

[a] []

0 0

200

400 600 F1 (Hz)

800

1000

Figure 1. Vocalic trajectories obtained with the DRM model with the corresponding vowels. Dotted lines are labialised trajectories. The set of vowels obtained with these three trajectories are [a, , e, i]. [, , o, u], and [a, œ, ø, y]. VOWEL PRODUCTION Two different sets of results are examined here in relation with trajectory components: one on formant characteristics pronounced in isolation and the other on vowel sequences in spontaneous speech. Isolated vowel characteristics The 11 French oral vowels were pronounced, randomly, with computer control according to the following example: “As in the sentence: Voilà le lapin, say: lapin, la, a”. Four French subjects (3 males and 1 female--sm) took part in the experiment. The first two formants of the isolated vowels were measured during all the production of these quasi stable vowels. The formant variations of each of the vowels are represented separately for each subject in Figure 2. The formant variations are fairly similar and privileged directions are observed pointing generally to

From Sound to Sense: June 11 – June 13, 2004 at MIT

C-8

Carre et al. Vowel categories

[a] as hypothesized. Individual differences can be observed: For (rc) the trajectories are more stable than for the other three speakers and the trajectory [y, ø, œ] .points to [] instead of [a]. (em & sm) do not produce separated [a, ]. (sm) does not produce separated [ø, œ].

1500 1000

/u/

500

//

/o/

2500

/a/ //

/e/

/y/

1500

/a/ //

//

/o/

/u/

// /œ/

/ø/

1000 500

0

0

0

500 F1 (Hz)

1000

Isolated Vowels (ws) /i/ /e/ // /y/ /ø/ /a/ /œ/

2000 1500 1000 500

/u/

/o/

//

0

2500

//

2000

/ø/

1500 1000

/u/

500

0

500 F1 (Hz)

1000

Isolated Vowels (sm) /i/ /e/ // /y/

3000

F2 (Hz)

2500

F2 (Hz)

/i/

2000 F2 (Hz)

2000 F2 (Hz)

Isolated Vowels (em)

Isolated Vowels (rc) /i/ /e/ // /y/ /ø/ /œ/

2500

/œ/ /o/

//

/a/ //

0 0

500 F1 (Hz)

1000

0

500 F1 (Hz)

1000

Figure 2. Representation in the F1/F2 plane of the vowel formant frequencies for four subjects. The formant variations correspond to the evolution from the beginning to the end of only one production. Table 1. Number of V1V2 extracted from V1CV2 successions in spontaneous French production. [a]: 8515 [ea]: 5525 [e]: 5068 [ii]: 4978 [ee]: 4686 [ia]: 4667

[ae]: 4612

[aa]: 4584

[ie]: 4459

[ai]: 4283

[e]: 4120

[a]: 3833

[ei]: 3766

[]: 3361

[i]: 3296

[e]: 3119

[e]: 3018

[i]: 3011

[a]: 2907

[e]: 2892

Vowel successions The structuring in terms of vocalic gestures can also be observed from speech production: assuming syllabic coproduction (Kozhevnikov and Chistovich, 1965; Öhman, 1966; Fowler, 1992), V1 and V2 should be somehow related in V1CV2 utterances. Table 1 shows that the

From Sound to Sense: June 11 – June 13, 2004 at MIT

C-9

Carre et al. Vowel categories

most frequent V1V2 successions in spontaneous French production lie on the [ai] trajectory (Carré, et al., 1995). PERCEPTUAL EXPERIMENTS Experiment 1. Categorical vowel perception Categorical perception on both [ie] and [iy] continua were compared. In the F1-F2 plane, 8 equal intervals in Bark scale were defined: [ie] (3 intervals) and [iy] (5 intervals). The corresponding synthetic signals were used for identification (25 presentations for each sound) and discrimination (25 presentations – distance maximum between two items: + or – one interval). The four preceding subjects took part in the experiment. Figure 3 shows the results: for (em & sm) the observed discrimination scores are close to those predicted from the labeling data, thereby indicating that are almost fairly categorical; the two other subjects (rc & ws) exhibit different degrees of over-discrimination. Notice that the amount of over-discrimination tends to be larger between /i/ and /y/ than between /i/ and /e/. Another perceptual difference between the two continua, is that the /i-e/ phonemic boundary is fairly stable across individuals (as indicated by the location of the predicted discrimination peaks, around S9 in Fig.3), whereas the /y-i/ boundary varies between S4-S5 and S6-S7 (Fig.3). /yie/ Categorical Perception (em) 100%

0 S1

S2 S1

/yie/ Categorical Perception(ws)

S3

0 S1

S8

S9

S7

S6 S5

S4 S3

S1

S2

0%

Pred.

S8

20%

Obs.

S9

Pred.

S6

Obs.

40%

S7

60%

80% 60% 40% 20% 0% S4

80%

S5

/yie/ Categorical Perception (rc) 100%

/yie/ Categorical Perception (sm) 100% 80%

100% 80%

Obs.

60% 40%

Pred

Pred.

S8 S9 S1 0

S7

S6 S5

S4 S3

S2 S1

0 S1

S9

S8 S7

S6 S5

S4

0%

S3

S2

Obs.

40% 20%

20% 0%

S1

60%

Figure 3. Predicted and observed categorical perception between isolated /y, i, e/. Experiment 2. Perceptual vowel representation The four preceding subjects took part in a perceptual experiment in which they had to find out with the use of a formant synthesizer the best vowel exemplars of the 11 French oral vowels by

From Sound to Sense: June 11 – June 13, 2004 at MIT

C-10

Carre et al. Vowel categories

adjusting F1 and F2 in a graphical F1-F2 plane representation (Johnson, et al., 1993). These vowels were produced with a preceding neutral vowel (F1=500, F2=1500, F3=2500Hz). The durations of the neutral vowel, the transition and the vowel to adjust were respectively 100, 100 150 ms. Twenty five sets of values were obtained and are represented in Figure 4 (in Bark scale). Most of the vowels stand on the two main trajectories [ai] and [au] but it is not the case for (sm). The labialised vowels [y, ø, œ] do not however lie on the [ay] trajectory. This result can be explained by the fact that two phonological gestures contribute to the [ay] gesture: the [ai]constriction gesture and the labial gesture, thereby generating two dimensions of variability. 20

20

Best /V/ (rc)

15

Best /V/ (em)

F2 (Bk)

F2 (Bk)

15

10

10

5

5 0

5 F1 (Bk)

20

0

10

20

Best /V/ (ws)

15

5 F1 (Bk)

10

Best /V/ (sm)

F2 (Bk)

F2 (Bk)

15

10

10

5

5 0

5 F1 (Bk)

10

0

5 F1 (Bk)

10

Figure 4. Representation in the F1-F2 plane of the best vowel exemplars for 4 subjects.

From Sound to Sense: June 11 – June 13, 2004 at MIT

C-11

Carre et al. Vowel categories

DISCUSSION Our results show overall that there are definite trends for vowel categories, both in production and perception, to follow the trajectories deduced from acoustic theory and for vowel boundaries in perception to be more stable when located along these trajectories. The production data (Fig.2) show that, for the four subjects, the isolated vowels vary along directions pointing to either [a] or []. At the perception level, Experiment 1 shows that perception is more categorical and the phonemic boundary is more stable for a continuum following a gestural trajectory, namely [i-e] located along the [i-a] trajectory, than for one located aside, namely [i-y] continuum which is located outside the [i-u] trajectory. Experiment 2 results indicate that, for three among the four subjects under study, the vowel category vary along the predicted [ai] and [au] trajectories. The results of the 4th subject (sm) are much less clearcut, which might be explained by the fact that she is bilingual speaker (Arabic French). Notice however, that this deviant behavior is only found for perception. Another difference between production and perception is that, for all subjects taken together, the labial vowels follow the [ay] trajectory at the production level but are more dispersed at the perception level. Finally, still another discrepancy between the productive and perceptual representation of vowel categories lies in the size of the vowel triangle, which is larger in the perceptual representation (compare Figs. 4 and 2), as already observed by Johnson et al. (1993). To summarize, these preliminary results show that, at production and perception level, the [ai] and [au] trajectories have a specific role which is has now to be further characterized. It is necessary to increase the number of subjects to bring out the subject characteristics from the main behavior. ACKNOWLEDGEMENTS This work is partly financed by a grant from "Action concertée Incitative: Complexité" of the French Ministry of Research REFERENCES Carré, R. (2004) From acoustic tube to speech production, Speech Communication, 42, 227240. Carré, R., Bourdeau, M. & Tubach, J. P. (1995) Vowel-vowel production: the distinctive region model (DRM) and vocalic harmony, Phonetica, 52, 205-214. Fowler, C. A. (1992) Phonological and articulatory characteristics of spoken language, Haskins Laboratory Status Report on Speech Research, SR-109/110, 1-12. Johnson, K., Flemming, E. & Wright, R. (1993) The hyperspace effect: Phonetic targets are hyperarticulated, Language, 69, 505-528. Kozhevnikov, V. A. & Chistovich, L. A. (1965) "Speech, articulation, and perception," JPRS30543. NTIS, US Dept. of Commerce. Mrayati, M., Carré, R. & Guérin, B. (1988) Distinctive region and modes: A new theory of speech production, Speech Communication, 7, 257-286. Öhman, S. (1966) Coarticulation in VCV utterances: spectrographic measurements, J. Acoust. Soc. Am., 39, 151-168. Repp, B., Healy, A. F. & Crowder, R. G. (1979) Categories and context in the perception of isolated steady-state vowels, Journal of Experimental Psychology: Human Perception and Performance, 5, 129-145.

From Sound to Sense: June 11 – June 13, 2004 at MIT

C-12

Suggest Documents