Lip-pellet positions during vowels and labial consonants

Journal of Phonetics (1997) 25, 405— 419 Lip-pellet positions during vowels and labial consonants John R. Westbury and Michiko Hashi Waisman Center a...
Author: Belinda Hopkins
6 downloads 1 Views 265KB Size
Journal of Phonetics (1997) 25, 405— 419

Lip-pellet positions during vowels and labial consonants John R. Westbury and Michiko Hashi Waisman Center and Department of Communicative Disorders, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, WI 53705—2280, U.S.A. Received 15th July 1996, and in revised form 8th April 1997

Sagittal-plane movements of small markers attached to the upper and lower lips were analyzed for ten speakers of American English, and seven speakers of Japanese. Each speaker produced simple utterances containing vowels and labial consonants. The data were analyzed to better understand: (1) patterns of pellet motions associated with labial consonant production; (2) pellet positions at discrete, acousticallydefined moments during selected speech sounds; and (3) the relationship between midline separation between the lip surfaces and inter-lip-pellet distance. Results from the study provide qualitative information about the dynamics of labial gestures for consonants involving lip closure. The data also indicate that the English and Japanese speakers positioned and moved their lips in generally similar ways during the test sounds analyzed. Finally, results suggest that plausible estimates of mid-line inter-lip separation can be derived from the trajectories of two pellets, one on each lip, as long as the possibility of lip-body deformation is taken into account. ( 1997 Academic Press Limited

1. Introduction Houde (1968) introduced the phrase ‘‘point-parameterized’’ to refer to records of speech movement that take the form of trajectories of discrete, point-like ‘‘markers’’ (e.g., pellets, coils, light-emitting diodes, reflecting disks, or even bony landmarks) on an articulator. Several modern techniques for studying speech movement provide data of this type for accessible articulators such as the tongue, lips, jaw, and soft palate. For example, the X-ray microbeam (e.g., Macchi, 1988), electro-magnetic articulometry (e.g., Perkell, Matthies, Svirsky & Jordan, 1993), and certain opto-electrical techniques (e.g., Boyce, 1990), have all been used to examine and describe speech-related motions of the lips in terms of sagittal-plane coordinates of the centers of small markers, one firmly attached in the mid-line at the vermillion border of each lip.1 The lips pucker and spread and rise and fall during speech, as talkers shape the inter-lip cavity to modify the spectrum of sound radiating from the mouth. In the best of all worlds, a complete understanding of these actions might be inferred from a sagittal-plane representation of the motions of two fleshpoints. However, more must be known about the relationship between lip-marker 1 A recent report by Ramsey, Munhall, Gracco & Ostry (1996) describes three-dimensional fleshpointkinematic data, recorded from a single talker, using multiple lip markers tracked by means of an optoelectrical system. Those data provide a more complete view of the actions of the lips than the more conventional two-marker, sagittal-plane view considered in this report. However, inferences about size and shape of the lip opening, from both types of data, will probably be constrained in similar ways. 0095—4470/97/040405#15 $25.00/0/jp970050

( 1997 Academic Press Limited

406

J. R. ¼estbury and M. Hashi

positions, and the nature and degree of labial constrictions, before broad inferences about lip action can be drawn. A handful of direct-imaging studies of lip function (e.g., Fujimura, 1961; Fromkin, 1964; Linker, 1982; Abry & Bo¨e, 1986) provide information about time changes in size and shape of the lip opening during speech. However, these studies do not tell us what discrete point motions reveal about labial articulation. An indirect way to begin to address this question is to examine trajectories of lip markers during speech events that limit lip opening. The labial consonants /p b m/ are events of this type. During their closures, no air exits the mouth, because the lips are fully together and form a tight seal. For this reason, we might expect the positions of lip markers, and the distances between them, to vary little during consonantal closure intervals. The analysis of lip-marker kinematics described in this report was designed, in part, to address these simple expectations, and in general, to provide an improved understanding of two broader topics. The first relates to how midline labial fleshpoints move as lip closures are formed and released for the labial consonants /p b m/. The second relates to the question of whether reliable conclusions about labial constrictions are possible from something so simple as the sagittal-plane trajectories of one marker on each lip. A practical benefit associated with a better understanding of the articulatory significance of lip-marker positions has to do with representing vocal tract postures for vowels. It is common to think of the vocal tract as a flexible tube, and to define the articulatory posture at any moment during speech in terms of an area function which represents the cross-sectional area of the tube as a function of its length, from the glottis to the lips. A plausible labial termination can easily be ‘‘attached’’ to a tube approximation of the

Figure 1. A stylized tracing of a midsagittal section of the vocal tract. Hypothetical pellets are represented by small solid circles ‘‘attached’’ to the outlines of both lips. Letter symbols and lines and defined in the text.

¸ip-pellet positions during selected sounds

407

vocal tract if length and degree of the lip constriction are simply related to the measured positions of upper and lower-lip markers. A sketch shown in Fig. 1 suggests a scheme for doing this. To a first approximation, constriction length might be represented by the length of a line segment such as A, drawn tangent to the lower-most edges of the maxillary incisors, and perpendicular to the segment connecting upper and lower-lip marker positions. Constriction degree reflected by the midsagittal separation between the lip surfaces, analogous to B, might then be represented by the distance C between the two markers, minus some reference measure of the lips’ combined thickness. A ‘‘candidate’’ reference measure might be some distance between markers when the opposing lip surfaces are in contact (e.g., during closure for a labial stop).

2. Methods and materials Acoustic and labial fleshpoint kinematic data from a sample of 10 speakers of American English, and seven speakers of Japanese, were analyzed. Data from these speakers were not collected specifically for this analysis. Instead, they were available from existing corpora, in which the sound pressure wave had been sampled 21,739 times/s, while each of the speakers produced single examples of /p b m/ in isolated /" — "/ frames, with primary stress on the second vowel; and, single, isolated examples of the five vowels /i 2 " o u/2. During each brief speech task, sagittal-plane positions of upper-lip (UL) and lower-lip (LL) markers (gold pellets, 2.5 mm in diameter, attached in the midline at the vermillion border), were recorded at 40 and 80 times/s, respectively, using the X-ray microbeam (XRMB) system at the University of Wisconsin3. Pellet positions were expressed relative to cranial axes, defined relative to each speaker’s maxillary occlusal plane (MaxOP) and central maxillary incisors (CMI), according to conventions described elsewhere (Westbury, 1994a). Materials for the English speakers were drawn from the publicly available XRMB Speech Production Database (Westbury, 1994b) [speakers E29 (female), E31(f ), E34(f ), E35(f ), E41 (male), E44(m), E53(m), E54(f ), E59(m), E61(m)].4 Materials for the Japanese 2 Throughout the text of this report, the mid-front, low-back, and high-back vowels of Japanese are transcribed phonemically as /2 " u/, respectively. These symbols are correct for the English vowels, but possibly misleading for Japanese. Sources cited by Vance (1987) suggest that the mid-front vowel in Japanese may be phonetically ‘‘midway’’ between [e] and [2]; the low-back vowel midway between [a] and ["]; and the high-back vowel closer to [ɯ]. In Fig. 5 the symbols e and a are used to represent the mid-front and low-back vowels produced by both sets of speakers. This usage is due to graphical limitations imposed by plotting software. 3 The XRMB system sometimes fails to track one or more pellets for some or all time samples spanning a speech task. Examples of /m b/ were lost to mistracking for one Japanese speaker (J5). Consequently, only 49 VCV utterances were available for analysis, rather than the 51 that would be expected given one example each of /p b m/ in an [" — "] frame, for seventeen speakers. Two time samples of UL pellet position were also lost to mistracking for one English speaker (E61), during the release gesture for /m/. However, sufficient residual information was available in the latter case to approximate the two missing time samples inthe trajectory, using a simple linear interpolation scheme. 4 Identification numbers for English speakers are the same as in the publicly available XRMB Speech Production Database, while the speech tasks correspond to vcv and vowel records from that corpus. Waveforms analyzed in the current study were processed and filtered in a slightly different way from those in the public release of the XRMB corpus, though these differences have no material effect on relative magnitudes of positional measures. Materials from these XRMB Database speakers, tasks, and/or waveforms may be used for similar or other purposes, in future work by other investigators, and it may become important to know which of the relevant materials were included in the current analysis. The 17 English and Japanese speakers included in this study are the same group also described in an independent analysis of vowel configurations (Hashi, Westbury & Honda, 1994).

408

J. R. ¼estbury and M. Hashi

speakers (three males and four females) were drawn from a smaller, independently collected data set. For the Japanese dataset, utterances containing /p b m/ embedded in /" — "/ frames were written in katakana, and read aloud as meaningless words with primary accent on the second /"/. Each isolated vowel production, also read aloud, was prompted by a single katakana letter corresponding to the sound. Speakers in both samples were healthy and neurologically normal, with no evidence of speech pathology. Moreover, none was younger than 18 years, and all were native speakers of their language group. All English speakers had spent their linguistically

Figure 2. Representative data from /"m"/, produced by one English speaker (E29). Time histories are shown for lip-pellet separation D (top panel); lip-pellet coordinates (e.g., ULx, lower right); and the sound pressure wave. Sagittal-plane trajectories of upper and lower-lip pellets are shown in the lower left panel. Vertical lines intersecting time histories represent ‘‘event’’ times associated with judged moments of consonantal closure and release (t and t , respectively), and # 3 the time of minimum distance between labial pellets (t ). .*/

¸ip-pellet positions during selected sounds

409

Figure 3. The sound pressure wave, and corresponding time histories of upper and lower-lip pellet speed, during /"p"/ produced by one English speaker (E31). All histories are intersected by vertical lines corresponding to the moments t and t . # 3 Shorter vertical lines, labeled t and t , intersect UL and LL speed histories, to #4 34 indicate the times of occurrence of local maxima in pellet speed, just before t , just # after t . 3

formative years as residents of Wisconsin, and were essentially monolingual. Six of the seven Japanese speakers spoke a Tokyo dialect, while one (J5) spoke an Ibaragi dialect. J5 was included in the Japanese sample because his vowel formant frequencies were essentially like those of the Tokyo-dialect speakers. All Japanese speakers were nominally bilingual, but variably proficient in English. In general, data from the two groups were elicited, recorded, and processed according to procedures described in a handbook for the XRMB Database (Westbury, 1994b). Several measurements were made from each VCV token, and from each isolated vowel, produced by each speaker. Measurements from representative VCV tokens are illustrated in Figs. 2 and 3. For each consonant in VCV tokens (e.g., Fig. 2), abrupt spectral changes in the acoustic wave associated with closure and release moments were identified and marked as t and t , respectively (corresponding to vertical lines in Fig. 2, # 3 headed by unfilled circles 1 and 2). A time of minimum distance (t ) was marked at the .*/ moment when the separation between pellets was smallest during the [t , t ] interval. # 3 Moments corresponding to local maxima in pellet speed5 preceding t and following t # 3 (e.g., Fig. 3), were identified for both pellets as t and t , respectively. In each isolated #4 34 5 Following conventions that are customary in mechanics, the term speed is used to refer to the scalar magnitude of the velocity vector (dx/dt, dy/dt) that can be defined at each position sample for each pellet.

410

J. R. ¼estbury and M. Hashi

vowel, an acoustically-defined ‘‘steady-state’’ moment (t ) (Hashi, Westbury & Honda, 7 1994) was identified as the midpoint of the interval within which neither F nor 1 F changed more than 40 Hz over any 20 ms interval. Sagittal-plane coordinates for both 2 lip pellets, and the distance between them (hereafter, D), were determined at the four moments Mt , t , t , t N. Speed values for both pellets were determined at each of their # 3 .*/ 7 respective moments Mt , t N. #4 #3 For reference purposes, first and second formant frequencies (F and F ) were meas1 2 ured at t during each isolated vowel, using an LPC-based formant tracking algorithm in 7 Cspeech (Milenkovic & Read, 1992), with the number of LPC coefficients set to 24. The formant-tracking measurements were supplemented by independent spectrographic measurements, involving bandwidths of 300 and 500 Hz for male and female speakers, respectively. Measurements based upon formant tracks were accepted when differences between measurements were within 30 Hz for F , and 60 Hz for F . Spectrographic 1 2 measurements of F and/or F were accepted when measurements differed by more than 1 2 the relevant criterion. Scatterplots of F and F frequencies at t are shown in Fig. 4, for 1 2 7 both speaker samples.

3. Results 3.1. ¸ip-pellet motions accompanying labial closure and release for /p b m/ Close inspection of data from all speakers revealed no systematic differences in pellet movements associated with the three different labial consonants, and relatively few differences in lip-pellet motions produced by English and Japanese speakers. During all utterances containing /p b m/, both pellets moved toward one another before t , and # away from one another after t . No speaker moved only one lip when making these 3 sounds. At least one lip pellet also continued to move throughout each closure produced by each speaker. Movements of the UL pellet during the interval surrounding [t , t ] usually covered # 3 distances no larger than 5 mm, and tended to be about twice as large in the y-direction (normal to the maxillary occlusal plane) as in the x-direction (parallel to the occlusal plane). UL trajectories were sometimes looping and partly elliptical, though on the whole there was no stereotyped movement pattern. Movements of the LL pellet during the interval surrounding [t , t ] followed paths # 3 that were initially upward and slightly forward, and then downward and rearward covering distances of roughly 20 mm from or toward positional extrema associated with the adjacent, low back vowel /"/. In general, LL trajectories were either line-like, in which the approach and retreat segments, toward and away from the local positional maximum at t , fell along roughly the same path; or, they were shaped something like an inverted .*/ letter Y, V, or U, in which the approach path was often steeper than, and always forward of, the retreat path. As a rule, the LL pellet began moving toward closure, synchronously in the x and y directions, about 75—100 ms before t . In the majority of cases, both # coordinate histories6 tended to reach their local maxima at about the same time, rarely 6 The term history is used to refer to any time-series record of a measured or derived quantity (e.g., the location of a pellet in either the x or y direction of the sagittal plane; or, the speed of a pellet, as it moves along its respective sagittal-plane trajectory). The term trajectory is reserved to refer to the path traced by a pellet moving in a plane.

¸ip-pellet positions during selected sounds

Figure 4. Scatterplots of frequencies of first and second formants (F and F , respectively) at t , for English and Japanese speaker samples. The 1 2 7 vowels /i 2 " o u/ are represented by unfilled circles, upright triangles, squares, inverted triangles, and diamonds, respectively. Identification numbers for speakers are shown inside their respective symbols.

411

412

J. R. ¼estbury and M. Hashi

more than 75 ms after t . Then, the pellet moved smoothly down and back, through the # release moment. An inflection occurred at about t in the majority of UL and LL trajectories. This # inflection is too small to be seen clearly in the UL trajectory shown in the lower left panel of Fig. 2, but corresponds to subtle ‘‘direction’’ changes in the UL x-and y-coordinate histories in the vicinity of t , in the lower right panel of the same figure. The closure# related inflection in the LL trajectory most often corresponded to a local change in the slope of the pellet’s y-coordinate history immediately before and after t , (cf. the lower # right panel of Fig. 2). The ‘‘bend’’ or ‘‘knee’’ near the top of the ascending leg of the LL trajectory, beginning at about the unfilled circle 1 marking t , illustrates the phenom# enon. Both pellets speed up as they approach t ; slow down during the [t , t ] interval; and # # .*/ then, speed up again as they move toward and through the moment t . This stereotyped 3 pattern is illustrated in Fig. 3, in which local maxima in speed histories derived from both the UL and LL trajectories occur shortly before t , and shortly after t . Descriptive # 3 statistics for times of occurrence of these maxima, relative to t and t for both pellets, and # 3 computed across sounds, talkers and language groups, are shown in Table I. On average, the local maximum in UL speed prior to t occurred slightly before the local maximum in # LL speed also preceding that moment. Conversely, the local maximum in UL speed after t occurred slightly later than the local maximum in LL speed also occurring after that 3 moment. The UL and LL pellet speeds at their respective t and t moments, averaged #4 34 across sounds and talkers but within each language group, are shown in Table II. Several generalizations within and across groups are possible from these data, though it is important not to overstate their significance in view of the small sample sizes. The strongest of these generalizations relates to the fact that speeds for the UL pellet were noticeably lower than for the LL pellet, at either t or t . Other generaliz#4 34 ations include the fact that: (1) pellet speeds were greater among Japanese than English talkers, at each measurement time except t ; (2) among English talkers, the maximum 34 LL speed prior to t was systematically lower than the maximum LL speed after t # 3 (cf. Sussman, MacNeilage & Hanson, 1973), while the UL pellet speeds at the two moments were about the same; and (3) among the Japanese talkers, maximum pellet speeds were higher before t than after t , though the relevant distributions were # # overlapping. 3.2. ¸ip pellet positions at t , t , t , and t # 3 .*/ 7 Scatterplots of average lip-pellet positions (computed across speakers, within each language sample)7, at closure-related moments for consonants (t , t , and t ), and # 3 .*/ 7 Speakers’ lips differ in size and shape. Morphological differences may account for certain between-group differences in data described in this report. For example, apparent group differences in linear slopes and y intercepts relating LL pellet-position coordinates, shown in Fig. 5, might just as easily represent a ‘‘morphological artifact’’ associated with the specific speakers included in either group, as a global distinction between the two languages. Similarly, the greater pellet separations for Japanese than English speakers, at t for 7 matched vowels (cf. Table III), were probably due to a general group difference in lip thickness. When morphological differences between talkers are extreme, and/or the directions and relative magnitudes of variable effects covary with morphology, some procedure for speaker normalization may be necessary before sample statistics can be calculated across speakers. Nothing about data summarized in this report suggested any strong argument for normalization. Consequently, data from individual speakers were merely ‘‘added together’’ before statistical calculations.

¸ip-pellet positions during selected sounds

413

TABLE I. Statistics for relative times of occurrence of local maxima in UL and LL speeds. Event times associated with t and t are expressed relative to t #4 34 # and t , respectively. Thus, a negative number indicates that t (or t ) occurred 3 #4 34 before t (or t ). Conversely, a positive number indicates that t (or t ) followed # 3 #4 34 t . Data are pooled across consonants, speakers, and language groups. Meas# ures are in ms t re/t #4 #

UL (n"49) LL (n"49)

t re/t 34

x6

p

Range

x6

p

Range

!28 !21

13 12

(!62, !7) (!56, 0)

26 14

26 9

(!28, 72) (!3, 36)

TABLE II. Statistics for the magnitudes of (maximum) pellet speed at t , and #4 at t . Measures are in mm/s 34 Closing speed (at t ) #4

UL (n"19) J UL (n"30) E LL (n"19) J LL (n"30) E

Opening speed (at t ) 34

x6

p

Range

x6

p

61 34 194 140

21 14 47 44

(25, 96) (15, 74) (106, 300) (61, 246)

45 33 178 180

16 12 29 51

Range (21, (15, (118, (103,

77) 75) 238) 281)

‘‘steady-state’’ moments for vowels (t ) are shown in Fig. 5. ºp, toward the top of the head 7 (along lines perpendicular to the maxillary occlusal plane, MaxOP), is toward the top of each panel in the figure, while forward, in the direction toward the face (expressed relative to the central maxillary incisors, CMI), is toward the right. Average sagittal-plane pellet positions for vowels are indicated by filled, labeled circles, while those for consonants are indicated by labeled, unfilled symbols (squares and circles). Several generalizations can be drawn from data illustrated in Fig. 5. One simple rule of thumb is that average UL position was about the same for all three consonants, at t , t , # 3 and t , within each speaker sample. This result is illustrated by the cluster of overlap.*/ ping positions, and (appropriately) illegible labels, for the unfilled squares and circles in the upper (UL) portions of the right and left panels of the figure. Within each language sample, average LL position was also about the same for different consonants measured at the same moment, but differed according to the moment at which measurements are made, being highest at t (indicated by a single unfilled circle, labeled mn); somewhat .*/ lower at t (the higher triple of labeled, unfilled squares) and lower still at t (in each panel, # 3 the lower triple of labeled, unfilled squares). Because the lips were separated during vowels, average UL positions were higher for vowels than for consonants and average LL positions were generally lower for vowels than consonants, though average LL position for /u/, within both language samples, fell within the lower portion of the range of LL positions observed during the consonants. Data plotted in Fig. 5 also show that the UL pellet was systematically further forward at t for /o u/ than for /i 2 " /, in both language samples. For the LL pellet, within both 7 speaker samples, sounds and moments associated with the highest positions (e.g., /u/) also exhibited relatively anterior positions. The highest LL position, at t , was also the .*/ most forward. The lowest LL position, at t for /"/, was also the one furthest back. The 7

414

J. R. ¼estbury and M. Hashi

Figure 5. Average pellet positions, computed across speakers within groups, at Mt , t , t , t N. Open squares (e.g., labeled pc, bc, mc, br and mr) indicate pellet # 3 .*/ 7 positions at closure and release for /p b m/. Open circles (labelled mn) indicate average position at t , computed across all three consonants. Average positions .*/ at that moment differed so little among the three consonants that it is impractical to use separate labeled symbols. Solid circles (labeled i, e, a, o, and u and for /i 2 " o u/, respectively) indicate average positions at t . 7

high, positive, correlation coefficients, shown in the same illustration, indicate that average x and y coordinates for the LL pellet were strongly related within language groups. 3.3. ¸ip-pellet separations during consonants and vowels Fig. 2 includes the derived history of inter-lip-pellet distance D, shown above the speech wave. The intersections between vertical lines indicating closure and release moments

¸ip-pellet positions during selected sounds

415

TABLE III. Pellet separations (in mm) by speaker, for isolated vowels at t . 7 The column labeled AvD , described in the text, refers to average pellet c separation at t , computed across/p b m/, for each speaker # Speaker E29 E31 E34 E35 E41 E44 E53 E54 E59 E61 J1 J2 J3 J4 J5 J6 J7

/i/

/2/

/"/

/o/

/u/

AvD c

25.2 26.5 30.6 22.4 31.8 24.9 30.6 20.6 25.5 23.2 35.1 33.8 26.3 38.8 31.7 24.7 24.7

34 30.5 33.6 23.9 36.2 28.2 38.1 23.1 28 26 35.5 36.6 32.3 39.6 36.4 27.5 34

36.2 38.2 35.5 31.4 40.1 27.9 42.2 24.8 27 30.9 37.9 41 33.4 48.8 35.7 30.3 36.6

22.6 22.1 22.9 22.2 30.2 21.2 34.5 20.5 20.9 23.2 33.2 29.3 30 43 26 29.9 34

18.6 20.5 21.4 20.6 30.1 21.9 25.3 17.8 19.7 20.3 27.4 25.5 25.9 29.6 26.5 23 32.8

16.9 15.6 18.4 15.5 19.4 17.5 20.2 15.5 18.5 20 22.8 20.8 22.2 20.2 23.83 18.5 25

t and t , and D(t), are indicated by short horizontal lines, to emphasize lip pellet # 3 separation at both moments. Two facts about D during /p b m/ were notable. The first is that the distance between upper and lower-lip pellet positions was never constant during the closure interval for any of these consonants. Instead, D always decreased for a period after t , and then began to increase again as t was approached. Across speakers and # 3 consonants, the decrease in pellet separation, between t and t , averaged 3.1 mm # .*/ (p"1.3 mm; and ranged between 0.7 and 6.6 mm). The decrease in D after t # was about the same for different consonants, but somewhat larger among Japanese than English speakers. Across speakers and consonants, the time to minimum D (i.e., [t !t ]) averaged 51 ms (p"20 ms; and ranged between 13 and 106 ms), and was .*/ # about the same for different consonants and speaker samples. The amount by which D decreased after closure, and the time to minimum D, were moderately strongly correlated (r "0.55) across sounds, talkers, and language samples. 9: A second notable fact about D is that it was usually greater at t than at t (by 1.5 mm 3 # on average, across speakers and consonants). The difference D(t )!D(t ) was positive in 3 # 43 of 49 comparisons, ranged between !0.8 and 4.2 mm, and ‘‘belonged’’ largely to a difference in LL pellet position. Lip-pellet separations at t are shown for each speaker in Table III. D was smallest at 7 t for /u/ produced by 15 of 17 talkers, and largest for /"/ for 14 of 17 talkers. In general, 7 D was about 5 mm greater among the Japanese than English speakers, at t for each of 7 the five vowels. A composite measure of pellet separation at t (hereafter, AvD ), averaged # c across /p b m/ for each speaker, is indicated in the rightmost column. The significance of this measure will be discussed in a subsequent section.

416

J. R. ¼estbury and M. Hashi 4. Discussion 4.1. Inferences about articulatory dynamics of labial gestures

Patterns among the positions and motions of lip pellets, accompanying vowels and consonants described in this report, prompt qualitative insights and interesting speculations about the dynamics of labial gestures. For example, the small but systematic inflections in pellet trajectories at about t for most instances of intervocalic /p b m/ # suggest that the forces bringing the lips together are large enough, and timed in such a way as to deform the lips’ shape when they contact one another (cf. Fujimura, 1990). This straightforward inference accounts for two facts established by the data. The first relates to changes in the inter-lip-pellet distance D that always occur after t . D # will decrease between t and t if the lips compress as their own inertial forces, # .*/ and diminishing muscular (closing) forces, dissipate into the soft tissue (cf. Folkins & Abbs, 1975). Conversely, D will increase between t and t as ‘‘opening’’ forces .*/ 3 accelerate the lips apart from a compressed state. The fact that D is systematically larger at t than t may be due to a vertical stretch of one or both lips as they are drawn apart, 3 # deformed by opening forces and/or ‘‘held together’’ by tension generated between their contacting surfaces (cf. Schulman, 1989). It is probably significant that the UL pellet often seemed to follow the LL pellet downward during the release gesture after t (cf. .*/ the ULy history in Fig. 2), as the LL pellet moved down from its own local, mid-closure extremum. A second fact likely due to deformation of the lips’ shapes relates to changes in the direction of pellet motion associated with inflections in lip-pellet trajectories just after t . # Total lip volume must remain constant. Pressing the lips together should cause them to deform either laterally (normal to the sagittal plane, which cannot be seen in microbeam data), and/or horizontally, in the x dimension of the sagittal plane. In this horizontal dimension, the lips cannot distend rearward, since both are bounded behind by buccal surfaces of the teeth. Thus, any horizontal deformation should occur in the space in front of the lips, and would appear as small ‘‘protrusive’’ movements, like those that can be seen during the [t , t ] interval in the x-coordinate histories for UL and LL Fig. 2. The # .*/ fact that inflections in UL and LL trajectories routinely occurred at about t , where this # moment was judged independently from spectral changes in the acoustic wave (e.g., changes involving rapid reductions in energy across the spectrum), supports the idea that the spectral changes themselves reflect discontinuities in lip kinematics. It is curious that compression during the [t , t ] interval seems to be largely # .*/ restricted to the lower lip. The position of the UL pellet at t (cf. Figs. 2 and 5) is not .*/ much different than at other moments during consonantal closures. In contrast, the position of the LL pellet at t is usually upward and forward of its position at t . This .*/ # difference in position of the two pellets suggests that the intrinsic mechanical properties of the lower lip, and/or the control forces applied to its position, may be significantly different from those affecting the upper lip. Recent comments by Honda, Kurita, Kakita & Maeda (1995) suggest a basis for such a difference. The timing of local maxima in UL and LL speed histories, relative to t and t # 3 (cf. Fig. 3), provide information about the timing of forces affecting the lips. The data show, for example, that local maxima in pellet speeds before t occurred within a narrow # temporal window, at most 50 ms wide, and centered about !20 ms with respect to t . In # short, lip pellets do not begin decelerating until very shortly before labial closure. The

¸ip-pellet positions during selected sounds

417

relatively tight ‘‘temporal coupling’’ between deceleration and t suggests that deceler# ation may be driven more by increasing contact, than by any muscular forces which could slow the lips’ approach. From this point of view, lip movements toward closure for /p b m/ are probably not delicate acts. Instead, speakers can accelerate the lips toward one another relatively coarsely, and take advantage of the fact that their collision and compression will create closure. Restoring forces arising from compression may even contribute toward release. At some time after t , lip pellet speeds would be expected to # increase, as restoring forces arising from compression, assisted by muscular ‘‘opening’’ forces, accelerate the lips apart. It is interesting to learn that the release acceleration ends, on average, no more than 25 ms after t . 3 A negative result in the current data that may refine some views about labial gesture dynamics relates to the fact than no systematic differences were observed among pellet movements for /p b m/. Across talkers, pellet trajectories for the three consonants did not differ in extent or shape. Moreover, the timing and magnitude of local maxima in approach and release speeds, averaged across talkers, were indistinguishable by consonant. The strong similarity in lip movements for /p b m/ is consistent with observations by Browman & Goldstein (1986, p. 233), but contrary to a report by Fujimura (1961), based upon an analysis of high-speed (240 frames/s) stroboscopic data recorded from one talker, and to a report by Sussman et al. (1973), based upon strain-gauge data from five talkers. Results from Fujimura’s study, for example, revealed differences in lip configurations just before the release moment, and damped oscillations after release, for /p b/ but not /m/. Fujimura (op cit., p. 236) attributed these differences to an ‘‘overpressure behind the obstruction’’ for /p b/, and suggested that this pressure represents a significant mechanism for effecting consonantal release. Sussman et al. (1973) also proposed an aerodynamic argument to account for sound-related differences in their electromyographic and kinematic data. The fact that no differences in fleshpoint kinematics among /p b m/ were found in this study casts doubt on the influence of intra-oral pressure, for the current sample of talkers and speech tasks. It is important to remember, however, that comparisons of the current results with those from other studies are complicated by differences in transduction and sampling methods. 4.2. Implications of pellet positions and separations for lip openings during vowels Speakers use the lips in specific and distinctive ways to create differences between some speech sounds. In English, the rounded vowels /o u/ are accompanied by protrusion of the lips (Perkell, 1969; Linker, 1982), and narrowing of the height and width of the opening between them (Fromkin, 1964). The relatively narrow average separations and protruded positions of lip pellets at t for /o u/, shown in the left half of Fig. 5 for English 7 speakers, are expected. Among Japanese speakers, however, comparable pellet positions for /u/ and /o/, and especially the protruded position of the UL pellet for /u/, shown in the right half of Fig. 5, are surprising. In descriptive phonetic accounts (cf. Vance, 1987; p. 10), Japanese /o/ is often said to be the only vowel of the language ‘‘that involves active lip rounding,’’ while /u/ ‘‘is commonly described as unrounded.’’ Hattori (1950), for example, generally considered the labial feature of Japanese Tokyo-dialect /u/ to be ‘‘spread’’ (with corners of the mouth retracted). The fact that Japanese speakers described in this report were bilingual in English may account for their unexpected lip protrusions during /u/. An alternate view is that their /u/ protrusion reflects hyper-articulation associated with laboratory speech.

418

J. R. ¼estbury and M. Hashi

Considered together, position data for the two pellets and speaker groups imply that rounding/protruding gestures for vowels, and closing gestures for consonants, may differ in kind for the upper lip, but not for the lower lip. Data summarized graphically in Fig. 5 show, for example, that the UL pellet was lower for labial consonant closures in a way that was not simply pushed more forward, and forward for rounded vowels in a way that was not simply lowered. In contrast, forward and up (and rearward and down) generally fell along a line for the LL pellet. Thus, these data suggest that the kinematic working space of the middle section of the lower lip, assisted by any contribution from the lower jaw on which it rides, may be chiefly one-dimensional. Macchi (1988) reached a similar conclusion about the lower lip, from additional data from two other speakers. The distances separating UL and LL positions at different measurement times provide useful hints about the size of the lip opening, though data summarized in Fig. 5 suggest caution in using inter-pellet distances to infer lip surface separation. Note, for example, that there is no one inter-lip-pellet distance that corresponds to labial closure. Any of several inter-pellet distances can be taken to indicate that the lips are closed. One implication of this fact is that there is no unambiguous way to estimate mid-line surface separation between the lips when they are apart (e.g., during a vowel), merely by subtracting some closure-related reference value from the distance between the pellets at some other moment of interest. The composite measure of pellet separation at t , AvD , # c included in the rightmost column of Table III, is one of several ‘‘candidate’’ reference values for representing the combined mid-line thickness of the lips when the distance between them is zero. Other candidate values that could be considered include some index of inter-pellet separation at t (probably too small), or at t (probably too large). .*/ 3 Working estimates of mid-line lip separation for /i 2 " o u/, derived by subtracting AvD c from the observed pellet separation at t for each talker and vowel, averaged 9.0, 12.5, 7 15.7, 7.7 and 4.5 mm, respectively, across the sample of 17 talkers described in Table III. These estimates are broadly in line with direct mid-line surface-separation measurements reported by Fromkin (1964). This fact suggests that plausible separation estimates from lip pellet data are possible, provided that a reference value corresponding to labial closure is carefully selected. Interpreting pellet-position data in terms that are ‘‘larger’’ than the pellets themselves—e.g., in terms that refer either: (1) to positions of the lips as a whole, the tongue, soft palate, or jaw; or (2) to the geometry of constrictions formed between these mobile articulators and vocal tract boundaries—is a problem common to all point-parameterized speech kinematic data. Solving even the first of these interpretative problems, for deformable bodies like the tongue, soft palate, and lips, whose anatomic partitions are not completely understood, and which may in fact not be discontinuous, is a difficult problem (cf. Fujimura, 1990). It is possible that very many measurement ponts, supplemented by high-resolution images, are necessary for general descriptions of soft body motions. However, multiple, closely-spaced measurement points are impractical for techniques such as the X-ray microbeam method, or electromagnetic articulometry. Highly accurate data supplied by these techniques must be supplemented by other information if we wish to extrapolate from fleshpoints to articulators, or to timevariations in vocal tract constrictions. Certain fine details about labial fleshpoint motions that accompany intervocalic consonants and isolated vowels, produced by different speakers, enhance an appreciation of the interpretative problems posed by point-parameterized data, and expose some of the limitations we must bear in mind

¸ip-pellet positions during selected sounds

419

when we try to use such data to understand the actions and control of the lips during speech. Research support was provided by USPHS Grant DC00820, and a collaborative research agreement between the University of Wisconsin-Madison and ATR Human Information Processing Research Laboratories, Kyoto, Japan. Constructive comments from Jim Dembowski, Kiyoshi Honda, Osamu Fujimura, Patrice Beddor, and an anonymous reviewer are gratefully acknowledged.

References Abry, C. & Bo¨e, L. (1986) ‘‘Laws’’ for lips, Speech Communications, 5, 97—104 Browman, C. P. & Goldstein, L. M. (1986) Towards an articulatory phonology, Phonology ½earbook 3, 219—252 Boyce, S. (1990) Coarticulatory organization for lip rounding in Turkish and English, Journal of the Acoustical Society of America, 88, 2584—2595 Folkins, J. & Abbs, J. (1975) Lip and jaw motor control during speech: responses to resistive loading of the lower jaw, Journal of Speech and Heading Research, 18, 207—220 Fromkin, V. (1964) Lip positions in American English vowels, ¸anguage and Speech, 7, 215—225 Fujimura, O. (1961) Bilabial stop and nasal consonants: a motion picture study and its acoustical implications, Journal of Speech and Hearing Research, 4, 233—247 Fujimura, O. (1990) Articulatory perspectives of speech organization. In Speech Production and Speech Modeling (W. J. Hardcastle & A. Marchal, eds) Kluwer Academic Publishers: Dordrecht, The Netherlands, 324—342 Hattori, S. (1950) Onseigaku. Tokyo, Japan, Iwanami Shoten Hashi, M., Westbury, J. & Honda, K. (1994) Articulatory and acoustic variability of vowels in Japanese and English, Journal of the Acoustical Society of America, 95, 2820 Honda, K., Kurita, T., Kakita, Y. & Maeda, S. (1995) Physiology of the lips and modeling of lip gestures, Journal of Phonetics, 23, 243—254 Houde, R. A. (1968) A study of tongue body motion during selected speech sounds. Speech Communications Research Laboratory Monograph No. 2: Santa Barbara, California Linker, W. (1982) Articulatory and acoustic correlates of labial activity in vowels: a cross- linguistic study, ºC¸A ¼orking Papers in Phonetics, 56 Macchi, M. (1988) Labial articulation patterns associated with segmental features and syllable structure in English, Phonetica, 45, 109—121 Milenkovic, P. H. & Read, C. (1992) Cspeech »ersion 4 ºser’s Manual. Madison, WI Perkell, J. S. (1969) Physiology of speech production. MIT Press: Cambridge, MA Perkell, J., Matthies, M., Svirsky, M. & Jordan, M. (1993) Trading relations between tongue-body raising and lip rounding in production of the vowel /u/: a pilot ‘‘motor-equivalence’’ study, Journal of the Acoustical Society of America, 93, 2948—2961 Ramsey, J. O., Munhall,K. G., Gracco, V. L. & Ostry, D. J. (1996) Functional analysis of lip motion, Journal of the Acoustical Society of America, 99, 3718—3727 Schulman, R. (1989) Articulatory dynamics of loud and normal speech, Journal of the Acoustical Society of America, 85, 295—312 Sussman, H., MacNeilage, P. & Hanson, R. (1973) Labial and mandibular dynamics during the production of bilabial consonants: preliminary observations, Journal of Speech and Hearing Research, 16, 397—420 Vance, T. (1987) An introduction to Japanese phonology. Albany, State University of New York Westbury, J. (1994a) On coordinate systems and the representation of articulatory movements, Journal of the Acoustical Society of America, 95, 2271—2273 Westbury, J. (1994b) X-ray microbeam speech production database user’s handbook. Madison, WI

.

Suggest Documents