ISSN: (Print) (Online) Journal homepage:

WORD ISSN: 0043-7956 (Print) 2373-5112 (Online) Journal homepage: Auditory Phonetics Herbert Pilch To cite thi...
Author: Roland Lester
1 downloads 0 Views 843KB Size

ISSN: 0043-7956 (Print) 2373-5112 (Online) Journal homepage:

Auditory Phonetics Herbert Pilch To cite this article: Herbert Pilch (1978) Auditory Phonetics, WORD, 29:2, 148-160, DOI: 10.1080/00437956.1978.11435657 To link to this article:

Published online: 16 Jun 2015.

Submit your article to this journal

Article views: 166

View related articles

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at Download by: []

Date: 22 January 2017, At: 18:43

HERBERT P I L C H - - - - - - - - - - - - -

Auditory Phonetics* The most elementary judgments in phonetics are auditory judgments. They are judgments of auditory sameness and difference. Such judgments precede even the most sophisticated experimental research. The initial sections of the English words pill and pin sound the same, but they sound different from the initial sections of Bill and bin. The difference between pin and bin sounds the same as that between tin and din. It is on the basis of such elementary auditory judgments that we set up the English stops and their distinctive features and that the Haskin's Group researches into voice onset time. Elementary phonetic information is passed on, necessarily, through oral transmission. Other kinds of knowledge may be gleaned from books, but did a phonetician ever learn from a textbook things like glottal fry (creaky voice), the lateral fricative of Welsh, or the tonelag ofNorwegian? We like to think of phonetics as a communication-oriented discipline. 1 Now how do we communicate? We communicate by listening. And yet, when we describe phonetic events, do we talk about the way we hear them? No, we talk about the output of our sophisticated machinery, about highlevel abstractions such as distinctive features and phonemes. For this purpose, we have built up comprehensive networks of articulatory parameters, and comprehensive networks of acoustic parameters. But there is nothing comparable in the domain of auditory parameters. The very question which I am posing is not generally put in this way; rather auditory phonetics is understood as the study of test responses to acoustically defined stimuli. "Auditory analysis and the perception of speech" (in this sense) was the title of a symposium recently held at Leningrad. 2 "Perceptual phonetics" (in this sense) is the research of the • This is the slightly revised text of a plenary session paper in the International Congress of Phonetic Sciences, Miami Beach, December 19th, 1977. 1 G. Ungeheuer, "Kommunikative und Extrakommunikative Gesichtspunkte in der Phonetik," Proc. 6th Int. Cong. Phonetic Sciences (1967), ed. B. Hala, M. Romportl, and P. Janota, Prague: 1970, pp. 73-85. 2 Ed. G. Fant and M.A. A. Tatham, London 1975. 14X



Instituut voor Perceptieonderzoek of Eindhoven. The ear is modelled (for this purpose) as a frequency analyzer, an inbuilt spectrograph. Consequently, auditory phonetics appears to be a special branch of acoustic phonetics. I certainly see no reason to belittle this work, but I do wish to use the term auditory phonetics for something wider, perhaps something different-not just for the study of acoustically predefined stimuli, but generally for the auditory perception of linguistic stimuli, which are not necessarily predefined acoustically-thereby putting auditory phonetics on a par with articulatory and acoustic phonetics. Auditory phonetics (in this sense) should be closer to the way we communicate-not through acoustically predefined noises, but just through (linguistically structured) noises. The type of auditory category I have in mind does occur in the established canon of phonetic learning, but only in a few isolated instances, not as a comprehensive network. What I have in mind are categories like the HISS and the HUSH, the difference between /s/ and/~/. The hiss and the hush are auditory terms; they describe what we hear. But we often prefer the articulatory labels, talking about apico-alveolar vs. !amino-alveolar fricatives, and we like to believe the articulatory specification is more "objective" than our "subjective hearing" can ever be. Little do we realize that the articulatory parameters are, in such cases, mere imitation labels, not objective, but speculative. 3 To clinch the argument, let us consider one of the well established facts about aphasia. I affirm, with confidence, that in paradigmatic aphasia the hiss-hush distinction is one of those lost early, 4 because I have heard this happen with enough aphasic patients. Can I say as much for the apico-alveolar and the !amino-alveolar fricatives? I cannot; though I have heard many aphasic patients, I have seen very little of their tongue movements. So I should conclude the auditory specification is more reliable, more "objective" than the articulatory imitation labels. The job of auditory phonetics is, then, not only to make auditory parameters available for isolated events such as the hiss and the hush, but to make available a comprehensive network of auditory parameters which will ideally cover all phonetic events. Once this has been achieved with the necessary precision, we will be able to dispense with a great deal of speculation, as in the current distinctive feature theories-whose feature 3 I wholeheartedly agree with Berti! Malmberg: "Les faits perceptifs sont aussi 'objectifs', aussi 'reels', aussi 'mesurables'-si les methodes de mesure sont adequats---{jue les faits physiques" ("Changement de perspective en phonetique," in Nouvelles perspectives en phonetique, Brussels: 1970, p. 12). 4 As discovered by R. Jakobson, Kindersprache, Aphasie und allgemeine Lautgesetze, Uppsala: 1943, rpt. Collected Writings, l, The Hague: 1962, pp. 328-401.



specifications usually prove a little fanciful once they are subjected to serious articulatory and acoustic investigation. 5 I. AUDITORY CATEGORIES

I wish to report about my attempt to invent a network of auditory parameters. It is based on the noise vocabulary of English. The lexicon of any language will embody a great deal of auditory categorization. Like all colloquial vocabularies, noise vocabularies are fuzzy at many points, and they have plenty of gaps. For the purposes of a technical vocabulary, the fuzzy meanings need straightening out, and the gaps need filling. This is done by laying down precise technical meanings and coining new terms ad hoc. All this is standard procedure in the creation of technical vocabularies. As one dimension of auditory space, let us consider noise words with different time characteristics: (i) INSTANTANEOUS NOISE: bang, burst, click, crack, pop, snap, tap, thud, thunk. (ii). BRIEF NOISE: beep, creak, peep, squeak, swish, zoom. (iii) CONTINUOUS NOISE: buzz, drone, hiss, hum, hush, rasp, rustle. (iv) FLUTTERY NOISE: 6 clatter, gurgle, patter, pocketa-pocketa, rattle, rumble, sizzle, sputter. The words of the first class refer to SINGLE, MOMENTARY noises, such as the crack of a whip, the pop of a cork, the thud of a book falling on a carpeted floor, the thunk of a car door slamming shut. The words of the second class refer to SINGLE, BRIEF noises .. They are brief but not so brief as to be momentary-such as the beep of a radio transmitter, the creak of a wooden floor, the swish of a scythe cutting the air, the zoom of a jet plane passing us overhead. If the swish of a whip is too short to be anything but momentary, it is no longer called a swish, but a crack. The words of the third class mean EVEN, CONTINUOUS noise. It is 5 This has been admitted even by some M.I.T. phoneticians, cf. C. M. Bush: "The acoustic specification as presently detailed in the distinctive feature analysis is either inadequate or inappropriate for the four English fricatives under investigation [i.e. /f v 91'1/]." (Phonetic Variation and Acoustic Distinctive Features, The Hague: 1964, Janua linguarum series practica 12, p. 136). It has been known for a long time to readers of E. Zwirner, M. Joos and H. M. Truby. Mario Rossi has highlighted the in built contradiction of a distinctive feature theory which claims to be adequate both to the abstract linguistic units (phonemes) and the acoustic observation. He advocates distinctive features on the perceptual level in the psychoacoustic sense-"Les faits acoustiques," La linguistique 13 (1977), 63-82. 6 This term is taken from the technical vocabulary of flute playing where it is a loan translation of German Flatterzunge.



homogeneous and of indefinite duration. The buzz of a bluebottle may be heard briefly or for a long time; the swish of a scythe is necessarily brief. Otherwise it would no longer be called swish but perhaps whistle-like the whistle of an approaching artillery shell. The words of the fourth class mean UNEVEN, CONTINUOUS noise. It swells up and down, like the fluttery tone of a flute. The pattering of feet on the ground consists of many individual "patters"-sit venia verbo-but no individual "patter" is called this, it is called a thump or tap (or by some other instantaneous-noise word). These four classes of noise words are, in relation to each other, ANTONYMOUS and EXHAUSTIVE. They are antonymous in the sense that each class differs in meaning from the three other classes. True, any one noise word of the colloquial vocabulary may have multiple meaning-witness the swish with its current slang meaning of "homosexual". If this bothers us, we can make it unambiguous by allowing only one of these meanings in the technical vocabulary. The four classes of noise words are EXHAUSTIVE in the sense that every noise word of English (hopefully) belongs to at least one of them. They thus do for us what we have asked auditory phonetics to do. They are clear, and they are comprehensive (as far as time characteristics are concerned). They apply to the noise classes of phonetics fairly smoothly. The single instantaneous noises are our old friends the stops. Within this class the BURSTS (explosives), SNAPS (implosives), FLAPS and DOUBLE STOPS form well known subclasses. A less well known subclass has friction noise between the onset (snap) and release (burst), as the diaphones of /'p/ and /o/ used in Ireland and Newfoundland. The single brief noises are the glides. The even, continuous noises are the continuants, the fluttery noises are the trills. Is this more than just a set of new terms? Does auditory phonetics teach us anything, in other words, that is not handled at least equally well by acoustic and articulatory phonetics? Yes, it does, in that the auditory categories cut right across the articulatory categories in quite a few instances. Consider the phonemes of English. Auditorily, not only are the semi-vowels /h y w/ glides, but also the so-called "lax-vowels", the retroflex vowels and the "vocalized" diaphone of /1/, as widely heard in American English. All these are glides in the sense specified above. They are of more than momentary duration, but not of indefinite duration. For a quick cross-check try to draw out a retroflex vowel indefinitely. It soon no longer sounds "r-colored"---even though it may still be retroflex in terms of tongue-position. Try the same with a "vocalized/". Or draw out a lax vowel indefinitely-say the stressed /t/ of the German word bitter. It soon sounds like the tense vowel /e/ of Beter-whatever its articulatory laxness may be. 7



The auditory specifications provide, at the same time, the phonetic motivation why "lax" vowels must always be followed by a final consonant (in the stressed syllables of those Germanic languages in which lax and tense vowels contrast). As they are glides, they are not of indefinite duration and occupy by themselves the nucleus of the intonation contour (which demands lengthening). Thus auditory phonetics, far from being a mere terminological innovation, enables us to solve at one stroke two riddles of long standing (which have proven recalcitrant to even the most advanced methods of acoustic and physiological investigation), the phonetic specification of the tense-lax distinction, and the phonetic link between the lax vowels and the obligatory presence of final consonants. Why should these answers be found in the auditory (more readily than in the acoustic or articulatory) domain? Because the tense-lax distinction functions in communication not by virtue of its acoustic properties, but by virtue of the way we hear it. The two major auditory dimensions other than time are RESONANCE and TIMBRE. Timbre is, in the colloquial vocabulary of English, described in terms of the antonymous pairs bright-dark, dull-clear, thin-full, soft-hard. These parameters apply, inter alia, to voice quality, a field which articulatory and acoustic phonetics have not clarified so far. Changes of voice quality can be HEARD in connection with certain pitch patterns. For instance, a change from hard to soft voice is associated, in English, with the extra-high pitch on a concave-fall nucleus. I cannot, at this point, go into further detail. 8 II. THE AUDITORY TEXT The crosscheck on the validity of the auditory network is provided by the well known confusion matrices. The lax vowels of English, for instance, are more readily confused with each other than they are with the tense vowels. 9 The confusion matrices are customarily taken to indicate that the listener is in error. A given lax vowel (say /i/, as in pit) was "factually present", but the listener "mistook it" for another lax vowel (say jej, as in pet). Now could it not be that it wasn't the listener who was mistaken, but the investigator? 7

pp. 8

Witness the experiments conducted by Eli Fischer-J0rgensen, op. cit. (fn. 2 above), 153~176.

The auditory network is spelled out in depth, and applied to English, in my Manual of English Phonetics, ch. 7 (Munich, Fink, in press). 9 See Richard C. Berry, "A three-feature system for English vowels," Proc. Seventh Int. Cong. Phonetic Sciences, ed. A. Rigault and R. Charbonneau, The Hague 1972, pp. 452-459.



Not necessarily in the simple, complementary sense that the investigator mistook for /i/ a "factual jej", but that he was mistaken in his belief that there IS such a thing as a "factual /i/" versus a "factual jej''? Things like /i/ and jej have, after all, been abstracted from a continuously changing soundflow. What is so "factual" about them as to leave no room for legitimate disagreement? Sure, when a properly trained listener listens to a properly trained speaker producing a minimal pair like pit-pet, no uncertainty is to be expected. But this is a highly specialized laboratory situation. I do not think we can safely generalize on this situation, assuming that ordinary communication also works with unambiguous /i/'s and jej's all the time. In fact, all of us probably have experienced the ambiguity involved in transcribing some spontaneous conversation recorded on tape. Let us call such a tape an AUDITORY TEXT 10-in contrast with the TEST UTTERANCE produced ad hoc. In an auditory text, it is by no means always easy to determine which phoneme is "factually there". The reason is, I submit, that the different phonemes of a given language do not factually sound different from each other, in auditory texts, as neatly as we would like to believe. On the contrary, there is a great deal of PHONEMIC INDETERMINACY. Sometimes this can be resolved by the context. For instance, I hear someone talk about "the Publishing House of Macmillan" with the of publishing sounding much more like a [kh]. But of course I know the word is publishing house, not publicking house, so I conclude the speaker did not pronounce properly. But do speakers, ordinarily, pronounce "properly"? True, the speaker concerned could probably be induced to pronounce the minimal pair finishing-finicking. Does he carry over this distinction from the test situation into ordinary communication? Obviously he does not. The fact is that some s-phonemes sound so much like k that the two cannot be distinguished. Some instances of phonemic indeterminacy cannot be resolved even through prior morphemic interpretation, I hear an utterance which is, presumably, either and you wouldn't feel confident or I just wouldn't feel confident with the "main stress" on wouldn't. The proclitic section preceding wouldn't sounds like [aiJi]. If this represents and you, then [i] is the denasalized vocoid allophone of the phoneme /n/. In fact, both nasalized and denasalized vocoids are heard, commonly enough, for the phoneme /n/ in English, as finally in government j'g;:wm;)t;::::: 'g~vm~t/. On the other hand !Ji! could, perhaps, be a proclitic allomorph of just, and this is more


10 This is E. Zwirner's Abhortext, cf. his Grundfragen der Phonometrie, 2nd ed. rev. Bibliotheca phonetica 3, Basel 1966, pp. 173-178; H. Bluhme in his English translation uses phonetic text (Principles of Phonometries, The University of Alabama Press 1970, pp. 130135).



plausible in the present context, as it carries on a rhetorical anaphora. Which phonemes are "factually present"? /renJi/? or jaiJi(st)/? We cannot answer this question uniquely. It is, I conclude, an improper question in this instance. The transcripts can, in the nature of things, be no more than rough approximations. How do people communicate with so much indeterminacy? The fact is they do, and they couldn't care less. In the present instances (both are from Canadian radio programs), no listener has been puzzled except the phonetician. And even the phonetician was not puzzled till he had done a good deal of re-listening to the tape. Dialectologists often call on a native speaker to help them transcribe an auditory text. Even though they know the phonemic system involved, they are puzzled by many passages. 11 This has been explained by the native speaker myth, attributing to the native speaker a sixth sense which enables her or him to recognize phonemes which the non-native does not. The rational explanation is, I submit, in the phonemic indeterminacy of the auditory text. As the indeterminacy is phonemic, it cannot be resolved by phonemic criteria, but it requires the skill of an editor who, by virtue of cultural experience, knows not only how the community concerned usually expresses itself, but what it usuflllY talks about. The dialectologist, for all his learning, cannot rival the cultural experience of the untutored native speaker. This is also why automatic speech recognition is so difficult. Though we may program into the computer all our acoustic knowledge, we cannot yet do the same with the cultural knowledge on which the human listener also relies for recognition. Ill. PHONETIC HEARING The assumption that there IS such a thing as "factual [e]" vs. "factual [i]" presupposes a UNIQUE PARTITION of auditory space and, conversely, a single, standard list of different "sounds". A crude version of this idea is what we were taught in grade school: there ARE five (or six, or seven, or eight ... ) vowels a, e, i, o, u, ... 12 The more sophisticated versions of the standard list claim to be "universal phonetic alphabets". They differ from the crude version in the number of vowels, but the basic idea remains the same: the unique partition of auditory space-not in terms of just five or six vowels, but (say) of twenty, or fifty or two hundred, or in terms of a single list of distinctive features-twelve, fourteen, twenty, or suchlike. The phonemic system (or sound pattern) of any particular language is then 11 As discussed by E. Zwirner, Grundfragen, pp. 169-173 (Phonometries, pp. 133-135); F. Hedblom, "Recording in Dialect Investigation in Sweden," Phonetica, 3 (1959), 95-108. 12 In fact, one of my (flunking) students at the University of Massachusetts argued thus the other day. Noticing my disapproval, he explained: "The teacher told us so in High School."



taken to constitute a selection from the big standard list-in such a way that certain specific "sounds" are selected from the list and all others deleted: "In the beginning God created the heavens and the earth and the International Phonetic Alphabet. 13 Now this is, I submit, a bad theory. Its justification is not empirical, but cultural. It is extrapolated from the Latin alphabet, which our culture has been using and adapting to new languages for many centuries. So the Latin alphabet appears "natural" to us-either the crude version with five vowels or one of the more sophisticated versions. 14 Other cultures, say the Chinese, use, for their version of phonemic analysis, a different partition of auditory space, one in which the question "How many vowels?" simply does not apply. 15 The idea that phonemic systems constitute subsets from some universal list of sounds is, I submit, irreconcilable with the empirical evidence: I. If every phonemic system were, in fact, a subset from a standard "universal phonetic alphabet", then we should be able to learn how to pronounce an unknown language from a sufficiently narrow transcription. We all know from experience that this cannot be done. The learner can, in 13 Note Charles Hockett's caustic comment on "the assumption that there is a large, but strictly finite, stock of distinct humanly possible articulations or speech sounds, from which each language selects a small subset as its phonemes." As Hockett remarks, "In fact, of course, the set of all humanly possible articulations forms a multidimensional continuum, from which discretely contrasting ranges are quarried by quantization, and the ways in which this is done in two different languages need show no congruence." (Amer. Speech, 47 [for 1972; belatedly published 1975], 243n.) In the same vein, this writer "rejects transcriptions which allege some particular topological structuring of phonetic space as universal" (Phonemtheorie, 3rd ed. Basel 1974, p. xi). 14 This was first pointed out by H. Liidtke: "Der Gedanke, daB man mit den modernen Transkriptionssystemen die Gesamtheit lautlicher Variation aller Sprachen und Mundarten erfassen konne, sofern man nur das Symbolrepertoire angemessen erweitert, ist letzlich eine Extrapolation der Entwicklungsgeschichte der Lateinschrift" (Folia Linguistica, 5 (1972], 336). Conversely, alphabetic writing has been taken to confirm the validity of our transcripts: "The segmentability of the speech chain into discrete and global phonemes is firmly established by alphabetic writing systems" (0. S. Achmanova, Proceedings of the Seventh International Congress of Phonetic Sciences p. 170). I question not the segmentability as such (see section (4) below), but the general validity of particular segmentations such as the International Phonetic Alphabet. 15 A convenient summary statement of the Chinese way of phonemic analysis is offered by Tung-Ho Tung, Bipartite Division of Syllables in Chinese Phonology," Proc. Ninth Int. Cong. of Linguists, p. 203. The reason for the different cultural modes of phonemic analysis does not necessarily lie in differences oflinguistic structure. Chinese can be analyzed with vowels, as has been done by C. H. Hockett, "Peiping Phonology," J. A mer. Oriental Soc., 67 (1947), 253267; "Peiping Morphonemics," Language, 26 (1950), 63-85. Conversely, some European languages, such as Swedish and Norwegian, have pitch patterns that are distinctive on the level of lexical meaning, but most dictionaries omit them.



fact, not get away with the selection from a known partition of auditory space, but he must learn a new partition, a new AUDITORY CATEGORIZATION with every new language he learns. 2. Conversely, if the universal phonetic alphabet did, in fact, apply to every language, then phoneticians should be able tc transcribe every unknown language properly at the first attempt. The expert who knows "all the sounds of the human species" must know every sound in any particular subset. Experience shows that the phonetician can, at the first attempt, achieve no more than a rough approximation, because she or he has not yet learned the specific auditory categorization which is pecu::ar to the new language. We categorize what we hear in terms of what we already know (such as some phonetic alphabet). As the two categorizations are incommensurate, we often find that the native speaker to whom we read our transcripts does not understand them. 3. Certain aphasic patients can hear in the sense that they have normal hearing under audiological examination, but they do not recognize phonemes. 16 For instance, I had a Welsh patient who had lost all but the most elementary pitch patterns of Welsh, yet he could sing. 17 What he had lost was not the sense of hearing, but the auditory categorization of what he heard in terms of the pitch patterns of Welsh. It appears then that auditory perception is not simply a matter of hearing something the way it "is," but it necessarily involves different auditory categorizations. These auditory categorizations are, in principle, at the discretion of the listener. It is the listener who chooses to treat a particular bit of noise as just noise or as an auditory text and who decides whether it can plausibly be treated as a text. Analytically speaking, this decision is what constitutes phonemic analysis. The particular phonological model A, B, C, or D in terms of which this analysis is couched-Chinese or European-is a matter of detail. 18 For the purposes of auditory phonetics, I propose to distinguish between three modes of auditory perception: 19 1. The AUDIOLOGICAL hearing which is a biological property of the 16

This is the syndrome known as PHONEMIC DEAFNESS (Lurija) or LAUTTAUBHEIT (Kleist). See my account in "Aphasische Intonationsstorungen," Saggi, 2 ( 1976) 33-42; "Aphasia in Welsh," Word, 28 (1976), 207-229. 18 Cf. H. Pilch, Phonemtheorie, 3rd ed., Basel: 1974, p. 157. 9 -' This is close to Berti! Malmberg's views, Introduktion till fonetiken som vetenskap, Stockholm: 1969. Malmberg distinguishes between the same three modes of perception, but he recognizes the third not as crucial to intelligibility, but only as a crosscheck on signals already recognized phonemically: "Faktor 2 ar avgorande for mottagarens reaktion. Om denne brukat den riktiga koden, identifierar han ljudvagen som en utsaga pi\ ett kant sprak". I have presented more extensive data in support of my alternative view in "La langue et Ia comprehension: experience d'un aphasique," La linguistique, 10 (1974), 79-90. Most 17



normal human ear as such, for instance the pitch ranges and loudness thresholds within which our ears perceive noise, feel pain etc. 2. The PHONEMIC listening which is an acquired faculty of those people who have learned a given language. It involves for example the recognition not just of pitch, but of the pitch patterns of a given language. It is lost under phonemic deafness. 3. The EDITORIAL understanding which involves hunches and guesses and is best practised by people who have a great deal of cultural experience with a given language. It is lost (or partially lost) in sensory aphasia. IV. AUDITORY ANALYSIS What, then, is the job of auditory phonetics? The answer I proposed above (end of introduction) appears oversimplified from our present vantage point. Let us revise it: Auditory phonetics should specify, in a coherent manner, the partitions of auditory space imposed by different phonemic systems. As I do not believe in "the natural partition" of auditory space, I adapt to my purposes culturally pre-established partitions, such as noise vocabularies, alphabets, audiological test scales, etc. This is my justification for drawing on the noise vocabulary of English in section (i) above. The empirical motivation for adopting any particular category is provided by phonemic experience with different languages. It will determine whether or not we need auditory parameters to describe (say) different voice qualities (see end of section (i) above). 20 Consider, as an example, the category of pitch. This is a pre-established category familiar to musicians and audiologists. We use it, in order to specify the pitch patterns of languages, even though pitch in languages sounqs remarkably different from musical pitch. Phonemic experience, however, convinces us we do not need the musician's elaborate classification of pitches in terms of the diapason. 21 All we need is a simple highphoneticians appear to attach even less importance to the editorial mode of perception than does Malmberg. They assume a "mini-interpreter" or "precategorical auditory store" which recognizes messages by a "matching procedure", comparing the input signal, on a purely phonetic basis, with the phonemic units somewhere in the brain. 1. C. Lafon distinguishes between the audiological mode (audition) and the phonemic mode (integration) of perception (Message et phonetique, Paris: 1966). 2 Cf. D. Fry: "there seems to be no record of a language in which voice quality differences operate independently as a prosodic feature." (Manual of Phonetics, 2nd ed., ed. B. Malmberg, Amsterdam: 1968, p. 370.) 21 It is true there have been linguistic investigations using the diapason framework, such as 1. E. Buning and C. H. van Schooneveld, The Sentence Intonation of Contemporary Standard Russian as a Linguistic Structure, The Hague: 1961. Such investigations are unrealistic in the sense that they contain more detail than can be distinguished by ear in terms of the recurrent differences (see below).




low dichotomy, or at most four or five different (relative) pitches. When a pitch changes, it either rises or falls, or it drifts indeterminately. Thus we have the rising, falling and level pitches. Phonemic experience suggests a further subdivision by the speed of the pitch movement, either accelerating or slowing down. Thus we have (besides the level pitch) the accelerating fall the accelerating rise, the decelerating fall, and the decelerating rise. This auditory specification is motivated by the pitch patterns oflanguages such as English, French, Welsh and German. 22 Work with pre-established categories and modify them to suit our needs is what we do in articulatory and acoustic phonetics, too. For instance, the acoustic measure of gravity as f3-F2_F 1 2 is surely motivated not by some natural partition of acoustic space, but by phonemic experience. 23

Auditory specifications are valid if they characterize at least the phonemic distinctions within the language concerned. They usually go beyond the level of distinctive features, but they stop short of a full specification of our auditory impressions. This restraint is necessary in order to insure both the coherence of our framework and the typological classifiability oflanguages on the basis of their auditory properties. Finnish and Lettish both have, for instance, the same inventory of five pitch patterns: [1] accelerating fall, [2] accelerating rise, [3] rise-fall, [4] fall-rise, and [5] two levels with pitch break between them. 24 Yet, Finnish and Lettish SOUND remarkably different. This, however, does not detract from the validity of the specification. The impressionistic incompleteness of the specification implies, conversely, that a given auditory specification does not necessarily enable us to recognize the phonemic units concerned (unless we know them beforehand). Recently, this author struggled to learn the tones of Mandarin Chinese. They are auditorily specified as [1] level, [2] rise, [3] fall-rise, [4] fall. 25 But to recognize them in an auditory text by just this specification is 22 On French cf. F. Carton, Introduction a Ia phonetique du fran~ais, Paris 1974. On English cf. fn. 8 above; on Welsh cf. my article Advanced Welsh Phonemics, Zeitschrifi.fiir ce/tische Philo/ogie 34 (1975) pp. 60-102; on German cf. my article "Baseldeutsche Phonologie auf Grundlage der Intonation," Phonetica, 34 (1977), 165-190. 23 G. Fan!, Den akustiskafonetikens grunder, Stockholm: 1957. 24 The inventory of Lettish has been described by L. K. Ceplitis, Analiz re~evoj intonaciji, Riga: 1974, p. 113; the inventory of Finnish by A. Sovijiirvi, Olli-Matti Ronimuksen eriiistii runoista laadittuja lausunto--analyysi-harjoituksia (Fonetikaan /aitos, Helsinki: 1963). 25 For an acoustic specification see J. M. Howie, Acoustical Studies of Mandarin Vowels and Tones, Cambridge and New York 1976.



hopeless. The pitches keep drifting in all directions. Still, the specification is good. How does one ever learn to recognize these tones? More generally, how can we ever analyze a phonemic system which we do not know already? The way to tackle the job is, first, to forget about the auditory specification. What we do instead is, in principle, listen to several sequences of at least two tones (the segmentation presents no particular difficulty most of the time) and decide not whether they rise or fall, but whether they sound the same or different. This is the most elementary phonetic judgment (as we saw above). We do, of course, hear subtle differences everywhere. We may thus be tempted never to judge any two given tones (or other phonetic events) to be the same. If we fall to this temptation, we will fail to arrive at any partition of the auditory space, that is, fail at phonemic analysis. The way to escape this pitfall is to take into account only such differences as we can hear again and again consistently (not just once or twice or any limited number of times). Thus we listen ultimately not just for given· phonetic events, but for recurrent auditory differences between classes of phonetic events. In this way one eventually learns to recognize the tones of Chinese, or the pitch patterns of intonational languages 26 and even phonetic parameters other than pitch. This is at least one way to learn,. by trial and error, those "new partitions of auditory space" which characterize unknown languages. In every case it is essential not to project one's auditory impressions onto a "general phonetics chart", as was done by the old "Ohrenphonetik," but to compare them with each other, listening. for recurrent differences. The specification of these differences in terms of auditory parameters, is a subsequent step. First recognize them, then characterize them. It is after we recognize the set of four tones that we can apply to them such parameters as level, rise, fall-rise and fall, and we can then argue whether or not some fall-rises might not be better described as rise-falls, low-levels, and so on. This procedure reflects the two basic assumptions of phonetics, 2 7 namely that phonetic events are (i) distinguishable, and (ii) classifiable. Both assumptions imply the discrete character of phonetic elements. If they were not distinguishable, we would be unable to recognize them. If they were not classifiable, we would be unable to classify them as same or different. These assumptions constitute the essential difference between audiology and auditory phonetics, acoustics and acoustic phonetics, physiology and articulatory phonetics. Without these assumptions (or their equivalents) no phonetic work 26 Lexical and semantic criteria are, of course, helpful in the case of tones, but not in the case of intonations which, by definition, do not involve lexical and semantic differences; see my article "Intonation in Discourse Analysis," Phonetica, 34 ( 1977), 81-92. 27 As formulated by E. Zwirner, Grundfragen, p. Ill.



would be possible; we would go on forever being baffied by the infinitely variable soundflow.

Has anybody ever successfully proceeded the other way-the way that is laid down in many of our textbooks, first fully analyze the soundftow in terms of a very narrow transcription, then single out the distinctive transcription signs and throw away the redundant ones? Has everybody not, in fact, always used so called shortcuts, meaning less unrealistic procedures than those of the textbook? Let us now recognize those "shortcuts" for what they are and accord them their proper epistemological status. First of all let us recognize the primacy of auditory phonetics as our most convenient and, indeed, inevitable gateway to phonetic investigation. Albert-Ludwigs-Universitiit Freiburg i.Br. and University of Massachusetts at Amherst