Selective perception and recognition of vocal signals

CHAPTER 4.3 Selective perception and recognition of vocal signals Günter Ehret* and Simone Kurt Institute of Neurobiology, University of Ulm, Ulm, Ge...
Author: Shonda Reed
1 downloads 0 Views 379KB Size
CHAPTER 4.3

Selective perception and recognition of vocal signals Günter Ehret* and Simone Kurt Institute of Neurobiology, University of Ulm, Ulm, Germany

Abstract: Perception and recognition of vocal signals in mammals is based on hearing abilities, which have a species-specific profile but share common mechanisms of sound processing in the auditory system. Common perceptual abilities in mammals refer to audiogram values, just noticeable differences in the perception of frequencies, intensities and duration of sounds, and to the mechanisms and limits of spectral and temporal resolution in hearing. After the discussion of these key features, we will show how perceptual limits and borders between perceptual classes in the auditory systems of mammals become effective in assessing the basic biological meaning of vocal messages. Finally, we will briefly address sound recognition as a consequence of the ability to learn subtle differences in vocalizations in order to identify the vocalizing animal as a member of a certain species and/or as an individual. Keywords: audiogram; limits in perception; categorical perception; difference limens in audition; hearing abilities; perception of biological meaning; perceptual borders; spectral-temporal resolution Abbreviations: CB: critical band; d: sound duration; Δd: difference in sound duration; Δf: difference in tone frequency; ΔL: difference in sound level; f: tone frequency; JMD: just meaningful difference; JND: just noticeable difference; L: sound level; VOT: voice onset time

I. Introduction

for perceiving the sounds are mutually adapted by natural selection in the evolution of that species. By similarities in the morphology of the vocal tract of many mammalian species, communication sounds consist of basic elements such as tones, tonal complexes (a number of harmonically or non-harmonically related frequencies), or noise, all of constant or variable frequency, intensity and duration. In addition, the same or varying acoustic elements are often emitted in sequences of a certain temporal structure. These elements, with their acoustic parameters, determine the requirements of the ear and the auditory system for their perception and recognition. Thus, we can expect a basic set of mechanisms and abilities common to all mammals providing the “auditory tools” for communication sound perception. Based on the hypothesis of mutual adaptation of vocalizing and perceiving, however, we can also expect specializations in hearing and perception that are adapted to communication in special environments or in special

Mammals vocalize in many behavioral contexts, often starting in their lives with the first cries after birth. The vocalizations convey information about the sender. They may specify the sender’s location, its membership of a species, and its characteristics as an individual such as age, sex, body size, health and experienced emotions and motivations. Other animals can take advantage of these vocalizations by perceiving them, decoding their informational content and adjusting their own behavior according to the recognized biological meaning (semiotic value). In this way, vocal communication is established. Since most social interactions that are accompanied by vocalizations happen among members of a given species, we can assume that the acoustical features of emitted sounds and the auditory mechanisms

*

Corresponding author. E-mail: [email protected]

Stefan M. Brudzynski (Ed.) Handbook of Mammalian Vocalization ISBN 978-0-12-374593-4

125

DOI: 10.1016/B978-0-12-374593-4.00013-9 Copyright 2010 Elsevier B.V. All rights reserved.

126

Vocalizations as Specific Stimuli: Selective Perception of Vocalization

cases, such as dual communication (self and social) in bats or highly-developed semantic communication with speech in humans. In the following subsection, we will first discuss common perceptual abilities and related auditory mechanisms, and then provide an outlook on how these abilities might be used in general and in species-specific vocal communication.

II. Perceptual abilities and their physiological bases II.A. Audiogram The basis of auditory perception is sensation, i.e., sounds must be audible to the individual. By definition, sounds are audible if their frequency components are within the animal’s audiogram, which is the

curve illustrating the minimal sound pressure levels of just audible tones, as a function of the tone frequency (Fig. 1). A compilation of audiograms of many species can be found in Fay’s psychophysics databook (Fay, 1988). Audiograms describe the frequency range of hearing of a species, together with frequency ranges of increased or reduced sensitivity. The species-specific shape of the audiogram is generated by the filter and amplifier characteristics of the outer, middle and inner (cochlea) ear (e.g., Ehret, 1989), i.e., the basis of hearing reflects very peripheral properties of processing in the auditory pathways. Further, audiograms set the limits for sound communication in the frequency-intensity space, i.e., frequencies of communication sounds must be within the frequency range of the audiogram, and at or above the minimal audible sound pressure levels represented by the audiogram. For many mammals the

Fig. 1. Relationship between audiograms of humans and house mice and the frequency ranges of their vocalizations. The audiograms represent the auditory threshold curves, i.e., the minimum sound pressure levels in dB (y-axis) as a function of the frequency of a just audible tone (x-axis). Two frequency scales are shown. One applies to human hearing (in kHz); the other is expressed in octaves and calibrated to the frequency with the lowest hearing threshold in the audiogram. The octave scale is common to mammals. The audiogram of the mouse shows the frequency of best hearing (fopt) at 15 kHz. The main frequency ranges of vowels of humans speech, of cries of human babies and of calls of adult mice all are in a frequency range around and about four octaves below fopt (normal frequency range of vocalizations) which can be found also in many other mammals. Mice have special adaptations to hear and vocalize in the high ultrasonic range up to three octaves above fopt. Other rodents and many bat species have such specializations. The audiograms are from Ehret (1974).

Selective perception and recognition of vocal signals frequency range of the audiogram can be divided into three parts with regard to the frequency ranges of communications sounds (Fig. 1): (1) a central “normal” part where the main energy of most communication sounds is located, as illustrated for humans and mice; (2) a specialized part above the central part serving communication in the high ultrasonic ranges (e.g., rodents, bats; see Brudzynski and Fletcher, Chapter 3.3 in this volume); and (3) a specialized part below the central part for communicating over long distances with low-frequency sounds (e.g., elephants; see Garstang, Chapter 3.2 in this volume). Knowing the sound pressure levels of frequencies in communication sounds and their specific attenuation by the medium in which the sound spreads out, one can calculate the communication space of a sender (e.g., Haack et al., 1983). Since high frequencies, especially in the high ultrasonic range, are heavily damped in air, communication with ultrasounds is restricted to short distances around the sender. Two other conditions have to be considered if one would like to take the audiogram of a species as a frame for estimating the audibility of communication sounds. First, audiograms usually describe the sensation of sounds of rather long duration (100 ms and longer, depending on the frequency spectrum). The perceptual thresholds of shorter sounds increase by about 10 dB for a decrease in sound duration by about one tenth, for example from 100 ms to 10 ms (Fay, 1988; Ehret, 1989). Thus, for optimal detection, communication sounds should either be long, or consist of a train of short duration pulses with a high repetition rate so that their energy is integrated over time to reach a low detection threshold. Second, the shape of the audiogram changes with the age of the animal. Young animals start hearing in a restricted frequency range (often close to the best frequency range of hearing in adults) at rather high detection thresholds. With increasing age during development, the audible frequency range increases towards lower and higher frequencies, and the thresholds decrease towards those of adults (Ehret, 1983, 1988). Old animals may suffer from hearing loss, especially at high frequencies (Ehret, 1974). Thus, newborns of cats, dogs, many rodent and bat species, and marsupials start hearing only several days after birth and may reach adult auditory sensitivity within about 3–12 weeks; in humans not before the age of two years (Ehret, 1983, 1988). This means that very young mammals and also very old ones may be active in producing sounds, but may not be able to perceive sounds as young adults do.

127

This requires special strategies for acoustic communication with young infants. If infants are the receivers of communication sounds, adult bats (Gould, 1971; Brown, 1976), cats (Haskins, 1977), wolves (Shalter et al., 1977) and humans (Stern et al., 1982) use sounds of a simple frequency and time structure, with frequency components in the optimal frequency range of the young and with rhythmic repetitions of elements.

II.B. Just noticeable differences in frequency, intensity and duration Mammals can detect minimal frequency differences (Δf) between tones in a series (f, f  Δf, f, f  Δf, …) in the order of 0.1–10%, except for very low frequencies where Δf is constant at a small value (Fay, 1988; Ehret, 1989). Similarly, just noticeable differences (JNDs) in the sound level (ΔL) between tones in a series (L, L  ΔL, L, L  ΔL, …) are in the order of 0.5–4 dB (Fay, 1988; Ehret, 1989). These JNDs (Δf, ΔL) decrease with increasing sound pressure level (Δf up to a level of about 60 dB above the audiogram curve; Ehret, 1989). This means that communication sounds have to be vocalized loud enough to make small fluctuations in frequency and intensity perceptible. JNDs in frequency and intensity can be derived from cochlear excitation patterns and hair cell densities in the organ of Corti of the cochlea (Maiwald, 1967; Ehret, 1989), and thus are peripheral in origin. Since the cochleae of mammals, except for some bats with a cochlear fovea region (Bruns, 1976; Suga and Jen, 1977), seem to be variants derived from a common scale (Greenwood, 1990), the mentioned frequency and sound level dependencies of JNDs may apply to most mammals. These JNDs and their properties refer only to sounds in series, to allow comparisons without the involvement of memory. If differences in frequency and/or intensity between sounds have to be detected on the basis of stored and remembered sounds, central auditory processing and learning with the additional variables of emotion, motivation, memory formation and recall, and the issue of “meaning” come into play, which will be discussed below. JNDs in duration (Δd) in a series of sounds (d, d  Δd, d, d  Δd, …) are in the order of 4–100% and depend very much on the species and the method of measurement (Fay, 1988; Klink and Klump, 2004). The variability of JNDs for duration between species is much larger compared to that for frequencies and intensities. The probable reason is that duration

128

Vocalizations as Specific Stimuli: Selective Perception of Vocalization

discrimination cannot be directly related to common peripheral (cochlear) properties. Specific coding for sound duration (duration sensitive neurons) seems to occur first at the level of the auditory midbrain in the central nucleus of the inferior colliculus (Covey and Casseday, 1999; Brand et al., 2000) so that species differences in auditory processing in the brainstem can contribute to enlarge the species-dependent variation of JNDs in duration.

II.C. Spectral resolution and the perception of the frequency composition in vocalizations Due to the vibrations of the vocal folds, most vocalizations consist of a fundamental frequency and harmonically-related overtones, several of which may be amplified by resonances in the vocal tract. These resonance frequencies determine the frequency ranges of formants in vocalizations. The analysis of the spectral structure of sounds requires frequency filters and spectral integrators. The characteristics of such filters – bandwidth as a function of center frequency, sound

pressure level and type of sound processed – have been determined in various auditory perception tests in humans and, to a lesser degree, in mammals and other animals (Fay, 1988; Ehret, 1995). Depending on the method of measurement, the filters have different names, such as critical band (CB), critical ratio (CR), or equivalent rectangular bandwidth (ERB) (Scharf, 1970; Moore, 1989). Perceptually important characteristics of CBs, the increase of bandwidth with increasing center frequency of the filter (Fig. 2) and the independence of the filter bandwidth and the spectral integration within the filter from the sound level, are derived from their cochlear origin and further processing up to the level of the inferior colliculus. One CB covers a frequency range represented by about 0.7–1 mm length of the cochlear tonotopy (Greenwood, 1961, 1990; Ehret, 1977, 1989) which leads to an increase of the CB bandwidth by almost one octave for a one octave increase in center frequency in the frequency range at and above the best hearing range (Fig. 1). In this frequency range, the cochlear frequency representation follows a logarithmic function. A consequence

Fig. 2. Spectrograms of vocalizations of a male and a female mouse during sexual interaction. The sexually-interested male emits a series of ultrasounds when approaching the female. The non-receptive female emits a series of defensive calls while trying to keep the male off. The male’s ultrasounds are pure tones with frequency modulations; the female’s defensive calls consist of many harmonics and broadband noise. The durations of the defensive calls and of the inter-call intervals are indicated on top of the figure. The right-side y-axis scale shows the border frequencies of critical bands of the mouse as they are established to resolve the first two or three harmonics of the defensive calls and the male ultrasounds. Higher harmonics and the noisy parts of the defensive calls cannot be resolved. The spectrograms were kindly provided by Simone Gaub. The critical band scale is based on Ehret (1976).

Selective perception and recognition of vocal signals is that mammals with a comparably short cochlea (about 6.5 mm length) and high-frequency hearing, such as the mouse, have a poor frequency resolution, i.e., large CBs of several kHz (compare Fig. 2), while humans with a long cochlea (about 32 mm) and low-frequency hearing have a high-frequency resolution, i.e., small CBs of less than 100 Hz up to several 100 Hz. Intensity-independent spectral filtering and integration is reached in neurons of the inferior colliculus (Ehret and Merzenich, 1985, 1988; Egorova and Ehret, 2008), which is a prerequisite for the experienced perceptual constancy of sounds in the spectral domain (e.g. timbre) over a wide range of intensities.

II.D. Pitch perception Every harmonic in a complex sound, such as a vowel in human speech or a complex animal vocalization, leads, if heard in isolation, to a pitch percept according to its frequency. The harmonic complex, however, also gives rise to a global pitch percept, which corresponds to the fundamental frequency of the complex, even if it is physically absent (Terhardt, 1974a, 1978; Moore, 1989). Further, when a tone is sinusoidally amplitude-modulated with a frequency between about 50 Hz and several hundred Hz, a pitch (periodicity pitch) is heard, often corresponding to the amplitude modulation frequency (Langner, 1992). The perception of these forms of pitches may be a general property of hearing not only of humans, but also of other mammals (Heffner and Whitfield, 1976; Tomlinson and Schwarz, 1988; Deutscher et al., 2006) even if their main frequency range of hearing is in the high ultrasounds (Preisler and Schmidt, 1995). Since pitch perception requires either harmonically-structured or continuously amplitude-modulated sounds, as found in many animal vocalizations, it can be useful for the discrimination of such sounds, and of animal sounds, with a rather regular structure in the frequency and/or time domain from other sounds with non-harmonically-related frequency components, often occurring as noise in the environment. Pitch perception relies on the temporally precise coding of sound frequencies and amplitude modulations in the auditory nerve up to the inferior colliculus, from where it may be represented by a spike-rate code in maps or special areas in the auditory cortex (Schulze et al., 2002; Bendor and Wang, 2005; Kurt et al., 2008).

129

II.E. Perception of rhythms When communication sounds are vocalized in series, the single elements of the series and the silent intervals between them may be rather regular in duration and, thus, create the percept of rhythm. Rhythms are heard if the intervals between short elements (clicks, tone or noise bursts) are of about 100–2,000 ms length, which equals a repetition rate of 10–0.5 Hz. If the silence intervals are longer than about 2,000 ms, every element is heard as a single event, if the intervals are shorter than about 100 ms, the rhythm percept changes to roughness, and if the intervals are shorter than about 20 ms, the percept is a pitch (Miller and Taylor, 1948; Besser, 1967; Terhardt, 1974b; Roederer, 1975; Krumbholz et al., 2000; Zanto et al., 2006). Thus, there is a continuum from the perception of single sounds to series of sounds in a rhythm, to roughness of a sound, which can be regarded as a percept of a temporally unresolved rhythm, to the pitch percept. The series of defensive calls of a nonreceptive female mouse shown in Fig. 2 has inter-call intervals of mostly less than 100 ms, suggesting that this series produces some roughness percept. If one calculates the repetition rate of the calls, however, by adding the call durations and the durations of the following inter-call intervals, then an average repetition rate of about 3.6 Hz is the result, suggesting that this series (Fig. 2) is perceived as a rhythm. Rhythm is encoded in the centers of the auditory system by the rhythmic occurrence of discharge peaks in response to sound elements. Although rhythm perception is possible without auditory cortex (Deutscher et al., 2006), neurons in the auditory cortex detect frequency and intensity changes in the sound elements and the interval lengths between the elements of a rhythmic series (Weinberger and McKenna, 1988; Ulanovsky et al., 2003) by increases of discharge rates so that changes in rhythmic vocalization series should be easily perceptible.

III. From perceptual abilities to the perception of biological meaning III.A. General perceptual borders and categories in the time domain In the above-mentioned perceptual shift from rhythm to roughness to pitch, a border is near a 50 Hz repetition rate or a 20 ms silence interval between sound

130

Vocalizations as Specific Stimuli: Selective Perception of Vocalization

elements. This 20 ms border in the perception of a temporal pattern seems to exist as a general timedomain border for mammals. It separates, for example, as the shortest phonetic border on the voice-onset-time (VOT) continuum, the perception of the stop consonants /ba/ and /pa/ in human speech (perception of /ba/ at VOT 25 ms, /pa/ at VOT 25 ms; Pisoni and Lazarus, 1974). The same border has been found for /ba//pa/ discrimination by VOT in the chinchilla (Kuhl and Miller, 1978), for the perception of temporal order (Hirsh, 1959) or gaps in sounds (Stevens and Klatt, 1974; Penner, 1975) by humans, for the categorization of mouse pup ultrasounds into biologically relevant and irrelevant ones by their mothers (Ehret, 1992), and for separating pursuable objects (potential prey) from non-pursuable objects in the evaluation of echo delays in the auditory cortex of moustached bats (Suga and O’Neill, 1979; Suga et al., 1983). In addition, the spectral integration of frequency components for the perception of series of auditory objects in a single stream by humans or mice is possible only if the frequency components start and end together within about 20–30 ms (Pisoni, 1977; Darwin and Sutherland, 1984; Geissler and Ehret, 2002). There are two further perceptual borders in the time domain which may be general to mammals. One border is near the 100 ms inter-sound interval. In humans, a series of tones of alternating sound frequency are heard as a single stream of tones with alternating frequency if the inter-tone intervals are longer than about 100 ms. At shorter inter-tone intervals, the tones are grouped in two streams, one stream consisting of the tones of higher frequency and the other of tones of the lower frequency (Anstis and Saida, 1985). Similarly, the time window for integrating sounds picked up by one or the other ear into one single perceptual stream is close to 100 ms (Culling and Summerfield, 1998). The other boundary is near 400 ms inter-sound interval. Loudness summation of click sounds across silent intervals occurs only if the intervals are shorter than about 400 ms (Zwislocki et al., 1962; Scharf, 1978), and masking of the perception of a sound burst by a previous one loses its effectiveness if the interval between both is longer than about 400 ms (Zwicker and Feldtkeller, 1967). These perceptual borders near 100 ms and 400 ms interval durations between sound bursts are also found in the perception of a series of mouse pup wriggling calls by their mothers (Gaub and Ehret, 2005). Both perceptual borders (100 ms, 400 ms) become evident in event-related potentials (Budd and Michie, 1994; Yabe et al., 1997) measured

in humans and in neuronal responses recorded from animals (Calford and Semple, 1995; Finlayson, 1999; Fishman et al., 2001). The general perceptual borders in the time domain near 20 ms, 100 ms and 400 ms seem to represent inherent constants of the mammalian auditory system which can be, and actually are, exploited for categorizing communication sounds (vocalizations). In ethological terms, we may speak of an “inborn releasing mechanism,” which is equivalent to a filter for separating communication sounds into different classes of biological significance or biological meaning.

III.B. General perceptual borders and categories in the spectral domain Above we introduced critical bands (CBs) or equivalent psychoacoustical measures as important “tools” of the auditory system for the analysis of sounds in the spectral domain. One can go a step further and state that whatever we perceive in the spectral domain is the direct consequence of CB processing of sounds (Scharf, 1970; Moore, 1989). Since most animal vocalizations, including vowels of human speech, are spectrally rich, CB processing is the basis for perception in the spectral domain. In order to perceive a spectrally-complex type of sound against background noise and independently of the individual voice characteristics of the sender, the spectral peaks (main harmonics and/or formants) have to be resolved by CB filters. In this way, the fundamental or first formant and other harmonics/formants can be detected, so that the sound type may often be derived from the frequency ratios of the first three spectral peaks. This happens in the analysis and categorization of vowels of human speech independent of the pitch and intensity of the speaker’s voice (Flanagan, 1972; Kuhl, 1979; Grieser and Kuhl, 1989), and in the perception of wriggling calls of mouse pups by their mothers (Ehret and Riecke, 2002). Also, ultrasounds of mouse pups are perceived categorically as sound energy in one single CB with a center frequency above about 35 kHz (Ehret and Haack, 1982). Fig. 2 shows that ultrasounds of a male mouse, emitted immediately before trying to mount a female, are analyzed in only one single CB with a bandwidth of about 23 kHz. Each of the first three harmonics of defensive calls of a non-receptive female mouse can also be analyzed in one CB, while higher harmonics and the noisy parts of the calls cannot be resolved (Fig. 2). Similarly,

Selective perception and recognition of vocal signals spectrally rich monkey calls may be perceived after a CB analysis in their auditory system (Fishman et al., 2000). As for the perceptual borders in the time domain, the CB mechanism serves the general function of an “inborn releasing mechanism” for the separation of biological meaning (semiotic value) in the spectral domain.

III.C. The perception of three basic biological meanings Morton (1977) developed, and August and Anderson (1987) complemented, motivation–structural rules describing common acoustic properties in animal vocalizations with their relationship to the sender’s emotional/motivational state at the moment of vocalizing (derived from 78 mammalian species). In summary, they divided the vocalizations into three groups expressing different emotions/motivations: (1) high-frequency sounds of rather tonal character express fear; (2) harsh, noisy, loud, broadband or lowfrequency sounds express aggressiveness or hostility; (3) soft, rhythmic, low-frequency sounds express friendliness. Assuming a co-evolution of vocalizing and perception, the three categories of vocalizations suggest the perception of three categories of biological meaning. On the basis of the acoustic properties of vocalizations, hearing abilities, and the behavioral responses to species-specific calls of house mice and other mammals, Ehret (2006) proposed six rules for perception of communication sounds. Rule four deals with the perception of meaning (biological significance), which is expressed by the receiver’s innate response behavior: (1) the relatively high-frequency tonal sounds expressing appeasing or fearful emotions/motivations are attractive and thus are perceived as “attraction;” (2) the soft, low-frequency rhythmic sounds expressing “friendly” emotions are often accompanied by peaceful interactions of animals in a group and thus are perceived as “cohesion” (staying together in a common context); (3) the harsh, loud, noisy and broadband sounds expressing aggressiveness may cause avoidance behavior, thus indicating “aversion.” The theory is (Ehret, 2006) that acoustic properties in the vocalizations determine the three biological meanings which, in turn, release an adequate response behavior in the receivers after being integrated with information from other sensory modalities and processed in the context of the perceiver’s own state of

131

arousal and emotion/motivation. The above-mentioned general perceptual mechanisms in the temporal and spectral domain are very well-suited for sorting out the three basic biological meanings of communication sounds. Non-resolved frequencies and temporal variations in a loud sound over a broad frequency range of the audiogram (see defensive calls, Fig. 2) lead to the percept of roughness and noisiness being associated with aversion. Well-resolved frequency components in sounds with major energy at or above the best frequency range of hearing (see male ultrasounds, Fig. 2) lead to the percept of a high pitch being associated with attraction. Series of rather soft low-frequency sounds lead to the percept of a rhythm being associated with cohesion. In this way, all types of sounds varying in many acoustic parameters are perceptually classified for basic behavioral consequences, as they are approaching, avoiding, or staying with the sound source, which have to be compatible, however, with other sensory information and the perceiver’s state of arousal and emotion/motivation. It is important to note that the described predetermined perception of the semiotic value/biological meaning does not build on JNDs in sounds and, therefore, is not working at the perceptual limits of the auditory system. The perception of the three basic biological meanings of vocalizations signaling basic affective states, and also other meanings, relies on just meaningful differences, JMDs, allowing even acoustically-distinct call types, which can easily be discriminated via JNDs, to be classified in the same category of meaning/semiotic category (Ehret, 2006; rule two). A characteristic feature of classifying sounds for the perception of meaning is to ignore perceivable variability in some acoustic parameters of sounds.

IV. Recognition of vocal signals and conclusions We used the term “perception” of basic meanings/semiotic values, although the neural processing behind this perception is cognitive and emotive in the way that auditory information is integrated with the emotional state, motivational tendencies and memory traces in the brain of the receiver, before a certain response can occur. This integration, which leads to a modulation of the responsiveness, may also cause modulation of the semiotic value/meaning of a vocalization. It becomes very obvious when the responses of animals to vocalizations of others change with changing

132

Vocalizations as Specific Stimuli: Selective Perception of Vocalization

hormonal states, for example during the reproductive cycle or estrous cycle. The following experiment may illustrate such a change: the perception of wriggling calls of mouse pups was modulated during the estrous cycle of pup naïve females (Ehret and Schmid, 2009). During diestrous and proestrous females discriminated the acoustic quality of the sounds, during estrous they were highly responsive and did not discriminate, during metestrous their responsiveness was low and they did not discriminate. This example shows that only on a certain emotional/motivational background acoustic properties of sounds become relevant in perception. Considering this, it is conceivable that the non-receptive female (Fig. 2) was not attracted by the ultrasounds of the male, and defended herself against the mating intentions of the male, while the sexually highly-motivated male was not repelled by the defensive calls of the female. Differing from such situations of modulation of the biological meaning of vocalizations by changes in the state of the organism, learning certain acoustic properties of vocalizations may become important for adjusting behavioral responses. This type of vocal recognition opens and expands the perception of basic biological meanings to the recognition of individuals and/or kin, sex and age in many species, especially those living in structured groups, such as vervet monkeys (Cheney and Seyfarth, 1982), rhesus monkeys (Rendall et al., 1996), spotted hyenas (Holekamp et al., 1999), elephants (McComb et al., 2000) and pigs (Illmann et al., 2002). In these situations of learning to associate sound properties with a certain biological meaning, which is derived from the consequences of interaction of the vocalizing with the perceiving animal, JNDs may become important for assessing the caller’s behavioral state and identity, and discriminate these from the identities of other individuals by perceiving small acoustic differences in the voices. By learning to pay attention to subtle differences in vocalizations, the perceptual abilities of the species’ auditory systems may fully be exploited.

References Anstis, S., Saida, S., 1985. Adaptation to auditory streaming of frequency-modulated tones. J. Exp. Psychol. Hum. Percept. Perform. 11, 257–271. August, P.V., Anderson, J.G.T., 1987. Mammal sounds and motivation-structural rules: a test to the hypothesis. J. Mammal. 68, 1–9.

Bendor, D., Wang, X., 2005. The neuronal representation of pitch in primate auditory cortex. Nature 436, 1161–1165. Besser, G.M., 1967. Some physiological characteristics of auditory flutter fusion in man. Nature 214, 17–19. Brand, A., Urban, A., Grothe, B., 2000. Duration tuning in the mouse auditory midbrain. J. Neurophysiol. 84, 1790–1799. Brown, P.E., 1976. Vocal communication in the pallid bat, Antrozous pallidus. Z. Tierpsychol. 41, 34–54. Bruns, V., 1976. Peripheral auditory tuning for fine frequency analysis by the Cf-FM bat, Rhinolophus ferrumequinum. II. Frequency mapping in the cochlea. J. Comp. Physiol. 106, 87–97. Budd, T.W., Michie, P.T., 1994. Facilitation of the N1 peak of the auditory ERP at short stimulus intervals. NeuroReport 5, 2513–2516. Calford, M.B., Semple, M.N., 1995. Monaural inhibition in cat auditory cortex. J. Neurophysiol. 73, 1876–1891. Cheney, D.L., Seyfarth, R.M., 1982. Recognition of individuals within and between groups of free-ranging vervet monkeys. Am. Zool. 22, 519–529. Covey, E., Casseday, J.H., 1999. Timing in the auditory system of the bat. Annu. Rev. Physiol. 61, 457–476. Culling, J.F., Summerfield, Q., 1998. Measurements of the binaural temporal window using a detection task. J. Acoust. Soc. Am. 103, 3540–3553. Darwin, C.J., Sutherland, N.S., 1984. Grouping frequency components of vowels: when is a harmonic not a harmonic. Quart. J. Exp. Psychol. 36A, 193–208. Deutscher, A., Kurt, S., Scheich, H., Schulze, H., 2006. Cortical and subcortical sides of auditory rhythms and pitches. NeuroReport 17, 853–856. Egorova, M., Ehret, G., 2008. Tonotopy and inhibition in the midbrain inferior colliculus shape spectral resolution of sounds in neural critical bands. Eur. J. Neurosci. 28, 675–692. Ehret, G., 1974. Age-dependent hearing loss in normal hearing mice. Naturwissenschaften 61, 506–507. Ehret, G., 1976. Critical bands and filter characteristics in the ear of the house mouse (Mus musculus). Biol. Cybernetics, 24, 35–42. Ehret, G., 1977. Comparative psychoacoustics: perspectives of peripheral sound analysis in mammals. Naturwissenschaften 64, 461–470. Ehret, G., 1983. Development of hearing and response behavior to sound stimuli: behavioural studies. In: Romand, R. (Ed.), Development of Auditory and Vestibular Systems. Academic Press, New York, NY, pp. 211–237. Ehret, G., 1988. Auditory development: psychophysical and behavioral acpects. In: Meisami, E., Timiras, P.S. (Eds.), Handbook of Human Growth and Developmental Biology. CRC Press, Boca Raton, FL, pp. 141–154. Ehret, G., 1989. Hearing in the mouse. In: Dooling, R.J., Hulse, S.H. (Eds.), The Comparative Psychology of Audition: Perceiving Complex Sounds. Lawrence Erlbaum, Hillsdale, NJ, pp. 3–32. Ehret, G., 1992. Categorical perception of mouse-pup ultrasounds in the temporal domain. Anim. Behav. 43, 409–416. Ehret, G., 1995. Auditory frequency resolution in mammals: from neuronal representation to perception. In: Manley, G.A., Klump, G.M., Köppl, C., Fastl, H.,

Selective perception and recognition of vocal signals Oeckinghaus, H. (Eds.), Advances in Hearing Research. World Scientific, Singapore, pp. 387–397. Ehret, G., 2006. Common rules of communication sound perception. In: Kanwal, J., Ehret, G. (Eds.), Behavior and Neurodynamics for Auditory Communication. Cambridge University Press, Cambridge, UK, pp. 85–114. Ehret, G., Haack, B., 1982. Ultrasound recognition in house mice: Key-stimulus configuration and recognition mechanisms. J. Comp. Physiol. A, 148, 245–251. Ehret, G., Merzenich, M.M., 1985. Auditory midbrain responses parallel spectral integration phenomena. Science 227, 1245–1247. Ehret, G., Merzenich, M.M., 1988. Complex sound analysis (frequency resolution, filtering and spectral integration) by single units of the inferior colliculus of the cat. Brain Res. Rev. 13, 139–163. Ehret, G., Riecke, S., 2002. Mice and humans perceive multiharmonic communication sounds in the same way. Proc. Nat. Acad. Sci. USA, 99, 479–482. Ehret, G., Schmid, C., 2009. Reproductive cycle-dependent plasticity of perception of acoustic meaning in mice. Physiol. Behav. 96, 428–433. Fay, R.R., 1988. Hearing in Vertebrates: A Psychophysics Databook. Hill-Fay Associates, Winnetka, IL. Finlayson, P.G., 1999. Post-stimulatory suppression, facilitation and tuning for delays shape responses of inferior colliculus neurons to sequential pure tones. Hear. Res. 131, 177–194. Fishman, Y.I., Reser, D.H., Arezzo, J.C., Steinschneider, M., 2000. Complex tone processing in primary auditory cortex of the awake monkey. II. Pitch versus critical band representation. J. Acoust. Soc. Am. 108, 247–262. Fishman, Y.I., Reser, D.H., Arezzo, J.C., Steinschneider, M., 2001. Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey. Hear. Res. 151, 167–187. Flanagan, J.L., 1972. Speech Analysis, Synthesis, and Perception. Springer-Verlag, Berlin, Germany. Gaub, S., Ehret, G., 2005. Grouping in auditory temporal perception and vocal production is mutually adapted: the case of wriggling calls of mice. J. Comp. Physiol. A. 191, 1131–1135. Geissler, D.B., Ehret, G., 2002. Time-critical integration of formants for perception of communication calls in mice. Proc. Natl. Acad. Sci. USA 99, 9021–9025. Gould, E., 1971. Studies of maternal-infant communication and development of vocalization in the bats Myotis and Eptesicus. Commun. Behav. Biol. 5, 262–313. Greenwood, D.D., 1961. Auditory masking and the critical band. J. Acoust. Soc. Am. 33, 484–502. Greenwood, D.D., 1990. A cochlear frequency-position function for several species – 29 years later. J. Acoust. Soc. Am. 87, 2592–2605. Grieser, D.A., Kuhl, P.K., 1989. Categorization of speech by infants: support for speech-sound prototypes. Dev. Psychol. 25, 577–588. Haack, B., Markl, H., Ehret, G., 1983. Sound communication between parents and offspring. In: Willott, J.F. (Ed.), The Auditory Psychobiology of the Mouse. Charles C. Thomas, Springfield, IL, pp. 57–97.

133

Haskins, R., 1977. Effect of kitten vocalizations on maternal behavior. J. Comp. Physiol. Psychol. 91, 830–838. Heffner, H., Whitfield, I.C., 1976. Perception of the missing fundamental by cats. J. Acoust. Soc. Am. 59, 915–919. Hirsh, I.J., 1959. Auditory perception of temporal order. J. Acoust. Soc. Am. 31, 759–767. Holekamp, K.E., Boydston, E.E., Szykman, M., Graham, I., Nutt, K.J., Birch, S., Piskiel, A., Singh, M., 1999. Vocal recognition in the spotted hyena and its possible implications regarding the evolution of intelligence. Anim. Behav. 58, 383–395. Illmann, G., Schrader, L., Špinka, M., Šustr, P., 2002. Acoustical mother–offspring recognition in pigs (Sus scrofa domestica). Behaviour 139, 487–505. Klink, K.B., Klump, G.M., 2004. Duration discrimination in the mouse (Mus musculus). J. Comp. Physiol. A, 190, 1039–1046. Krumbholz, K., Patterson, R.D., Pressnitzer, D., 2000. The lower limit of pitch as determined by rate discrimination. J. Acoust. Soc. Am. 108, 1170–1180. Kuhl, P.K., 1979. Models and mechanisms in speech perception. Species comparisons provide further contributions. Brain Behav. Evol. 16, 374–408. Kuhl, P.K., Miller, J.D., 1978. Speech perception by the chinchilla: identification function for synthetic VOT stimuli. J. Acoust. Soc. Am. 63, 905–917. Kurt, S., Deutscher, A., Crook, J.M., Ohl, F.W., Budinger, E., Moeller, C.K., Scheich, H., Schulze, H., 2008. Auditory cortical contrast enhancing by global winnertake-all inhibitory interactions. PLoS ONE 3, e1735. Langner, G., 1992. Periodicity coding in the auditory system. Hear. Res. 60, 115–142. Maiwald, D., 1967. Ein Funktionsschema des Gehörs zur Beschreibung der Erkennbarkeit kleiner Frequenz- und Amplitudenänderungen. Acustica 18, 81–92. McComb, K., Moss, C., Sayialel, S., Baker, L., 2000. Unusually extensive networks of vocal recognition in African elephants. Anim. Behav. 59, 1103–1109. Miller, G., Taylor, W., 1948. The perception of repeated bursts of noise. J. Acoust. Soc. Am. 20, 171–182. Moore, B.C.J., 1989. An Introduction to the Psychology of Hearing, 3rd edn. Academic Press, London, UK. Morton, E.S., 1977. On the occurrence and significance of motivation-structural rules in some bird and mammal sounds. Amer. Nat. 111, 855–869. Penner, M.J., 1975. Persistence and integration: two consequences of a sliding integrator. Percept. Psychophys. 18, 114–120. Pisoni, D.B., 1977. Identification and discrimination of the relative onset time of two component tones: Implications for voicing perception in stops. J. Acoust. Soc. Am. 61, 1352–1361. Pisoni, D.B., Lazarus, J.H., 1974. Categorical and noncategorical modes of speech perception along the voicing continuum. J. Acoust. Soc. Am. 55, 328–333. Preisler, A., Schmidt, S., 1995. Virtual pitch formation in the ultrasonic range. Naturwissenschaften 82, 45–47. Rendall, D., Rodman, P.S., Emond, R.E., 1996. Vocal recognition of individuals and kin in free-ranging rhesus monkeys. Anim. Behav. 51, 1007–1015.

134

Vocalizations as Specific Stimuli: Selective Perception of Vocalization

Roederer, J.G., 1975. Introduction to the Physics and Psychophysics of Music. Springer, New York, NY. Scharf, B., 1970. Critical band. In: Tobias, J.V. (Ed.), Foundations of Modern Auditory Theory, Vol. 1. Academic Press, New York, NY, pp. 159–202. Scharf, B., 1978. Loudness. In: Carterette, E.C., Friedman, M.P. (Eds.), Handbook of Perception, Vol. IV. Hearing. Academic Press, New York, NY, pp. 187–242. Schulze, H., Hess, A., Ohl, F.W., Scheich, H., 2002. Superposition of horseshoe-like periodicity and linear tonotopic maps in auditory cortex of the Mongolian gerbil. Eur. J. Neurosci. 15, 1077–1084. Shalter, M.D., Fentress, J.C., Young, G.W., 1977. Determinants of response of wolf pups to auditory signals. Behaviour 60, 98–114. Stern, D.N., Spieker, S., MacKain, K., 1982. Intonation contours as signals in maternal speech to prelinguistic infants. Dev. Psychol. 18, 727–735. Stevens, K.N., Klatt, D.H., 1974. Role of formant transitions in the voiced-voiceless distinction for stops. J. Acoust. Soc. Am. 55, 653–659. Suga, N., Jen, P.H.S., 1977. Further studies on the peripheral auditory system of “CF-FM” bats specialized for fine frequency analysis of Doppler-shifted echoes. J. Exp. Biol. 69, 207–232. Suga, N., O’Neill, W.E., 1979. Neural axis representing target range in the auditory cortex of the mustache bat. Science 206, 351–353. Suga, N., O’Neill, W.E., Kujirai, K., Manabe, T., 1983. Specialization of “combination-sensitive” neurons for

processing of complex biosonar signals in the auditory cortex of the mustached bat. J. Neurophysiol. 49, 1573–1626. Terhardt, E., 1974a. Pitch, consonance and harmony. J. Acoust. Soc. Am. 55, 1061–1069. Terhardt, E., 1974b. On the perception of periodic sound fluctuations (roughness). Acustica 30, 201–213. Terhardt, E., 1978. Psychoacoustic evaluation of musical sounds. Percept. Psychophys. 23, 483–492. Tomlinson, R.W., Schwarz, D.W., 1988. Perception of the missing fundamental in nonhuman primates. J. Acoust. Soc. Am. 84, 560–565. Ulanovsky, N., Las, L., Nelken, I., 2003. Processing of lowprobability sounds by cortical neurons. Nature Neurosci. 6, 391–398. Weinberger, N.M., McKenna, T.M., 1988. Sensitivity of single neurons in auditory cortex to contour: toward a neurophysiology of music perception. Music Percept. 5, 355–390. Yabe, H., Tervaniemi, M., Näätänen, R., 1997. Temporal window of integration revealed by MMN to sound ommission. NeuroReport 8, 1971–1974. Zanto, T.P., Snyder, J.S., Large, E.W., 2006. Neural correlates of rhythmic expectancy. Adv. Cog. Psychol. 2, 221–231. Zwicker, E., Feldtkeller, R., 1967. Das Ohr als Nachrichtenempfänger. Hirzel, Stuttgart, Germany. Zwislocki, J., Hellmann, R.P., Verillo, R.T., 1962. Threshold of audibility for short pulses. J. Acoust. Soc. Am. 34, 1648–1652.

Suggest Documents