Communication by unvoiced speech: the role of whispering. (17/576)

Most studies on whispering deal with its production and perception, neglecting its communicative role. I have focused on this, especially some social and psychobiological objectives. I have combined a general inquiry into the use of unvoiced speech with stimulus-response experiments on particular signal properties. (1) Analyses of answers to queries revealed that judgments about whispering depend on the social contexts. In the private domain it plays a clearly positive role, but in the public domain it is more problematical. Two causative factors were identified as relevant: (a) an 'ingroup' function of whispering which could induce negative 'outgroup' effects in co-listeners, and (b) a psychobiological component of whispering which could affect the auditory vigilance of co-listeners who were not addressed personally by the signaling, but often wanted to understand a whispered message. (2) Analyses of experimental data confirmed the relevance of these factors. Additionally, they showed that unvoiced speech has a limited transmission range, and is easily masked by background noise. Taken together, the results suggest that whispering is explained best as a close-distance signal adapted for private use among partners.  (+info)

Compensation for pitch-shifted auditory feedback during the production of Mandarin tone sequences. (18/576)

Recent research has found that while speaking, subjects react to perturbations in pitch of voice auditory feedback by changing their voice fundamental frequency (F0) to compensate for the perceived pitch-shift. The long response latencies (150-200 ms) suggest they may be too slow to assist in on-line control of the local pitch contour patterns associated with lexical tones on a syllable-to-syllable basis. In the present study, we introduced pitch-shifted auditory feedback to native speakers of Mandarin Chinese while they produced disyllabic sequences /ma ma/ with different tonal combinations at a natural speaking rate. Voice F0 response latencies (100-150 ms) to the pitch perturbations were shorter than syllable durations reported elsewhere. Response magnitudes increased from 50 cents during static tone to 85 cents during dynamic tone productions. Response latencies and peak times decreased in phrases involving a dynamic change in F0. The larger response magnitudes and shorter latency and peak times in tasks requiring accurate, dynamic control of F0, indicate this automatic system for regulation of voice F0 may be task-dependent. These findings suggest that auditory feedback may be used to help regulate voice F0 during production of bi-tonal Mandarin phrases.  (+info)

The role of spectral and temporal cues in voice gender discrimination by normal-hearing listeners and cochlear implant users. (19/576)

The present study investigated the relative importance of temporal and spectral cues in voice gender discrimination and vowel recognition by normal-hearing subjects listening to an acoustic simulation of cochlear implant speech processing and by cochlear implant users. In the simulation, the number of speech processing channels ranged from 4 to 32, thereby varying the spectral resolution; the cutoff frequencies of the channels' envelope filters ranged from 20 to 320 Hz, thereby manipulating the available temporal cues. For normal-hearing subjects, results showed that both voice gender discrimination and vowel recognition scores improved as the number of spectral channels was increased. When only 4 spectral channels were available, voice gender discrimination significantly improved as the envelope filter cutoff frequency was increased from 20 to 320 Hz. For all spectral conditions, increasing the amount of temporal information had no significant effect on vowel recognition. Both voice gender discrimination and vowel recognition scores were highly variable among implant users. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to comparable speech processing (4-8 spectral channels). The results suggest that both spectral and temporal cues contribute to voice gender discrimination and that temporal cues are especially important for cochlear implant users to identify the voice gender when there is reduced spectral resolution.  (+info)

Speech recognition with amplitude and frequency modulations. (20/576)

Amplitude modulation (AM) and frequency modulation (FM) are commonly used in communication, but their relative contributions to speech recognition have not been fully explored. To bridge this gap, we derived slowly varying AM and FM from speech sounds and conducted listening tests using stimuli with different modulations in normal-hearing and cochlear-implant subjects. We found that although AM from a limited number of spectral bands may be sufficient for speech recognition in quiet, FM significantly enhances speech recognition in noise, as well as speaker and tone recognition. Additional speech reception threshold measures revealed that FM is particularly critical for speech recognition with a competing voice and is independent of spectral resolution and similarity. These results suggest that AM and FM provide independent yet complementary contributions to support robust speech recognition under realistic listening situations. Encoding FM may improve auditory scene analysis, cochlear-implant, and audiocoding performance.  (+info)

Effects of talker variability on perceptual learning of dialects. (21/576)

Two groups of listeners learned to categorize a set of unfamiliar talkers by dialect region using sentences selected from the TIMIT speech corpus. One group learned to categorize a single talker from each of six American English dialect regions. A second group learned to categorize three talkers from each dialect region. Following training, both groups were asked to categorize new talkers using the same categorization task. While the single-talker group was more accurate during initial training and test phases when familiar talkers produced the sentences, the three-talker group performed better on the generalization task with unfamiliar talkers. This cross-over effect in dialect categorization suggests that while talker variation during initial perceptual learning leads to more difficult learning of specific exemplars, exposure to intertalker variability facilitates robust perceptual learning and promotes better categorization performance of unfamiliar talkers. The results suggest that listeners encode and use acoustic-phonetic variability in speech to reliably perceive the dialect of unfamiliar talkers.  (+info)

Effects of reverberation time on the cognitive load in speech communication: theoretical considerations. (22/576)

The paper presents a theoretical analysis of possible effects of reverberation time on the cognitive load in speech communication. Speech comprehension requires not only phonological processing of the spoken words. Simultaneously, this information must be further processed and stored. All this processing takes place in the working memory, which has a limited processing capacity. The more resources that are allocated to word identification, the fewer resources are therefore left for the further processing and storing of the information. Reverberation conditions that allow the identification of almost all words may therefore still interfere with speech comprehension and memory storing. These problems are likely to be especially serious in situations where speech has to be followed continuously for a long time. An unfavourable reverberation time (RT) then could contribute to the development of cognitive fatigue, which means that working memory resources are gradually reduced. RT may also affect the cognitive load in two other ways: RT may change the distracting effects of a sound and a person's mood. Both effects could influence the cognitive load of a listener. It is argued that we need studies of RT effects in realistic long-lasting listening situations to better understand the effect of RT on speech communication. Furthermore, the effect of RT on distraction and mood need to be better understood.  (+info)

The emergence of mature gestural patterns in the production of voiceless and voiced word-final stops. (23/576)

The organization of gestures was examined in children's and adults' samples of consonant-vowel-stop words differing in stop voicing. Children (5 and 7 years old) and adults produced words from five voiceless/voiced pairs, five times each in isolation and in sentences. Acoustic measurements were made of vocalic duration, and of the first and second formants at syllable center and voicing offset. The predicted acoustic correlates of syllable-final voicing were observed across speakers: vocalic segments were shorter and first formants were higher in words with voiceless, rather than voiced, final stops. In addition, the second formant was found to differ depending on the voicing of the final stop for all speakers. It was concluded that by 5 years of age children produce words ending in stops with the same overall gestural organization as adults. However, some age-related differences were observed for jaw gestures, and variability for all measures was greater for children than for adults. These results suggest that children are still refining their organization of articulatory gestures past the age of 7 years. Finally, context effects (isolation or sentence) showed that the acoustic correlates of syllable-final voicing are attenuated when words are produced in sentences, rather than in isolation.  (+info)

The control of orofacial movements in speech. (24/576)

Rapid, complex movements of orofacial structures are essential to produce the sounds of speech. A central problem in speech production research is to discover the neural sources that generate the control signals supplied to motoneurons during speaking. Speech movement production appears to share organizational principles with other motor behaviors; thus speech movements probably arise from an interaction of centrally generated command signals with sensory information. That speech movements are ultimately linked to the perception of language, however, has led many investigators to suggest that speech movement control involves unique features, features that may be linked to abstract linguistic units.  (+info)