Skip to main content

Currently Skimming:

Speech Communication -- An Overview
Pages 76-104

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 76...
... This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing-along with ways to coalesce the fundamental issues of recognition, synthesis, and coding.
From page 77...
... Invention of the telephone, and the beginning of telecommunications as a business to serve society, stimulated work in network theory, transducer research, filter design, spectral analysis, psychoacoustics, modulation methods, and radio and cable transmission techniques. Early on, the acoustics and physiology of speech generation were identified as critical issues for understanding.
From page 78...
... But computing technology soon grew beyond data sorting for business and algorithm simulation for science. Inexpensive arithmetic and economical storage, along with expanding knowledge of information signals, permitted computers to take on functions more related to decision making-understanding subtle intents of the user and initiating ways to meet user needs.
From page 79...
... HEM Calf O~ .
From page 80...
... The resonators were activated by vibrating reeds analogous to the vocal cords. The disparity with natural articulatory shapes points up the nonuniqueness between sound spectrum and resonator shape (i.e., job security for the ventriloquist)
From page 81...
... As we approach the threshold of the twenty-first century, fledging systems are being demonstrated for translating telephony. These systems require automatic recognition of large fluent vocabularies in one language by a great variety of talkers; transmission of the inherent speech information; and natural-quality synthesis in a foreign language preferably with the exact voice quality of the original talker.
From page 82...
... Casual informal conversational speech, with all its vagaries and nongrammatical structure, poses special challenges in devising tractable models of grammar, syntax, and semantics. TECHNOLOGY STATUS A fundamental challenge in speech processing is how to represent, quantify, and interpret information in the speech signal.
From page 83...
... Signal quality typically diminishes with coding rate, with a notable "knee" at about 8k bits/second. Nevertheless, vocoder rates of 4k and 2k bits/second are finding use for digital encryption over voice bandwidth channels.
From page 84...
... In coding wideband audio signals the overt use of auditory perception factors within the coding algorithm ("hearing-specific" coders) has been remarkably successful, allowing wideband signal representation with an average of less than two bits per sample.
From page 85...
... Vocabularies of a couple hundred words and a grammar that permits billions of sentences about a specific task say, obtaining airline flight information are
From page 86...
... Autodirective microphone arrays. In many speech communication environments, particularly in teleconferencing and in the use of voice
From page 87...
... New research on three-dimensional arrays and multiple beam forming is leading to high-quality signal capture from designated spatial volumes. CRITICAL DIRECTIONS IN SPEECH RESEARCH Physics of Speech Generation; Fluid-Dynamic Principles The aforementioned lack of naturalness in speech generated from compact specifications stems possibly from two sources.
From page 88...
... Both of these aspects affect speech quality and certainly affect the ability to duplicate individual voice characteristics. Traditional synthesis takes as its point of departure a sourcefilter approximation to the vocal system, wherein source and filter do not interact.
From page 89...
... Given the three-dimensional, time-varying, soft-walled vocal tract, excited by periodically valved flow at the vocal cords and by turbulent flow at constrictions, the Navier-Stokes equation can be solved numerically on a fine space-time grid to produce a remarkably realistic description of radiated sound pressure. Nonlinearities of excitation, generation of turbulence, cross-modes of the system, and acoustic interaction between sources and resonators are taken into account.
From page 90...
... Computational Models of Language Already mentioned is the criticality of language models for fluent, large-vocabulary speech recognition. Tractable models that account for grammatical behavior (in spoken language)
From page 91...
... Statistical constraints in spoken language are as powerful as those in text and can be used to complement substantially the traditional approaches to parsing and determining parts of speech. Information Processing in the Auditory System; Auditory Behavior Mechanics and operation of the peripheral ear are relatively well understood.
From page 92...
... The estimate is based on the likelihood ratio (ratio of the probability that the name string belongs to language i, to the average probability of the name string across all languages)
From page 94...
... Relatively untouched, so far, is the esoteric behavior of binaural release from masking, wherein interaural phase markedly controls perceptibility. Coalescing Speech Coding, Synthesis, and Recognition The issues of coding, recognition, and synthesis are not disjointthey are facets of the same underlying process of speech and hearing.
From page 95...
... Spectral differences between real and synthetic signals are perceptually weighted and used in a closed loop to adjust iteratively the parameters of the synthesis, driving the difference to a minimum. The voice mimic attempts to generate a synthetic speech signal that, within perceptual accuracy, duplicates an input of arbitrary natural speech.
From page 96...
... SNR of a tingle microphone Unoteered 1 Number of beams 7 25 63 FIGURE 13b Signal-to-noise ratios measured on two octaves of speech for a 7 x 7 x 7 rectangular microphone array positioned at the ceiling center in a computer-simulated hard-walled room of dimensions 7 x 5 x 3 meters. Source images through third order are computed, and multiple beams are steered to the source and its images.
From page 97...
... Three Dimensional Sound Capture and Projection High-quality, low-cost electret microphones and economical digital signal processors permit the use of large microphone arrays for hands-free sound capture in hostile acoustic environments. Moreover, three-dimensional arrays with beam steering to the sound source and
From page 98...
... Using the force feedback glove, the wearer can compute a virtual object and sense tactHy the relative position of the object and its programmed compliance. Alternatively, the force feedback device can be programmed for force output sequences for medical rehabilitation and exercise of inured hands.
From page 99...
... The system includes an autodirecOve beam-steering microphone array, speech recognizes control of caI1 setup and video conferencing display, text-to-speech voice response, image compression for digital transmission, and an interface to the MINT BeN Laboratories experimental high-speed packet data network, XUNET (Fraser et ala 1992~.
From page 100...
... Auto-directive microphone arrays permit hands-free sound pickup. System features are controlled by automatic recognition of spoken commands.
From page 101...
... On the speech technology side, this means integration into the information system of the piece parts for speech recognition, synthesis, verification, low bit-rate coding, and handsfree sound pickup. Initial efforts in this direction are designed for conferencing over digital telephone channels (Berkley and Flanagan, 1990~.
From page 102...
... This is among the more problematic estimates, but improved models of hearing and nonlinear signal processing for automatic recognition will narrow the gap between human and machine performance on noisy signals. Comparable recognition performance by human and machine seems achievable for limited vocabularies and noisy inputs.
From page 103...
... Furui, S., and Sondhi, M., eds., Advances in Speech Signal Processing, Marcel Dekker, New York, 1992. Furui, S., Digital Speech Processing, Synthesis, and Recognition, Marcel Dekker, New York, 1989.
From page 104...
... R., and B-H. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, N.J., 1993.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.