Skip to main content

Currently Skimming:

Scientific Bases of Human-Machine Communication by Voice
Pages 15-33

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 15...
... speech recognition and understanding for voice input, and (4) usability factors related to how humans interact with machines.
From page 16...
... ) Clearly, this reliance on the modest intelligence of the animal source of power was severely limiting, and even that limited voice control capability disappeared as animal power was replaced by fossil fuel power.
From page 17...
... With a microphone to pick up the human voice and a speaker or headphones to deliver a synthetic voice from the system to the human ear, the human can communicate with the system, which in turn can command other machines or cause desired actions to occur. In order to do this, the voice communication system must take in the human voice input, determine what action is called for, and pass information to other systems or machines.
From page 18...
... concerning the goals, the value, and the potential for success of research in speech recognition stimulated much valuable discussion and thought in the late 1960s and may even have dampened the enthusiasm of engineers and scientists for a while, but ultimately the research community answered with optimistic vigor. Although it is certainly true that the ambitious goal of providing a machine with the speaking and understanding capability of a native speaker is still far away, the past 25 years have seen significant progress in both speech synthesis and recognition, so that effective systems for human-machine communication by voice are now being deployed in many important applications, and there is little doubt that applications will increase as the technology matures.
From page 19...
... Computer-based laboratory facilities quickly became indispensable tools for speech research, and it is not an exaggeration to say that one of the strongest motivating forces in the modern field of digital signal processing was the need to develop digital filtering, spectrum analysis, and signal modeling techniques for simulating and implementing speech analysis and synthesis systems (Gold and Rader, 1969; Oppenheim and Schafer, 1975; Rabiner and Gold, 1975; Rabiner and Schafer, 1978~. In addition to its capability to do the numerical computations called for in analysis and synthesis of speech, the digital computer can provide the intelligence necessary for human-machine communication by voice.
From page 20...
... This high performance is not limited to special-purpose microcomputers. Currently available workstations and personal computers also are becoming fast enough to do the real-time operations required for human-machine voice communication without any coprocessor support.
From page 21...
... SPEECH ANALYSIS AND SYNTHESIS In human-machine communication by voice, the basic information-carrying medium is speech. Therefore, fundamental knowledge of the speech signal how it is produced, how information is encoded in it, and how it is perceived is critically important.
From page 22...
... The vocal tract system response also changes with time to shape the spectrum of the signal to produce appropriate resonances orformants. With such a model as a basis, the problem of speech analysis is concerned with finding the parameters of the model given a speech signal.
From page 23...
... of such a digital waveform representation is simply the number of samples per second times the number of bits per sample. Since the bit rate determines the channel capacity required for digital transmission or the memory capacity required for storage of the speech signal, the major concern in digital speech coding is to minimize the bit rate while maintaining an acceptable perceived fidelity to the original speech signal.
From page 24...
... Linear predictive analysis is used to estimate parameters of the vocal tract system model in Figure 3, and, either directly or indirectly, this model serves as the basis for a digital representation of the speech signal. Variations on the LPC theme include adaptive differential PCM (ADPCM)
From page 25...
... Speech analysis and synthesis have received much attention from researchers for over 60 years, with great strides occurring in the 25 years since digital computers became available for speech research. Synthesis research has drawn support from many fields, including acoustics, digital signal processing, linguistics, and psychology.
From page 26...
... Learning how to represent phonetic elements, syllables, stress, emphasis, etc., in a form that can be effectively coupled to speech modeling, analysis, and synthesis techniques should continue to have high priority in speech research. Increased knowledge in this area is obviously essential for text-to-speech synthesis, where the goal is to ensure that linguistic structure is correctly introduced into the synthetic waveform, but more effective application of this knowledge in speech analysis techniques could lead to much improved analysis/synthesis coders as well.
From page 27...
... is a major part of the general problem of human-machine communication by voice. As in the case of speech synthesis, it is critical to build on fundamental knowledge of speech production and perception and to understand how linguistic structure of language is expressed and manifested in the speech signal.
From page 28...
... The "front-end" processing extracts a parametric representation or input pattern from the digitized input speech signal using the same types of techniques (e.g., linear predictive analysis or filter banks) that are used in speech analysis/synthesis systems.
From page 29...
... As in the case of speech synthesis, a continu~ng goal must be to understand how linguistic structure is encoded in the acoustic speech waveform, and, in the case of speech recognition, to learn how to incorporate such models into both the pattern analysis and pattern matching phases of the problem. Robustness.
From page 30...
... The analysis-by-synthesis paradigm of Figure 6 may also be useful for speech recognition applications. Indeed, if the block labeled "Model Parameter Generator" were a speech recognizes producing text or some symbolic representation as output, the block labeled "Speech Synthesis Model" could be a text-tospeech synthesizer.
From page 31...
... The paradigm of the voice-controlled team and wagon has features that are very similar to those found in some computer-based systems in use today that is, a limited vocabulary of acoustically distinct words, spoken in isolation, with an alternate communication/control mechanism conveniently accessible to the human in case it is necessary to override the voice control system. Given a computer system with such constrained capabilities, we could certainly go looking for applications for it.
From page 32...
... CONCLUSION Along the way to giving machines human-like capability to speak and understand speech there remains much to be learned about how structure and meaning in language are encoded in the speech signal and about how this knowledge can be incorporated into usable systems. Continuing improvement in the effectiveness and naturalness of human-machine voice communication systems will depend on cre
From page 33...
... Maragos, P., "Fractal Aspects of Speech Signals: Dimension and Interpolation," Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 417-420, Toronto, May 1991.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.