National Academy of Sciences | 150 Year Anniversary

Questions? Call 800-624-6242

| Items in cart [0]

The National Academies Press

HARDBACK
price:$89.95
add to cart

Rights & Permissions

topleft topright

Voice Communication Between Humans and Machines (1994)
National Academy of Sciences (NAS)

Citation Manager

. "Toward the Ultimate Synthesis/Recognition System." Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press, 1994.

Please select a format:

BibTeX EndNote RefMan


Page
451
bottomleft bottomright

The following HTML text is provided to enhance online readability. Many aspects of typography translate only awkwardly to HTML. Please use the page image as the authoritative form to ensure accuracy.


Page 451

VISION OF THE FUTURE

For the majority of humankind, speech production and understanding are quite natural and unconsciously acquired processes performed quickly and effectively throughout our daily lives. By the year 2001, speech synthesis and recognition systems are expected to play important roles in advanced user-friendly human-machine interfaces (Wilpon, in this volume). Speech recognition systems include not only those that recognize messages but also those that recognize the identity of the speaker. Services using these systems will include database access and management, various order-made services, dictation and editing, electronic secretarial assistance, robots (e.g., the computer HAL in 2001—A Space Odyssey), automatic interpreting (translating) telephony, security control, and aids for the handicapped (e.g., reading aids for the blind and speaking aids for the vocally handicapped) (Levitt, in this volume). Today, many people in developed countries are employed to sit at computer terminals wearing telephone headsets and transfer information from callers to computer systems (databases) and vice versa (information and transaction services). According to the basic idea that boring and repetitive tasks done by human beings should be taken over by machines, these information-transfer workers should be replaced by speech recognition and synthesis machines. Dictation or voice typewriting is expected to increase the speed of input to computers and to allow many operations to be carried out without hand or eye movements that distract attention from the task on the display.

Figure 1 shows a typical structure for task-specific voice control and dialogue systems. Although the speech recognizer, which converts spoken input into text, and the language analyzer, which extracts meaning from text, are separated into two boxes in the figure, it is desirable that they perform with tight mutual connection, since it is necessary to use semantic information efficiently in the recognizer to obtain correct texts. How to combine these two functions is a most important issue, especially in conversational speech recognition (understanding). Then, the meanings extracted by the language analyzer are used to drive an expert system to select the desired action, to issue commands to various systems, and to receive data from these systems. Replies from the expert system are transferred to a text generator that constructs reply texts. Finally, the text replies are converted into speech by a text-to-speech synthesizer. "Synthesis from concepts" is performed by the combination of the text generator and the text-to-speech synthesizer.

Figure 2 shows hierarchical relationships among the various types

Page
451
Front Matter (R1-R10)
Dedication (1-4)
Voice Communication Between Humans and Machines--An Introduction (5-12)
Scientific Bases of Human-Machine Communication by Voice (13-14)
Scientific Bases of Human-Machine Communication by Voice (15-33)
The Role of Voice in Human-Machine Communication (34-75)
Speech Communication -- An Overview (76-104)
Speech Synthesis Technology (105-106)
Computer Speech Synthesis: Its Status and Prospects (107-115)
Models of Speech Synthesis (116-134)
Linguistic Aspects of Speech Synthesis (135-156)
Speech Recognition Technology (157-158)
Speech Recognition Technology: A Critique (159-164)
State of the Art in Continuous Speech Recognition (165-198)
Training and Search Methods for Speech Recognition (199-214)
Natural Language Understanding Technology (215-216)
The Roles of Language Processing in a Spoken Language Interface (217-237)
Models of Natural Language Understanding (238-253)
Integration of Speech with Natural Language Understanding (254-272)
Applications of Voice-Processing Technology I (273-274)
A Perspective on Early Commercial Applications of Voice-Processing Technology for Telecommunications and Aids for the Handicapped (275-279)
Applications of Voice-Processing Technology in Telecommunications (280-310)
Speech Processing for Physical and Sensory Disabilities (311-344)
Applications of Voice-Processing Technology II (345-346)
Commercial Applications of Speech Interface Technology: An Industry at the Threshold (347-356)
Military and Government Applications of Human-Machine Communication by Voice (357-370)
Technology Deployment (371-372)
Deployment of Human-Machine Dialogue Systems (373-389)
What Does Voice-Processing Technology Support Today? (390-421)
User Interfaces for Voice Applications (422-442)
Technology in 2001 (443-444)
Speech Technology in the Year 2001 (445-449)
Toward the Ultimate Synthesis/Recognition System (450-466)
Speech Technology in 2001: New Research Directions (467-481)
New Trends in Natural Language Processing: Statistical Natural Language Processing (482-504)
The Future of Voice-Processing Technology in the World of Computers and Communications (505-514)
Author Biographies (515-524)
Index (525-548)