National Academy of Sciences | 150 Year Anniversary

Questions? Call 800-624-6242

| Items in cart [0]

The National Academies Press

HARDBACK
price:$89.95
add to cart

Rights & Permissions

topleft topright

Voice Communication Between Humans and Machines (1994)
National Academy of Sciences (NAS)

Citation Manager

. "Computer Speech Synthesis: Its Status and Prospects." Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press, 1994.

Please select a format:

BibTeX EndNote RefMan


Page
115
bottomleft bottomright

The following HTML text is provided to enhance online readability. Many aspects of typography translate only awkwardly to HTML. Please use the page image as the authoritative form to ensure accuracy.


Page 115

human subjects can then be used to provide validation at strategically chosen points.

Researchers at NTT and ATR in Japan have been especially prominent in these explorations, and their initial results look very promising. As such methods gain wider application, and especially as we see general availability of the large-scale single-speaker databases that will be required to support them, we can hope to see an increased rate of improvement in segmental speech synthesis quality. Thus, increased investment in speech synthesis research is warranted, both because there is an opportunity created by advances in microelectronics and because there are significant new ideas and new methods waiting to be applied.

As this research goes forward, it faces some pointed questions. What will it take to make synthetic speech that sounds entirely natural, or at least better than word concatenation voice response systems for restricted phrase types such as name and address sequences? Will progress come by a scientific route, through better modeling of human speech production, or by an engineering route, through larger inventories of prerecorded elements with optimal automatic selection and combination methods? How far can we push current ideas about text analysis algorithms? How can we produce more natural-sounding modulation of pitch, amplitude, and timing, and how important are such prosodic improvements relative to segmental improvements?

What will it take to put speech synthesis into true mass market applications? What will those applications be? Will the key development be cheaper hardware, a particular "killer" application, or better-quality synthesis? Will there be a gradual spread of the existing niche markets or a single breakthough?

How should we quantify progress in synthesis quality? What is the proper place for subjective testing relative to objective distortion metrics?

The papers by Carlson and Allen in this volume present a solid foundation of fact for evaluating these questions, and a wide variety of opinions were aired in the symposium discussion, from which an individual point of view has been distilled in this introduction. The next decade will be a lively and interesting time in the field of speech synthesis research, and there is little doubt that the situation will look very different 10 years from now.

Page
115
Front Matter (R1-R10)
Dedication (1-4)
Voice Communication Between Humans and Machines--An Introduction (5-12)
Scientific Bases of Human-Machine Communication by Voice (13-14)
Scientific Bases of Human-Machine Communication by Voice (15-33)
The Role of Voice in Human-Machine Communication (34-75)
Speech Communication -- An Overview (76-104)
Speech Synthesis Technology (105-106)
Computer Speech Synthesis: Its Status and Prospects (107-115)
Models of Speech Synthesis (116-134)
Linguistic Aspects of Speech Synthesis (135-156)
Speech Recognition Technology (157-158)
Speech Recognition Technology: A Critique (159-164)
State of the Art in Continuous Speech Recognition (165-198)
Training and Search Methods for Speech Recognition (199-214)
Natural Language Understanding Technology (215-216)
The Roles of Language Processing in a Spoken Language Interface (217-237)
Models of Natural Language Understanding (238-253)
Integration of Speech with Natural Language Understanding (254-272)
Applications of Voice-Processing Technology I (273-274)
A Perspective on Early Commercial Applications of Voice-Processing Technology for Telecommunications and Aids for the Handicapped (275-279)
Applications of Voice-Processing Technology in Telecommunications (280-310)
Speech Processing for Physical and Sensory Disabilities (311-344)
Applications of Voice-Processing Technology II (345-346)
Commercial Applications of Speech Interface Technology: An Industry at the Threshold (347-356)
Military and Government Applications of Human-Machine Communication by Voice (357-370)
Technology Deployment (371-372)
Deployment of Human-Machine Dialogue Systems (373-389)
What Does Voice-Processing Technology Support Today? (390-421)
User Interfaces for Voice Applications (422-442)
Technology in 2001 (443-444)
Speech Technology in the Year 2001 (445-449)
Toward the Ultimate Synthesis/Recognition System (450-466)
Speech Technology in 2001: New Research Directions (467-481)
New Trends in Natural Language Processing: Statistical Natural Language Processing (482-504)
The Future of Voice-Processing Technology in the World of Computers and Communications (505-514)
Author Biographies (515-524)
Index (525-548)