National Academy of Sciences | 150 Year Anniversary

Questions? Call 800-624-6242

| Items in cart [0]

The National Academies Press

HARDBACK
price:$89.95
add to cart

Rights & Permissions

topleft topright

Voice Communication Between Humans and Machines (1994)
National Academy of Sciences (NAS)

Citation Manager

. "Training and Search Methods for Speech Recognition." Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press, 1994.

Please select a format:

BibTeX EndNote RefMan


Page
200
bottomleft bottomright

The following HTML text is provided to enhance online readability. Many aspects of typography translate only awkwardly to HTML. Please use the page image as the authoritative form to ensure accuracy.


Page 200

INTRODUCTION

It was pointed out by Makhoul and Schwartz (this volume) that the problem of speech recognition can be formulated most effectively as follows:

Given observed acoustic data A, find the word sequence that was the most likely cause of A.

The corresponding mathematical formula is:

W = arg max P(A I W)P(W)                (1)

W

P(W) is the a priori probability that the user will wish to utter the word sequence W = wl, w2, . . . wn (wi denotes the individual words belonging to some vocabulary V). P(A ï W) is the probability that if W is uttered, data A = al, a2,. . . ak will be observed (Bahl et al., 1983).

In this simplified presentation the elements ai are assumed to be symbols from some finite alphabet A of size êAú. Methods of transforming the air pressure process (speech) into the sequence A are of fundamental interest to speech recognition but not to this paper. From my point of view, the transformation is determined and we live with its consequences.

It has been pointed out elsewhere that the probabilities P(A ô W) are computed on the basis of a hidden Markov model (HMM) of speech production that, in principle, operates as follows: to each word 1 of vocabulary V, there corresponds an HMM of speech production. A concrete example of its structure is given in Figure 1. The model of speech production of a sequence of words W is a concatenation of models of individual words wi making up the sequence W (see Figure 2).

We recall that the HMM of Figure 1 starts its operation in the initial state SI and ends it when the final state SF is reached. A transi-

image

FIGURE 1 Structure of a hidden Markov model for a word.

Page
200
Front Matter (R1-R10)
Dedication (1-4)
Voice Communication Between Humans and Machines--An Introduction (5-12)
Scientific Bases of Human-Machine Communication by Voice (13-14)
Scientific Bases of Human-Machine Communication by Voice (15-33)
The Role of Voice in Human-Machine Communication (34-75)
Speech Communication -- An Overview (76-104)
Speech Synthesis Technology (105-106)
Computer Speech Synthesis: Its Status and Prospects (107-115)
Models of Speech Synthesis (116-134)
Linguistic Aspects of Speech Synthesis (135-156)
Speech Recognition Technology (157-158)
Speech Recognition Technology: A Critique (159-164)
State of the Art in Continuous Speech Recognition (165-198)
Training and Search Methods for Speech Recognition (199-214)
Natural Language Understanding Technology (215-216)
The Roles of Language Processing in a Spoken Language Interface (217-237)
Models of Natural Language Understanding (238-253)
Integration of Speech with Natural Language Understanding (254-272)
Applications of Voice-Processing Technology I (273-274)
A Perspective on Early Commercial Applications of Voice-Processing Technology for Telecommunications and Aids for the Handicapped (275-279)
Applications of Voice-Processing Technology in Telecommunications (280-310)
Speech Processing for Physical and Sensory Disabilities (311-344)
Applications of Voice-Processing Technology II (345-346)
Commercial Applications of Speech Interface Technology: An Industry at the Threshold (347-356)
Military and Government Applications of Human-Machine Communication by Voice (357-370)
Technology Deployment (371-372)
Deployment of Human-Machine Dialogue Systems (373-389)
What Does Voice-Processing Technology Support Today? (390-421)
User Interfaces for Voice Applications (422-442)
Technology in 2001 (443-444)
Speech Technology in the Year 2001 (445-449)
Toward the Ultimate Synthesis/Recognition System (450-466)
Speech Technology in 2001: New Research Directions (467-481)
New Trends in Natural Language Processing: Statistical Natural Language Processing (482-504)
The Future of Voice-Processing Technology in the World of Computers and Communications (505-514)
Author Biographies (515-524)
Index (525-548)