Cover Image


View/Hide Left Panel

Page 160

plications use derivatives of some of the advanced techniques discussed here, they are not as ambitious as the purely experimental systems.

In keeping with the theme of advanced technology, J. Makhoul and R. Schwartz report on the "State of the Art in Continuous Speech Recognition." They give a phonetic and phonological description of speech and show how that structure is captured by a mathematical object called a hidden Markov model (HMM). This discussion includes a brief account of the history of the HMM and its application in speech recognition. Also included in the paper are discussions of extracting features from the speech waveform, measuring the performance of the system and the possibility of using the newer methods based on artificial neural networks.

Makhoul and Schwartz conclude that, as a result of the advances made in model accuracy, algorithms, and the power of computers, a "paradigm shift" has occurred in the sense that high-accuracy, real-time, speaker-independent, continuous speech recognition for medium-sized vocabularies can be implemented in software running on commercially available workstations. This assertion provoked an important and lively debate that I shall recount later in this paper. The HMM methodology allows us to cast the speech recognition problem as that of searching for the best path through a weighted, directed graph. The paper by F. Jelinek addresses two central and specific technical issues arising from this representation. First, how does one estimate the parameters of the model (i.e., weights of the graph) from data? This is usually referred to as the training problem. Second, given an optimal model, how does one use it in the recognition task? This second problem can be cast as a combinatorial search problem to which Jelinek outlines several solutions with emphasis on a dynamic programming approach known as the Viterbi algorithm.

There is no need to review these papers in more detail here since they appear in their entirety in this volume. What does deserve discussion here are the scientific, technological, and commercial implications of these papers. These issues formed the core of the debate that ensued at the colloquium after these two excellent and comprehensive papers were presented.

I opened the discussion at the colloquium by asking the speakers to evaluate the state of the art of their most advanced laboratory prototype systems with respect to human performance in communication by spoken language. I raised this question because I think the ultimate goal of research in speech recognition is to provide a means whereby people can carry on spoken conversations with machines in the same effortless manner in which they speak to each other. As I

The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement