case of continuous speech recognition, the following advances have converged to make the new technology possible:

  • Higher-accuracy, continuous speech recognition, based on hidden Markov modeling techniques,

  • Better recognition search strategies that reduce the time needed for high-accuracy recognition, and

  • Increased power of off-the-shelf workstations.

The paradigm shift is taking place in the way we view and use speech recognition. Rather than being mostly a laboratory endeavor, speech recognition is fast becoming a technology that is pervasive and will have a profound influence on the way humans communicate with machines and with each other. For a recent survey of the state of the art in continuous speech recognition, see Makhoul and Schwartz (1994).

Using HMMs, the word error rate for continuous speech recognition has been dropping steadily over the last decade, with a factor of two drop in error rate about every two years. Research systems are now able to tackle problems with large vocabularies. For example, in a test using the ARPA Wall Street Journal continuous speech recognition corpus, word error rates of 11 percent have been achieved for speaker-independent performance on read speech (Pallett et al., 1994). Although this performance level may not be sufficient for a practical system today, continuing improvements in performance are likely to make such systems of practical use in a few years.

Because of the availability of large amounts of training speech data from large numbers of speakers (hundreds of hours of speech), speaker-independent performance has reached such levels that it rarely makes sense to train systems on the speech of specific speakers. However, there will always be outlier speakers for whom, for one reason or another, the system does not perform well. For such speakers, it is possible to collect a relatively small amount of speech (on the order of minutes of speech) and then adapt the system's models to the outlier speaker to improve performance significantly for that speaker.

For information retrieval applications, it is important to understand the user's query and give an appropriate response. Speech understanding systems have reached the stage at which it is possible to develop a practical system for specialized applications. The understanding component must be tuned to the specific application; the work requires significant amounts of data collection from potential users and months of labor-intensive work to develop the language understanding component for that application. In the ARPA Airline Travel Information Service (ATIS) domain, users access flight information using verbal queries. Speech understanding systems in the ATIS domain have achieved understanding



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement