Rapid advances in very-large-scale integrated (VLSI) circuit capabilities are creating a revolution in the world of computers and communications. These advances are creating an increasing demand for sophisticated products and services that are easy to use. Automatic speech recognition and synthesis are considered to be the key technologies that will provide the easy-to-use interface to machines.
The past two decades of research have produced a stream of increasingly sophisticated solutions in speech recognition and synthesis (Rabiner and Juang, 1993). Despite this progress, the perception remains that the current technology is not flexible enough to allow easy voice communication with machines. This chapter reviews the present status of this important technology, including its limitations, and discusses the range of applications that can be supported by our present knowledge. But as we look into the future and ask which speech recognition and synthesis capabilities will be available about 10 years from now, it is important also to discuss the technical challenges we face in realizing our vision of the future and the directions in which new research should proceed to meet these challenges. We will examine these issues in this paper and take a critical look at the shortcomings of the current speech recognition and synthesis algorithms.
Much of the technical knowledge that supports the current speech-processing technology was created in a period when our ability to implement technical solutions on real-time hardware was limited. These limitations are quickly disappearing, and we look to a future at the end of this decade when a single VLSI chip will have a billion transistors to support much higher processing speeds and more ample storage than is now available.
The speech recognition and synthesis algorithms available at present work in limited scenarios. With the availability of fast processors and a large memory, tremendous opportunity exists to push speech recognition technology to a level where it can support a much wider range of applications. Speech databases with utterances recorded from many speakers in a variety of environments have been important in achieving the progress that has been realized so far. But on the negative side, these databases have encouraged speech researchers to rely on trial-and-error methods, leading to solutions that are narrow and that apply to specific applications but do not generalize to other situations. These methods, although fruitful in the early development of the technology, are now a hindrance as we become much more ambitious in seeking solutions to bigger problems. The time has come to set the next stage for the development of speech technology, and it is important to realize that a solid base of scientific understanding is