The goal of integrating speech recognition with natural language understanding is to produce spoken-language-understanding systemsthat is, systems that take spoken language as their input and respond in an appropriate way depending on the meaning of the input. Since speech recognition (Makhoul and Schwartz, in this volume) aims to transform speech into text, and natural-language-understanding systems (Bates, in this volume) aim to understand text, it might seem that spoken-language-understanding systems could be created by the simple serial connection of a speech recognizer and a natural-language-understanding system. This naive approach is less than ideal for a number of reasons, the most important being the following:
• Spontaneous spoken language differs in a number of ways from standard written language, so that even if a speech recognizer were able to deliver a perfect transcription to a natural-language-understanding system, performance would still suffer if the natural language system were not adapted to the characteristics of spoken language.
• Current speech recognition systems are far from perfect transcribers of spoken language, which raises questions about how to make natural-language-understanding systems robust to recognition errors and whether higher overall performance can be achieved by a tighter integration of speech recognition and natural language understanding.
• Spoken language contains information that is not necessarily represented in written language, such as the distinctions between words that are pronounced differently but spelled the same, or syntactic and semantic information that is encoded prosodically in speech. In principle it should be possible to extract this information to solve certain understanding problems more easily using spoken input than using a simple textual transcription of that input.
This paper looks at how these issues are being addressed in current research in the ARPA Spoken Language Program.
The participants in the ARPA Spoken Language Program have adopted the interpretation of requests for air travel information as a common task to measure progress in research on spoken language