This paper provides an overview of the natural language understanding session at the Colloquium on Human-Machine Communication by Voice held by the National Academy of Sciences (NAS). The aim of the paper is to review the role that language understanding plays in spoken language systems and to summarize the discussion that followed the two presentations by Bates and Moore. A number of questions were raised during the discussion, including whether a single system could provide both understanding and constraint, what the future role of discourse should be, how to evaluate performance on interactive systems, and whether we are moving in the right direction toward realizing the goal of interactive human-machine communication.1
Much of the research discussed at the natural language understanding session was done in connection with the Advanced Research Projects Agency's (ARPA) Spoken Language Systems program. This program, which started in 1989, brought together speech and language technologies to provide speech interfaces for interactive problem solving. The goal was to permit the user to speak to the system, which would respond appropriately, providing (intelligent) assistance. This kind of interaction requires the system to have both input and output capabilities, that is, for speech, both recognition and synthesis, and for language, both understanding and generation. In addition, the system must be able to understand user input in context and carry on a coherent conversation. We still know relatively little about this complex process of interaction, although we have made significant progress in one aspect, namely spoken language understanding.2
In the ARPA Spoken Language Systems program, multiple contractors are encouraged to develop independent approaches to the core problem of spoken language interaction. To focus the research,
1 I am indebted to the many contributors during the colloquium's discussion who raised interesting questions or provided important material. For the sake of the flow of this paper, I have folded these questions or comments into appropriate sections, rather than summarizing the discussion separately.
2Spoken language understanding focuses on understanding user input, as opposed to communicating with the user, which is a bidirectional process that requires synthesis and generation technologies.