Skip to main content

Currently Skimming:

The Role of Voice in Human-Machine Communication
Pages 34-75

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 34...
... However, before this technology will be broadly useful, a substantial knowledge base about human spoken language and performance during computer-based interaction needs to be gathered and applied. This paper reviews application areas in which spoken interaction may play a significant role, assesses potential benefits of spoken interaction with machines, and attempts to compare voice with alternative and complementary modalities of human-computer interaction.
From page 35...
... Although far from human-level conversation, this initial capability is generating considerable interest and optimism for the future of human-computer interaction using voice. This paper aims to identify applications for which spoken interaction may be advantageous, to situate voice with respect to alternative and complementary modalities of human-computer interaction, and to discuss obstacles that exist to the successful deployment of spoken language systems because of the nature of spoken language interaction.
From page 36...
... To develop initial algorithms, researchers typically first use read speech as data, in which speakers read random sentences drawn from some corpus, such as the Wall Street Journal. Subsequent to this stage of algorithm development, speech recognition research attempts to handle spontaneous speech, in which speakers construct new utterances in the chosen domain of discourse.
From page 37...
... Word recognition accuracy has been found, in general, to be inversely proportional to perplexity. Most commercial systems offer speech recognition systems claiming >95 percent word recognition accuracy given a perplexity on the order of 10.
From page 38...
... . Here the combination of speech recognition and language understanding will be termed speech understanding, and the systems that use such input will be termed spoken language systems.
From page 39...
... Then, we discuss spoken language interaction, comparing it both to keyboard-based interaction and to the currently dominant graphical user-interface paradigm. After identifying circumstances that favor spoken language interaction, gaps in the scientific knowledge base of spoken communication are identified that present obstacles to the development of spoken language-based systems.
From page 40...
... In such circumstances, by using voice to communicate with the machine, people are free to pay attention to their task, rather than breaking away to use a keyboard. Field studies suggest that, for example, F-16 pilots who can attain a high speech recognition rate can perform missions, such as formation flying or low-level navigation, faster and more accurately when using spoken control over various avionics subsystems, as compared with keyboard and multifunction-button data entry (Howard, 1987; Rosenhoover et al., 1987; Williamson, 1987~.
From page 41...
... A major factor determining success for speech input applications is speech recognition accuracy. For example, the best task performance reported during F-16 test flights was obtained once pilots attained isolated word recognition rates greater than 95 percent.
From page 42...
... in which two callers speaking different languages can engage in a dialogue mediated by a spoken language translation system (Kitano, 1991; Yato et al., 1992~. Such systems are currently designed to incorporate speech recognition, machine translation, and speech synthesis subsystems and to interpret one sentence at a time.
From page 43...
... Text-to-speech synthesis can assist users with speech and motor impediments; can assist blind users with computer interaction; and, when coupled with optical character recognition technology, can read printed materials to blind users. Finally, given sufficiently capable speech recognition systems, spoken input may become a prescribed therapy for repetitive stress injuries, such as carpal tunnel syndrome, which are estimated to afflict approximately 1.5 percent of office workers in occupations that typically involve the use of keyboards (Tanaka et al., 1993)
From page 44...
... Substantial research will also be needed to develop and field test new educational software that can take advantage of speech recognition and synthesis for teaching reading. This is perhaps one of the most important potential applications of speech technology because the societal implications of raising literacy levels on a broad scale are enormous.
From page 45...
... Summary There are numerous existing applications of voice-based humancomputer interaction, and new opportunities are developing rapidly. In many applications for which the user's input can be constrained sufficiently to allow for high recognition accuracy, Voice input has been found to lead to faster task performance with fewer errors than keyboard entry.
From page 46...
... COMPARISON OF SPOKEN LANGUAGE WITH OTHER COMMUNICATION MODALITIES A user who will be speaking to a machine may expect to be able to speak in a natural language, that is, to use ordinary linguistic constructs such as noun and verb phrases. Conversely, if natural language interaction is chosen as a modality of human-computer communication, users may prefer to speak rather than type.
From page 47...
... and Japan (Yato et al., 1992~. Much of the language processing technology used for spoken language understanding has been based on techniques for keyboard-based natural language systems.5 However, spoken input presents qualitatively different problems for language understanding that have no analog in keyboard interaction.
From page 48...
... The speed of the "system" is thus governed by the assistant's knowledge and reaction time, as well as the task at hand, but not by speech recognition, language understanding, and speech synthesis. However, because people speak differently to a computer than they do to a person (Fraser and Gilbert, 1991)
From page 49...
... Similar findings of a fine-grained approach during spoken interaction versus a more syntactically integrated approach for keyboard interaction have been found in a study
From page 50...
... That is, unlike keyboard interaction, spoken utterances did not literally convey the speaker's intention that the listener perform an action (Cohen, 1984~. Future research needs to address the extent to which such results generalize to spoken human-computer interaction for comparable tasks.
From page 51...
... Also, more studies comparing the structure and content of spoken human-computer language with typed human-computer language need to be conducted in order to understand how to adapt technology developed for keyboard interaction to spoken language systems. Common to many successful applications of voice-based technology is the lack of an adequate alternative to voice, given the task and environment of computer use.
From page 52...
... These graphical user interfaces (GUIs) , popularized by the Apple Macintosh and by Microsoft Windows, use techniques pioneered at SRI International and at Xerox's Palo Alto Research Center in the late 1960s and 1970s (Engleb art, 1973; Kay and Goldberg, 1977~.
From page 53...
... It is no exaggeration to say that graphical user interfaces supporting direct manipulation interaction have been so successful that no serious computer company would attempt to sell a machine without one. Weaknesses Direct manipulation interfaces do not suffice for all needs.
From page 54...
... So far, however, there is little research comparing graphical user interfaces with speech. Early laboratory results of a direct-manipulation VLSI design system augmented with speaker-dependent speech recognition indicate that users were as fast at speaking single-word commands as they were at invoking the same commands with mousebutton clicks or by typing a single letter command abbreviation (Martin, 1989~.
From page 55...
... This study suggests that, for simple risk-free tasks, user preference may be based on time to input rather than overall task completion times or overall task accuracy. Natural Language Interaction Strengths Natural language is the paradigmatic case of an expressive mode of communication.
From page 56...
... , lead to frustration and disillusionment. One way to overcome these problems was suggested in a menubased language processing system in which users composed queries in a quasi-natural language by selecting phrases from a menu (Tennant et al., 1983~.
From page 57...
... Summary: Circumstances Favoring Spoken Language Interaction with Machines Theoretically, direct manipulation should be beneficial when the objects to be manipulated are on the screen, their identity is known, and there are not too many objects from which to select. In addition, graphical user interfaces limit users' options, preventing them from
From page 58...
... Because of the recency of usable spoken language systems, there are very few studies comparing spoken language interaction with direct manipulation for accomplishing real tasks. So far, we have contrasted spoken interaction with other modalities.
From page 59...
... , and robust processing techniques have been developed that enable language analysis routines to recover the meaning of an utterance despite recognition errors (Dowding et al., 1993; Huang et al., 1993; Jackson et al., 1991; Stallard and Bobrow, 1992~. Assessment of different types of human-human and human-computer spoken language has revealed that people's rate of spontaneous disfluencies and self-repairs is substantially lower when they speak to a system, rather than another person (Oviatt, 1993~.
From page 60...
... However, it is not known if the modeling of syntactic structures occurs in spoken human-computer interaction. If users of spoken language systems do learn to adopt the grammatical structures they observe, then new forms of user training may be possible by having system designers adhere to the principle that any messages supplied to a user must be analyzable by the system's parser.
From page 61...
... One way to mitigate the inherent difficulty of referent determination when using a multimodal system may be to couple spoken pronouns and definite noun phrases with pointing actions (Cohen, 1991; Cohen et al., 1989~. Present spoken language systems have supported dialogues in which the user asks multiple questions, some of which request fur
From page 62...
... , and that telephone communications are especially sensitive to delays. The need for timely confirmations will challenge most applications of spoken language processing, particularly those involving telephony.
From page 63...
... Some analog of planning is thus also likely to be required. Dialogue research is currently the weakest link in the research program for developing spoken language systems.
From page 64...
... Users frequently respond to speech recognition errors by hyperarticulating. But since recognizers are typically not trained on hyperarticulated speech, this repair strategy leads to a lower likelihood of successful recognition for that content (Shriberg et al., 1992~.
From page 65...
... SCIENTIFIC RESEARCH ON COMMUNICATION MODALITIES The present research and development climate for speech-based technology is more active than it was at the time of the 1984 National Research Council report on speech recognition in severe environments (National Research Council, 1984~. Significant amounts of research and development funding are now being devoted to building speechunderstanding systems, and the first speaker-independent, continuous, real-time spoken language systems have been developed.
From page 66...
... . longitudinal studies of users' linguistic and problem-solving behavior that would explore how users adapt to a given system; · studies of users' understanding of system limitations, and of their performance in observing the system's bounds; · studies of different techniques for revealing a system's coverage, and for channeling user input; · studies comparing the effectiveness of spoken language technology with alternatives, such as the use of keyboard-based natural language systems, query languages, or existing direct manipulation interfaces; and · studies analyzing users' language, task performance, and preferences to use different modalities, individually and within an integrated multimodal interface.
From page 67...
... Static and dynamic predictions: A method to improve speech understanding in cooperative dialogues. In Proceedings of the International Conference on Spoken Language Processing, Banff, Alberta, Canada, Oct.
From page 68...
... In Proceedings of the 1990 International Conference on Spoken Language Processing, pp. 1185-1188, The Acoustical Society of Japan, Kobe, Japan, 1990.
From page 69...
... Zue. NSF Workshop on Spoken Language Understanding.
From page 70...
... Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64:532-536, April 1976.
From page 71...
... Mariani, J Spoken language processing in the framework of human-machine commu
From page 72...
... National Research Council. Automatic Speech Recognition in Severe Environments.
From page 73...
... Benchmark tests for the DARPA spoken language program. In Proceedings of the ARPA Workshop on Human Language Technology, San Mateo, Calif., Morgan Kaufmann Publishers, Inc., 1993.
From page 74...
... Human-machine problem-solving using spoken language systems (SLS) : Factors affecting performance and user satisfaction.
From page 75...
... Iida. Dialogue interpretation model and its application to next utterance prediction for spoken language processing.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.