Skip to main content

Currently Skimming:

User Interfaces for Voice Applications
Pages 422-442

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 422...
... This paper discusses how these factors influence user interface design and then describes components of user interfaces that can be used to facilitate efficient and effective human-machine voice-based interactions. Voice interfaces provide an additional input and output modality for human-computer interactions, either as a component of a multimodal, multimedia system or when other input and output modali 422
From page 423...
... Where technological limitations prohibit the use of natural conversational speech, the primary role of the interface is to induce the user to modify his/her behavior to fit the requirements of the technology. As voice technologies become capable of dealing with more natural input, the user interface will still be critical for facilitating the smooth flow of information between the user and the system by providing appropriate conversational cues and feedback.
From page 424...
... This paper discusses some of the aspects of task requirements, user expectations, and technological capabilities that influence the design of a voice interface and then identifies several components of user interfaces that are particularly critical in successful voice applications. Examples from several applications are provided to demonstrate how these components are used to produce effective voice interfaces.
From page 425...
... For example, an automated bank-by-phone application using input from the telephone keypad may use an audio menu to direct the user to "For checking (account balance) , press 1; for savings, press 2." Early implementations of voice interfaces for such systems directly translated this request into "For checking, say 1; for savings, say 2," rather than eliciting the more natural command words (e.g., "Do you want your account balance for checking or savings?
From page 426...
... In cases where voice is the only modality available for output (as in most current telephone applications) , the interface designer must take into account the temporal nature of speech output and its concomitant demands on user memory.
From page 427...
... Technological Capabilities and Limitations Voice Input Current speech applications demonstrate a wide range of voice input capabilities. The requirements of the application typically dictate whether a system uses (a)
From page 428...
... As the technology allows more natural conversational input, the primary role of the user interface could shift from directing the user about how to speak toward providing more graceful resolution of ambiguities and errors. Voice Output Two kinds of voice output technologies are available for voice interfaces: prerecorded (typically digitized)
From page 429...
... System Capabilities Several system capabilities not directly related to speech technology per se have been shown to affect the success of voice interfaces. Some systems are incapable of "listening" to speech input while they simultaneously produce speech.
From page 430...
... Other system constraints that can influence the design of the user interface include whether a human agent is available in the event of recognition failure and whether relevant information and databases are available throughout the interaction or only during fixed portions. User Expectations and Expertise Conversational Speech Behaviors Because users have so much experience with human-human speech interactions, it is not surprising that new users mav expect a human J computer voice interface to allow the same conversational speech style that is used between humans.
From page 431...
... The inability to reliably coax humans to speak in isolation, without any nonvocabulary words or sounds, has been a driving force for the development of word spotting and continuous recognition algorithms. Word spotting allows accurate recognition of a small vocabulary in the presence of nonvocabulary utterances, even when they are modified by coarticulation effects introduced by continuous speech.
From page 432...
... For example, falling intonation can imply acknowledgment, while rising intonation of the confirmatory phrase signals uncertainty and potential error (Karis and Dobroth, 1991~. Currently, such intonational cues are not recognized reliably by automated systems, but advances in understanding the contribution of intonation patterns to dialogue flow will provide a powerful enhancement to user interfaces, not only for confirmation and error recovery but also for detecting user hesitations and repairs (i.e., self-corrections a user makes within an utterance)
From page 433...
... In response to the user's repetition, the system spoke the number "four," presumably as a confirmation request for an erroneously recognized digit. When it became apparent to the system that the continuous input was not likely to be a digit, the system asked for another repetition.
From page 434...
... Even though the system may not be certain about the source of the problem, it might be possible to make a reasonable guess based on the length of the input utterance. If the length of the utterance that is detected is significantly longer than would be probable if the input utterance was a single digit, the error message might instruct the user to speak the digits one at a time, or even explicitly say "Please say the next digit of the authorization number now." Third, the confirmation messages themselves were confusing, particularly when the recognized digit was not a number in the user's input utterance.
From page 435...
... The importance of these elements is discussed in more detail in the following sections, using examples that demonstrate their use in more successful voice interfaces. Dialogue Flow Underlying every user interface is a plan for obtaining the information elements necessary to complete the task.
From page 436...
... 80.8 5.7 3.4 10.2 useful for resolving ambiguities in systems where more natural input is permitted. Directive prompts can significantly increase user compliance to system restrictions.
From page 437...
... As mentioned above, if the cost of errors is minimal, and if the user is provided sufficient information to ascertain that the system's response was appropriate, it may be reasonable to forgo many of the confirmation interchanges for some applications. For example, in a research prototype stock quotation service developed at Bell Northern Research (Lennig et al., 1992)
From page 438...
... The goal of error recovery procedures is to prevent the complete breakdown of the system into an unstable or repetitive state that precludes making progress toward task completion. Obviously, error recovery requires the cooperation of the user, and both the system and the user must be able to
From page 439...
... By requiring the user to indicate the correct alternative by providing its item number on the list or by spelling it, the error recovery module uses a much more restricted vocabulary than the general dictation task to increase the likelihood of correct recognition of the user's corrective response. When provided with error messages, users do modify their subsequent behavior.
From page 440...
... EVALUATING TECHNOLOGY READINESS This paper has stressed the interdependencies among application requirements, user needs, and system capabilities in designing a successful voice interface. Often, voice interface developers are asked whether speech technology is "good enough" or "ready" to accomplish a particular task.
From page 441...
... Iterative testing with representative users is necessary in developing a robust voice interface. With improved understanding of how prosody and message content cue conversational dynamics for confirmation, error recovery, hesitations, and repair, and with the development of reliable techniques for automatically recognizing and providing these cues, voice interfaces can begin to realize the promise of a "more natural" interface for humancomputer interactions.
From page 442...
... and the Artificial Intelligence Speech Technology Group. "Collection and analysis of data from real users: Implications for speech recognition/understanding systems." Proceedings of the DARPA Speech and Natural Language Workshop, P


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.