Skip to main content

Currently Skimming:

Deployment of Human-Machine Dialogue Systems
Pages 373-389

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 373...
... The degree of difficulty depends not only on the demands placed on the speech recognition and speech synthesis technologies but also on the expectations of the user of the system. Experience has shown that deployment of effective speech communication systems requires an iterative process.
From page 374...
... , much of the discussion focused on practical difficulties in building and deploying systems for carrying on voice dialogues between humans and machines. Deployment of systems for human-to-machine communication by voice requires solutions to many types of problems
From page 375...
... DEGREE OF DIFFICULTY OF A VOICE DIALOGUE APPLICATION Whether a voice dialogue system is successful depends on the difficulty of each of the four steps in Figure 1 for the particular application, as well as the technical capabilities of the computer system. There are several factors that can make each of these four steps difficult.
From page 376...
... Air Travel Information System; and "natural spoken language" refers to conversational speech on any and every topic. Clearly, a task that is difficult in both the dimensions of vocabulary size and speaking style would be harder (and would have lower accuracy)
From page 377...
... The problems for voice dialogue systems can be separated into those of speech recognition, language understanding, and speech synthesis, as in Figure 1. (For the database access stage, a conventional computer is adequate for most voice dialogue tasks.
From page 378...
... As Richard Schwartz remarked at the NAS colloquium, "You are a first time user only once." . Vocabulary confusability.
From page 379...
... Noise can include background speech and other acoustic noise as well as noise in the transmission channel. Variability in transmission bandwidth and in microphone characteristics also affects speech recognition accuracy.
From page 380...
... More powerful statistical techniques now being developed for text understanding (Marcus, in this volume) hold the promise of significantly improving the language understanding capabilities of advanced voice dialogue systems.
From page 381...
... Dimensions of the Speech Synthesis Task There are two families of computer speech technologies today: digitized human speech and text-to-speech synthesis. Text-to-speech synthesis is flexible enough to pronounce any sentence but lacks the naturalness of recorded human speech.
From page 382...
... In subjective tests of speech quality, text-tospeech synthesizers are judged significantly worse than digitized human speech (Nlan Santen, 1993~. This remains true even when the text-to-speech synthesizer was provided with the pitch contour and the phoneme durations used by the original human speaker.
From page 383...
... In order of decreasing cost and computation power, recognizers have been programmed on parallel processors, RISC chips, floating point digital signal processors, general-purpose microprocessors, and integer digital signal processors. For speech synthesis, waveform coding systems require little processing power (a small fraction of the processing power of a digital signal processor chip)
From page 386...
... Human-machine dia Set objectives Develop for system Create speech performance dialogue prompts Build speech \ models recognition ~models Increase variety and / Development ~ Inte rate range of opera ing har3ware/software conditions \` Cycle / system Diagnose \ / velures ~° Deolov trial Final development .~ Is system performance satisfactory? - ~r -- a _ _ voice system statist;Cs in the field on system performance and user reactions FIGURE 3 Deployment process for a voice dialogue system.
From page 387...
... The preferred solution, word spotting in speech recognition, took several years to develop and deploy. Second, it is difficult to gather speech for training speech recognizers unless you have a working system that will capture speech in exactly the environment encountered in the real service.
From page 388...
... Better speech enhancement algorithms and models of background noise make speech recognizers more accurate in noisy or changing environments, such as automobiles. · Speaker adaptation.
From page 389...
... Ramesh, P., et al., "Speaker independent recognition of spontaneously spoken connected digits," Speech Communication, Vol.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.