Skip to main content

Currently Skimming:

Speech Technology in 2001: New Research Directions
Pages 467-481

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 467...
... The focus of speech research is now on producing systems that are accurate and robust but that do not impose unnecessary constraints on the user. This chapter takes a critical look at the shortcomings of the current speech recognition and synthesis algorithms, discusses the technical challenges facing research, and examines the new directions that research in speech recognition and synthesis must take in order to form the basis of new solutions suitable for supporting a wide range of applications.
From page 468...
... The speech recognition and synthesis algorithms available at present work in limited scenarios. With the availability of fast processors and a large memory, tremendous opportunity exists to push speech recognition technology to a level where it can support a much wider range of applications.
From page 469...
... with isolated words or words spoken in grammatical sentences, and the performance is continuing to improve. Figure 1 shows the word error rate for various test materials and the steady decrease in the error rate achieved from 1980 to 1992.
From page 470...
... ATAL CONTINUOUS - ALPHABETS \ SPEECH " (1000 WORDS) CONNECTED DIGITS 1 1 980 5000 ISOLATED WORDS 1 1985 1 1990 YEAR -95 FIGURE 1 Reduction in the word error rate for different automatic speech recognition tasks between 1980 and 1992.
From page 471...
... These systems can synthesize only a few voices reading grammatical sentences but cannot capture the nuances of natural speech. CHALLENGING ISSUES IN SPEECH RESEARCH For speech technology to be used widely, it is necessary that the major roadblocks faced by the current technology be removed.
From page 472...
... A number of methods have been proposed to deal with the problem of robustness. The proposed methods include signal enhancement, noise compensation, spectral equalization, robust distortion measures, and novel speech representations.
From page 473...
... Current speech recognition algorithms are designed to maximize performance for the speech data in the training set, and this does not automatically translate to robust performance on speech coming from different user environments. Figure 4 shows the principal functions of an automatic speech recognition system.
From page 474...
... A speech recognizes can be regarded as a method for compressing speech from a high rate needed to represent individual samples of the waveform to a low phonemic rate to represent speech sounds. Let us look at the information rate (bit rate)
From page 475...
... To achieve robust performance, it is important to develop methods that can efficiently represent speech segments extending over a time interval of several hundred milliseconds. An example of a method for representing large speech segments is described in the next section.
From page 476...
... Although auditory models have not yet made a significant impact on automatic speech recognition technology, they exhibit considerable promise. What we need is a better understanding of the principles of signal processing in the auditory periphery that could lead to more robust performance in automatic systems.
From page 477...
... In the temporal decomposition model the continuous variations of acoustic parameters are represented as the output of a linear timevarying filter excited by a sequence of vector-valued delta functions located at nonuniformly spaced time intervals (Atal, 1989~. This is illustrated in Figure 6, where the linear filter with its impulse response specified by h(t, ~)
From page 478...
... LPC line spectral parameters, (b) filter impulse responses for the different speech events, and (c)
From page 479...
... However, there are important differences in the way the two technologies have evolved. Speech synthesis algorithms generate continuous speech by concatenating segments of stored speech patterns, which are selected to minimize discontinuities in the synthesized speech.
From page 480...
... Lack of accurate models for representing the coarticulation in speech and the dynamics of parameters at the acoustic or the articulatory level has been the major obstacle in developing automatic methods to carry out the segmentation task. Without automatic methods, it is difficult to process large speech databases and to develop models that represent the enormous variability present in speech due to differences in dialects, prosody, pronunciation, and speaking style.
From page 481...
... 1986. Ghitza, O., "Auditory nerve representation as a basis for speech processing," Advances in Speech Signal Processing, S


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.