Skip to main content

Currently Skimming:

Models of Speech Synthesis
Pages 116-134

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 116...
... INTRODUCTION The term "speech synthesis" has been used for diverse technical approaches. Unfortunately, any speech output from computers has been claimed to be speech synthesis, perhaps with the exception of playback of recorded speech.1 Some of the approaches used to gen 1The foundations for speech synthesis based on acoustical or articulatory modeling can be found in Pant (1960)
From page 117...
... The last group includes both predictive coding and concatenative synthesis using speech waveforms. Acoustic and articulatory models have had a long history of development, while natural speech models represent a somewhat newer field.
From page 118...
... A rule-based system using waveform coding is a perfectly possible combination, as is speech coding using a terminal analog or a rule-based diphone system using an articulatory model. In the following pages, synthesis models will be described from two different perspectives: the sound-generating part and the control part of the system.
From page 119...
... Considerable success has been achieved by systems that base sound generation on concatenation of natural speech units (Moulines et al., 1990~. Sophisticated techniques have been developed to manipulate these units, especially with respect to duration and fundamental frequency.
From page 120...
... While the male voice sometimes has been regarded to be generally acceptable, an improved glottal source will open the way to more realistic synthesis of child and female voices and also to more naturalness and variation in male voices. Most source models work in the time domain with different controls to manipulate the pulse shape (Ananthapadmanabha, 1984; Hedelin, 1984; Holmes, 1973; Klatt and Klatt, 1990; Rosenberg, 1971~.
From page 121...
... -30 0 1 2 3 kHz ~ -I ~and ~ n ~2. Fundamental frequency: TOO Hz -30 FIGURE 2 Influence of the parameters Rg, Rk, and Ra on the differentiated glottal flow pulse shape and spectrum (from Gobl and Karlsson, 1991~.
From page 122...
... Formant-Based Terminal Analog The traditional text-to-speech systems uses a terminal analog based on formant filters. The vocal tract is simulated by a sequence of second-order filters in cascade while a parallel structure is used mostly for the synthesis of consonants.
From page 123...
... It should be noted that the transfer of knowledge from phonetics to speech technology has not been an easy process. Another reason is that the efforts using formant synthesis have not explored control methods other than the explicit rule-based de .
From page 124...
... Articulatory models, now under improvement, stem from basic work carried out at such laboratories as AT&T Bell Labs, MIT, and KTH. At each time interval, an approximation of the vocal tract is used either to calculate the corresponding transfer function or to directly filter a source waveform.
From page 125...
... This section has not dealt with the important work carried out to describe speech production in terms of physical models. The inclusion of such models still lies in the future, beyond the next generation of text to speech systems, but the results of these experiments will improve the current articulatory and terminal analog models.
From page 126...
... The context-oriented clustering approach is a good illustration of a current trend in speech synthesis: automatic methods based on databases. The studies are concerned with much wider phonetic contexts than before.
From page 127...
... These automatic procedures will, in the future, make it possible to gather a large amount of data. Lack of glottal source data currently is a major obstacle for the development of speech synthesis with improved naturalness.
From page 128...
... The basic concept is to preserve speaker characteristics in interpreting systems (Abe et al., 1990~. The proposed voice conversion technique consists of two steps: mapping code book generation of LPC parameters and a conversion synthesis using the mapping code book.
From page 129...
... The ultimate test of our descriptions is our ability to successfully synthesize not only different voices and accents but also different speaking styles (Bladon et al., 1987~. Appropriate modeling of these factors will increase both the naturalness and intelligibility of synthetic speech.
From page 130...
... The recent work that has been done in the ESPRIT/SAM projects, the COCOSDA group, and special workshops will set new standards for the future. CONCLUDING REMARKS In this paper a number of different synthesis methods and research goals to improve current text-to-speech systems have been touched on.
From page 131...
... Stevens (1986) , "Effects of the vocal tract constriction on the glottal source: Experimental and modeling studies.
From page 132...
... , "Male and Female Voice Source Dynamics," Proceedings of the Vocal Fold Physiology Conference, Gauffin and Hammarberg, eds. Singular Publishing Group, San Diego.
From page 133...
... Pant (1992) , "An articulatory speech synthesizer based on a frequency domain simulation of the vocal tract," Proc.
From page 134...
... Pols (1989) , "Comparing formant movements in fast and normal rate speech," Proceedings of the European Conference on Speech Communication and Technology 89.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.