Skip to main content

Currently Skimming:

What Does Voice-Processing Technology Support Today?
Pages 390-421

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 390...
... are discussed. Software development environment, which is a key technology in developing applications software, ranging from DSP software to support software also is described.
From page 391...
... There are two recent improvements in algorithm evaluation. First, algorithm evaluation using large-scale speech databases, which are developed and shared by many research institutions, means that various types of algorithms can be more easily and extensively compared.
From page 392...
... Hardware Technology Microprocessors Whether a speech-processing system utilizes dedicated hardware, a personal computer, or a workstation, a microprocessor is necessary to control and implement the application software. Thus, microprocessor technology is an important factor in speech applications.
From page 393...
... To process signals efficiently, the DSP chip uses the following mechanisms: high-speed floating point processing unit, pipeline multiplier and accumulator, and parallel architecture of arithmetic-processing units and address calculation units. Specifications of typical current DSPs are shown in Table 1.
From page 394...
... The amount of memory needed for a dictionary for speech recognition or speech synthesis is too large to implement on-chip though; several hundred kilobytes or several megabytes of external memory are required to implement such a system. Most traditional methods of speech analysis, as well as speech
From page 395...
... 395 ~ o o ~ Cut ~ ._ ~ ~ U)
From page 396...
... , can be carried out in real time by a single DSP chip and some amount of external memory. On the other hand, recent speech analysis, recognition, and synthesis algorithms have become so complex and time consuming that a single DSP cannot always complete the processing in real time.
From page 397...
... , recent improvements of digital device technology such as those shown in Figures 1 and 2, and Table 1 have made it possible to install a speech-processing board in the chassis of the increasingly popular personal computers and workstations. The advantages of this board-type implementation are that speech processing can be shared between the board and a host computer or workstation, thus reducing the cost of speech processing from that using self-contained equipment; speech application software developed for a personal computer or workstation equipped with the board operates within the MS-DOS or UNIX environment, making it easier and simpler to develop application programs; and connecting the board directly to the personal computer or workstation bus allows the data bandwidth to be greatly widened, which permits quick response a crucial point for a service that entails frequent interaction with a user.
From page 398...
... In Japan a voice response and speech recognition system has been offered for use in public banking services since 1981. The system is called ANSER (Automatic answer Network System for Electrical Request)
From page 399...
... Therefore, the development environment for a DSP system is a critical factor that affects the turnaround time of system development and eventually the system cost. In early days only an assembler language was used for DSPs, so software development required a lot of skill.
From page 400...
... A more efficient and usable DSP development environment is needed to make DSP programming easier and more reliable. Application Development Environment Environments in application development take various forms, as exemplified by the varieties of DSP below:
From page 401...
... . Speech recognition and synthesis boards are commonly plugged into personal computers, and interface subroutines with these boards can be called from an application program written in C
From page 402...
... best utilizes speech recognition and synthesis capabilities. With this method a personal computer or workstation system with a speech input/output function can be optimally designed.
From page 403...
... there is no need to develop a new operating system. For example, a Japanese kana-kanji preprocessor, which converts keyboard inputs into 2-byte Japanese character codes, is widely used for Japanese character input.
From page 404...
... Key Word Extraction Even in applications based on word recognition, humans tend to utter extra words or vocal sounds and sometimes even entire sentences. When the recognition vocabulary is limited, a simple key word extraction or spotting technique is used.
From page 405...
... ALGORITHMS Recognition of large spoken vocabularies and understanding of spontaneous spoken language are studied eagerly by many speech recognition researchers. Recent speech synthesis research focuses on improvement of naturalness and treatment of prosodic information.
From page 406...
... Also, in Japan large speech databases are under construction by many researchers at various institutes (Itahashi, 1990~. For speech synthesis, on the other hand, concatenation of context-dependent speech units has recently proven efficient for producing high-quality synthesized speech.
From page 407...
... Some examples of speech database collection through telephone lines are summarized in Table 2. In the United States, Texas Instruments is trying to construct a large telephone speech database that is designed to provide a statistically significant model of the demographics of the U.S.
From page 408...
... Several comments on these issues, based on the experiences of developing the ANSER system and participating in the operation of the banking service, are given below. Simulated Telephone Lines Because of the difficulties underlying the collection of a large speech corpus through telephone lines, there has been a discussion that a synthetic telephone database, constructed by passing an existing speech database through a simulated telephone line, could be used as an alternative (Rosenbeck, 1992)
From page 409...
... The speech recognizes was retrained using all of the utterances collected in all three areas, and recognition performance stabilized. Assessment of Algorithms Assessment of Speech Recognition Algorithms In the early days of speech recognition, when word recognition was the research target, the word recognition rate was the criterion for assessment of recognition algorithms.
From page 410...
... The bottlenecks could be the recognition rate, rejection reliability, dialogue design, or other factors. It is nearly impossible to create a sys TABLE 5 Assessment Criteria of Speech Recognition from User's Side Objective Criteria Subjective Criteria Task completion rate Task completion time Number of interactions Number of error correction sequences Satisfaction rating Fatigue rating Preference
From page 411...
... However, as applications of speech synthesis technology rapidly diversify, new criteria for assessment by users arise: a. In some information services, such as news announcements, customers have to listen to synthesized speech for lengthy periods, TABLE 6 Assessment Criteria of Speech Synthesis Intelligibility Naturalness Phoneme intelligibility Syllable intelligibility Word intelligibility Sentence intelligibility Preference score MOS
From page 412...
... Speech recognition algorithms usually include training and evaluation procedures in which speech samples from a database are used. Of course, different speech samples are used for training procedures than for evaluation procedures, but this process still contains serious flaws from the application standpoint.
From page 413...
... Environmental Variation Environmental variations that affect recognition performance are distance between the speech source and the microphone, variations of transmission characteristics caused by reflection and reverberation, and microphone characteristics. In research where only existing databases are used, these issues have not been dealt with.
From page 414...
... Use of a directional microphone, adoption of an appropriate distance measure, or introduction of adaptive filtering are reported to be effective methods for preventing performance degradation (Tobita et al., 1990a; Yamada et al., 1991~. It has been known that the interference Microphone Characteristics Each microphone performs optimally under certain conditions.
From page 415...
... One typical intraspeaker variation is known as the "Lombard effect," which is speech variation caused by speaking under very noisy conditions (Roe, 1987~. Also, in several studies utterances representing various kinds of speaking mannerisms were collected, and the HMM
From page 416...
... The current size of the speech recognition market is only around $100 million, although most market research in the 1980s forecasted that the market would soon reach $1 billion (Nakatsu, 1989~. And the situation is similar for speech synthesis.
From page 417...
... Based on the above results and also on our experiences with developing and operating the ANSER system and service, the following is an outline of a strategy for widening the speech recognition market. Service Trials Because practical application of speech recognition to real services is currently limited to word recognition, which is so different from how humans communicate orally, it is difficult for both users and vendors to discover appropriate new applications.
From page 418...
... Then, several issues relating to the practical application of speech recognition and synthesis technologies were discussed. Speech databases for application and evaluation of these technologies were described.
From page 419...
... E., et al., "Performance Trials of the Spain and United Kingdom Intelligent Network Automatic Speech Recognition Systems," Proceedings of the 1st IEEE Workshop on Interactive Voice Technology for Telecommunications Applications IEEE (1992)
From page 420...
... F., et al., "Development of Telephone-Speech Databases," Proceedings of the 1st Workshop on Interactive Voice Technology for Telecommunications Applications, IEEE (1992)
From page 421...
... Yang, K M., "A Network Simulator Design for Telephone Speech," Proceedings of the 1st IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, IEEE (1992)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.