The Colloquium on Human-Machine Communication by Voice highlighted the global technical community's focus on the problems and promise of voice-processing technology, particularly, speech recognition and speech synthesis. Clearly, there are many areas in both the research and development of these technologies that can be advanced significantly. However, it is also true that there are many applications of these technologies that are capable of commercialization now. Early successful commercialization of new technology is vital to ensure continuing interest in its development. This paper addresses efforts to commercialize speech technologies in two markets: telecommunications and aids for the handicapped.
Two voice-processing technologies, speech recognition and speech synthesis, have reached the point that they are ready for commercial application. Speech recognition applications using small vocabularies deliver significant cost reduction for the service providers and also expand markets to rotary telephone users. Speech synthesis provides cost reduction and expanded services for its users, despite "nonhuman" sound. For handicapped markets these technologies provide
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 275
Page 275 A Perspective on Early Commerical Applications of Voice-Processing Technology for Telecommunications and Aids for the Handicapped Chris Seelbach SUMMARY The Colloquium on Human-Machine Communication by Voice highlighted the global technical community's focus on the problems and promise of voice-processing technology, particularly, speech recognition and speech synthesis. Clearly, there are many areas in both the research and development of these technologies that can be advanced significantly. However, it is also true that there are many applications of these technologies that are capable of commercialization now. Early successful commercialization of new technology is vital to ensure continuing interest in its development. This paper addresses efforts to commercialize speech technologies in two markets: telecommunications and aids for the handicapped. INTRODUCTION Two voice-processing technologies, speech recognition and speech synthesis, have reached the point that they are ready for commercial application. Speech recognition applications using small vocabularies deliver significant cost reduction for the service providers and also expand markets to rotary telephone users. Speech synthesis provides cost reduction and expanded services for its users, despite "nonhuman" sound. For handicapped markets these technologies provide
OCR for page 275
Page 276 users with increased mobility and control of devices, such as computers and telephones not otherwise available to them. The industry must be able to work in this environment where significant technological advances are possible yet growing numbers of commercial applications deliver real benefits. To date, too many potential users of these technologies have been frightened away by examples of the improvements necessary to make the technology "really work." In this environment the challenge for system integrators is to learn how to apply developing technology in ways that deliver results, rather than disappointment, because the match between the job to be done and the application of the available technology was inappropriate. In addition, the technical community must continue its efforts to utilize the successes and failures in a way that leads to successful identification of the "right" applications and commercialization of the "right" technology. Early successful commercialization of the right level of technology in the right applications will benefit users, service providers, and the research community. In today's world of cost cutting, reengineering, and impatience with long lead time projects, investors are looking for a near-term payback for their interest. The research community does itself a disservice by not understanding this and providing for near-term commercial demonstrations. Examples are plentiful. It took 20 years for speech recognition to be deployed widely in a telecommunications application. Speech synthesis is only now being deployed on a large scale for reverse directory applications. The initial large-scale, telephone-based commercial speech recognition application was a simple "yes" or "no" recognition. Because it was so ''simple" the research community was not interestedresearchers wanted to solve the really big problems. In addition, the application required working with users, systems integrators, and human factors professionals. And live, messy, real-world trials were essential to explore uncharted areas. These were not the normal directions for the research community, causing researchers to revert to ivory tower "real research." But the payout for this simple application was understood to be large from the earliest days. The payout was recognized, but the direction was different and the technical research was not as challenging as others. As a result, the speech recognition community lost out on an opportunity to demonstrate the viability of this technology for at least 5 to 10 years. This is not an isolated case, but these times call for recognizing
OCR for page 275
Page 277 the need to get new technology into the marketplace early in order to: • demonstrate its viability, even in simple uses; • build credibility for more research; • use systems integrators, human factors professionals, and others to broaden the research base; and • make end users part of the process and use all features of the application to make the technology work better. CURRENT COMMERCIAL APPLICATIONS: TELEPHONE BASED Successful application experience on a large scale is occurring in applications of speech recognition that deliver large cost reductions. Automation of operator services is the largest ongoing commercial application, at first using "yes" and "no" to save hundreds of millions of dollars a year for telephone companies, initially in the United States and Canada. The vocabularies have been expanded to include selection of paying choice (e.g., collect, bill-to-third-party) as well as help commands such as "operator.'' Early deployment is increasing attention on user interface issues and spurring advances to meet additional early user challenges. For example, the initial applications of the early 1990s are being expanded to handle larger vocabularies, "out-of-vocabulary words," and the ability to speak over prompts (called "barge in"). In fact, deployment of "simple" technology is aiding the research community's efforts by providing large-scale use of the technology, which highlights areas for priority research that might not have been as high a priority before. In addition, deployment of "simple" technology gets systems integrators involved with the technology earlier. In this case knowledge of the applications is vital to successful technology commercialization. Use of unique aspects of the applications vocabulary, the work process, and other aspects of the application can enhance the success of the technology. None of this would happen without the early involvement of application-knowledgeable systems integrators. Speech recognition and synthesis technologies are affected more than other recent new technologies by specific applications factors and user interface issues. Successful commercialization of these technologies will not happen unless systems integrators and human factors professionals are involved at early stages. The technical research community is recognizing this, although later than it should have.
OCR for page 275
Page 278 Other applications providing call routing, directory assistance, and speaker identification are being deployed by telephone companies worldwide. Initial deployment is in Canada and the United States, led by Northern Telecom and AT&T. In addition, the use of speech recognition for access to information services is growing with both telephone and independent information service providers. The lack of Touch-Tone dialing is a big incentive to deploying speech recognition in this applications area. In the United States 30 percent of phones are rotary, while in Europe it varies from 25 percent in Scandinavia to 80 percent in Germany. A few early applications in Japan, the United States, and Europe have been deployed for 5 to 10 years, and service bureaus report that on services where speech recognition is advertised in the United States, 30 percent of the callers use it. Reports on the use of speech recognition in Japan and Europe exceed even these results. Speech synthesis is less broadly deployed, primarily because of dissatisfaction with the "nonhuman" sound that is produced. Applications for internal company use such as dispatch are spreading, but where interaction with customers or the public is involved most organizations have been reluctant to use it. However, some believe that having a "nonhuman" sound is preferable in situations where people become confused with what they are being asked to do on telephone systems because they do not know they are speaking to a computer and thus must be more precise. Applications for reverse directory assistance are on the horizon as a potential large-scale commercialization effort by a number of telephone companies. This will help the entire industry as experience is gained on what is acceptable and what is not in a well-chosen application done with care. CURRENT COMMERCIAL APPLICATIONS: AIDS TO THE HANDICAPPED This market area uses signal-processing technologies to enhance hearing-aid performance. Hearing loss affects more people than any other disabilityover 3 million people in the United States. In addition, voice-processing technologies are used to provide speech output for the blind and control by voice for devices, computers, and telephones for blind and physically handicapped people. For disabled people, even limited speech recognition increases their control over such things as beds and wheelchairs and allows some to use computers and telephones. Speech synthesis also pro-
OCR for page 275
Page 279 vides much benefit to blind people in hearing the output of computers and other devices. While this market is much more forgiving about imperfect technology because of the benefits offered, other attributes of the market have limited technology deployment. In applications of voice-processing technology for the disabled, many users have special, specific needs. These needs often require customized systems that are expensive to develop and do not lead to large enough markets for "generic" products to encourage widespread use. Thus, the costs to deliver benefits are often very high. In addition, the incorporation of voice-processing technologies in large-scale applications to date has been expensive relative to the underlying cost of the system or device. So hospital beds and wheelchairs with speech control are still small specialized markets. However, this market shares with the telephone market the need to involve human factors professionals and systems integrators early in the commercialization process. With a broader market, lower costs, and more adaptable systems, the use of voice-processing technology will grow. CONCLUSION While it is recognized that many improvements in voice-processing technologies are possible, the commercialization of current technologies is under way. Greater involvement of human factors professionals and systems integrators is enhancing the possibility of commercial success. The global research community needs to continue its impressive efforts at expanding the capability of the technologies while encouraging and learning from the commercialization efforts.