Cover Image

HARDBACK
$89.95



View/Hide Left Panel

Page 311

Speech Processing for Physical and Sensory Disabilities

Harry Levitt

SUMMARY

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities.

A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition.

Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 311
Page 311 Speech Processing for Physical and Sensory Disabilities Harry Levitt SUMMARY Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.

OCR for page 311
Page 312 INTRODUCTION Assistive technology is concerned with "devices or other solutions that assist people with deficits in physical, mental or emotional function" (LaPlante et al., 1992). This technology can be as simple as a walking stick or as sophisticated as a cochlear implant with advanced microelectronics embedded surgically in the ear. Recent advances in computers and biomedical engineering have greatly increased the capabilities of assistive technology for a wide range of disabilities. This paper is concerned with those forms of assistive technology that involve voice communication. It is important in any discussion of assistive technology to distinguish between impairment, disability, and handicap. According to the International Classification of Impairments, Disabilities, and Handicaps (World Health Organization, 1980), an impairment is "any loss or abnormality of psychological, physiological or anatomical structure or function"; a disability is "a restriction in the ability to perform essential components of everyday living"; and a handicap is a "limitation on the fulfillment of a role that is normal for that individual." Whereas handicap may be the result of a disability that, in turn, may be the result of an impairment, these consequences are not necessarily contingent on each other. The fundamental aim of assistive technology is to eliminate or minimize any disability that may result from an impairment and, concomitantly, to eliminate or minimize any handicap resulting from a disability. Figure 1 shows the extent to which different forms of assistive technology are being used in the United States, as measured by the 1990 Health Interview Survey on Assistive Devices (LaPlante et al., 1992). Of particular interest are those forms of assistive technology that involve voice communication. Assistive devices for hearing loss are the second most widely used form of assistive technology (4.0 million Americans as compared to the 6.4 million Americans using assistive mobility technology). It is interesting to note that in each of these two widely used forms of assistive technology one specific device dominates in terms of its relative use—the cane or walking stick in the case of assistive mobility technology (4.4 million users) and the hearing aid in the case of assistive hearing technology (3.8 million users). It should also be noted that only a small number of people with hearing loss who could benefit from acoustic amplification actually use hearing aids. Estimates of the number of people in the United States who should wear hearing aids range from 12 million to 20 million, or three to five times the number who actually do (Schein and Delk, 1974). A less widely used form of assistive technology

OCR for page 311
Page 313 FIGURE 1 Users of assistive technology. The numbers of users (in millions) of various types of assistive technology are shown. The three most widely used types of assistive technology (mobility, hearing, and anatomical) all include a specific device that is used substantially more than any other (walking stick, hearing aid, and backbrace, respectively). The labeled dashed lines within the bars show the numbers of users for each of these specific devices. Note that eyeglasses were not considered in this survey. The diagram is based on data reported in LaPlante et al. (1992). involving both speech and language processing is that of assistive devices for people with speech and/or language disabilities. Devices of this type typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. A relatively new form of assistive technology is that of voice control of wheelchairs, hospital beds, home appliances, and other such devices by people with mobility disabilities. This form of assistive technology is not widely used at present, but it is likely to grow rapidly once its advantages are realized. The demand for a given form of assistive technology has a significant effect on the research and development effort in advancing

OCR for page 311
Page 314 that technology. Hearing aids and related devices represent the most widely used form of assistive technology involving voice communication. Not surprisingly, the body of research on assistive hearing technology and on hearing aids, in particular, is large, and much of the present chapter is concerned with this branch of assistive technology. ASSISTIVE HEARING TECHNOLOGY Background Assistive technology for hearing loss includes, in addition to the hearing aid, assistive listening devices (e.g., infrared sound transmission systems for theaters), visual and tactile sensory aids, special alarms and text telephones (also known as TTYs, as in TeleTYpewriters, or TDDs, as in Telecommunication Devices for the Deaf). The use of this technology increases substantially with age, as shown in Table 1. Whereas only 3.9 percent of hearing-aid users are 24 years of age or younger, over 40 percent are 75 or older. The use of TDDs/TTYs also increases with age, but the growth pattern is not as systematic (the highest percentage of users, 32.1, being in the 45- to 64-year age group). Note that this table does not specifically identify tactile sensory aids, visual displays of various kinds (e.g., captioning of television programs), or cochlear implants. Whereas the number of users of tactile aids or cochlear implants is measured in the thousands (as compared to the millions and hundreds of thousands of hearing-aid and TTY/ TDD users, respectively), the use of these two less well-known forms of assistive hearing technology is growing steadily. Visual displays TABLE 1 Use of Hearing Technology Age by Group   Percentage of Users as a Function of Age           Device Total No. of Users  (millions) 24 years and under 25-44 years 45-64 years 65-74 years 75 years and over Hearing Aid 3.78 3.9 6.0 19.7 29.1 41.3 TTY/TDD 0.17 12.7 13.3 32.1 13.8 27.5 Special             Alarms 0.08 9.2 22.3 31.5 6.6 30.2 Other 0.56 4.3 10.0 24.2 25.2 36.4 NOTE: Percentages (summed across columns) may not total 100 because of rounding. SOURCE: LaPlante et al. (1992).

OCR for page 311
Page 315 for deaf people, on the other hand, are very widely used in the form of captioned television. An important statistic to note is the high proportion of older people who use hearing aids (about two in five). There is, in addition, a large proportion of older people who have a hearing loss and who should but do not use a hearing aid. Not only is this combined proportion very high (over three in five), but the total number of older people with significant hearing loss is growing rapidly as the proportion of older people in the population increases. A related factor to bear in mind is that people with visual disabilities rely heavily on their hearing. Older people with visual disabilities will have special needs if hearing loss becomes a factor in their later years. This is an important concern given the prevalence of hearing loss in the older population. Similarly, people with hearing loss whose vision is beginning to deteriorate with age have special needs. The development of assistive technology for people with more than one impairment presents a compelling challenge that is beginning to receive greater attention. Assistive devices for hearing loss have traditionally been categorized by the modality of stimulation—that is, auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural). Another more subtle form of categorization is in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech; (b) that take the average characteristics of speech into account, such as the long-term speech spectrum; (c) that process articulatory or phonetic characteristics of speech; and (d) that embody some degree of automatic speech recognition. The two methods of categorization, by modality and by degree of speech processing, are independent of each other. The many different assistive devices that have been developed over the years can thus be specified in terms of a two-way matrix, as shown in Table 2. This matrix provides some revealing insights with respect to the development of and the potential for future advances in assistive hearing technology. Hearing Aids and Assistive Listening Devices Column 1 of Table 2 identifies assistive devices using the auditory channel. Hearing aids of various kinds are covered here as well as assistive listening devices (ALDs). These include high-gain telephones and listening systems for rooms and auditoria in which the signal is transmitted by electromagnetic means to a body-worn re-

OCR for page 311
Page 316 TABLE 2 Categories of Sensory Aids     Modality           1 2 3 4   Type of Processing Auditory Visual Tactile Direct Electrical 1 Nonspeech specific Early hearing aids Envelope displays Single-channel vibrator Single-channel implant 2 Spectrum analysis Modern hearing aids Spectrographic displays Tactile vocoder Spectrum-based multi-channel implants 3 Feature extraction Speech-feature hearing aids Speech-feature displays Speech-feature displays Speech-feature implants 4 Speech recognition Speech recognition-synthesis Automated relay service and captioning Speech-to-Braille conversion — ceiver. Low-power FM  radio or infrared transmissions are typically used for this purpose. The primary function of most assistive listening devices is to avoid the environmental noise and reverberation that are typically picked up and amplified by a conventional hearing aid. FM transmission is typically used in classrooms and infrared transmission in theaters, auditoria, houses of worship, and other public places. Another application of this technology is to allow a hard-of-hearing person to listen to the radio or television at a relatively high level without disturbing others. Most conventional hearing aids and assistive listening devices do not make use of advanced signal-processing technology, and thus a detailed discussion of such devices is beyond the scope of this chapter. There is a substantial body of research literature in this area, and the reader is referred to Braida et al. (1979), Levitt et al. (1980), Skinner (1988), Studebaker et al. (1991), and Studebaker and Hochberg (1993). It should be noted, however, that major changes are currently taking place in the hearing-aid industry and that these changes are

OCR for page 311
Page 317 likely to result in greater use of advanced speech-processing technology in future hearing instruments. Until now the major thrust in hearing-aid development has been toward instruments of smaller and smaller size because most hearing-aid users do not wish to be seen wearing these devices. Miniature hearing aids that fit entirely in the ear canal and are barely visible are extremely popular. Even smaller hearing aids have recently been developed that occupy only the innermost section of the ear canal and are not visible unless one peers directly into the ear canal. The development of miniature hearing aids that are virtually invisible represents an important turning point in that there is little to be gained cosmetically from further reductions in size. The process of miniaturizing hearing-aid circuits will undoubtedly continue, but future emphasis is likely to be on increasing signal-processing capability within the space available. There is evidence that this change in emphasis has already begun to take place in that several major manufacturers have recently introduced programmable hearing aids— that is, hearing aids that are controlled by a digital controller that, in turn, can be programmed by a computer. The level of speech processing in modern hearing aids is still fairly elementary in comparison with that used in other forms of human-machine communication, but the initial steps have already been taken toward computerization of hearing aids. The potential for incorporating advanced speech-processing techniques into the coming generation of hearing instruments (digital hearing aids, new types of assistive listening devices) opens up new possibilities for the application of technologies developed for human-machine communication. Hearing instruments that process specific speech features (row 3, column 1 of Table 2) are primarily experimental at this stage. The most successful instruments, as determined in laboratory investigations, involve either extraction of the voice fundamental frequency (FO) and/or frequency lowering of the fricative components of speech. It has been shown by Breeuwer and Plomp (1986) that the aural presentation of FO cues as a supplement to lipreading produces a significant improvement in speech intelligibility. It has also been shown by Rosen et al. (1987) that severely hearing-impaired individuals who receive no benefit from conventional hearing aids can improve their lipreading capabilities by using a special-purpose hearing aid (the SiVo) that presents FO cues in a form that is perceptible to the user by means of frequency translation and expansion of variations in FO. A common characteristic of sensorineural hearing impairment is that the degree of impairment increases with increasing frequency. It has been suggested that speech intelligibility could be improved for

OCR for page 311
Page 318 this form of hearing loss by transposing the high-frequency components of speech to the low-frequency region, where the hearing impairment is not as severe (see Levitt et al., 1980). Whereas small improvements in intelligibility (5 to 10 percent) have been obtained for moderate amounts of frequency lowering (10 to 20 percent) for moderate hearing losses (Mazor et al., 1977), the application of greatest interest is that of substantial amounts of frequency lowering for severely hearing-impaired individuals with little or no hearing above about 1000 Hz. A series of investigations on frequency-lowering schemes for hearing impairments of this type showed no significant improvements in speech intelligibility (Ling, 1969). These studies, however, did not take into account the phonetic structure of the speech signal. When frequency lowering is limited to only those speech sounds with substantially more high-frequency energy than low-frequency energy (as is the case with voiceless fricative consonants), significant improvements in speech intelligibility can be obtained (Johansson, 1966; Posen et al., 1993). Frequency lowering of fricative sounds has also proven to be useful in speech training (Guttman et al., 1970). A hearing aid with feature-dependent frequency lowering has recently been introduced for clinical use. Hearing aids that involve speech recognition processing (row 4, column 1, Table 2) are still highly experimental. In one such application of speech recognition technology, the output of a speech recognizer was used to drive a special-purpose speech synthesizer in which important phonetic cues in the speech signal are exaggerated (Levitt et al., 1993). It has been shown that the perception of certain speech sounds can be improved significantly for people with severe hearing impairments by exaggerating relevant acoustic phonetic cues. For example, perception of voicing in consonantal sounds can be improved by increasing the duration of the preceding vowel (Revoile et al., 1986). This type of processing is extremely difficult to implement automatically using traditional methods of signal processing. In contrast, automatic implementation of vowel lengthening is a relatively simple matter for a speech recognition/synthesis system. The major limitations of the experimental speech recognition/ synthesis hearing aid evaluated by Levitt et al. (1993) were the accuracy of the speech recognition unit, the machine-like quality of the synthetic speech, the time the system took to recognize the speech, and the physical size of the system. The last-mentioned limitation does not apply if the system is to be used as a desk-mounted assistive listening device.

OCR for page 311
Page 319 Visual Sensory Aids The development of visual sensory aids for hearing impairment follows much the same sequence as that for auditory aids. The earliest visual sensory aids were concerned primarily with making sounds visible to a deaf person (Bell, 1876). The limitations of these devices for representing speech were soon recognized, and attention then focused on more advanced methods of signal processing that took the average spectral characteristics of speech into account. The sound spectrograph, for example, is designed to make the spectral components of speech visible. (A high-frequency emphasis tailored to the shape of the average speech spectrum is needed because of the limited dynamic range of the visual display.) This device also takes into account the spectral and temporal structure of speech in the choice of bandwidths for the analyzing filters. The visible speech translator (VST), which, in essence, is a real-time-version sound spectrograph, belongs in the category of sensory aids that take the average characteristics of the speech signal into account (Table 2, row 2). The VST was introduced in the belief that it would provide a means of communication for deaf people as well as being a useful research tool (Potter et al., 1947). Early experimental evaluations of the device supported this view. Figure 2 shows data on the number of words that two groups of hearing subjects (B and C) and one deaf subject (A) learned to recognize using the VST. The deaf subject reached a vocabulary of 800 words after 220 hours of training. The curve for this subject also shows no evidence of flattening out; that is, extrapolation of the curve suggests that this subject FIGURE 2 Early results obtained with the visible speech translator. The diagram shows the number of words learned by each of five subjects as a function of the amount of training (in hours). Curve A relates to a deaf engineer; curves B and C are related to two groups of two young adults each with no engineering background. Adapted from Potter et al. (1947).

OCR for page 311
Page 320 would have continued to increase his vocabulary at the rate of about 3.6 words for each hour of training. Subsequent evaluations of the VST have not been as impressive. These studies showed that the device was of limited value as an aid to speech recognition (House et al., 1968), although several subsequent studies have shown that speech spectrum displays can be of value in speech training (Stark, 1972). In the speech-training application the user need only focus on one feature of the display at a time, whereas in the case of a visual display for speech recognition several features must be processed rapidly. Liberman et al. (1968) have argued that speech is a complex code; that the ear is uniquely suited to interpret this code; and that, as a consequence, perception of speech by modalities other than hearing will be extremely difficult. It has since been demonstrated that it is possible for a human being to read a spectrogram without any additional information (Cole et al., 1980). The process, however, is both difficult and time consuming, and it has yet to be demonstrated that spectrogram reading is a practical means of speech communication. A much easier task than spectrogram reading is that of interpreting visual displays in which articulatory or phonetic cues are presented in a simplified form. Visual aids of this type are categorized by row 3, column 2 of Table 2. Evidence of the importance of visual articulatory cues is provided by experiments in speechreading (lipreading). Normal-hearing listeners with no previous training in speechreading have been shown to make substantial use of visual cues in face-to-face communication when the acoustic signal is masked by noise (Sumby and Pollack, 1954). Speechreading cues, unfortunately, are ambiguous, and even the most skilled speechreaders require some additional information to disambiguate these cues. A good speechreader is able to combine the limited information received auditorially with the cues obtained visually in order to understand what was said. Many of the auditory cues that are lost as a result of hearing loss are available from speechreading. The two sets of cues are complimentary to a large extent, thereby making speechreading with acoustic amplification a viable means of communication for many hearing-impaired individuals. A technique designed to eliminate the ambiguities in speechreading is that of ''cued speech" (Cornett, 1967). In this technique, hand symbols are used to disambiguate the speech cues in speechreading. Nearly perfect reception of conversational speech is possible with highly trained receivers of cued speech (Nicholls and Ling, 1982; Uchanski et al., 1994). There have been several attempts to develop sensory aids that provide visual supplements to speechreading. These include eye-

OCR for page 311
Page 321 FIGURE 3 Eyeglasses conveying supplementary speech-feature cues. The speech signal is picked up by a microphone mounted on the frame of a pair of eyeglasses. The signal is analyzed, and coded bars of light indicating the occurrence of certain speech features, such as frication or the noise burst in a stop consonant, are projected onto one of the eyeglass lenses. The light beam is reflected into the eye so as to create a virtual image that is superimposed on the face of the speaker. This device, known as the eyeglass speech reader, is a more advanced version of the eyeglass speechreading aid developed by Upton (1968). (Reproduced from Pickett et al. (1974). glasses with tiny lights that are illuminated when specific speech features are produced (Upton, 1968). A more advanced form of these eyeglasses is shown in Figure 3. A mirror image is used to place these visual cues at the location of the speaker, so that the wearer of the sensory aid does not have to change his/her focal length while looking at the speaker and simultaneously attempting to read the supplementary visual cues (Gengel, 1976). Experimental evaluations of visual supplements to speechreading have shown significant improvements in speech recognition provided the supplemental cues are extracted from the speech signal without error (Goldberg, 1972). In practice, the supplemental cues need to be extracted automatically from the speech signal, usually in real time and often in the presence of background noise and reverberation. Practical systems of this type have yet to show the improvements demonstrated by the error-free systems that have been evaluated in the laboratory. Erroneous information under difficult listening conditions can be particularly damaging to speech understanding. Un-

OCR for page 311
Page 334 cent advances in compact disc technology and computer-interactive media have eliminated much of the inconvenience in scanning recorded texts. A second application of assistive speech technology is the use of machine-generated speech for voice output devices. These applications include reading machines for the blind (Cooper et al., 1984; Kurzweil, 1981); talking clocks and other devices with voice output; and, in particular, voice output systems for computers. The use of this technology is growing rapidly, and, with this growth, new problems are emerging. One such problem is linked to the growing complexity of computer displays. Pictorial symbols and graphical displays are being used increasingly in modern computers. Whereas computer voice output systems are very effective in conveying alphanumeric information to sightless computer users, conveying graphical information is a far more difficult problem. Innovative methods of representing graphical information using machine-generated audio signals combined with tactile displays are being experimented with and may provide a practical solution to this problem (Fels et al., 1992). A second emerging problem is that many sightless people who are heavily dependent on the use of machine-generated speech for both employment and leisure are gradually losing their hearing as they grow older. This is a common occurrence in the general population, as indicated by the large proportion of people who need to wear hearing aids in their later years (see Table 1). Machine-generated speech is usually more difficult to understand than natural speech. For older people with some hearing loss the difficulty in understanding machine-generated speech can be significant. For the sightless computer user whose hearing is deteriorating with age, the increased difficulty experienced in understanding machine-generated speech is a particularly troublesome problem. Good progress is currently being made in improving both the quality and intelligibility of machine-generated speech (Bennett et al., 1993). For the special case of a sightless person with hearing loss, a possible approach for improving intelligibility is to enhance the acoustic characteristics of those information-bearing components of speech that are not easily perceived as a result of the hearing loss. The thrust of this research is similar to that underlying the development of the speech recognition/speech synthesis hearing aid (Levitt et al., 1993).

OCR for page 311
Page 335 Augmentative and Alternative Communication People with a range of different disabilities depend on the use of augmentative and alternative methods of communication (AAC). Machine-generated speech is widely used in AAC, although the way in which this technology is employed depends on the nature of the disability. A nonvocal person with normal motor function may wish to use a speech synthesizer with a conventional keyboard, whereas a person with limited manual dexterity in addition to a severe speech impairment would probably use a keyboard with a limited set of keys or a non-keyboard device that can be controlled in various ways other than by manual keypressing. It is also possible for a person with virtually no motor function to use eye movements as a means of identifying letters or symbols to be used as input to the speech synthesizer. A common problem with the use of synthetic speech in AAC is the time and effort required to provide the necessary input to the speech synthesizer. Few people can type at speeds corresponding to that of normal speech. It is possible for a motivated person with good motor function to learn how to use either a Stenograph or Palantype keyboard at speeds comparable to normal speech (Arnott, 1987). The output of either of these keyboards can be processed by computer so as to drive a speech synthesizer in real time. A high-speed keyboard, however, is not practical for a nonvocal person with poor motor function. It is also likely that for this application a keyboard with a limited number of keys would be used. In a keyboard of this type, more than one letter is assigned to each key, and computer techniques are needed to disambiguate the typed message. Methods of grouping letters efficiently for reduced set keyboards have been investigated so as to allow for effective disambiguation of the typed message (Levine et al., 1987). Techniques of this type have also been used with a Touch-Tone® keypad so that a hearing person using a Touch-Tone telephone can communicate with a deaf person using a text telephone (Harkins et al., 1992). Even with these innovations, this technique has not met with much success among people who are not obliged to use a reduced set keyboard. Other methods of speeding up the communication process is to use the redundancy of language in order to predict which letter or word should come next in a typed message (Bentrup, 1987; Damper, 1986; Hunnicutt, 1986, 1993). A variation of this approach is to use a dictionary based on the user's own vocabulary in order to improve the efficiency of the prediction process (Swiffin et al., 1987). The

OCR for page 311
Page 336 dictionary adapts continuously to the user's vocabulary as the communication process proceeds. For nonvocal people with limited language skills, methods of generating well-formed sentences from limited or ill-formed input are being investigated (McCoy et al., 1989). In some applications, symbols rather than words are used to generate messages that are then synthesized (Hunnicutt, 1993). Computer-assisted instruction using voice communication can be particularly useful for people with disabilities. The use of computers for speech training has already been discussed briefly in the section on visual sensory aids. Similarly, computer techniques for improving human-machine communication can be of great value in developing instructional systems for people who depend on augmentative or alternative means of communication. A relatively new application of this technology is the use of computerized speech for remedial reading instruction (Wise and Olsen, 1993). A common problem with text-to-speech systems is the quality of the synthetic voice output. Evaluations of modern speech synthesizers indicate that these systems are still far from perfect (Bennett et al., 1993). Much of the effort in the development of text-to-speech systems has been directed toward making the synthesized speech sound natural as well as being intelligible. In some applications, such as computer voice output for a hearing-impaired sightless person, naturalness of the synthesized speech may need to be sacrificed in order to improve intelligibility by exaggerating specific features of the speech signal. The artificial larynx represents another area in which advances in speech technology have helped people with speech disabilities. The incidence of laryngeal cancer has grown over the years, resulting in a relatively large number of people who have had a laryngectomy. Many laryngectomies use an artificial larynx in order to produce intelligible speech. Recent advances in speech technology coupled with an increased understanding of the nature of speech production have resulted in significant improvements in the development of artificial larynxes (Barney et al., 1959; Sekey, 1982). Recent advances include improved control of speech prosody and the use of microprocessor-generated glottal waveforms based on recent theories of vocal cord vibration in order to produce more natural-sounding speech (Alzamora et al., 1993). Unlike the applications of assistive technology discussed in the previous sections, the use of voice communication technology in AAC is highly individualized. Nevertheless, two general trends are apparent. The first is the limitation imposed by the relatively low speed

OCR for page 311
Page 337 at which control information can be entered to the speech synthesizer by the majority of candidates for this technology. The second is the high incidence of multiple disabling conditions and the importance of taking these factors into account in the initial design of assistive devices for this population. In view of the above, an important consideration for the further development of assistive technology for AAC is that of developing flexible generalized systems that can be used for a variety of applications involving different combinations of disabling conditions (Hunnicutt, 1993). Assistive Voice Control: Miscellaneous Applications Assistive voice control is a powerful enabling technology for people with mobility disabilities. Examples of this new technology include voice control of telephones, home appliances, powered hospital beds, motorized wheelchairs, and other such devices (Amori, 1992; Miller, 1992). In each case the set of commands to control the device is small, and reliable control using a standardized set of voice commands can be achieved using existing speech recognition technology. Several of these applications of voice control technology may also find a market among the general population (e.g., voice control of a cellular telephone while driving a car, remote control of a television set, or programming a VCR by voice). A large general market will result in mass production of the technology, thereby reducing cost for people who need this technology because of a mobility disability. An additional consideration in developing assistive voice control technology is that a person with a neuromotor disability may also have dysarthria (i.e., unclear speech). For disabilities of this type, the speech recognition system must be capable of recognizing various forms of dysarthric speech (Goodenough-Trepagnier et al., 1992). This is not too difficult a problem for speaker-dependent automatic speech recognition for those dysarthrics whose speech production is consistent although not normal. Machine recognition of dysarthric speech can also be of value in assisting clinicians obtain an objective assessment of the speech impairment. Voice control of a computer offers intriguing possibilities for people with motor disabilities. Relatively simple methods of voice control have already been implemented, such as remote control of a computer mouse (Miller, 1992), but a more exciting possibility that may have appeal to the general population of computer users (and, concomitantly, may reduce costs if widely used) is that of controlling or programming a computer by voice. The future in this area of assistive voice control technology appears most promising.

OCR for page 311
Page 338 ACKNOWLEDGMENT Preparation of this paper was supported by Grant No. 5P50DC00178 from the National Institute on Deafness and Other Communication Disorders. REFERENCES Alzamora, D., D. Silage, and R. Yantorna (1993). Implementation of a software model of the human glottis on a TMS32010 DSP to drive an artificial larynx. Proceedings of the 19th Northeast Bioengineering Conference, pp. 212-214, New York: Institute of Electrical and Electronic Engineers. Amori, R. D. (1992). Vocomotion: An intelligent voice-control system for powered wheelchairs. Proceedings of the RESNA International '92 Conference, pp. 421423. Washington, D.C.: RESNA Press. Arnott, J. L. (1987). A comparison of Palantype and Stenograph keyboards in high-speed speech output systems. RESNA '87, Proceedings of the 10th Annual Conference on Rehabilitation Technology, pp. 106-108. Washington D.C.: RESNA Press. Barney, H., F. Haworth, and H. Dunn (1959). Experimental transistorized artificial larynx. Bell Syst. Tech. J., 38:1337-1359. Bell, A. G. (1876). Researches in telephony. Proceedings of the American Academy of Arts and Sciences, XII, pp. 1-10. Reprinted in Turning Points in American Electrical History, Britain, J.E., Ed., IEEE Press, New York, 1976. Bennett, R. W., A. K. Syrdal, and S. L. Greenspan (Eds.) (1993). Behavioral Aspects of Speech Technology. Amsterdam: Elsevier Science Publishing Co. Bentrup, J. A. (1987). Exploiting word frequencies and their sequential dependencies. RESNA '87, Proceedings of the 10th Annual Conference on Rehabilitation Technology, pp. 121-123. Washington, D.C.: RESNA Press. Bilser, F. A., W. Soede, and J. Berkhout (1993). Development and assessment of two fixed-array microphones for use with hearing aids. J. Rehabil. Res. Dev., 30(1)7381. Boothroyd, A., and T. Hnath (1986). Lipreading with tactile supplements. J. Rehabil. Res. Dev., 23(1):139-146. Boothroyd, A., T. Hnath-Chisolm, L. Hanin, and L. Kishon-Rabin (1988). Voice fundamental frequency as an auditory supplement to the speechreading of sentences. Ear Hear., 9:306-312. Braida, L. D., N. L. Durlach, R. P. Lippmann, B. L. Hicks, W. M. Rabinowitz, and C. M. Reed (1979). Hearing Aids-A Review of Past Research of Linear Amplification, Amplitude Compression and Frequency Lowering. ASHA Monograph No. 19. Rockville, Md.: American Speech-Language-Hearing Association. Breeuwer, M., and R. Plomp (1986). Speechreading supplemented with auditorily-presented speech parameters. J. Acoust. Soc. Am., 79:481-499. Brooks, P. L., B. U. Frost, J. L. Mason, and D. M. Gibson (1986). Continuing evaluation of the Queens University Tactile Vocoder, Parts I and II. J. Rehabil. Res. Dev., 23(1):119-128, 129-138. Chabries, D. M., R. W. Christiansen, R. H. Brey, M. S. Robinette, and R. W. Harris (1987). Application of adaptive digital signal processing to speech enhancement for the hearing impaired. J. Rehabil. Res. Dev., 24(4):65-74. Cholewiak, R. W., and C. E. Sherrick (1986). Tracking skill of a deaf person with long-term tactile aid experience: A case study. J. Rehabil. Res. Dev., 23(2):20-26.

OCR for page 311
Page 339 Clark, G. M., R. K. Shepherd, J. F. Patrick, R. C. Black, and Y. C. Tong (1983). Design and fabrication of the banded electrode array. Ann. N.Y. Acad. Sci., 405:191-201. Clark, G. M., Y. C. Tong and J. F. Patrick (1990). Cochlear Prostheses. Edinburgh, London, Melbourne and New York: Churchill Livingstone. Cohen, N. L., S. B. Waltzman, and S. G. Fisher (1993). A prospective randomized cooperative study of advanced cochlear implants. N. Engl. J. Med., 43:328. Cole, R. A., A. I. Rudnicky, V. W. Zue, and D. R. Reddy (1980). Speech as patterns on paper. In Perception and Production of Fluent Speech, R. A. Cole (Ed.). Hillsdale, N.J.: Lawrence Erlbaum Associates. Cooper, F. S., J. H. Gaitenby, and P. W. Nye (1984). Evolution of reading machines for the blind: Haskins Laboratories' research as a case history. J. Rehabil. Res. Dev., 21:51-87. Cornett, R. 0. (1967). Cued speech. Am. Ann. Deaf, 112:3-13. Damper, R. I. (1986). Rapid message composition for large vocabulary speech output aids: a review of possibilities. J. Augment. Altern. Commun., 2:4. Dillon, H., and R. Lovegrove (1993). Single-microphone noise reduction systems for hearing aids: A review and an evaluation. In Acoustical Factors Affecting Hearing Aid Performance, 2nd Ed., G. A. Studebaker and I. Hochberg (Eds.), pp. 353-370, Needham Heights, Mass.: Allyn and Bacon. Douek, E., A. J. Fourcin, B. C. J. Moore, and G. P. Clarke (1977). A new approach to the cochlear implant. Proc. R. Soc. Med., 70:379-383. Eddington, D. K. (1983). Speech recognition in deaf subjects with multichannel intracochlear electrodes. Ann. N.Y. Acad. Sci., 405:241-258. Fabry, D. A. (1991). Programmable and automatic noise reduction in existing hearing aids. In The Vanderbilt Hearing Aid Report II, G. A. Studebaker, F. H. Bess, and L. B. Beck (Eds.), pp. 65-78. Parkton, Md.: York Press. Fabry, D. A., M. R. Leek, B. E. Walden, and M. Cord (1993). Do adaptive frequency response (AFR) hearing aids reduce upward spread of masking? J. Rehabil. Res. Dev., 30(3):318-323. Fels, D. I., G. F. Shein, M. H. Chignell, and M. Milner (1992). See, hear and touch the GUI: Computer feedback through multiple modalities. Proceedings of the RESNA International '92 Conference, pp. 55-57, Washington, D.C.: RESNA Press. Gault, R. H. (1926). Touch as a substitute for hearing in the interpretation and control of speech. Arch. Otolaryngol, 3:121-135. Gault, R. H., and G. W. Crane (1928). Tactual patterns from certain vowel qualities instrumentally communicated from a speaker to a subject's fingers. J. Gen. Psychol., 1:353-359. Gengel, R. W. (1976). Upton's wearable eyeglass speechreading aid: History and current status. In Hearing and Davis: Essays Honoring Hallowell Davis, S. K. Hirsh, D. H. Eldredge, I. J. Hirsh, and R. S. Silverman (Eds.). St. Louis, Mo.: Washington University Press. Goldberg, A. J. (1972). A visible feature indicator for the severely hard of hearing. IEEE Trans. Audio Electroacoust., AU-20:16-23. Goodenough-Trepagnier, C., H. S. Hochheiser, M. J. Rosen, and H-P, Chang (1992). Assessment of dysarthric speech for computer control using speech recognition: Preliminary results. Proceedings of the RESNA International '92 Conference, pp. 159-161. Washington, D.C.: RESNA Press. Graupe, D., J. K. Grosspietsch, and S. P. Basseas (1987). A single-microphone-based self-adaptive filter of noise from speech and its performance evaluation. J. Rehabil. Res. Dev., 24(4):119-126.

OCR for page 311
Page 340 Guttman, N., H. Levitt, and P. A. Bellefleur (1970). Articulation training of the deaf using low-frequency surrogate fricatives. J. Speech Hear. Res., 13:19-29. Harkins, J. E., and B. M. Virvan (1989). Speech to Text: Today and Tomorrow. Proceedings of a Conference at Gallaudet University. GRI Monograph Series B, No. 2. Washington, D.C.: Gallaudet Research Institute, Gallaudet University. Harkins, J. E., H. Levitt, and K. Peltz-Strauss (1992). Technology for Relay Service, A Report to the Iowa Utilities Board. Washington, D.C.: Technology Assessment Program, Gallaudet Research Institute, Gallaudet University. Hochberg, I. H., A. S. Boothroyd, M. Weiss, and S. Hellman (1992). Effects of noise suppression on speech perception by cochlear implant users. Ear Hear., 13(4):263271. House, W., and J. Urban (1973). Long-term results of electrical implantation and electronic stimulation of the cochlea in man. Ann. Otol. Rhinol., Laryngol., 82:504510. House, A. S., D. P. Goldstein, and G. W. Hughes (1968). Perception of visual transforms of speech stimuli: Learning simple syllables. Am. Ann. Deaf, 113:215-221. Hunnicutt, S. (1986). Lexical prediction for a text-to-speech system. In Communication and Handicap: Aspects of Psychological Compensation and Technical Aids, E. Hjelmquist & L-G. Nilsson, (Eds.). Amsterdam: Elsevier Science Publishing Co. Hunnicutt, S. (1993). Development of synthetic speech technology for use in communication aids. In Behavioral Aspects of Speech Technology, R. W. Bennett, A. K. Syrdal, and S. L. Greenspan (Eds.). Amsterdam: Elsevier Science Publishing Co. Johansson, B. (1966). The use of the transposer for the management of the deaf child. Int. Audiol., 5:362-373. Kanevsky, D., C. M. Danis, P. S. Gopalakrishan, R. Hodgson, D. Jameson, and D. Nahamoo (1990). A communication aid for the hearing impaired based on an automatic speech recognizer. In Signal Processing V: Theories and Applications, L. Torres, E. Masgrau, and M. A. Lagunas (Eds.). Amsterdam: Elsevier Science Publishing Co. Karis, D., and K. M. Dobroth (1991). Automating services with speech recognition over the public switched telephone network: Human factors considerations. IEEE J. Select. Areas Commun., 9(4). Kewley-Port, D., C. S. Watson, D. Maki, and D. Reed (1987). Speaker-dependent speech recognition as the basis for a speech training aid. Proceedings of the 1987 IEEE Conference on Acoustics, Speech, and Signal Processing, pp. 372-375, Dallas, Tex.: Institute of Electrical and Electronic Engineering. Knudsen, V. 0. (1928). Hearing with the sense of touch. J. Gen. Psychol., 1:320-352. Kurzweil, R. C. (1981). Kurzweil reading machine for the blind. Proceedings of the Johns Hopkins First National Search for Applications of Personal Computing to Aid the Handicapped, pp. 236-241. New York: IEEE Computer Society Press. LaPlante, M. P., G. E. Hendershot, and A. J. Moss (1992). Assistive technology devices and home accessibility features: Prevalence, payment, need, and trends. Advance Data, Number 127. Atlanta: Vital and Health Statistics of the Centers for Disease Control/National Center for Health Statistics. Levine, S. H., C. Goodenough-Trepagnier, C. O. Getschow, and S. L. Minneman (1987). Multi-character key text entry using computer disambiguation. RESNA '87, Proceedings of the 10th Annual Conference on Rehabilitation Technology, pp. 177179. Washington, D.C.: RESNA. Levitt, H. (1993). Future directions in hearing aid research. J. Speech-Lang.-Pathol. Audiol., Monograph Suppl. 1, pp. 107-124.

OCR for page 311
Page 341 Levitt, H., J. M. Pickett, and R. A. Houde (Eds.) (1980). Sensory Aids for the Hearing Impaired. New York: IEEE Press. Levitt, H., M. Bakke, J. Kates, A. Neuman, and M. Weiss (1993). Advanced signal processing hearing aids. In Recent Developments in Hearing Instrument Technology, Proceedings of the 15th Danavox Symposium, J. Beilen and G. R. Jensen (Eds.), pp. 247-254. Copenhagen: Stougaard Jensen. Liberman, A. M., F. S. Cooper, D. P. Shankweiler, and M. Studdert-Kennedy (1968). Why are spectrograms hard to read? Am. Ann. Deaf, 113:127-133. Ling, D. (1969). Speech discrimination by profoundly deaf children using linear and coding amplifiers. IEEE Trans. Audio Electroacoust., AU-17:298-303. Mazor, M., H. Simon, J. Scheinberg, and H. Levitt (1977). Moderate frequency compression for the moderately hearing impaired. J. Acoust. Soc. Am., 62:1273-1278. McCoy, K., P. Demasco, Y. Gong, C. Pennington, and C. Rowe (1989). A semantic parser for understanding ill-formed input. RESNA '89, Proceedings of the 12th Annual Conference, pp. 145-146. Washington, D.C.: RESNA Press. McGarr, N. S., K. Youdelman, and J. Head (1992). Guidebook for Voice Pitch Remediation in Hearing-Impaired Speakers. Englewood, Colo.: Resource Point. Miller, G. (1992). Voice recognition as an alternative computer mouse for the disabled. Proceedings of the RESNA International '92 Conference, pp. 55-57, Washington D.C.: RESNA Press. Miller, J. D., A. M. Engebretsen, and C. L. DeFilippo (1974). Preliminary research with a three-channel vibrotactile speech-reception aid for the deaf. In Speech Communication, Vol. 4, Proceedings of the Speech Communication Seminar, G. Fant (Ed.). Stockholm: Almqvst and Wiksell. Nicholls, G., and D. Ling (1982). Cued speech and the reception of spoken language. J. Speech Hear. Res., 25:262-269. Osberger, M. J., and H. Levitt (1979). The effect of timing errors on the intelligibility of deaf children's speech. J. Acoust. Soc. Am., 66:1316-1324. Peterson, P. M., N. I. Durlach, W. M. Rabinowitz, and P. M. Zurek (1987). Multimicrophone adaptive beamforming for interference reduction in hearing aids. J. Rehabil. Res. Dev., 24(4):103-110. Pickett, J. M., and B. M. Pickett (1963). Communication of speech sounds by a tactual vocoder. J. Speech Hear. Res., 6:207-222. Pickett, J. M., R. W. Gengal, and R. Quinn (1974). Research with the Upton eyeglass speechreader. In Speech Communication, Vol. 4, Proceedings of the Speech Communication Seminar, G. Fant (Ed.). Stockholm: Almqvist and Wiksell. Posen, M. P., C. M. Reed, and L. D. Braida (1993). The intelligibility of frequency-lowered speech produced by a channel vocoder. J. Rehabil. Res. Dev., 30(1):26-38. Potter, R. K., A. G. Kopp, and H. C. Green (1947). Visible Speech. New York: van Nostrand Co. Revoile, S. G., L. Holden-Pitt, J. Pickett, and F. Brandt (1986). Speech cue enhancement for the hearing impaired: I. Altered vowel durations for perception of final fricative voicing. J. Speech Hear. Res., 29:240-255. Risberg, A. (1968). Visual aids for speech correction. Am. Ann. Deaf, 113:178-194. Robbins, A. M., S. L. Todd, and M. J. Osberger (1992). Speech perception performance of pediatric multichannel tactile aid or cochlear implant users. In Proceedings of the Second International Conference on Tactile Aids, Hearing Aids, and Cochlear Implants, A. Risberg, S. Felicetti, G. Plant, and K. E. Spens (Eds.), pp. 247-254. Stockholm: Royal Institute of Technology (KTH). Rosen, S., J. R. Walliker, A. Fourcin, and V. Ball (1987). A micro-processor-based acoustic

OCR for page 311
Page 342 hearing aid for the profoundly impaired listener. J. Rehabil. Res. Dev., 24(4):239260. Ryalls, J., M. Cloutier, and D. Cloutier (1991). Two clinical applications of IBM's SpeechViewer: Therapy and its evaluation on the same machine. J. Comput. User's Speech Hear., 7(1):22-27. Schein, J. D., and M. T. Delk (1974). The Deaf Population of the United States. Silver Spring, Md.: National Association of the Deaf. Schwander, T. J., and H. Levitt (1987). Effect of two-microphone noise reduction on speech recognition by normal-hearing listeners. J. Rehabil. Res. Dev., 24(4):87-92. Sekey, A. (1982). Electroacoustic Analysis and Enhancement of Alaryngeal Speech. Springfield, Ill.: Charles C. Thomas. Sherrick, C. E. (1984). Basic and applied research on tactile aids for deaf people: Progress and prospects. J. Acoust. Soc. Am., 75:1325-1342. Shimizu, Y. (1989). Microprocessor-based hearing aid for the deaf. J. Rehabil. Res. Dev., 26(2):25-36. Simmons, R. B. (1966). Electrical stimulation of the auditory nerve in man. Arch. Otolaryngol., 84:2-54. Skinner, M. W. (1988). Hearing Aid Evaluation. Englewood Cliffs, N.J.: Prentice-Hall. Soede, W. (1990). Improvement of speech intelligibility in noise: Development and evaluation of a new directional hearing instrument based on array technology. Ph.D. thesis, Delft University of Technology, The Netherlands. Stark, R. E. (1972). Teaching /ba/ and /pa/ to deaf children using real-time spectral displays. Lang. Speech, 15:14-29. Stuckless, E. R. (1989). Real-time captioning in education. In Speech to Text: Today and Tomorrow. Proceedings of a Conference at Gallaudet University, J. E. Harkins and B. M. Virvan (Eds.). GRI Monograph Series B, No. 2. Washington D.C.: Gallaudet Research Institute, Gallaudet University. Studebaker, G. A., and I. Hochberg, (Eds.) (1993). Acoustical Factors Affecting Hearing Aid Performance, Second Edition. Needham Heights, Mass.: Allyn and Bacon. Studebaker, G. A., F. H. Bess, and L. B. Beck (Eds.) (1991). The Vanderbilt Hearing Aid Report II. Parkton, Md.: York Press. Sumby, W. H., and I. Pollack (1954). Visual contribution to speech intelligibility in noise. J. Acoust. Soc. Am., 26:212-215. Swiffin, A. L., J. Arnott, J. Pickering, and A. Newell (1987). Adaptive and predictive techniques in a communication prosthesis. Augment. Altern. Commun., 3:181191. Uchanski, R. M., L. A. Delhorne, A. K. Dix, L. D. Braida, C. M. Reed, and N. I. Durlach (1994). Automatic speech recognition to aid the hearing impaired: Prospects for the automatic generation of cued speech. J. Rehabil. Res. Dev., 31(1):20-41. Upton, H. (1968). Wearable eyeglass speech reading aid. Am. Ann. Deaf, 113:222-229. Van Tasell, D. J., S. Y. Larsen, and D. A. Fabry (1988). Effects of an adaptive filter hearing aid on speech recognition in noise by hearing-impaired subjects. Ear Hear., 9:15-21. Watson, C. S., D. Reed, D. Kewley-Port, and D. Maki (1989). The Indiana Speech Training Aid (ISTRA) I: Comparisons between human and computer-based evaluation of speech quality. J. Speech Hear. Res., 32:245-251. Weiss, M. (1993). Effects of noise and noise reduction processing on the operation of the Nucleus-22 cochlear implant processor. J. Rehabil. Res. Dev., 30(1):117-128. Weiss, M., and E. Aschkenasy (1981). Wideband Speech Enhancement, Final Technical Report. RADC-TR-81-53. Griffiss Air Force Base, N.Y.: Rome Air Development Center, Air Force Systems Command.

OCR for page 311
Page 343 Wilson, B. S., C. C. Finley, D. T. Lawson, R. D. Wolford, and M. Zerbi (1993). Design and evaluation of a continuous interleaved sampling (CIS) processing strategy for multichannel cochlear implants. J. Rehabil. Res. Dev., 30(1):110-116. Wise, R., and R. Olson (1993). What computerized speech can add to remedial reading. In Behavioral Aspects of Speech Technology, R. W. Bennett, A. K. Syrdal, and S. L. Greenspan (Eds.). Amsterdam: Elsevier Science Publishing Co. World Health Organization (1980). The International Classification of Impairments, Disabilities, and Handicaps. Geneva: World Health Organization. Yamada, Y., N. Murata, and T. Oka (1988). A new speech training system for profoundly deaf children. J. Acoust. Soc. Am., 84(1):43.

OCR for page 311