Cover Image

HARDBACK
$89.95



View/Hide Left Panel

Page 525

Index

A

Abbreviations, pronunciation of, 142-143

Acoustic

interactions, 122

inventory elements, 126

models/modeling, 26, 36, 85, 95, 117, 122, 182-183, 476

Kratzenstein's resonators, 78, 80

phonetics, 85, 95

speech recognition, 64, 182-183

speech synthesis, 137

terminal analog synthesizer, 117

Advanced Research Projects Agency. See also Airline Travel Information System

Benchmark Evaluation summaries, 224-225

common speech corpora, 181-182

continuous speech recognition program, 175-176, 181-182

Human Language Technology Program, 108

research funding, 349

Speech and Natural Language Workshop, 359

Speech Language Understanding Program, 262-263

Spoken Language Systems program, 218-219, 220, 230, 232-233, 250, 254, 255-256, 262-263, 265, 405

Resource Management corpus, 181-182, 184, 185, 188, 376, 377

Wall Street Journal corpus, 184-185, 186, 187

Air traffic control, 365-366

Airline Travel Information System (ATIS), 376

context-dependent utterances, 61

corpus, 61, 184-185, 234, 219, 250, 256, 257-258, 491

degree of difficulty, 383-385

error rates, 252, 486

human performance on, 162

interactive dialogue, 227, 228, 233



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 525
Page 525 Index A Abbreviations, pronunciation of, 142-143 Acoustic interactions, 122 inventory elements, 126 models/modeling, 26, 36, 85, 95, 117, 122, 182-183, 476 Kratzenstein's resonators, 78, 80 phonetics, 85, 95 speech recognition, 64, 182-183 speech synthesis, 137 terminal analog synthesizer, 117 Advanced Research Projects Agency. See also Airline Travel Information System Benchmark Evaluation summaries, 224-225 common speech corpora, 181-182 continuous speech recognition program, 175-176, 181-182 Human Language Technology Program, 108 research funding, 349 Speech and Natural Language Workshop, 359 Speech Language Understanding Program, 262-263 Spoken Language Systems program, 218-219, 220, 230, 232-233, 250, 254, 255-256, 262-263, 265, 405 Resource Management corpus, 181-182, 184, 185, 188, 376, 377 Wall Street Journal corpus, 184-185, 186, 187 Air traffic control, 365-366 Airline Travel Information System (ATIS), 376 context-dependent utterances, 61 corpus, 61, 184-185, 234, 219, 250, 256, 257-258, 491 degree of difficulty, 383-385 error rates, 252, 486 human performance on, 162 interactive dialogue, 227, 228, 233

OCR for page 525
Page 526 Airline Travel Information System (cont'd) language understanding methods, 258, 268 N-best filtering in, 227 and Online Airline Guide, 219 order in problem solving, 229 overview of, 46 recognition errors, 261, 262 spontaneous input, 234 template-based approach, 259 understanding errors, 262 Algorithms ambiguity-handling, 56 assessment of, 391-392, 405, 409-412 Baum-Welch training (forward-backward), 178-179, 199, 202-207, 489 beam search, 202, 210, 212, 214 compression, 83, 381 databases, 405-409 inside/outside, 263-264, 489-490, 491 intonation contours, 45 large vocabularies, 307 learning, 249, 250, 263-264 nonlinear interpolation, 97 part-of-speech assignment, 143 probabilistic parsing, 56 prosodic phrase generation, 146, 147, 151 reference resolution, 57 robustness, 391, 392, 405, 412-416 search, 180-181, 189, 199, 202, 208, 209, 264-265 speech processing, 21, 393 speech recognition, 28, 409-411, 412, 417-418, 431, 468, 469 speech synthesis, 468 Stack, 202, 208 standardization, 7 text-to-speech, 25 Viterbi, 173, 180, 199, 202, 208-209, 210, 213 voice coding, 7 wordspotting, 404, 431 Allophone models, 182 American Automobile Association, 354 Ameritech, 291, 292, 293, 302 Analog-to-digital converter, 22-23, 189, 350 Analysis-by-synthesis systems articulatory data, 125 and automatic learning, 127 bit-rate reduction and, 23 and "break index "data, 148-149 defined, 118 linear predictive coding, 24, 26-27, 119 PSOLA methods 119-120 source-filter technique, 119 in speech analysis, 26-27 in speech recognition, 30 text-to-speech conversion as, 136 Apple Macintosh, 52 Applications of voice communications. See Assistive technology for disabled persons Deployment of applications Military and government applications Telecommunications Telephony air travel information systems, 46, 85-86, 162 aircraft pilots, 40, 41, 44, 45, 359, 365, 509 assessment criteria, 409-410 automatic teller machines, 86 computer-aided instruction, 151 databases for, 406-408 baggage handlers, 40 consumer electronics programming, 43, 353 development environment, 400-401 driving instructions, 354 economic impact of, 280 expectations for, 505-506 force feedback glove, 98, 101 foreign language learning, 44

OCR for page 525
Page 527 Applications of voice communications (cont'd) hands/eyes-busy tasks and, 39-41 in information society, 506-508 limited keyboard/screen option and, 41-43 medical report generation, 351 motor vehicle navigation, 44 multimodal systems, 63-64 parcel sorters, 40, 509 portable computers, 43 reading lessons, 44 real-time support, 403-405 smart interfaces, 508-509 speech interface technology, 347-356 success factors, 289-290 security, 7 speech recognition, 28-29, 30-32, 81, 275-282, 283-284, 318, 377-379, 451, 457, 458, 471, 508-510 stock quotation service, 283, 292, 293, 299, 354, 437, 438, 439 technology trends, 399-405 text-to-speech synthesis, 43, 109, 280, 282, 302, 354, 451 user-friendly, 508-510 video/audio conferencing system, 99-100 and VLSI technology, 40, 54, 510-511 voice input, 39-44 voice output, 44-45 wire installers, 40 Army. See also Military and government applications Avionic Research and Development Activity (AVRADA), 362 Communications and Electronics Command (CECOM) program, 361-362 Articulatory models, 88, 95, 117, 118, 120, 122, 124-125, 152-153, 461-463, 476 Artificial intelligence, 484 Artificial neural networks, 2, 21, 124, 190, 191-193, 381, 479 Assembler language, 399-400, 401 Assistive technology for disabled persons assistive listening devices, 315-316 augmentative and alternative communication, 130, 335-337 captioning, 314-315, 322-323 carpal tunnel syndrome, 43 categories of sensory aids, 316 cochlear implants, 314, 328-331, 332-333 computer-assisted instruction, 336 deaf-blind, 327 direct stimulation of auditory system, 328-331 dysarthric speech, 337 extracochlear implant, 329-330 eyeglass speechreader, 320-322 hearing aids and assistive listening devices, 278, 311, 312, 315-318, 328-331,332 hearing impaired, 43, 292, 302-304, 312, 314-333 limitations of, 318 mobility control, 312 noise reduction, 331-333 reading machines for blind, 349 research and development efforts, 313-314 sound/speech spectrograph, 319, 325, 349 with speech/language disabilities, 311, 313, 325 speech recognition, 275-279 speech processing for sightless people, 279, 313, 329, 333-335, 349 speechreading cues, 320-321, 325, 327, 328 tactile sensory aids, 314, 324-328 talking books, 333 Telephone Relay Services (TRS),292, 302-303, 322 teletypewriters, 323

OCR for page 525
Page 528 Assistive technology for disabled persons (cont'd) Terminal Device for the Deaf (TDD), 302, 314, 322 text telephone, 322, 323, 324 use trends, 312-313 visible speech translator, 319-320 visual sensory aids, 319-324 voice control, 278-279, 313, 337 voice output devices, 334, 336 ATR International, 9, 10, 42, 83, 108-109, 115, 119, 128, 130, 176, 513 AT&T 800 Speech Recognition service, 298 articulatory models, 124-125 Directory Assistance Call Completion, 292, 301-302 control of network fraud, 291 government funding, 349 hidden Markov models, 175 Hobbit chip, 511-512 HuMaNet, 454 Intelligent Network, 292, 298 operator services automation, 291,292, 293 packet data network [XUNET], 99 speech synthesis technology, 107-108, 112, 124-125 spoken language translation, 9, 10, 130 Telephone Relay Services (TRS), 292, 302-303 telephone speech database, 407 Terminal Device for the Deaf (TDD), 302 text-to-speech system, 348-349 voice dialing system, 300, 383-385 Voice English-Spanish Translation (VEST), 10 Voice Interactive Phone, 292, 300-301 voice processing vision, 285-286 Voice Prompter, 292 Voice Recognition Call Processing (VRCP), 292, 293-295, 383-385 voice response systems market, 281, 282 Who's Calling service, 282 wordspotting techniques, 305 Auditory modeling, 24, 26, 91, 92, 94, 97 B Bandwidth compression, 81 Basilar membrane filtering, 97 Bell, Alexander Graham, 77-78 Bell Atlantic, 291 Bell Mobility (BM), 302 Bell Northern Research (BNR), 176, 282, 283, 292, 293, 294, 295, 299, 383-386, 437, 438, 439 Bell System, 6 Bellcore, 291-293 Bigram models, 201, 209, 211, 213, 214, 222 Bit rates and image processing, 101 speech coding and, 23, 24, 81, 83-84 text-to-speech synthesis, 29, 77 Bolt, Beranek, and Newman (BBN) Systems and Technologies ATIS, 46, 261 Delphi system, 259 directory service, 438 hidden Markov models, 175 N-best filtering and rescoring, 267 word lattice parsing, 265 ''Break index" data, 147, 148 C C cross compiler, 399-400 Cambridge University, 176 Carnegie-Mellon University (CMU). See also Airline Travel Information System ATIS, 46, 261 dialogue state information, 229

OCR for page 525
Page 529 Carnegie-Mellon University (cont'd) HMM applications in speech recognition, 175 multilingual systems, 42, 83 Phoenix system, 258, 259, 260 recursive transition networks, 222 spoken language translation, 9 Cepstrum techniques, 28, 86, 178, 182-183, 476 Chaos, 21, 26 Classification and decision tree techniques, 152 Classification and regression tree techniques, 147 CNET, 130 COCOSDA group, 130 Coding. See Linear Predictive coding; Music coding; Speech coding Compact disc technology, 334 Compound words, 142, 147 Compression algorithms, 83, 381 bandwidth, 81 image, 99 speech, 23, 83, 474 two-channel amplification, 332 Computation models of language, 78, 81, 86, 90-91 of pronunciation, 139 research needs, 30 speech recognition systems, 30 speech synthesis, 137 speed, 19-20, 97 teraflop capability, 97 Viterbi algorithm, 173 Computer-aided tools, 21, 510 Computer Search and Language, 2-3 Consonents alveolar flapped, 142 clusters, 138, 140 modeling, 123 Consortium for Lexical Research, 241 Context-oriented clustering, 126 Corpora Airline Travel Information Service, 61, 184-185, 219, 250, 256, 257-258, 491 American English, 489, 495 annotated, 493, 494-495 Brown, 489, 495, 499 common speech, 181-182 connected digit, 184-185 IBM/Lancaster Treebank, 495 large linguistic, 447 Penn Treebank, 241, 491, 495 Resource Management, 181-182, 184, 185, 188, 376, 377 optimization, 113 telephone speech, 408-409 Wall Street Journal, 184-185, 186, 187 Creak (vocal), 122 CRIM, 176 Cross-word effects, 182 CSELT, 176 CSTR, 130 Currency, pronunciation of, 143 Cybernetics, 445-446, 448-449 D Databases. See also Corpora algorithms, 405-409 for applications, 406-408 dialect considerations, 409 interfaces, 240, 252 large tagged, 152 natural language interfaces, 240 NTIMIT, 409 Official Airline Guide, 46, 219 relational, 53-54 for research, 405-406 remote access to, 42, 44, 278, 296-299, 348, 349, 351 retrieval system, product quality, 57 simulated telephone lines, 408-409 speech, 387, 405, 407, 468, 472 StockTalk, 383-386, 437, 438, 439 WordNet, 499 DEC, 130 Decision criteria, 305

OCR for page 525
Page 530 Defense Advanced Research Projects Agency. See Advanced Research Projects Agency Deployment of applications degree of difficulty and, 375-386 hardware considerations, 381, 382-383 language understanding task dimensions and, 379-381 military technology transfer, 367-369 obstacles to, 374-375 procedure for, 386-388 speech recognition task dimension and, 377-379 speech synthesis task dimensions and, 381-382 system integration requirements, 383 technical challenges in, 280-281 Desert Storm, 360 Dialogue capabilities, 85, 403-405 clarification/confirmation, 56, 62-63 continuous speech, 431-432 convergence of styles, 60 conversational dynamics, 431-432 engineering constraints, 387-388 feedback and confirmation, 437-438 finite state transition network, 63, 85 flow, 435-436 grammars, 62, 63 interaction and, 61-63 models, 62-63 natural language, 17, 56, 61-63 quantity of text and, 381 real-time processing function, 403-404 research, 63, 66 robustness of, 66 speech recognition, 63 spoken language systems, 47, 60, 61-63, 66, 229 talk-over, 431 task-specific voice control, 452 transcript, 433-434 Dictation devices, automatic, 50, 77, 81, 426, 428, 437-438. See also Text, typewriters Digital encryption, 83 speech coding, 25, 82-83, 85 filtering, 19 telephone answering machines, 7-8 Digital computers. See also Digital signal processors and speech signal processing, 19, 78, 81, 189, 393-396 and microelectronics, 19-21, 81 Digital-to-analog converter, 23, 398 Digital signal processors/processing applications, 350, 400-401 capabilities, 391, 393-394 development environment, 399-405 distributed control of, 404-405 floating-point, 383, 394-396 growth of, 19, 78, 81 integer, 383 for LSP synthesis, 398 mechanisms, 393 microphone arrays, 97 technology status, 393-396 transputer architecture, 396, 397 workstation requirements, 189 Digitizing pens, 52 Diplophonia, 122 Discourse natural language processing, 246 and prosodic marking, 149-151 speech analysis, 145, 149-151 in spoken language systems, 227-230 in text-to-speech systems, 145 Dragon Systems, Inc., 176, 380, 401, 402 Dynamic grammar networks, 265-266 Dynamic time warping (DTW), 28

OCR for page 525
Page 531 E Electronic mail (e-mail), 8, 306, 381 ESPRIT/Polyglot project, 123, 129, 130, 406 Etymology proper name estimates, 92 trigram statistics, 141 Experiments capabilities, 32 real-time, 32 research cycle, 183-184 Extralinguistic sounds, 122 F Fallside, Frank, 1-3, 445-446 Fast Fourier Transform (FFT), 28, 84, 475 FAX machines, 5 Feature extraction, 177-178 delta, 182-183 vectors, 182-183 Federal Aviation Administration, 365-366, 509 Federal Bureau of Investigation, 367 Fiber optics, 6 Filter bank outputs, 28, 475 Filters/filtering adaptive, 332, 414, 456-457 basilar membrane, 97 digital, 19 high-pass, 332 language understanding component for, 22 linear time-varying, 477-478 N-best, 227, 267 transverse, 415 Flex-Word, 292 Fluid dynamics, principles in speech production, 87-90 Force feedback glove, 98, 101 Foreign language. See also Multilingual systems; Spoken language translation learning, 44 word incorporation in text-to-speech systems, 138 Formants, 122-123, 125 Fractals, 21, 26 Frequency-domain representation, 24, 476 G Gestural inputs, 65 Government. See Military and government applications Grammars ambiguity, 380 bigram, 179 combinatory categorical, 490 context-free, 264, 461, 490, 491-494 covering, 493 dialogue, 62, 63 dynamic grammar networks, 265-266 features-value structures in, 264 finite-state, 266, 379-380 formalisms, 490 hand-coded linguistic, 483 lexicalized, 490 lexicalized tree-adjoining, 490 Markov, 179-180 modeling, 28, 63 natural language understanding and, 37-38, 264, 380, 491-494 perplexity, 180, 185, 229, 378 probabilistic context-free, 491-494 size, 37-38 speech analysis and, 28, 36-38 speech recognition, 36-37, 41-42, 63, 81, 85-86, 179-180, 185-186, 265-66 statistical n-gram, 183, 224 training speech, 179-180, 185-186 trigram, 141, 179-180, 183 unification, 461 Graphical user-interface. See also User interfaces

OCR for page 525
Page 532 Graphical user-interface (cont'd) growth of, 108 guidelines, 66-67 hierarchical menu structure, 54 speech compared with, 54-55 strengths, 52-53 weaknesses, 53, 57-58 H Handwriting recognition, 402-403 screen-based channel, 64 Hardware technology. See also Digital signal processors; Microcomputers; Personal computers; Workstations advances in, 391 CISC architecture, 392, 392 Hobbit chip, 511-512 Intel x86 series, 392 microprocessors, 383, 391, 392-393, 396 Motorola 68000 series, 392 RISC chips, 383, 392-393 speech-processing equipment and systems, 383, 396-405, 510-511 V810 multimedia processing chip, 511-512 Health Interview Survey on Assistive Devices, 312 Hidden Markov models (HMM) bigram, 201, 211, 213, 214 defined, 171-173 estimation of statistical parameters of, 199, 202-208 feature extraction, 177-178 fenonic case, 207 grammar-state-transition table, 266 limitations of, 189-190 Markov chains, 170-171, 172 and mel-frequency cepstral coefficients, 178 neural nets combined with, 193-194 part-of-speech tagging, 487-488, 490 phonetic, 166, 173-175, 178-179, 182, 188 and semantics, 221 speaker recognition systems, 30, 85 speech recognition, 28, 30, 85, 170-175, 177-178, 199, 200-208, 377, 394, 396, 397, 478-479 speech variability and, 28, 415-416 and talker verification, 86 three-state, 172 theory development, 175 training and analysis, 30, 178-179, 181-182, 478-479 trellis representation, 203, 208, 212 trigram, 201-202, 212, 213-214 unigram, 210 Viterbi algorithm and, 210 word models, 179 wordspotting, 397 Human-human communication conversational dynamics, 431-432 language imitation, 60 repair rates, 260 studies, 50-51 I IBM, 9, 175, 349, 380, 495 Image compression, 99 Image processing, 78, 101 Information processing in auditory systems, 91, 94 speech technologies, 453 Information retrieval, 54-55, 57 INFOVOX, 130 Institute for Defense Analyses, 175, 234-235 Institute for Perception Research, 127 INTELLECT, 57 Integrated Services Digital Network (ISDN), 84

OCR for page 525
Page 533 Interaction. See also User interfaces acoustic, 122 and dialogue, 61-63 failures, cost of, 426-427 large-vocabulary conversational, 101-102 natural language, 51-57, 58 speech recognition, 36 spoken language systems, 51-57, 60, 61-63 system requirements for voice communications applications, 383 Intonation contours, 45 cues, 432 parts-of-speech distinctions and, 151 structures, 129 models, 127 J Joysticks, 52 K Karlsruhe University, 83 Klatt, Dennis, 111, 123 Kratzenstein's acoustic resonators, 78, 80 Kurzweil Applied Intelligence, 380 L Language acquisition, theory of, 2 generation, 38, 241 imitation, 60 processing, 239; see also Natural language processing variability, 380 Language modeling. See also Natural language bigram, 201, 209, 211, 213, 214, 222, 461 computational, 78, 81, 86, 90-91 etymology estimates for proper names, 92 future of, 307 research needs, 26, 29 speech recognition, 29, 81-82, 90-91, 168-169, 183, 263, 307 speech synthesis, 128 statistical, 263-264, 461, 472-473 trigram, 92, 183, 209-210, 212, 213-214, 461 by users, 60 Laryngalization, 122 Law enforcement, 367 Lexicons, 138, 140, 141-142, 178-179, 188, 296, 499 LIMSI, 176 Linear predictive coding analysis by synthesis, 24, 26-27, 119 mapping code book, 128 code-excited (CELP), 24, 26, 83, 101 mixed-excitation (MELP), 24 multipulse excited (MPLPC), 24, 26 pitch-excited, 24 robustness of, 97 self-excited (SEV), 24, 26 AND SPEECH ANALYSIS, 575 Linguistic analysis, 59-60, 259, 263, 382, 461,484 Linguistic Data Consortium, 181, 241, 252 Linguistics. See also Parsing; Semantics; Syntax after-thoughts, 256 consonent cluster, 138 discourse-level effects, 149-151 English lexical stress system, 141-142 letter-to-sound relationships, 138, 140-141 metonymy, 256-257, 257-258 morphonemics, 141-142 orthographic conventions, 142-143

OCR for page 525
Page 534 Linguistics (cont'd) parts-of-speech assignment, 143, 151 prosodic marking, 145-149 spontaneous speech, 255-258 vocalic suffixes, 139 word-level analysis, 138-139 M Machine translation, 240 Markov chains, 170-171, 172 Masking, time and frequency, 84, 93-94, 177-178 Massachusetts Institute of Technology articulatory models, 124 ATIS, 46, 227, 261 HHMs for speech recognition, 176 MITalk, 123 multilingual synthesis, 130 speech synthesis, 111, 123, 124 TINA language understanding system, 222, 223, 259 Matsushita, 130 MCI, 300 Mel-frequency cepstral coefficients (MFCC), 178, 182-183 Message processing, 241, 251 Microcomputers. See also Personal computers computation speed, 19-20, 97 device density, 20 digital signal processing, 19 projected advances in, 102-103 speech processing and, 19-20, 81,396-399 Microelectronics chip densities, 102 digital computation and, 19-21 research, 21 revolution, 108 speech signal processing, 19-20 Microphones applications, 86-87, 102 autodirective arrays, 86-89, 96, 97, 99-100, 102 beamforming systems, 87, 88, 99 characteristics, 414 digital signal processors, 97 directional, 333, 414-415 electret, 87, 88, 97, 102 environmental variation in speech input, 412-413, 460 in hearing aids, 331-332, 333 noise reduction, 331-332, 414-415 reflection and reverberation, 414 speaker distance from, 414 and speech recognition, 379, 414 technology projections, 102 three-dimensional, 96, 97, 99-100 track-while-scan mode, 87, 89 Microsoft Windows, 52 Military and government applications. See also Advanced Research Projects Agency; other government agencies Agent's Computer, 367 Air Force, 359, 365 air traffic control, 365-366 aircraft carrier flight deck control and information management, 363 Army, 359, 360-363 combat team tactical training, 364-365, 366 Command and Control on the Move (C2OTM), 360-361 law enforcement, 367 Multi-Role Fighter, 365 Navy, 363-365 Pilot's Associate system, 365 Soldier's Computer, 360, 361-362, 367 SONAR supervisor command and control, 363-364 technology transfer issues, 367-369 voice control of systems, 360, 362, 365

OCR for page 525
Page 535 Mixed-mode communication. See Multimodal systems Models/modeling. See also Hidden Markov models; Language modeling acoustic, 26, 36, 64, 85, 95, 117, 122, 182-183, 476 allophone, 182 articulation, 88, 95, 117, 118, 120, 122, 124-125, 152-153 auditory, 24, 26, 91, 92, 94, 97 bigram, 201, 209, 211, 213, 214 computational, 78, 81, 86, 90-91 consonents, 123 context-dependent, 182, 246 cross-word effects, 182 dialogue, 62-63 grammar, 28, 63, 380 intonation, 127 Klatt, 123 left-to-right, 175 natural language understanding, 238-253, 262-264 noise excitation, 122 phonetic, 173-174, 190-191, 193 prosody, 117 segmental, 125, 173-174, 190-191,193 signal, 19, 101 sinusoidal, 24 sound source, 462 source/system, 22, 118, 120-122 speech perception, 26 speech production, 22 speech recognition requirements, 168-169 speech synthesis, 109, 116-130 speech variability, 176 spoken language systems, 48 stochastic segment, 190-191 trigram, 201-202 vocal tract, 95, 118, 122, 124, 125 wave propagation, 26 word, 179, 207 Modulation theory, 26 Morphemes, 137, 139, 140 Morphology, speech synthesis, 110, 111, 112, 113, 137, 141-142, 489 Morphs, 138-139, 140 Motorola, 383, 392 Mouse, 52, 350-351, 402-403 Multilingual systems. See also Foreign language; Spoken language translation; Telephony future of, 513-514 INTERTALKER, 513-514 Japanese kana-kanji preprocessor, 403 MITalk, 130 PIVOT, 512-513 speech synthesis, 42, 101, 117, 129-130, 151-152 Multimodal systems. See also User interfaces advantages of, 426 error avoidance, 64 error correction, 64 HuMaNet, 454 referent determination difficulties, 61 robustness, 64 situational and user variation, 64-65 synergistic integration of sensory modalities, 100-101, 102 user interfaces, 32, 56, 63-65 Multiprocessing, 21 Music coding, 84 N N-Best interface, 217, 221, 226, 233 N-Best Paradigm, 191, 193 National Institute of Standards and Technology, 377 Natural language. See also Speech recognition; Spoken language anaphora, 55 dialogue, 17, 56, 61-63 interaction, 51-57, 58

OCR for page 525
Page 538 Phoneme conversion to acoustic events, 429 intelligibility, 411 recognition systems, 182 sequences, 138, 175, 461-462 Phonetics acoustic, 85, 95 hidden Markov models, 166, 173-175, 178-179, 182, 188 segmental models, 125, 173-174, 190-191,193 and speech recognition, 167, 169-170, 188 in training speech, 30, 178-179, 182-183 text-to-speech synthesis, 85, 125, 174 typewriter, 511 Pierce, John, 283 Pitch-synchronous overlap-add approach (PSOLA), 114, 119-120, 128-129 Pitch-synchronous analysis, 127 Pragmatic structure, 144, 246, 150, 246, 250 Pronunciation abbreviations and symbols, 142-143 computational, 139 numbers and currency, 143, 288 part of speech and, 144 speech recognition, 44 surnames/proper names, 140-141, 288 symbols, 142-143 Proper names, 92, 288, 458, 484 Prosodic phenomena algorithm, 146, 147, 151 articulation as a basis for, 152-153 and conversational dynamics, 431 defined, 144, 145-146 discourse-level effects, 149-151 marking, 144, 145-149 modeling, 117 multiword compounds, 147 in natural language processing, 268-269 parsing and, 56, 144, 146-147 pauses, 431 phrasing, 146-147, 151 PSOLA technique for modifying, 128-129 in speech synthesis, 88, 117, 119, 124-125, 128-129, 145-149, 288-289 and speech quality, 88, 118, 288-289 Psychoacoustic behavior, 78, 91, 94 Pulse Code Modulation, adaptive differential (ADPCM), 24, 82-83, 101 Q Quasi-frequency analysis, 177 Query language, artificial, 57 R Rabiner, Lawrence, 111, 113 Recursive transition networks (RTNs), 222 Repeaters, electromechanical, 81 Resonators, 78, 80 Research methodology, spoken language vs. types language, 47-48 Robust processing techniques, 259-260, 263 Robustness algorithms, 391, 392, 405, 412-416 ATIS system, 262 case frames and, 258 classification of factors in, 412-413 dialogue systems, 66 environmental variation in speech input and, 412-414 lexical stress system, 142

OCR for page 525
Page 539 Robustness (cont'd) linear predictive coding, 97 multimodal systems, 64 natural language systems, 56, 59, 262 noise considerations, 413 research, 417-418 speaker variation and, 415-416 speech analysis, 97 speech recognition systems, 29-30, 44, 184, 261-262, 459-460 speech synthesis, 139 speech variation and, 413 spoken-language understanding systems, 66, 258-259 templates and, 258-259 user interfaces, 56 word error rates, 182-183, 184, 185-186 Royal Institute of Technology (KTH), 122, 123, 124, 125, 129 Rutgers University, CAIP Center, 98, 99 S Sampled-data theory, 78, 81 Security applications seaker verification, 9, 30, 86, 300, 305 low bit-rate coding for transmission, 7 Semantics ambiguity, 380 compositional, 486 First-Order Predicate Logic, 245-246 lexical, 486, 495-500 natural language, 245-246, 247, 250 pragmatics and, 144 propositional logic, 245 and speech recognition, 305-306 and spoken-language understanding, 220-221 Sensimetrics Corporation, 123 Siemens A. G., 9, 42, 83 Signal modeling techniques, 19, 101 Signal processing digital, 19, 97 enhancement, 102 research, 21 Sinusoidal models, 24 Software technologies, 391 Sound generation, 118, 119, 124 source model, 462 Sound Pattern of English, 126 Sound/speech spectrograph, 319, 325, 349 Source-filter decomposition, 128 Speak 'N Spell, 110 Speaker adaptation, 459, 460 atypical, 187-188 dependence, 36 recognition/identification, 9, 30, 85, 348 style shifting, 460, 461 variation, 415-416 verification, 9, 30, 86, 300, 305 Speaking characteristics and styles, 128-129, 378-379 Spectrum analysis, 19 Speech behaviors, conversational, 430-432 casual informal conversational, 82 compression, 23, 83, 474 connected, 97 continuous, 36, 78, 95, 323, 427-428, 430-431 constraints on, 77, 268-269 databases, 405, 407-409, 468 dialect, 409 digitized, 38, 45, 189, 428 dysarthric speech, 337 gender differences, 129 information processing technologies, 453 interactive, 36 intonation, 45, 127, 129, 432 knowledge about, 117 machine-generated, 335

OCR for page 525
Page 540 Speech (cont'd) noninteractive, 48 pause insertion strategies, 129 perception models, 26 preprocessor, 403 production, 21-22, 26, 77, 87-90, 137-138 prolongation of sounds, 322 psychological and physiological research, 462 self-correction, 256, 432 signal processing systems, 19 slips of the tongue, 257 spontaneous, 58-59, 185, 255-260, 303, 460, 461, 469-471 standard model of, 267 synthetic, 428-428; see also Speech synthesis; Speech synthesizers toll quality, 23, 24 training, 322, 325 type, 36 ungrammatical, 257 units of, 168-170, 462-463 variability, 28, 176, 378, 413, 459-460, 480 waveforms, 24, 136, 137 Speech analysis acoustic modeling, 26 analysis-by-synthesis method, 26-27 auditory modeling, 26 defined, 22 dimensions, 36-38 importance, 21 interactivity, 36 language modeling, 26 linear predictive coding, 24 robustness, 97 speech continuity, 36 speech type, 36 vocabulary and grammar, 28, 36-38 vocal tract representation in, 90, 91 Speech coding, 26 applications, 82-83 articulatory-model-based, 125 audio perception factors in, 84, 85 in cochlear implants, 331 concatenation using speech waveforms, 117 bit rates and, 23, 24, 81, 83-84 digital, 25, 82-83, 85 and masking, 84, 93 predictive, 117 psychoacoustic factors in, 101 research challenges in, 76 rule-based diphone system, 118 stereo coding, 84-85 technology status, 82-85, 281 terminal analog, 118 wideband audio signals, 84 Speech processing algorithms, 21, 393 articulatory and perceptual constraints in, 461-463 digital, 22-23, 76 equipment and systems, 19-20, 81, 396-399 evaluation methods, 463-464 in hearing aids, 317 and natural language processing, 460-461 obstacles to, 373 research challenges, 76-77 psychoacoustic behavior and, 94 for sightless people, 333-335 and speech technology development, 76, 78 Speech recognition accuracy, 28, 37, 41, 46-47, 86, 159, 181-189, 377, 378, 470, 473 acoustic modeling, 64, 182-183 adverse conditions, 459-460 algorithms, 28, 409-411, 412, 417-418, 469 alternative models, 189-193 analysis-by-synthesis, 30 applications, 28-29, 30-32, 81, 275-282, 283-284, 318, 377-379, 451, 457, 458, 471, 508-510

OCR for page 525
Page 541 Speech recognition (cont'd) articulation and, 152-153 assessment techniques, 410-411, 463-464 ''barge in" (interruption of conversation) and, 277, 287, 292, 295, 298-299, 388, 404 common speech corpora, 181-182 complexity, 17 connected digit corpus, 184-185 continuous speech, 78, 165-194, 323, 471,506 defined, 7, 239, 348 decision criteria, 305 decoding, 209-214 dialogue grammar approach models, 63 dimensions of task difficulty, 376, 377-379 domain independent (DI), 187 dynamic grammar networks, 265-266 dynamic programming matching, 509 environmental factors, 413-414 error correction, 64, 261-262, 388 feature extraction, 177-178, 180 Flexible Vocabulary Recognition, 295 future, 307-309, 456-459 generalization, 479 Hidden Markov models and, 28, 30, 85, 170-175, 177-178, 199, 200-208, 377, 397, 478 historical overview, 175-176 improvements in performance, 181-184, 388 interactivity, 36 language modeling, 29, 81-82, 90-91,168-169, 183, 263 large-vocabulary systems, 183, 193, 277, 292, 506 lip reading, 64 linguistic rules, 82 market for technology, 350-351, 416-417 microphones and, 305, 414 most likely path, 208-209 most likely word sequence, 209-214 N-best filtering or rescoring, 267 natural language and, 17, 262-267, 388 naturalness, 45, 153 neural networks, 191-193 new words, 188-189 noise immunity and channel equalization, 288, 305, 379, 388, 414-415, 469, 473 normalization of speakers in, 30, 456-457, 459, 460 pattern matching, 474, 478-479 perplexity of language model and, 37, 180, 185, 229, 378, 463 phonetics and, 167, 169-170, 188, 410 processes, 167-168, 180-181, 199, 451,453-454, 473-474 pronunciation and, 44 prototype systems, 34 real-time, 189 rejection of irrelevant input, 287, 388 and repetitive stress injuries, 43 research challenges, 29-30, 44, 76, 108, 183-184, 304-306, 417-418 robustness, 29-30, 44, 184, 261-262, 459-460, 473, 474 sample performance figures, 184-185 search algorithms, 180-181, 248, 264-265 segmental models, 190-191, 473-474 sheep and goats phenomenon, 456 speaker-adaptive, 36, 187-188, 288, 388, 479 speaking characteristics and styles and, 128, 377, 378-379, 415-416, 460

OCR for page 525
Page 542 Speech recognition (cont'd) speaker-dependent, 28, 36, 54, 186-187, 292, 509-510 speaker expertise and, 378 speaker-independent, 28, 36, 37, 46, 184, 186-187, 188, 362-363, 378, 397, 425, 433-434, 506, 507 spontaneous speech and, 58-59, 185, 460, 461, 469, 471 SR-1000 system, 507 SR-3200 system, 507 subword units, 287-288, 299, 388 successful systems, 239 system structure, 27-28, 398,401,402 talker verification, 86 task completion rate, 410 technology status, 8-9, 18, 81, 85-86, 112-113, 159-164, 165-166, 181-189, 286-288, 428,468 templates, 258-259, 425 terminal-type, 508-510 training data, 178-180, 185-186, 457, 459, 473, 478-479 transputer-based, 397 trials, 417 units of speech and, 168-170 user tolerance of errors and, 379 vocabulary and grammar and, 36-37, 41-42, 81, 85-86, 185-186, 265-266, 277, 378, 457 Wizard of Oz assessment technique, 410-411, 439 word lattice parsing, 265 wordspotting, 286-287, 292, 295, 298-299, 305, 387, 388, 397, 404 Speech research computational models of language, 90-91 critical directions in, 87-101 historical background, 78-82 language modeling, 26 physics of speech generation, 87-90 unification of coding, synthesis, and recognition, 94-95, 97 Speech synthesis. See also Text-to-speech synthesis acoustic models, 85, 95, 117, 122, 476 analysis-synthesis systems, 117, 118, 119, 125 applications, 30-32, 108, 109, 110, 278, 381-382 articulatory models, 88, 117, 118, 120, 124-125, 152-153, 476, 480 assessment of, 411-412 automatic learning, 127 concatenative, 110, 114, 117, 118-119, 126, 168, 406 concept-to-speech systems, 38-39 content, 45 control, 124, 118, 125-127 corpus-based optimization, 113 defined, 22, 109, 110, 116, 348 digitized speech, 22-23, 25, 38 dimensions of task difficulty, 381-382 discourse-level effects, 149-151 error rates, 112 evaluation of, 130 expectations of listeners, 382 flexibility needs, 117-118 fluid dynamics in, 89-90 formant-based terminal analog, 117, 118, 122-123, 125 forms, 38-39 frequency domain approach, 119 future of, 152-153, 455-456 higher-level parameters, 123-124 history of development, 111-115 individual voices, speaking styles, and accents and, 117-118 input, 109 intelligibility, 44-45, 129, 130, 149, 382, 429 large-vocabulary systems, 101-102, 351

OCR for page 525
Page 543 Speech synthesis (cont'd) letter-to-sound rules, 140-141 linguistic aspects of, 135-153 market for, 351 microelectronics revolution and, 108 models, 109, 116-130 morphophonemics and lexical stress, 110, 111, 112, 113, 137, 141-142 multilingual, 42, 101, 117, 129-130, 151-152 natural speech coding and, 117, 128 naturalness, 129, 149, 381, 429, 456 noise sources, 122 and objective distortion metrics, 114-115 obstacles to, 117 orthographic conventions, 142-143 output, 118 parsing, 137, 139, 144-145 part-of-speech assignment, 143 phonetic HMM functions and, 174, 429 predictive coding, 117 process, 167-168, 135, 428-429, 453, 454, 479 prosody, 88, 117, 119, 124-125, 128-129, 145-149, 288-289 PSOLA (pitch-synchronous overlap-add approach), 114, 119-120, 128-129 quantity of text and, 381 real-time, 108 research, 25-26, 29-30, 44-45, 76, 108, 113-114, 128 rule-based, 111, 118, 125, 126-127, 140-145, 429 segmental, 113-114, 115, 125, 145, 479-480 sentence length and grammatical complexity, 382 sound generation, 118 source/system models, 22, 118, 120-121 speech quality, 130 structures and processes, 109-110 systematic optimization methods, 114 techniques, 118 text analysis, 110, 112, 113 technology status, 18, 29, 81, 85-86, 107-115, 411-412, 468 testing, 114-115 time functions, 111, 113, 118, 119, 476-478 variability of text and, 381-382 vocabulary, 119 vocal tract model, 95, 118, 122, 125 waveform concatenation (simple), 118-119, 383, 476 word-level analysis, 138-139 Speech synthesizers acoustic terminal analog, 117 cartridge-type, 510 cascade, 122-123 future, 455-456 large-vocabulary, 349 neural network controller, 124 OVE, 123 parallel, 123, 125 terminal analog, 510 voice quality, 456 Speech technology, See Deployment of applications capabilities and limitations, 427-430 challenges in, 284, 471-475 commercial developments, 352-354 foundations, 77-78 growth of, 2 information processing, 453 market, 350-352, 416-418 projections, 101-102, 355-356 readiness evaluation, 440 research on, 65-67, 417-418 service trials, 417 status, 82-87 trends, 117

OCR for page 525
Page 544 Speech technology (cont'd) voice input, 427-428 voice output, 428-429 Speech Technology Laboratory, 123 Speech transmission, low-bit-rate, 23, 24, 29, 77, 81, 83-84, 97, 474 Speech understanding, 17, 34, 37-38, 307, 379 Spoken language systems (SLS) ARPA, 218-220 comparison of modalities, 46-58 constraints on, 227-230 defined, 38, 241 dialogue, 47, 60, 61-63, 66, 229 discourse in, 227-230 efficiency of language-based modalities, 48-51 error metrics, 224-225, 259 error recovery, 439 evaluation of, 230-233, 251 human factors obstacles to, 58-63 interaction, 51-57, 60, 61-63 interfacing speech and language, 221-224 linguistic analysis, 59-60, 259 mixed initiative, 228-229 N-best interface, 217, 221, 233 natural language, 51-57, 59-61 order in problem solving, 229 prototypes, 46-47, 438 reference, 227-228 robustness, 66, 259 simulation methods, 66 speaker-independent, 65 spontaneous speech and, 58-59, 234, 255-260, 427-428 SUNDIAL, 228-229 research methodology, 47-48 technology development, 81 training, 60, 260 typed language contrasted with, 47-51, 60 user adaptation to, 60 Spoken language translation current capabilities, 9-10, 42 defined, 9-10 directory assistance, 295-296 laboratory systems, 9-10 projections, 102 VEST (Voice English-Spanish Translator), 10, 42 voice output, 29 Spoken language understanding, 47 approaches to, 220-221 defined, 255 error repair, 260 limits on, 379 process, 452, 453 progress in, 224-226 spontaneous speech and, 258-260 Sprint, 300 SQL, 57 SRI International, 52, 176, 213 ATIS, 46, 261 Gemini system, 259, 260 Template Matcher, 258, 259 Stenograph, 322, 335 Stereo coding, 84-85 StockTalk, 383-386, 437, 438, 439 Stored voice, 110 Subband coders, 24, 83, 101 SUNDIAL spoken language systems, 229 Surnames, pronunciation of, 140-141, 288 Symbols, pronunciation of, 142-143 Symbolic learning techniques, 501 Syntax, 137. See also Parsing natural language processing system, 244-245, 247, 269 speech recognition systems, 305-306 and spoken language understanding, 220-221 Syntactico-semantic theory, 447 System technologies. See Hardware technology; Workstations T Tactile technology, 101, 324-328 Talker. See Speaker Talking statues, 78, 79

OCR for page 525
Page 545 Technology transfer issues, 367-369 Telecommunications. See also Telephony banking services, 292, 507 Baudot code, 323 conferencing, 101 cost-reduction applications, 290-291 digital speech coding, 82-83 information access from remote databases, 42, 44, 278, 296-299, 348, 349 interfaces, 397 market for speech technology, 290-304 personal communication networks and services, 306 predictions, 307-308 revenue opportunities in, 291-293 shaping user language, 60-61 speaker verification, 305 speech technology and, 7, 41-42, 285-286 technical challenges, 304-306 Telefbnica, 9, 10, 298 Telegraph, 80-81 Telephony. See also Telecommunications Automated Alternate Billing Services, 292, 293, 431 Automated Customer Name and Address, 302 automatic interpreting, 513-514 bandwidth conservation, 19 banking by phone, 283, 291, 398-399, 407-408, 425 cellular, 6, 7, 81, 83, 374, 383-385, 507-508 deaf user aids, 43, 302-304 digital channels, 101 directory assistance, 41, 278, 282, 283, 291, 292, 295-296, 301-302, 355-356, 438, 458 history, 81 language translation, 10, 42, 77, 81, 82, 83, 108-109, 513-514 operator services, 8-9, 277, 282, 284, 291, 292, 293-296, 351, 353-354, 374, 380, 383-385, 387 simulated telephone lines, 278, 408-409 speech databases, 407 speech recognition technology, 428 teleconferencing, 454-455 telephone relay service, 302-304, 322 text telephone, 322, 323 voice-controlled automated attendant, 356 voiced-based dialers, 40, 292, 299-300, 355, 374, 376, 383-386, 436, 507-508 voice-interactive phone service, 292, 300-301,351 Voice Recognition Call Processing (VRCP), 292, 293-295, 376, 383-385 TELECOM, 510, 513 Telephone answering machines, digital, 7-8 Texas Instruments (TI), 110, 176, 184-185, 291, 300, 349, 377, 407 Text analysis, 110, 112, 113 Text-to-speech synthesis. See also Speech analysis acoustic phonetics and, 85 address, date, and number processing, 288 advances in, 288-289 algorithms, 25 applications, 43, 109, 280, 282, 302, 354, 451 articulatory synthesis in, 124-125 cartridge-type device, 510 components of, 38 constraints on speech production, 137-137 development tools, 126-127 discourse analysis in, 145

OCR for page 525
Page 546 Text-to-speech synthesis (cont'd) error rate, 262 formant-based terminal analog, 122-123 future of, 152-153, 308 hardware requirement, 383 language modeling and, 26, 78, 90-91 linguistic analysis in, 382 multilingual, 42, 129, 397-398 naturalness, 381 output, 29 parsing, 144-145 part-of-speech assignment, 143 phonemic-based, 348 phonetic factors, 125 problems, 120, 303-304, 471 proper name pronunciation, 288 prosody, 288-289, 306 research challenges, 26, 304, 306, 324 rule system, 125 sound generation, 124 source models and, 120 speaker identity and normalization, 30 speaking characteristics and styles and, 128-129 structural framework, 136-137, 398 waveform approach, 24-25 word-level analysis, 138-139 Text preprocessors, 381-382 Time Assignment Speech Interpolation, 81 Tools. See Computer-aided tools Touch screens, 50 Touch-Tone keypad, 335 Trackballs, 52 Training natural language interactive systems, 56, 57, 58 neural nets, 193 shaping user language, 60-61 speech, 322 tactical, combat team, 364-365 Training speech [learning] automatic, 263-264 databases for, 387, 405, 407, 468, 472 discriminative, 479 effects of, 185-186, 473 grammar, 179-180, 185-186 natural language processing, 56, 57, 58, 249, 250, 252, 263-264 phonetic HHMs and lexicon, 30, 178-179, 182-183 speech recognition, 178-180, 185-186, 457, 459, 473, 478-479 syntactico-semantic theory and, 447 Transatlantic radio telephone, 81 Transatlantic telegraph cables, 81 Transform coders, 24 Treebank Project, 241, 491, 495 Trigrams, 92, 183, 201-202, 209-210, 212, 213-214, 229 Triphones, 182 Turing's test, 35 Tuttle, Jerry 0., 363 U United Kingdom, Defense Research Agency, 365 University of Indiana, 130 University of Pennsylvania, 181, 241,252, 491,495 US West, 300-301 Usability/usefulness. See also Applications of voice communications determinants of, 31-32 issues, 18, 30-32 pronunciation and, 44 voice input, 39-44 voice output, 44-45 User interfaces. See also Graphical user-interface artificial query language, 57 capabilities and limitations, 51-52, 387, 427-430, 434 cost of interaction failures, 426-427 databases, 240, 252

OCR for page 525
Page 547 User interfaces (cont'd) design strategies, 387, 423-424, 426, 433-440 dialogue flow, 435-436 direct manipulation, 51, 52-55, 57-58 error recovery, 438-440 evaluation of, 440 feedback and confirmation, 434, 437-438, 445 heirarchical, 454 information requirements of, 425-426 instructions, 438 keyboard dialogs, 49-50 metaphor, 54 multimodal systems, 32, 56, 63-65, 505, 508-510 N-best, 217, 221, 226, 233 natural language interaction, 55-57 personal computer, 511-512 prompts, 435-436, 471 research directions, 56, 511-512 revisions suggested, 435 robustness, 56 smart, 512-513 system capabilities, 429-430 task modalities, 426 task requirement considerations, 424-427 telecommunications, 397 training issues, 58 user expectations and expertise and, 430-432 voice-actuated, 360 voice input, 427-428 Users conversational speech behaviors, 430-432 expectations and expertise, 430-432 language modeling by, 60 novices vs. experts, 432 satisfaction, 429-430 tolerance of speech recognition errors, 379 USS Ranger, 363 V Vector quantiization, 28 Verbal repair, 269 Videophones, 5-6 Virtual reality technology, 454-455 Visual sensory aids, 319-324 Vocabulary algorithms, 307 confusability, 378 conversational, 101-102 Flexible Vocabulary Recognition, 295 large, 101-102, 183, 193, 277, 292, 307, 349, 351, 506 and natural language understanding, 37-38 operator services, 277 speech analysis and, 28, 36-38 speech recognition and, 36-37, 41-42, 81, 85-86, 183, 185-186, 193, 265-266, 277, 292, 378, 457, 506 speech synthesis, 101-102, 119, 349, 351 user-specific dictionaries, 335-336 wordspotting techniques, 292, 305 Vocal tract modeling, 95, 118, 122, 124, 125 Vocoder, 48, 81, 83, 119, 325 Voice control, assistive, 278-279, 313, 337, 360, 452 conversion system, 128-129 dialog applications, 375-377 fundamental frequency, tactile display, 326-327 input, 39-44, 50, 427-428 mail, 7, 81, 83, 101, 110 messaging systems, 281 mimic, 94-95 output, 44-45, 428-429 response, 25 task-specific control, 452 typewriters, 97, 376, 380, 451

OCR for page 525
Page 548 Voice coding algorithm standardization, 7 current capabilities, 7-8 defined, 7-8 research challenges, 306 security applications, 7 source models, 120-122 storage applications, 7-8 Voice communication, human-machine advantages, 16, 48-51 art of, 387-388 current capabilities, 469 degree-of-difficulty considerations, 375-386 expectations for, 505-506 implementation issues, 18 natural language interaction, man-farm animal analogy, 16 process, 374 research and development issues, 511-513 research methodology, 47-48 role of, 34-67; see also Applications scientific bases, 15-33 scientific research on, 65-67 simulations, 47-48, 50, 51 successful, 423 system elements, 17-18 and task efficiency, 48-49 transcript, 433-434 voice control, 337 VSLI technology and, 510-511 Voice processing network-based, 292 market share, 281 research, 6 technology elements, 6-7 technology status, 467-468 telecommunications industry vision, 285-286 Voice synthesis current capabilities, 8 defined, 8 output, 23, 17-18, 29 text-to-speech, 99 von Kemplen's talking machine, 78, 80 Vowel clusters, 140 digraphs, 140 reduction, 129 VSLI technology, 468, 510-511 W Wave propagation, 26 Waveform coding techniques. See also Speech coding adaptive differential PCM (ADPCM), 24 speech synthesis, 118, 119, 136, 137, 381,474 Wavelets, 21 Wideband audio signals, 84 Windows, 52, 350, 353 Wizard of Oz (WOZ) assessment technique, 410-411, 439 Word-level analysis, 138-139 Word models, 179, 207 Word processors, speech only, 50 Word recognition systems, 182, 188 Workstations Hewlett-Packard 735 RISC chips in, 393 Silicon Graphics Indigo R3000, 189 speech input/output operating systems, 401-403 speech processing board, 397 Sun SparcStation 2, 189 Wheatstone, Charles, 80 X Xerox, 52 Z Zipf's law, 489