Skip to main content

Currently Skimming:

11. Markov Models for Speech Recognition
Pages 217-234

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 217...
... Sustained research efforts have resulted in systems that place substantially fewer restraints on the speaker. Earlier recognition systems typically required that the words spoken belong to a fixed small vocabulary, that the speaker pause between each word, and that the speaker participate in a training period during which the system would automatically adjust to that particular speaker (and no other)
From page 218...
... Even finding the boundaries between two words can be a difficult task, since in the speech signal, phonemes as well as words may merge together and no simple signal processing can separate them. A spoken phrase like "this is," can easily be interpreted as "the sis" (or even by some graduate students as "thesis"~.
From page 219...
... The speech signal associated with a word or a phoneme is extremely variable, and can vary greatly depending on both the speaker's identity and the manner in which the word was spoken. Variability is caused by anatomical differences between speakers, such as sex or vocal tract length, as well as by differences in style, health (presence or absence of a cold)
From page 220...
... Automated systems that allow large vocabularies and employ a grammar whose perplexity is close to the perplexity of spoken English, can be said fairly to handle natural tasks. It.3 Some Recent Systems All speech recognition systems require restricted input to achieve good accuracy (around one word in twenty wrong)
From page 221...
... For the remainder of this discussion, it is assumed that the speech signal is already described by a series of acoustic labels, each of which belongs to a small, fixed set. Il.5 Probabilistic Recognition The most successful approaches to the speech recognition problem use probabilistic modeling.
From page 222...
... similar to the SPHINX baseline system [14] , which formed the basis for the SPHINX system, a successful large-vocabulary, continuous speech, speaker-indepenclent recognition system.
From page 223...
... , the class of such models is highly restricted and may not be particularly useful. I l.7 Mocleling of Speech The recognition system we describe uses phoneme models as an intermediate stage between acoustic labels and words.
From page 224...
... Combining this with the value for P(W = w3 yields P(W = w, Y = y3. Il.S Hidclen Markov Modeling First, we introduce hidden Markov moctels (HMMs3, and then describe their use in speech recognition.
From page 225...
... Hidden Markov models possess desirable properties. The observed R.V.s (Y)
From page 226...
... is formed by using the values of the probabilit~es ~~_~' and ~tY`~,~_~' as specified by appropriate phoneme models. The manner in which this is done is called "instantiation," and is described below.
From page 227...
... It is a distribution on strings of acoustic labels and hidden states. As indicated previously, any string of hidden states x is associated with a unique word string.
From page 228...
... cannot be done in a manner consistent with the assumptions implicit in the phoneme models. Most current systems estimate parameters using training data consisting of utterances of whole known sentences.
From page 229...
... be > .8 because a string of length 22 would otherwise be quite unlikely. One of the reasons for the success of HMM is the existence of a computationaLy efficient method for approximate maximum likelihood parameter estimation (as opposed to the completely ad hoc estimation above)
From page 230...
... One iteration will be conducted for each HMM sentence model, then the estimated parameters
From page 231...
... The most likely string of hidden states, for small vocabulary and simple grammar systems, can be found by a simple dynamic programming [6] scheme called the Viterbi algorithm [2~.
From page 232...
... As a result, various fascinating and extremely challenging subproblems can bee approached by single researchers on current generation workstations. One such problem is the speaker-independent recognition of phonemes in continuous speech; another is the recognition of connected digits.
From page 233...
... thesis, Computer Science Department, Carnegie Mellon University, Pittsburgh, 1988.
From page 234...
... K Soong, High performance connected digit recognition using hidden Markov models, IEEE Int.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.