National Academies Press: OpenBook

Voice Communication Between Humans and Machines (1994)

Chapter: New Trends in Natural Language Processing: Statistical Natural Language Processing

« Previous: Speech Technology in 2001: New Research Directions
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 482

New Trends in Natural Language Processing: Statistical Natural Language Processing

Mitchell Marcus

SUMMARY

The field of natural language processing (NLP) has seen a dramatic shift in both research direction and methodology in the past several years. In the past, most work in computational linguistics tended to focus on purely symbolic methods. Recently, more and more work is shifting toward hybrid methods that combine new empirical corpus-based methods, including the use of probabilistic and information-theoretic techniques, with traditional symbolic methods. This work is made possible by the recent availability of linguistic databases that add rich linguistic annotation to corpora of natural language text. Already, these methods have led to a dramatic improvement in the performance of a variety of NLP systems with similar improvement likely in the coming years. This paper focuses on these trends, surveying in particular three areas of recent progress: part-of-speech tagging, stochastic parsing, and lexical semantics.

SOME LIMITATIONS OF RULE-BASED NLP

Until about 3 or 4 years ago, all natural language processing (NLP) systems were entirely hand constructed, with grammars and semantic components made up of many carefully handcrafted rules. Often the target coverage of such systems was based on a small set of ex-

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 483

emplar sentences; many such systems were originally developed on fewer than several hundred examples. While these systems were able to provide adequate performance in interactive tasks with typed input, their success was heavily dependent on the almost magical ability of users to quickly adapt to the limitations of the interface.

The situation is quite different, however, when these rule sets are applied open loop to naturally occurring language sources such as newspaper texts, maintenance manuals, or even transcribed naturally occurring speech. It now appears unlikely that hand-coded linguistic grammars capable of accurately parsing unconstrained texts can be developed in the near term. In an informal study conducted during 1990 (Black et al. 1992b), short sentences of 13 words or less taken from the Associated Press (AP) newswire were submitted to a range of the very best parsers in the United States, parsers expressly developed to handle text from natural sources. None of these parsers did very well; the majority failed on more than 60 percent of the test sentences, where the task was to find the one correct parse for each sentence in the test set. Another well-known system was tested by its developer using the same materials in 1992, with a failure rate of 70 percent.

This failure rate actually conflates two different, and almost contradictory, problems of this generation of parsers. The first is that the very large handcrafted grammars used by parsers that aim at broad coverage often generate very, very large numbers of possible parses for a given input sentence. These parsers usually fail to incorporate some source of knowledge that will accurately rank the syntactic and semantic plausibility of parses that are syntactically possible, particularly if the parser is intended to be domain independent. The second problem, somewhat paradoxically, is that these parsers often fail to actually provide the correct analysis of a given sentence; the grammar of a natural language like English appears to be quite vast and quite complex.

Why can't traditional approaches to building large software systems, using techniques like divide and conquer, solve this last problem? The problem is not that the grammar developers are not competent or that there is a lack of effort; a number of superb computational linguists have spent years trying to write grammars with broad enough coverage to parse unconstrained text. One hypothesis is that the development of a large grammar for a natural language leads into a complexity barrier similar to that faced in the development of very large software systems. While the human grammatical system appears to be largely modular, the interaction of the subcomponents is still sufficient to cause the entire system to be unmanageably com-

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 484

plex. The net result is that the grammatical system does not appear to decompose easily into units that a team can develop and then join together. In support of this view is the fact that almost all of the large grammars extant are the result of a single grammar developer working over a long period of time. If this conclusion is correct, an approach to developing NLP systems must be found other than careful handcrafting.

STATISTICAL TECHNIQUES: FIRST APPEARANCE

One of the first demonstrations that stochastic modeling techniques, well known in the speech-processing community, might provide a way to cut through this impasse in NLP was the effective application of a simple letter trigram model to the problem of determining the national origin of proper names for use in text-to-speech systems (Church, 1985). Determining the etymology of names is crucial in this application because the pronunciation of identical letter strings differs greatly from language to language; the string GH, for example, is pronounced as a hard G in Italian, as in Aldrighetti, while most often pronounced as F or simply silent in English, as in laugh or sigh. This system estimates the probability that a name W comes from language L as the product of the probabilities, estimated across a set of known names from L, of all the contiguous three-letter sequences in W. It then assigns W to the language L, which maximizes this probability. The success of this program came as a surprise to most of the NLP community, at the time completely wedded to the symbolic techniques of traditional artificial intelligence (AI). Many people in NLP thought this application was a fluke, that the task solved by the program was somehow special. In fact, this technique led the way toward application of statistical techniques to problems that one would have thought required an ''AI-complete" solution, a full solution to the problem of modeling human understanding.

In another sense this work was an application of an approach to linguistic analysis called distributional analysis (Harris, 1951), which reached its zenith in the 1950s. This work suggested that the structure of language could be discovered by looking at distributional patterns of linguistic entities. While the work of Chomsky in the late 1950s showed that distributional analysis could not be the whole story, most linguists assumed that Chomsky's work implied that distributional techniques should be abandoned entirely. This application showed that simple distributional techniques were useful for solving hard engineering problems that looked resistant to the application of a priori knowledge.

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 485

THE ARCHITECTURE OF AN NLU SYSTEM

Figure 1a gives an overview of a few of the crucial steps in the process of decoding a sentence in a conventional NLU system, given that the words that make up the sentence have been determined either by a speech recognition system or by tokenization of an ASCII source. When a new sentence comes in, it is analyzed by a parser that both determines what part of speech to assign to each of the words

image

FIGURE 1 Two decompositions of the architecture of an NLU system. (a)—Standard decomposition. (b)—An alternate decomposition.

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 486

and combines these part-of-speech tagged words into larger and larger grammatical fragments, using some kind of grammar that tells what combinations are possible and/or likely. The output of this grammatical analysis, either a single-rooted tree or a string of tree fragments, then goes through semantic analysis, which determines the literal meaning of a sentence in isolation. This phase of analysis decides both what the individual words mean and how to combine the individual word meanings into larger semantical structures. Often, this last step is done using some form of compositional semantics, where the meaning of each larger unit is constrained to be a relatively simple function of the meaning of its parts. This meaning representation is then further analyzed by pragmatic and discourse components to determine what the sentence means given its particular context of use and to place this representation into a multisentence representation useful for such tasks as determining the referent of pronouns and other noun phrases.

For the purposes of the rest of this chapter, the problem will be subdivided into somewhat smaller functional units than given by the conventional model. This subdivision, given in Figure lb, reflects the development of statistical NLP techniques over the past several years, with rate of progress roughly proportional to height in the figure. The first success of statistical modeling techniques for NLU was in the area of part-of-speech determination—deciding, for example, whether the word saw functioned in a given linguistic context as a singular noun or a past tense verb. A variety of techniques now tag previously unseen material with 95 to 97 percent accuracy. Recently, purely context-free probabilistic parsing methods have been supplanted by parsing algorithms that utilize probabilities of context-free rules conditionalized on aspects of surrounding linguistic structure. Such parsers provide the correct parse as the first parse output between 60 to 80 percent of the time, by sentence, on naturally occurring texts from rich, but not entirely unconstrained, domains such as the Wall Street Journal. They have performed with up to 91 percent accuracy on spoken language tasks from limited domains like the Advanced Research Projects Agency's (ARPA) Air Travel Information Service (ATIS) domain. In the area of lexical semantics, a range of promising techniques for performing word-sense disambiguation have emerged in just the last year, as well as some preliminary work in automatically determining the selectional restrictions of verbs, that is, what kind of objects can serve as the subject or object of a given verb.

Finally, all of these methods depend crucially on the availability of training materials annotated with the appropriate linguistic structure. These advances were made possible by the development of cor-

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 487

pora appropriately annotated with part-of-speech and syntactic structure. This paper will also touch on the development of such corpora.

PART-OF-SPEECH TAGGING

The task of a part-of-speech tagger is to map an input stream of word tokens into the correct part of speech for each word token in context. To do this, it must disambiguate words that have the potential to be many different parts of speech. A part-of-speech tagger, for example, should map the sentence Can we can the can? into the string of parts of speech shown in Figure 2. This problem of lexical disambiguation is a central problem in building any NLP system; given a realistically large lexicon of English, many common words are used in multiple parts of speech. Determining what function each word plays in context is a crucial part of either assigning correct grammatical structure for purposes of later semantic analysis or of providing a partial heuristic chunking of the input into phrases for purposes of assigning intonation in a text-to-speech synthesizer.

The problem of lexical disambiguation was thought to be completely intractable 10 years ago by most of the NLP community, and yet now  a wide range of very different techniques can solve this problem with 95 to 97.5 percent word accuracy, for part-of-speech tag sets of between 40 and 80 tags, depending on the task and how accuracy is measured (see, e.g., Black et al., 1992a; Brill, 1992; Church, 1988; Cutting et al., 1992; Hindle, 1989; Merialdo, 1991; Weischedel et al., 1993; and others). It is worth noting that many of these errors are not very harmful; a significant fraction of the errors consist of cases where one kind of verb is confused with another kind of verb or one kind of noun with another. Many of the parsers for which these taggers serve as preprocessors are forgiving enough that the errors do not actually throw the parser off track.

Most part-of-speech taggers are implemented as hidden Markov models (HMMs). For an input sentence S = w1, w2, ... wn, these taggers predict a tag ti for each word wi given two sets of probabilities: First, P(w ^ t) (the probability of w given t), the probability for each possible word w and each part-of-speech tag t that if a given word is tagged with t, it is in fact the word w. Second, P(ti+1) (ti), the transition probability that the next tag is ti+1, given that the current

Words in:

Can

we

can

the

can?

Part-of-speech stream out:

modal

pronoun

verb

det

noun

FIGURE 2 Part-of-speech taggers assign tags in context.

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 488

tag is ti. These taggers use a linear time search utilizing the dynamic programming algorithm, often called the Viterbi algorithm, to find the string of tags T = t1, t2 . . . tn, that maximize Fli P(wi, ^ ti) P(tI ^ ti+1).

The question here is how to estimate the value of the parameters of the HMM. The standard approach for HMMs is to use the forward/backward algorithm to automatically estimate the parameters, as described by Jelinek elsewhere in this volume. However, systems that use the forward/backward algorithm do not perform quite as well as those that estimate parameters, at least initially, by simple counting, using a corpus of text that has been pretagged with part-of-speech information (Merialdo, 1991). In practice, such systems must use some technique to smooth1 very small counts. One could then use the forward/backward algorithm  to smooth these direct estimates, but there is little evidence that this helps. Currently, then, the best way to estimate the parameters of an HMM  for part-of-speech tagging is to hand tag a corpus and simply count.

The theme that emerges here is true of most statistical NLP applications and will be a leitmotif in what follows below. What works best both for part-of-speech tagging using HMMs and for the entire range of statistical NLP applications considered in this paper, is some appropriate combination of stochastic techniques and linguistic knowledge. While earlier work provides evidence that handcrafted symbolic representations of linguistic knowledge are insufficient to provide industrial-strength NLP, it also appears that the use of statistical methods without some incorporation of linguistic knowledge is insufficient as well. This linguistic knowledge may either be represented in implicit form, as in the use of a pretagged corpus here, or encoded explicitly in the form of a grammar.2 In the next few years, I believe we are going to see stochastic techniques and linguistic knowledge more and more deeply interleaved.

The Problem of Unknown Words

In conjunction with this observation it is important to realize that if one simply implemented an HMM  for part-of-speech tagging as

1Since sentence probabilities are estimated by multiplying together many estimates of local probabilities, any probability estimate of zero leads to a zero probability for the entire string. Since any direct estimate is based on only finite data, it is important to assume that any event not observed at all has some very small, but nonzero probability. How to best perform this smoothing of probability estimates is a central technical issue in applying any of the methods discussed in this chapter.

2For readers familiar with logic, this is the distinction between knowledge represented extensionally and knowledge represented intensionally.

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 489

discussed above, the performance of the resulting system on new material could well be no better than 70 or 80 percent correct. Without exception, input is preprocessed before parts of speech are assigned by an HMM; this preprocessing is often only partially discussed in technical descriptions of part-of-speech taggers. The preprocessing copes with "unseen words," words that were never seen in the training data and for which the system therefore has no prior knowledge of possible parts of speech. It turns out that about half of the word types in the Brown corpus (Francis, 1964; Francis and Kucera, 1982), a carefully balanced representative corpus of American English, appear exactly once (about 32,000 out of 67,000 word types). This is consistent with Zipf's law, the empirical law that the frequency of a word type is inversely proportional to its rank. Nor can this problem be circumvented by some appropriately huge lexicon; a very large number of proper names appear on any newswire for the first time each day.

How can this unseen word problem be handled? One simple but quite effective technique is to tag each unknown word with the most likely tag given its last three letters—an empirical approximation to simple morphological analysis (Brill, 1992). A useful heuristic for proper nouns in most English text is to use capitalization, often combined with some other heuristics to correct for unknown words used at the beginning of sentences (Weischedel et al., 1993). The key point here is that these techniques for unseen words go beyond using purely stochastic techniques to using implicit and explicit linguistic knowledge, although in a trivial way, to get the job done.

STOCHASTIC PARSING

All work on stochastic parsing begins with the development of the inside/outside algorithm (Baker, 1979), which generalizes the Baum-Welch algorithm for estimating HMMs to the estimation of parameters of stochastic context-free grammars.3 Just as each iteration of the Baum-Welch algorithm over some training corpus improves estimates of the parameters of the underlying HMM, as judged by the criterion of maximal likelihood, so the inside/outside algorithm improves parameter estimates of an underlying probabilistic context-free grammar, judged by this same criterion.

However, straightforward application of the inside/outside algorithm does not appear to produce effective parsers; the best results to

3For a tutorial introduction to probabilistic context-free grammars and the inside/ outside algorithm, see Jelinek et al. (1991) Lari and Young (1990).

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 490

date have resulted in parsers with about 35 percent correct parses on fully reserved test material in simple parsing tasks (Fujisaki et al. 1989; Sharman et al., 1990). Two problems appear to lie behind this failure. First, for realistic probabilistic context-free grammars (PCFGs) the number of parameters that must be estimated is very large; unless some a priori constraint is provided, n3 parameters must be estimated for a grammar with n nonterminal categories, categories that label not words but structures, like noun phrase, verb phrase, or sentence.

But a worse problem is that the objective function that the inside/outside procedure maximizes, namely the probability of the training corpus given the grammar, is in fact not the objective function that one wants to maximize to train effective parsers. For parsing the goal is to maximize assignment of the correct grammatical structure, to recursively subdivide the sentence correctly into its constituent grammatical parts, determined, say, by examining similar sentences in a treebank of hand-parsed sentences. Unfortunately, there is no reason to expect that a PCFG whose parameters are estimated by the inside/ outside algorithm will assign structures that have the desired constituent structure.

In recent years a range of new grammatical formalisms have been proposed that some suggest have the potential to solve a major part of this problem. These formalisms, called lexicalized grammar formalisms, express grammars in which the entire grammar consists of complex structures associated with individual words, plus some very simple general rules for combining these structures. Such grammar formalisms include combinatory categorial grammars (CCGs), lexicalized tree-adjoining grammars (LTAGs), and link grammars (Joshi and Schabes, 1992; Steedman, 1993). In these lexicalized formalisms each word can be thought of as a tree fragment; the full grammatical analysis of a sentence is formed by specifying how and in what order the fragments associated with each word in a sentence combine. Words may themselves be ambiguous between different "parts of speech," here differing tree fragments. In these grammar formalisms the bulk of parsing a sentence is just deciding on which part of speech to assign to each word. Given this property of these grammar formalisms, perhaps some way can be found to extend the inside/outside algorithm appropriately so that its objective function maximizes the probabilities of strings of part-of-speech tagged words. If so, it is just a matter of extending the search space to handle the large number of complex part-of-speech structures of lexicalized grammars.4

4 I thank Aravind Joshi for the above observation.

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 491

Constraining the Inside/Outside Algorithm

Recently, a number of experiments have been performed that combine the inside/outside algorithm with some form of linguistic knowledge. In a recent experiment by Pereira and Schabes (1992), a modified version of the inside/outside algorithm was applied to a corpus that was manually annotated with a skeletal syntactic bracketing by the Penn Treebank Project (Brill et al., 1990; Marcus et al., 1993). In this experiment the I/O algorithm was modified to consider only PCFG rules that did not violate the skeletal bracketing of the corpus, zeroing out many of the n3 parameters a priori. The algorithm was then trained on a corpus of only 770 sentences collected in the Air Travel Information System  (ATIS) domain (Hemphill et al., 1990). The evaluation was based on the "crossing brackets" parser evaluation metric of Black et al. (1991). This crossing-brackets measure counts the number of brackets inserted during parsing that are consistent with the correct bracketing of the sentence.5 Without constraint, the algorithm achieved 35 percent bracketing accuracy on reserved test materials but achieved 90 percent bracketing accuracy when constrained by the annotated corpus.

Conditioning PCFG Rules on Linguistic Context

One new class of models uses linguistic knowledge to condition the probabilities of standard probabilistic context-free grammars. These new models, which in essence augment PCFG grammar rules with probabilistic applicability constraints, are based on the hypothesis that the inability of PCFGs to parse with high accuracy is due to the failure of PCFGs to model crucial aspects of linguistic structure relevant to the appropriate selection of the next grammar rule at each point within a context-free derivation. Probabilities in the standard stochastic context-free model are conditioned only on the type of nonterminal that the grammar is about to expand; the key idea of these new models is that this provides insufficient linguistic context to adequately model the probabilities of rule expansion. One such parser, that of Magerman and Marcus (1991a, 1991b), assumes that expansion of any nonterminal is conditioned on the type of nonterminal, the most likely part-of-speech assignments for the next several words

5Notice that this is a much easier measure than the percentage of sentences parsed correctly; if one of, say, 33 brackets is inconsistent in a given sentence, the sentence is 97 percent correct by the bracket measure and simply wrong by the sentences-correct measure.

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 492

I'm currently at MIT
Forty-five Pearl Street
What kind of food does LaGroceria serve
Is there a Baybank in Central Square
Where is the closest library to MIT
What's the address of the Baybank near Hong Kong
What's the closest ice cream parlor to Harvard University
How far is Bel Canto's from Cambridge Street in Cambridge
Is there a subway stop by the Mount Auburn Hospital
Can I have the phone number of the Cambridge City Hall
Can you show me the intersection of Cambridge Street and Hampshire Street
How do I get to the closest post office from Harvard University
Which subway stop is closest to the library at forty-five Pearl Street

FIGURE 3 Sample sentences from the Massachusetts Institute of Technology's Voyager corpus.

in the parser's input stream, and the rule that has generated the particular nonterminal that the parser is trying to expand. For example, the rule "NP—pronoun" might have a different probability when it expands the NP in the rule "S NP VP" than when it expands the NP in the rule "VP—NP NP"). Tested on a corpus of sentences from the Massachusetts Institute of Technology's Voyager domain (Zue et al., 1990), this parser correctly parsed 89 percent of a reserved test set. A sample list of sentences from this corpus, with length distribution typical of the corpus as a whole, is given in Figure 3. Although the performance of this algorithm is quite impressive in isolation, the sentences in this corpus are somewhat simpler in structure than those in other spoken language domains and are certainly much simpler than sentences from newswire services that were the target of the parser evaluation discussed in the introduction to this chapter.

On the other hand, a simple PCFG for this corpus parses a reserved test set with only about 35 percent accuracy, comparable to PCFG performance in other domains. If the probability of each rule is conditioned on both the current nonterminal and on the rule immediately above that gave rise to the current nonterminal, then performance improves to about 50 percent accuracy. Conditioning each rule on the expected part of speech of the next several words in addition increases performance to 87.5 percent accuracy. The key point here again is that combining a very simple stochastic framework with a little bit of linguistic knowledge greatly increases performance over each alone.

Many parsing techniques are now emerging that combine stochastic techniques with linguistic knowledge in a number of different ways. Again, as discussed briefly above, linguistic knowledge can be

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 493

encoded explicitly, perhaps in the form of a grammar, or implicitly within the annotations of an annotated corpus.

For combination with stochastic methods, so-called covering grammars can be used, grammars that provide at least one correct parse for sentences of interest but that may also produce spurious impossible parses. While these spurious parses would be a problem if the grammar were used with a purely symbolic parser, the hope is that when used within a stochastic framework, spurious parses will be of much lower probability than the desired analyses. One simple method for combining explicit linguistic knowledge with stochastic techniques is to use a stochastic technique to estimate the probability distribution for all and only the rules within the grammar, drastically limiting the number of parameters that need to be estimated within the stochastic model. While advocated by many researchers, this method suffers from the potential defect that it cannot model grammar rules that the grammar writer overlooked or that occur rarely enough that they were unseen in training materials. A somewhat more powerful method is to (1) use the grammar to generate all potential parses of a set of example sentences, (2) create a training set of trees by either hand picking the correct parse for each sentence or simply using all potential parses (which works far better than might be expected), and then (3) use the usage count of each grammar rule within this training set to provide an initial estimate of the parameters of the associated stochastic grammar, which might then be smoothed using the inside/outside algorithm.

If the annotation of a corpus takes the form of syntactic bracketing, the implicit knowledge encoded in the annotations of a corpus can be used in exactly the same way as explicit knowledge, to a first approximation. The key idea is that a grammar can be simply extracted from the corpus, along with counts for each rule, and then the methods discussed immediately above are applicable. In fact, most grammatical annotation provided in corpora to date is skeletal, providing only partial structures, so the use of smoothing techniques is crucial.

To what extent might combinations of linguistic knowledge with stochastic techniques improve the performance of parsers in the nearterm future? Two experiments, both on a corpus of short sentences from computer manuals, cast some light here. In the first experiment (Black et al., 1992b), both an explicit grammar and an annotated corpus were used to build a stochastic parser that parsed 75 percent of the sentences in a reserved test set completely consistently with a hand-assigned bracketing. The second experiment (Black et al., 1993) is attempting to leave explicit grammars behind, using instead a very

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 494

rich set of linguistically relevant questions in combination with decision tree techniques. These questions examine not only syntactic properties, but lexical and class-based information as well, thus combining a much richer set of linguistic knowledge sources than any other model to date. The decision tree uses this set of questions to search for the grammar implicit in a very large hand-annotated corpus. Published reports of early stages of this work indicate that this technique is 70 percent correct on computer manual sentences of length 7 to 17, where, to count as correct, each parse must exactly match the prior hand analysis of that sentence in the test corpus, a more stringent test criterion than any other result mentioned here.

While this last experiment uses one uniform statistical technique, decision trees, to make all parsing decisions, some recent work suggests that effective parsing might be done by a suite of interacting parsing experts, each handling a particular grammatical phenomenon. Perhaps the clearest example of this is a recent technique to resolve the ambiguous attachment of prepositional phrases. Consider the sentence I saw the man with the telescope; here the prepositional phrase with the telescope might modify the man, meaning I saw the man who had a telescope, or it might modify the main verb saw, meaning I used the telescope to see the man. If the sentence were instead I saw the planet with the telescope, the prepositional phrase would certainly modify the main verb, but if it were I saw the man with the hat the prepositional phrase would clearly modify the man. Here, as in many other cases, it becomes clear that a decision about grammatical structure depends crucially on the properties of the lexical items themselves. A technique that uses likelihood ratios to compare the strength of association between the preposition and the main verb with the strength of association between the preposition and the preceding noun correctly assigns about 80 percent of prepositional phrases in sentences from the AP newswire with structure identical to the examples here (Hindle and Rooth, 1993). It is interesting to note that human judges, given the same information, do this task at about 85 to 87 percent accuracy. This experiment also points out the key role of lexical properties in deciding grammatical structure. Its success suggests that the crucial role of grammar is just to mediate the properties of lexical items themselves. This would suggest, as does the recent work on lexicalized grammars discussed above, that the words themselves are primary.

Annotated Corpora

Before leaving the question of syntax, I would like to say a word about the production of annotated corpora themselves. There are, at

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 495

the moment, two large grammatically annotated corpora for English—the IBM/Lancaster Treebank (Garside et al., 1987) and the Penn Treebank (Marcus et al., 1993). As of this time, only materials from the second are generally available; they are distributed through the Linguistic Data Consortium; because of this, and my familiarity with this corpus, that is the focus here.

The Penn Treebank now consists of 4.5 million words of text tagged for part of speech, with about two-thirds of this material also annotated with a skeletal syntactic bracketing. All of this material has been hand corrected after processing by automatic tools. The two largest components of the corpus consist of over 1.6 million words of material from the Dow-Jones News Service, hand parsed, with an additional 1 million words tagged for part of speech and a skeletally parsed version of the Brown corpus (Francis, 1964; Francis and Kucera, 1982), the classic 1-million-word balanced corpus of American English. This material has already been used for purposes ranging from serving as a gold standard for parser testing to serving as a basis for the induction of stochastic grammars to serving as a basis for quick lexicon induction.

There is now much interest in very large corpora that have quite detailed annotation, assuming that such corpora can be efficiently produced. The group at Penn is now working toward providing a 3-million-word bank of predicate-argument structures. This will be done by first producing a corpus annotated with an appropriately rich syntactic structure and then automatically extracting predicate-argument structure, at the level of distinguishing logical subjects and objects, and distinguishing a small range of particular adjunct classes. This corpus will be annotated by automatically transforming the current treebank into a level of structure close to the intended target and then completing the conversion by hand.

LEXICAL SEMANTICS AND BEYOND

We now turn to an area of very recent progress-lexical semantics. At initial inspection, it would appear most unlikely that statistical techniques would be of much use for either the discovery or representation of the meanings of words. Surprisingly, some preliminary work over the past several years indicates that many aspects of lexical semantics can be derived from existing resources using statistical techniques.

Several years ago it was discovered that methods from statistics and information theory could be used to ''tease out" distinctions between words, as an aid to lexicographers developing new dictionar-

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 496

image

Computed over Parsed AP Corpus (N = 24.7 million SVO triples)

FIGURE 4  What do you typically do with food and water? (Adapted from Church et al., 1991.)

ies (Church et al., 1991). As an example, consider the following: How could one distinguish the meaning of food and water? Figure 4 shows the mutual information score,6 an information theoretic measure, between various verbs and between food and water in an automatically parsed corpus, where either food or water is the object of that verb or, more precisely, where one or the other is the head of the noun phrase which is the object of the verb. The corpus used in this experiment consists of 25 million subject-verb-object triples automatically extracted from  the AP newswire by the use of a parser for unrestricted text. The mutual information score is high if the verb and noun tend to occur together and will tend toward 0 if the verb and noun occur together no more often than expected by chance. Because this measure is the log of a ratio, scores such as those shown in the table are quite high. What kinds of things can you do with food? Well, according to the AP newswire, you can hoard it, go without it, eat it, consume it, etc. With water, you can conserve it, boil it, ration it, pollute it, etc.7 This indeed begins to reveal something about the meaning of

6The mutual information statistic is a measure of the interdependence of two signals. It is defined as Ml(x,y) I log [P(x,y)/P(x)P(y)].

7Because English contains a large number of so-called phrasal verbs, whenever the parser encounters a verb followed immediately by a prepositional phrase, as in go without food, the parser creates a potential phrasal verb, for example, go_lwithout and a triple where the object of the preposition (here food) is taken as the object of the putative phrasal verb (here go_without).

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 497

image

Computed over Parsed AP Corpus (N = 24.7 million SVO triples)

FIGURE 5  What do you do more with food than with water? (Adapted from Church et al., 1991.)

these verbs, based on distributional properties of these words, that is, what other words they cooccur with.

To differentiate these two words somewhat more sharply, one might ask what might be done more or less to one than the other. Here, an appropriate metric is the t-score, which contrasts the conditional probability of seeing food as object given a particular verb with the conditional probability of seeing water as object given that verb.8 Figure 5 shows that one eats or buys food far more than water and that one pumps or drinks water far more than food. Perhaps surprisingly, these descriptions get quite close to the heart of the difference between food and water.

These experiments show that statistical techniques can be used to tease out aspects of lexical semantics in such a way that a human lexicographer could easily take advantage of this information. However, for computers to utilize such information, some kind of representation must be found to encode semantical information in the machine. What kinds of representations might be appropriate for automatic discovery procedures? Much research on automatic machine translation is now being done using the parallel French-English transcripts of the proceedings of the Canadian parliament. This corpus has been used to test a statistical technique that finds the most reliable local clue in context to tease apart different senses of the same word in the

8The formula computed is t | [P(food|verb) - P(water|verb)]/{s2[P(food|verb)] + s2[P(water|verb)]}1/2

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 498

Word:

prendre

Informant:

Right noun

Information:

.381 bits

(a) For prendre the noun to the right is maximally informative.

Sense I

Sense 2

part

decision

mesure

parole

note

connaissance

exemple

engagement

temps

fin

initiative

retraite

(b) Some French words that are informant values for each sense.

Pr(English —

Sense I)

Pr(English—

Sense 2)

to_take

.433

to_make

.186

to_make

.061

to_speak

.105

to_do

.051

to_rise

.066

to_be

.045

to_take

.066

 

to_be

.058

decision

.036

to_get

.025

to_have

.021

(c) Sense one translates as take, sense two as make.

FIGURE 6 The two senses of Prendre translate as take or make. (Adapted from Brown et al., 1991.)

source language, representing that meaning as the translation in the target language (Brown et al., 1991). An example of this technique is shown in Figure 6. Here, the most useful single clue for the translation of an instance of the French word prendre is the particular word that occurs as the first noun to its right. As shown in Figure 6(a), the identity of this word provides, on average, 0.381 bits of information as to how to distinguish between two different senses of prendre. Figure 6(b) shows some of the French words that distinguish  between one sense of prendre (part, mesure, note, exemple, etc.) and another (decision, parole, connaissance, etc.). As shown in Figure 6(c), the most likely translation into English for sense 1 is the verb take; for sense 2, the most likely translation is the English verb make. This technique has shown itself to be quite useful in current work in machine translation. As a technique for word sense disambiguation, it

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 499

has the clear drawback that often a word with multiple senses in language 1 has many of the same senses in language 2, so it can only be used when the target language does split the senses of interest.

Other researchers have found a richer and more robust representation for words in the internal representation used within WordNet, a large hand-built computer-based lexicon (Beckwith et al., 1991). Because WordNet is fundamentally computer based, it can be organized as a large graph of different kinds of relations between words. These relations include not only relatively standard ones such as X is a synonym of Y, or X is an antonym of Y but also many other relations such as X is a kind of Y and X is a part of Y. Concepts within WordNet are represented by "synonym sets," sets of words that all share a core meaning. One simple representation of a meaning of a word, then, is just the synonym set, or synset, of the words that share that meaning.

This hierarchy has been used to investigate the automatic classification of verbs by the kinds of objects that they take, a first step toward determining the selectional restrictions of verbs automatically (Resnik, 1992). In this work, synonym sets are used to represent classes of objects in both the input and output of a program that computes a variant of the mutual information statistic discussed above. Using synonym sets for the output provides the general classification one seeks, of course. Using synonym sets for input as well has an important added advantage: it provides a solution to the sparse data problem that plagues work in lexical statistics. Many of the counts of verb-object pairs that make up the input to this program are very small and therefore unreliable, in particular given a corpus as small as the million-word Brown corpus (Francis, 1964; Francis and Kucera, 1982) used in this experiment. By pooling data for particular nouns into the synonym sets they fall into, much of this sparse data problem can be solved.

Figure 7 gives one example of the performance of Resnik's statistic. These are the highest-ranking synonym sets for objects of the verb open. The names of the synonym sets were hand assigned within WordNet. Figure 8 gives the single highest ranking synonym set for a list of common verbs. These two experiments show that a statistical approach can do surprisingly well in extracting major aspects of the meaning of verbs, given the hand encoding of noun meanings within WordNet. These experiments suggest that it might be possible to combine the explicit linguistic knowledge in large hand-built computational lexicons, the implicit knowledge in a skeletally parsed corpus, and some novel statistical and information theoretic methods to automatically determine a wide variety of aspects of lexical semantics.

The work described above is typical of much recent work in the

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 500

SynSet Name

Typical Members

entrance

door

mouth

mouth

repository

store, closet, locker, trunk

container

bag, trunk, locker, can, box, hamper

time_period

tour, round, season, spring, session, week, evening,

morning, saturday

 

oral_communication

discourse, engagement, relation, reply, mouth, program,

 

conference, session

writing

scene, book, program, statement, bible, pargraph, chapter

FIGURE 7  Classes of things that are opened. (Adapted from Resnik, 1992.)

Verb

Most Highly Associated Object SynSet

ask

question

call

someone

climb

stair

cook

repast

draw

cord

drink

beverage

eat

nutrient

lose

sensory_faculty

play

part

pour

liquid

pull

cover

push

button

read

written_material

sing

music

FIGURE 8  "Prototypical" classes of objects for common verbs. (Adapted from Resnik, 1992.)

area of lexical discovery. Other recent work has focused on, for example, the use of distributional techniques for discovery of collocations in text (Smadja, 1993) and of subcategorization frames for verbs (Brent, 1993) and to uncover lexical semantics properties (Pustojovsky et al., 1993). Much other work has been done in this area; the references given here are typical rather than exhaustive.

A QUESTION FOR TOMORROW

While the focus above has been on the effectiveness of recent stochastic and statistical techniques, there is some evidence that this

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 501

effectiveness is due in large measure to the empirical corpus-based nature of these techniques rather than to the power of stochastic modeling. Surprisingly, symbolic learning techniques have performed as well as stochastic methods on two tasks considered above, despite the fact that they learn only simple symbolic rules, with only simple counting used during training, and then only to choose one potential rule over another. This raises the question of whether the effectiveness of the stochastic techniques above is essentially due to the fact that they extract linguistic structure from a large collection of natural data or is the result of their statistical nature. This issue, I believe, will be resolved in the next several years.

In one experiment a very simple symbolic learner integrated with a parser for free text produced a set of symbolic lexical disambiguation rules for that parser. The parser, running with the new augmented grammar, if viewed only as a part-of-speech tagger, operates at about 95 percent word accuracy (Hindle, 1989). What makes this result all the more surprising is that this parser works strictly left to right in a fully deterministic fashion. Recently, a very simple symbolic learning technique called error-based transformation learning was applied to the tagging problem (Brill, 1992). The resultant tagger operates with a set of only 160 simple rules, plus a table of the most likely tag in isolation for each word. The tagger begins by tagging each word with the most likely tag in isolation and then applies each of the 160 rules in order to the entire corpus. These rules are of the form In a context X, change tag A to tag B. This tagger operates at about 96 percent word accuracy.

This learning algorithm has also been applied to the problem of bracketing English text, with very surprising results (Brill, 1993). The learner begins by assuming that English is strictly right branching and then learns a set of rules using exactly the same learning technique as used in the tagger discussed above, except that here the potential environments for rule application are very simple configurations in the bracketed string, for example, if some category X is preceded by a right paren then . . . or if a left paren falls between category X and category Y then. . . . There are only two possible rule operations that simply transform the binary branched trees with bracketings ((A B) C) and (A (B C)) into each other. The parser was trained on a small 500-sentence bracketed subset of material from the Wall Street Journal, of sentences less than 15 words in length, and acquired about 160 rules. Tested on a reserved test set of sentences of the same length, the parser bracketed 54 percent of the sentences completely consistently with the original bracketing; 72 percent of the sentences were bracketed with one bracketing error or less. Trained on only 250 sen-

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 502

tences of length n, n < 25, the parser again acquired about 160 rules and parsed a similar reserved test set at 30 percent of sentences bracketed correctly. In recent unpublished experiments this same technique was applied to the problem of labeling these bracketed but unlabeled structures, achieving about 95 percent correct labeling, by node.

Perhaps this learning technique will lead to an even more powerful stochastic method of some kind. What is unique about this learner is that each rule applies to the output of the previous rule. But perhaps it will turn out that the power of these methods comes from use of a corpus itself. Time will tell.

ACKNOWLEDGMENTS

This work was partially supported by Defense Advanced Research Projects Agency (DARPA) Grant No. N0014-85-K0018, by DARPA and (AFOSR) jointly under Grant No. AFOSR-90-0066, and by Grant No. DAAL 03-89-C0031 PRI. Thanks to Eric Brill, Ken Church, Aravind Joshi, Mark Liberman, David Magerman, Yves Schabes, and David Yarowsky for helpful discussions. Thanks also to two anonymous reviewers for many excellent comments and suggestions.

REFERENCES

Baker, J. K. 1979. Trainable grammars for speech recognition. Proceedings of the Spring Conference of the Acoustical Society of America.

Beckwith, R., C. Fellbaum, G. Gross, and G. Miller. 1991. WordNet: A lexical database organized on psycholinguistic principles. Lexical Acquisition: Exploiting On-line Resources to Build a Lexicon, U. Zernik (ed.), Lawrence Erlbaum, Hillsdale, N.J., pp. 211-232.

Black, E., S. Abney, F. Flickenger, R. Grishman, P. Harrison, D. Hindle, R. Ingria, F. Jelinek, J. Klavans, M. Liberman, M. Marcus, S. Roukos, B. Santorini, and T. Strzalkowski. 1991. A procedure for quantitatively comparing the syntactic coverage of English grammars. Proceedings of the Fourth DARPA Speech and Natural Language Workshop, February.

Black, E., F. Jelinek, J. Lafferty, R. Mercer, and S. Roukos. 1992a. Decision tree models applied to the labeling of text with parts-of-speech. Proceedings of the DARPA Speech and Natural Language Workshop, February, pp. 117-121.

Black, E., F. Jelinek, J. Lafferty, D. M. Magerman, R. Mercer, and S. Roukos. 1992b. Towards history-based grammars: Using Richer models for probabilistic parsing. Proceedings of the 31th Annual Meeting of the Association for Computational Linguistics.

Black, E., J. Lafferty, and S. Roukos. 1993. Development and evaluation of a broadcoverage probabilistic grammar of English-language computer manuals. Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics.

Brent, M. 1993. From grammar to lexicon: Unsupervised learning of lexical syntax. Computational Linguistics, 19:243-262.

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 503

Brill, E. 1992. A simple rule-based part of speech tagger. Proceedings of the Third Conference on Applied Natural Language Processing, Trento, Italy.

Brill, E. 1993. Automatic grammar induction and parsing free text: A transformation-based approach. Proceedings of the 31th Annual Meeting of the Association for Computational Linguistics.

Brill, E., D. Magerman, M. Marcus, and B. Santorini. 1990. Deducing linguistic structure from the statistics of large corpora. Proceedings of the DARPA Speech and Natural Language Workshop, June, pp. 275-282.

Brown, P., S. Della Pietra, V. Della Pietra, and R. Mercer. 1991. A statistical approach to sense disambiguation in machine translation. Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, Calif.

Church, K. 1985. Stress assignment in letter to sound rules for speech synthesis. Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics, pp. 246-253.

Church, K. 1988. A stochastic parts program and noun phrase parser for unrestricted text. Proceedings of the Second Conference on Applied Natural Language Processing. 26th Annual Meeting of the Association for Computational Linguistics, pp. 136-143.

Church, K. W. Gale, P. Hanks, and D. Hindle. 1991. Using Statistics in Lexical Analysis. AT&T Bell Laboratories Technical Memorandum.

Cutting, D., J. Kupiec, J. Pederson, and P. Sibun. 1992. A practical part-of-speech tagger. Proceedings of the Third Conference on Applied Natural Language Processing, ACL.

Francis, W. N. 1964. A Standard Sample of Present-Day English for Use with Digital Computers. Report to the U.S Office of Education on Cooperative Research Project No. E-007. Brown University, Providence, R.I.

Francis, W. N., and H. Kucera. 1982. Frequency Analysis of English Usage. Lexicon and Grammar. Houghton Mifflin, Boston.

Fujisaki, T., F. Jelinek, J. Cocke, E. Black, and T. Nishino. 1989. A probabilistic method for sentence disambiguation. Proceedings of the 1st International Workshop on Parsing Technologies, Carnegie-Mellon University, Pittsburgh, Pa.

Harris, Z. 1951. Methods in Structural Linguistics. University of Chicago Press, Chicago.

Garside, R., G. Leech, and G. Sampson. 1987. The Computational Analysis of English. A Corpus-Based Approach. Longman, London.

Hemphill, C., J. Godfrey, and G. Doddington. 1990. The ATIS spoken language systems pilot corpus. Proceedings of the Third DARPA Speech and Natural Language Workshop, February.

Hindle, D. 1989. Acquiring disambiguation rules from text. Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics.

Hindle, D., and M. Rooth. 1993. Structural ambiguity and lexical relations. Computational Linguistics, 19:103-120.

Jelinek, F., J. D. Lafferty, and R. L. Mercer. 1991. Basic methods of probabilistic context-free grammars. Continuous Speech Recognition Group, IBM T. J. Watson Research Center.

Joshi, A., and Y. Schabes. 1992. Tree adjoining grammars and lexicalized grammars. Tree Automata and Languages, M. Nivat and A. Podelski (eds.). Elsevier, New York.

Lari, K., and S. J. Young. 1990. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4:35-56.

Magerman, D., and M. Marcus. 1991a. Parsing the Voyager domain using pearl. Pro-

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 504

ceedings of the Fourth DARPA Speech and Natural Language Workshop, February.

Magerman, D., and M. Marcus. 1991b. PEARL—A probabilistic chart parser. Proceedings, Fifth Conference of the European Chapter of the Association for Computational Linguistics (EACL), Berlin, April.

Marcus, M., B. Santorini, M. A. Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19:313-330.

Merialdo, B. 1991. Tagging text with a probabilistic model. Proceedings of ICASSP-91, pp. 809-812.

Pereira, F., and Y. Schabes. 1992. Inside-outside reestimation from partially bracketed corpora. Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics.

Pustojovsky, J., S. Bergler, and P. Anick. 1993. Lexical semantic techniques for corpus analysis. Computational Linguistics, 19:331-358.

Resnik, P. 1992. WordNet and distributional analysis: A class-based approach to lexical discovery. Workshop Notes AAAI-92 Workshop in Statistically-Based NLP Techniques, July.

Schabes, Y., M. Roth, and R. Osborne. 1993. Parsing the Wall Street Journal with the inside-outside algorithm. Proceedings, Sixth Conference of the European Chapter of the Association for Computational Linguistics (EACL), Utrecht, April.

Sharman, R.A., F. Jelinek, and R. Mercer. 1990. Generating a grammar for statistical training. Proceedings of the Third DARPA Speech and Natural Language Workshop, February.

Smadja, F. 1993. Retrieving collocations from text: Xtract. Computational Linguistics, 19:143-178.

Steedman, M. 1993. Categorial grammar, Lingua, 90:221-258.

Weischedel, R., M. Meteer, R. Schwartz, L. Ramshaw, and J. Palmucci. 1993. Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics, 19:359-382.

Zue, V., J. Glass, D. Goodine, H. Leung, M. McCandless, M. Philips, J. Polifroni, and S. Seneff. 1990. Recent progress on the VOYAGER system. Proceedings of the Third DARPA Speech and Natural Language Workshop, June.

Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 482
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 483
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 484
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 485
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 486
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 487
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 488
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 489
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 490
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 491
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 492
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 493
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 494
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 495
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 496
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 497
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 498
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 499
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 500
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 501
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 502
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 503
Suggested Citation:"New Trends in Natural Language Processing: Statistical Natural Language Processing." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 504
Next: The Future of Voice-Processing Technology in the World of Computers and Communications »
Voice Communication Between Humans and Machines Get This Book
×
Buy Hardback | $95.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Science fiction has long been populated with conversational computers and robots. Now, speech synthesis and recognition have matured to where a wide range of real-world applications—from serving people with disabilities to boosting the nation's competitiveness—are within our grasp.

Voice Communication Between Humans and Machines takes the first interdisciplinary look at what we know about voice processing, where our technologies stand, and what the future may hold for this fascinating field. The volume integrates theoretical, technical, and practical views from world-class experts at leading research centers around the world, reporting on the scientific bases behind human-machine voice communication, the state of the art in computerization, and progress in user friendliness. It offers an up-to-date treatment of technological progress in key areas: speech synthesis, speech recognition, and natural language understanding.

The book also explores the emergence of the voice processing industry and specific opportunities in telecommunications and other businesses, in military and government operations, and in assistance for the disabled. It outlines, as well, practical issues and research questions that must be resolved if machines are to become fellow problem-solvers along with humans.

Voice Communication Between Humans and Machines provides a comprehensive understanding of the field of voice processing for engineers, researchers, and business executives, as well as speech and hearing specialists, advocates for people with disabilities, faculty and students, and interested individuals.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!