Skip to main content

Currently Skimming:

New Trends in Natural Language Processing: Statistical Natural Language Processing
Pages 482-504

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 482...
... In the past, most work in computational linguistics tended to focus on purely symbolic methods. Recently, more and more work is shifting toward hybrid methods that combine new empirical corpus-based methods, including the use of probabilistic and information-theoretic techniques, with traditional symbolic methods.
From page 483...
... newswire were submitted to a range of the very best parsers in the United States, parsers expressly developed to handle text from natural sources. None of these parsers did very well; the majority failed on more than 60 percent of the test sentences, where the task was to find the one correct parse for each sentence in the test set.
From page 484...
... STATISTICAL TECHNIQUES: FIRST APPEARANCE One of the first demonstrations that stochastic modeling techniques, well known in the speech-processing community, might provide a way to cut through this impasse in NLP was the effective application of a simple letter trigram model to the problem of determining the national origin of proper names for use in text-to-speech systems (Church, 1985~. Determining the etymology of names is crucial in this application because the pronunciation of identical letter strings differs greatly from language to language; the string GH, for example, is pronounced as a hard G in Italian, as in Aldrighetti, while most often pronounced as F or simply silent in English, as in laugh or sigh.
From page 485...
... STATISTICAL NATURAL LANGUAGE PROCESSING THE ARCHITECTURE OF AN NLU SYSTEM 485 Figure la gives an overview of a few of the crucial steps in the process of decoding a sentence in a conversational NLU system, given that the words that make up the sentence have been determined either by a speech recognition system or by tokenization of an ASCII source. When a new sentence comes in, it is analyzed by a parser that both determines what part of speech to assign to each of the words a.
From page 486...
... In the area of lexical semantics, a range of promising techniques for performing word-sense disambiguation have emerged in just the last year, as well as some preliminary work in automatically determining the selectional restrictions of verbs, that is, what kind of objects can serve as the subject or object of a given verb. Finally, all of these methods depend crucially on the availability of training materials annotated with the appropriate linguistic structure.
From page 487...
... This problem of lexical disambiguation is a central problem in building any NLP system; given a realistically large lexicon of English, many common words are used in multiple parts of speech. Determining what function each word plays in context is a crucial part of either assigning correct grammatical structure for purposes of later semantic analysis or of providing a partial heuristic chunking of the input into phrases for purposes of assigning intonation in a text-to-speech synthesizer.
From page 488...
... While earlier work provides evidence that handcrafted symbolic representations of linguistic knowledge are insufficient to provide industrial-strength NLP, it also appears that the use of statistical methods without some incorporation of linguistic knowledge is insufficient as well. This linguistic knowledge may either be represented in implicit form, as in the use of a pretagged corpus here, or encoded explicitly in the form of a grammar.2 In the next few years, I believe we are going to see stochastic techniques and linguistic knowledge more and more deeply interleaved.
From page 489...
... The key point here is that these techniques for unseen words go beyond using purely stochastic techniques to using implicit and explicit linguistic knowledge, although in a trivial way, to get the job done. STOCHASTIC PARSING All work on stochastic parsing begins with the development of the inside/outside algorithm (Baker, 1979)
From page 490...
... In recent years a range of new grammatical formalisms have been proposed that some suggest have the potential to solve a major part of this problem. These formalisms, called lexicalized grammar formalisms, express grammars in which the entire grammar consists of complex structures associated with individual words, plus some very simple general rules for combining these structures.
From page 491...
... Conditioning PCFG Rules on Linguistic Context One new class of models uses linguistic knowledge to condition the probabilities of standard probabilistic context-free grammars. These new models, which in essence augment PCFG grammar rules with probabilistic applicability constraints, are based on the hypothesis that the inability of PCFGs to parse with high accuracy is due to the failure of PCFGs to model crucial aspects of linguistic structure relevant to the appropriate selection of the next grammar rule at each point within a context-free derivation.
From page 492...
... Although the performance of this algorithm is quite impressive in isolation, the sentences in this corpus are somewhat simpler in structure than those in other spoken language domains and are certainly much simpler than sentences from newswire services that were the target of the parser evaluation discussed in the introduction to this chapter. On the other hand, a simple PCFG for this corpus parses a reserved test set with only about 35 percent accuracy, comparable to PCFG performance in other domains.
From page 493...
... While these spurious parses would be a problem if the grammar were used with a purely symbolic parser, the hope is that when used within a stochastic framework, spurious parses will be of much lower probability than the desired analyses. One simple method for combining explicit linguistic knowledge with stochastic techniques is to use a stochastic technique to estimate the probability distribution for all and only the rules within the grammar, drastically limiting the number of parameters that need to be estimated within the stochastic model.
From page 494...
... The decision tree uses this set of questions to search for the grammar implicit in a very large hand-annotated corpus. Published reports of early stages of this work indicate that this technique is 70 percent correct on computer manual sentences of length 7 to 17, where, to count as correct, each parse must exactly match the prior hand analysis of that sentence in the test corpus, a more stringent test criterion than any other result mentioned here.
From page 495...
... Surprisingly, some preliminary work over the past several years indicates that many aspects of lexical semantics can be derived from existing resources using statistical techniques. Several years ago it was discovered that methods from statistics and information theory could be used to "tease out" distinctions between words, as an aid to lexicographers developing new dictionar
From page 496...
... Figure 4 shows the mutual information score,6 an information theoretic measure, between various verbs and between food and water in an automatically parsed corpus, where either food or water is the object of that verb or, more precisely, where one or the other is the head of the noun phrase which is the object of the verb. The corpus used in this experiment consists of 25 million subject-verb-object triples automatically extracted from the AP newswire by the use of a parser for unrestricted text.
From page 497...
... Perhaps surprisingly, these descriptions get quite close to the heart of the difference between food and water. These experiments show that statistical techniques can be used to tease out aspects of lexical semantics in such a way that a human lexicographer could easily take advantage of this information.
From page 498...
... Sense one translates as take, sense two as make. MITCHELL MARCUS FIGURE 6 The two senses of Prendre translate as take or make.
From page 499...
... These two experiments show that a statistical approach can do surprisingly well in extracting major aspects of the meaning of verbs, given the hand encoding of noun meanings within WordNet. These experiments suggest that it might be possible to combine the explicit linguistic knowledge in large hand-built computational lexicons, the implicit knowledge in a skeletally parsed corpus, and some novel statistical and information theoretic methods to automatically determine a wide variety of aspects of lexical semantics.
From page 500...
... and of subcategorization frames for verbs (Brent, 1993) and to uncover lexical semantics properties (Pustojovsky et al., 1993~.
From page 501...
... In one experiment a very simple symbolic learner integrated with a parser for free text produced a set of symbolic lexical disambiguation rules for that parser. The parser, running with the new augmented grammar, if viewed only as a part-of-speech tagger, operates at about 95 percent word accuracy (Hindle, 1989~.
From page 502...
... Proceedings of the Fourth DARPA Speech and Natural Language Workshop, February. Black, E., F
From page 503...
... Proceedings of the Second Conference on Applied Natural Language Processing. 26th Annual Meeting of the Association for Computational Linguistics, pp.
From page 504...
... Proceedings, Sixth Conference of the European Chapter of the Association for Computational Linguistics (EACL) , Utrecht, April.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.