National Academy of Sciences | 150 Year Anniversary

Questions? Call 800-624-6242

| Items in cart [0]

The National Academies Press

Rights & Permissions

topleft topright

Computing, Communication, and the Information Age (1994)
Commission on Physical Sciences, Mathematics, and Applications (CPSMA)

Citation Manager

. "Information Capture and Access." Computing, Communication, and the Information Age. Washington, DC: The National Academies Press, 1994.

Please select a format:

BibTeX EndNote RefMan


Page
15
bottomleft bottomright

The following HTML text is provided to enhance online readability. Many aspects of typography translate only awkwardly to HTML. Please use the page image as the authoritative form to ensure accuracy.


COMPUTING, COMMUNICATION, AND THE INFORMATION AGE

FIGURE 11 Blank document.

characters. For a mathematical equation, the presence of subscripts and superscripts generates additional small peaks. A theory of blank space in lines and documents would aid in building tools for accessing text.

Why is it important to analyze text automatically? There are many reasons. As has already been mentioned, allowing standard database queries of English text documents in the case where the document has tables is one reason. Another reason involves automatic indexing. Consider WAIS, a tool created by Apple Computer, Dow Jones, Thinking Machines, and Dunn and Bradstreet for locating information from wire services and other text sources (Kahle, 1989). Intuitively, WAIS operates by taking a document, sorting the words, and creating a vector whose components indicate the number of occurrences of various words in the document. One retrieves a document by providing a set of keywords. The keywords are used to form a search vector. The dot product of the search vector is computed with each vector representing a document in the collection. (This is basically the SMART technology (Salton, 1991). A short list of documents with the highest dot products is produced. The documents can then be displayed.

Page
15