Skip to main content

Mapping Knowledge Domains (2004) / Chapter Skim
Currently Skimming:

From paragraph to graph: Latent semantic analysis for information visualization
Pages 32-37

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 32...
... provides an effective dimension reduction method for the purpose that reflects synonymy and the sense of arbitrary word combinations (2, 3~. Latent Semantic Analysis LSA is one of a growing number of corpus-based techniques that employ statistical machine learning in text analysis.
From page 33...
... By matching documents with similar meanings but different words, LSA improved recall in information retrieval, usually achieving 10-30% better performance cetera paribus by standard metrics, again doing best with ~300 dimensions (9~. LSA has been found to measure coherence of text in such a way as to predict human comprehension as well as sophisticated psycholinguistic analysis, whereas measures of surface word overlap fail badly (104.
From page 34...
... . In an expediencedictated procedure differing from the optimal process described above, we first divided the corpus into 317,115 paragraph-like passages containing an average of 212 word tokens, and applied SVD to the resulting matrix of 240,718 terms by 317,115 passages.
From page 35...
... Table 1. Top two and bottom two titles in the amount of topic overlap, as determined by cos product Title cos product In vitro properties of the first ORE protein from mouse 0.664 LINE-1 support its role in ribonucleoprotein particle formation during retrotransposition Prospero is a panneural transcription factor that modulates homeodomain protein activity Chondrocytes as a specific target of ectopic Fos expression in early development c-Myc transactivation of LDH-A: Implications for tumor 0.490 metabolism and growth 0.656 0.492 PNAS 1 April 6, 2004 1 vol.
From page 36...
... in 6 years of PNAS articles is shown in SVD dimensions 3 and 4. In all of these figures, the article title Primordial nucleosynthesis was used as the query (shown as the black circle)
From page 37...
... This kind of search has much the same goal as recent attempts to automatically cluster subsets of returns, but allows and relies on visual search by the user that can reveal patterns and shapes such as clouds, gradual intermixings, and scattered islands that hard-boundary clustering usually misses, but the human visual system has evolved to perform in still mysterious ways. Of course, it might sometime be possible to find computational procedures to automatically find views optimized according to these or other objectives.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.