Skip to main content

Mapping Knowledge Domains (2004) / Chapter Skim
Currently Skimming:

COLLOQUIUM PAPERS: Extracting knowledge from the World Wide Web
Pages 4-9

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 4...
... Their main idea is to perform a random walk so that a page is visited by the walk with probability roughly proportional to its PageRank (2) value, and then to sample the visited pages with probability inversely proportional to their PageRank value.
From page 5...
... To minimize the bias, we can use the domain name system to identify multiple IP addresses serving the same content, and consider only the lowest numbered address to be part of the publicly indexable web. Most major sites are not virtually hosted, and few public servers operate on a nonstandard port.
From page 6...
... For example, among web pages of the same category, link distributions can diverge strongly from power law scaling, exhibiting a roughly log-normal distribution. In earlier models predicting a power law distribution, most members of a community fare poorly; they have none or very few links to them.
From page 7...
... However, if one assumes the existence of one or more seed web sites and exploits systematic regularities of the web graph (8, 30, 31) , the problem can be recast into a framework that allows for efficient community identification using a polynomial time algorithm.
From page 8...
... Bipartite subgraph identification, cocitation, and biblio graphic coupling are localized approaches that aim to identify well defined graph structures existing in a narrow region of the web graph. PageRank, HITS, and spreading activation energy (SAE)
From page 9...
... (1996) Spectral Graph Theory, CBMS Lecture Notes (Ar Math.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.