Skip to main content

Currently Skimming:

3. Knowledge Discovery: Data Mining and Search
Pages 12-15

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 12...
... An overarching challenge is to develop a new level of capability in automated search and mining. For example, instead of merely providing the capability to retrieve Web pages that concern the safety records of automobiles, it would be useful to be able to satisfy requests such as, "Compile a report with all available information on the safety of the ~ 998 Ford Taurus." This task requires the ability to retrieve information stored in tables or sentences on various Web pages, judge its relevance and reliability, and present the results in an easily understood document.
From page 13...
... Web Social Structure and Search Jon Kleinberg of Cornell University focused on the information content and social structure of the Web and on the problems of Web searching. He noted that searching is made difficult by two opposing characteristics.
From page 14...
... This effort has led to interesting interactions between computer science and the mathematical sciences. Valiant's model of PAC learning is connected to Vapnik's ideas about classification and pattern recognition; the computational algorithms minimize a cost function plus a term to restrict the search space, just as in regularization methods for ill-posed problems.
From page 15...
... These connections highlight the importance of basic linear algebra constructions such as the graph Laplacian in understanding Intemet structure and search. Lovasz noted that "discretecontinuous" is not the right dichotomy, since all of these graph concepts have analogs in the continuous setting and the problems are closely related to finding good embeddings of finite metric spaces into each other with little distortion.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.