Problem 34. How can a search engine solicit feedback from the user about how well the retrieved documents satisfy the query, and how can the engine make subsequent use of such information? In addition, how could retrieval be tailored to an individual user, making intelligent guesses, for instance, about whether a request for “Michael Jordan ” refers to the computer scientist, the jazz musician, or the basketball player?

Problem 35. The issue of image retrieval is not well understood, unless there is accompanying text or an index. How can we retrieve images related to a given one by color profile (e.g., “mostly red ”), category (e,g., landscape), or subject (e.g., Isaac Newton)?

Summary of Data Mining and Search Session

Laszlo Lovasz of Microsoft Research noted that two interesting graph decompositions have emerged in recent decades. Robertson-Seymour graph minor theory for low-density graphs considers graphs G that do not contain Km as a minor. Such graphs are a small perturbation of two-dimensional graphs glued together. The Szemeredi lemma notes that graphs in which the number of edges is proportional to n2, where n is the number of vertices, can be decomposed into pieces that look random and have various lower dimensions. This suggests Problem 36.

Problem 36. Is there a randomized way of simulating the Internet as a graph?

These connections highlight the importance of basic linear algebra constructions such as the graph Laplacian in understanding Internet structure and search. Lovasz noted that “discrete-continuous” is not the right dichotomy, since all of these graph concepts have analogs in the continuous setting and the problems are closely related to finding good embeddings of finite metric spaces into each other with little distortion. It is essential that algorithms should work on huge problems, and thus they must run in time close to linear.

Lovasz quoted Ronald Coifman of Yale University as saying that in representing data, the most economical representation brings us halfway to the discovery of the underlying structure, and this must be a basic principle in data mining and search.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement