Apte predicted that breakthroughs would come though interdisciplinary approaches, drawing from expertise in statistics, machine learning, and data management. In addition, results from computational learning theory, human-computer interaction, understanding of high-dimensional systems, visualization, and optimization involving variational problems will be essential (Problem 26 and Problem 27).

Problem 26. Are there automated methods to select the best data transformations, dropping to an appropriate subspace? In other words, how can we identify important features and search for alternate models within the space in order to find a near-optimal one?

Problem 27. Can we extract from data robust models that are simultaneously comprehensible to end users?

Enhancing the Power of Automated Knowledge Discovery

Raúl Valdés-Pérez of Carnegie Mellon University focused on the problems of knowledge discovery—gathering and analyzing data in order to determine underlying rules and defining properties. He demonstrated some current technology and presented several challenges. The current capabilities of automated discovery are quite limited: we cannot easily extract information from multiple sources with multiple formats and reason about it ( Problem 28 and Problem 29). In addition, the presentation of results to users is quite primitive, given, for example, as a decision tree or a cluster diagram.

Problem 28. Enlarge the range of automated discovery: for instance, enable machines to form conjectures or make predictions.

Problem 29. Make data mining so automatic that even non-experts in mining can ask questions and get useful information back that is easy to interpret.

Web Social Structure and Search

Jon Kleinberg of Cornell University focused on the information content and social structure of the Web and on the problems of Web searching. He noted that searching is made difficult by two opposing characteristics. On the one hand, for specific queries, only a few sites (if any) might contain the answer to a user's query, and there are challenges in natural language processing to infer that the answer is contained within a given Web page. On the other hand, general queries bring back far too many responses, overwhelming the person who formulated the query. Thus effective filtering of responses is essential. A capability for giving priority to Web pages that are authorities, in the sense discussed above, would be useful.

Problem 30. Find good visualization models for the Web that present information in a usable way. Thus far, the most successful tool has been the graphical browser.

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement