• Documents that differ in language (human and programming), vocabulary (including words, product numbers, zip codes, latitudes and longitudes, links, symbols, and images), formats (such as the Hypertext Markup Language (HTML), Portable Document Format (PDF), or Joint Photographic Experts Group (JPEG) format), character sets, and source (human or machine generated).

  • Non-textual information, such as audio and video files, and interactive games. The volume of online content (in terms of the number of bytes) in image, sound, and video formats is much greater than that of most library collections and is expanding rapidly.

  • Transaction services, such as sales of products or services, auctions, tax return preparation, matchmaking, and travel reservations.

  • Dynamic information, such as weather forecasts, stock market information, and news, which can be constantly changing to incorporate the latest developments.

  • Scientific data generated by instruments such as sensor networks and satellites are contributing to a “data deluge.”3 Many of these data are stored in repositories on the Internet and are available for research and educational purposes.

  • Custom information constructed from data in a database (such as product descriptions and pricing) in response to a specific query (e.g., price comparisons of a product listed for sale on multiple Web sites).

Consequently, aids or services that support Internet navigation face the daunting problem of finding and assigning descriptive terms to each of these types of resource so that it can be reliably located. Searchers face the complementary problem of selecting the aids or services that will best enable them to locate the information, entertainment, communication link, or service that they are seeking.

6.1.2 Two-sided Process

Second, Internet navigation is two-sided: it must serve the needs both of the searchers who want to reach resources and of the providers that want their resources to be found by potential users.

From the searcher’s perspective, navigating the Internet resembles to some extent the use of the information retrieval systems that were developed


See Tony Hey and Anne Trefethen, “The Data Deluge: An e-Science Perspective,” Grid Computing: Making the Global Infrastructure a Reality, Fran Berman, Geoffrey Fox, and Anthony J.G. Hey, editors, Wiley, 2003.

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement