individual projects or disciplines. However, more widely applicable tools and infrastructure might be the initial components of collaboratories designed to support doing science in collaboration at a distance.


Today's computing and communications infrastructure supports rudimentary collaboration at a distance but typically is inadequately developed, deployed, and supported to sustain the quality and scope of tools and applications envisioned for collaboratories that will facilitate scientific research.

Interconnecting Data Sources

Scientists participating in this project's three workshops discussed a number of components considered essential to an enhanced capability for sharing data:

  • Electronic libraries that would combine databases, literature, and software relevant to their research. Scientists considered electronic libraries a top priority because of the potential for rapid access to literature and the library's capacity to help locate information and data.

  • Easily accessible archives of data, particularly in the physical sciences, for some of which large, established archives already exist, such as those at the National Space Science Data Center and the National Center for Atmospheric Research. Archives have become increasingly important as experiments and data gathering have become more complex and expensive, and as data sets have become more massive.

  • A comprehensive system that would support retrieval of data from any or all sources, regardless of the data's origin or physical location. An example is the globe data catalog contemplated in Chapter 2, which would visually relate collected and archived data to the area of investigation, the type of data collected, and the time period that the data derive from. Today, data catalogs of archived holdings do exist, but the format for their presentation, the means of searching for the desired information, and the accessibility of the catalogs to researchers all vary widely from archive to archive, especially across disciplines.

Currently, research prototypes exist that permit users to issue single queries that search across multiple databases or archives. One example is the Worm Community System (Schatz, 1991-1992), described in Chapter 4, which represents a specific solution to a small research community's requirements for sharing data. Another useful tool is resource discovery software, which searches descriptions of databases or files to locate suitable sources to search and which can transfer files once they are discovered. Widely used examples of public-domain software include Archie,1 Gopher,2 and World-Wide Web,3 which search by name of the file, and Wide Area Information Server,4 which permits free-text searches by concept. These resource discovery tools were designed to be used on the Internet—an environment where it is expected that files will be shared. In a general-purpose file system—an environment in which the sharing of files may be an afterthought to their creation—resource discovery is much more difficult. Nevertheless, prototype resource discovery tools are now being developed for general-purpose file systems. One such prototype is Essence (Hardy and Schwartz, 1993). Despite current research efforts, more work needs to be done in this area.

If all or nearly all the information sources for a given subject domain can be accessed uniformly, the resulting ensemble of all information sources becomes a powerful research tool. The entire corpus of sources can be considered a single federated database consisting of multiple physical databases and

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement