have it report interesting data on a real-time basis. Temperature data are available not only from traditional sources but from cell phones, car navigation systems, portable GPS devices, and the like. These are just a few examples of expanding range of the disparate types of data of possible interest to the scientific community.

In Dr. Stonebraker’s view, this dramatic increase in data availability calls for the following four capabilities:

  • Locate data sets more effectively. Scientists must be able to discover data sets of interest much more easily than they can today.

  • Convert data sets easily to a usable format. It should be much easier for scientists to reformat data sets than is currently the norm.

  • Integrate multiple data sets. Since data on the same phenomenon often come from many sources, a scientist needs to readily discover the syntax and semantics of data sets and to convert them to be syntactically and semantically comparable.

  • Process larger data sets. As noted above the scale of scientific data is increasing rapidly.

The benefits of integrating large volumes of data, multiple data sets from different sources, and multiple types of data are enormous, and this integration will enable science to advance more rapidly and in areas heretofore outside the realm of possibility.

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement