(see Figure 4-1). In other areas, such as clinical data, the potential gains from data reuse are clear, even though technical and other barriers stand in the way of realizing that potential.2

This technological capability has given rise to a powerful new vision of how some areas of research can be conducted.3 Known as e-science or cyber-infrastructure, this approach to research involves decentralized collaborations of researchers who draw on remote sensors and facilities, very large data collections, and powerful computing resources. These distributed resources are interconnected so that they can be shared in a flexible, secure, and coordinated manner. Individuals and groups can build and make available services and tools that extend across research fields.4 In an interconnected grid of facilities, instruments, and computers, the collective knowledge of scientific, engineering, and medical research resides not just in published books and articles but in the grid itself.

THE LOSS AND UNDERUTILIZATION OF RESEARCH DATA

E-science has been partially implemented in a number of research fields, but in others information technology is not being used to advantage.

Today, much research data that could be of value in the future are lost because of the lack of provisions for preserving them: Research notebooks are discarded; computer hard disks crash, destroying unique data; an investigator changes fields, retires, or dies and leaves behind data that are poorly organized, haphazardly stored, or otherwise unusable.

Digital data are often stored in formats that rapidly become technologically obsolete. Data stored on paper can survive for decades or centuries before the paper breaks down and becomes unreadable. In the digital age, however, the longevity of storage media sometimes seems to conform to an inverse Moore’s law, with accelerating technological advances hastening the demise of superseded media. Many scientists have data on floppy disks, hard drives, or zip drives that new generations of computers cannot read. One expert raises the possibility of a “digital dark age,” in which large amounts of digital data stored in a variety of proprietary file formats are permanently lost.5

Digital media also decay over time, a phenomenon known as “bit rot.” Many old magnetic tapes molder in boxes and are now essentially worthless.

2

James J. Cimino. 2007. “Collect once, use many: Enabling the reuse of clinical data through controlled terminologies.” Journal of AHIMA 78(2):24–29.

3

National Science Foundation Cyberinfrastructure Council. 2007. Cyberinfrastructure Vision for 21st Century Discovery. Arlington, VA: National Science Foundation. Available at http://www.nsf.gov/pubs/2007/nsf0728/index.jsp.

4

Ian Foster. 2005. “Service-oriented science.” Science 308:814–817.

5

Phil Ciciora. 2008. “‘Digital dark age’ may doom some data.” University of Illinois at Urbana-Champaign News Bureau. October 27. Available at news.illinois.edu/news/08/1027data.html.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement