the petabyte (1 million gigabytes) range, with a current growth rate of roughly 0.2 petabytes per year. Challenges for data archiving will increase dramatically in the future. The committee’s top-ranked ground project, the Large Synoptic Survey Telescope (LSST), expects its archive to grow by a petabyte per month. A complete SKA operated in the manner of the VLA or ALMA would operate in the thousand-petaflop, or exaflop (1,000 petaflops), scale compared with the petaflop of sustained power consumed by current astronomical computing. Proper maintenance and accessibility of these archives are essential to optimizing scientific return, especially for LSST studies of transient and time-variable phenomena for which rapid availability of validated data will be critical.

As discussed in Chapter 3, the AAAC can play a key role in providing tactical advice to DOE, NASA, and NSF on the support of data archiving and dissemination, and on data analysis software funding across the three agencies relative to the agencies’ programmatic needs as identified by Astro2010. In particular, the optimal infrastructure for the curation of archival space- and ground-based data from federally supported missions and facilities will need periodic attention.

Data Archives

Data archives are central to astronomy today, and their importance continues to grow. The science impact of these archives is large and increasing rapidly. Papers based on archival data from the Hubble Space Telescope now outnumber those based on new observations in any year and include some of the highest-impact science from the HST, as shown in Figures 5.6 and 5.7. Data from the 2 Micron All Sky Survey (2MASS) and the Sloan Digital Sky Survey (SDSS), which were both designed as archival projects, led to more than 1,400 and 2,650 refereed papers in the past decade, respectively, with the scientific output continuing to increase well after the completion of these surveys.

Publicly accessible data archives can multiply the scientific impact of a facility or mission—for a fraction of the capital and operating costs of those facilities or missions. The data explosion and the long-term need for the ability to cross-correlate enormous data sets require archival data preservation beyond the life of projects and the development of new analysis and data-mining tools. The establishment over the past decade of the National Virtual Observatory, a top recommendation of the 2001 decadal survey and part of an International Virtual Observatory initiative, has produced widely accepted standards for data formatting, curation, and the infrastructure of a common user interface. These standards have the potential to substantially enhance the collective value of archival data sets.

NASA has regarded data handling and archiving as an integral part of space missions. It has established a network of data centers to host data from NASA mis-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement