Astronomy and the Computer Revolution
Computer technologies have been central to advances in astronomy and astrophysics for the last 40 years and will play an even more important role in analyzing more complex phenomena in the next decade. In the early 1950s, roughly half the cycles of John von Neumann' s pioneering MANIAC computer were devoted to the first stellar evolution codes. In the 1960s, advanced computers allowed the first detailed models of supernova explosions. In the 1970s, the Einstein Observatory x-ray telescope and the Very Large Array (VLA) of radio telescopes created images using computers as intermediaries between the telescope and the observer. In the 1980s, microcomputers came into wide use for the control of data acquisition at telescopes, and theoretical simulations were extended to a wide variety of complex astrophysical phenomena. In the 1990s, astronomers will apply powerful new computer technologies to obtain, process, and interpret the data from ground-and space-based observatories. The field of astronomy will, by virtue of its dependence on large quantities of data and its past experience and future goals, be the leader in important aspects of a national program in high-performance computing.
In this chapter, the committee discusses exciting developments in astronomy that can occur as a result of enhancements in computing strategies, techniques, or power. Since computing has a central role in astronomical research, the committee makes recommendations regarding archiving, workstations and hierarchical computing, networks, and community code development. The costs for
computer support are not given explicitly in this chapter because the costs occur in various programs that are discussed in Chapter 1, Chapter 3, and Chapter 4. For example, workstations are included in the grants program, supercomputing support is provided directly to the supercomputing centers, and computers to perform the first stage of data reduction are included in the costs of individual instrumental initiatives.
A HIERARCHY OF COMPUTING POWER
A hierarchical network of ever more powerful machines will provide great computing power to the individual researcher in the next decade. Beginning in the 1980s, personal computers and workstations gave many scientists control over their own computing and observing environments; supercomputers became generally available through the creation of the NSF national supercomputer centers; and international research networks allowed researchers to communicate electronically with their colleagues or with supercomputer centers, national laboratories, observatories, and data archives. These trends will accelerate and become more tightly integrated in the next decade.
The 1990s will bring major advances at all levels in this evolving hierarchy. New processor technologies will put affordable, powerful computing in every observatory and on every astronomer's desk. These machines, which are nearly as powerful as today's supercomputers for many tasks, will make possible the acquisition and processing of large datasets, as well as the forging of synergistic links between data analysis, theoretical computations, and visualization tools. On a slightly larger scale, departmental mini-supercomputers will offer performance enhancements over desktop machines and provide sharing of expensive peripherals. Optical recorders capable of storing more than 100 terabytes each will hold archives of important datasets from ground- and space-based telescopes. Electronic networks will facilitate scientific collaboration and provide access to the national archives of observational and laboratory data. Despite the advances in local computing, there will still be a place for large central computers: multimillion dollar supercomputers will be bigger and faster versions of whatever is sitting on one's desk. These machines, with their huge memories, extensive disk capacity, and extremely fast processing and input and output rates, will be crucial to a minority of computer users with the most demanding programs.
DATA ACQUISITION AND PROCESSING
Computers have become essential to the acquisition of astronomical data, offering enhancements in performance comparable in some cases to improvements due to new telescope and detector designs. Software engineering has become as important to the success of a new instrument as are mechanical,
electrical, or optical engineering. High-performance computing has become necessary to make full use of many observations. For example, powerful algorithms have greatly enhanced imaging telescopes that operate in the radio and in the optical domains (Figure 5.1; see also Plate 4.3). The VLA that astronomers use today is much more powerful and flexible than the one that was originally designed, without any major modification of the telescope itself, because of more powerful processing at the VLA and at national supercomputer centers.
The volume of data produced by some astronomical instruments is already large and is growing rapidly. Data rates of 10 gigabytes per day are common, and 100 gigabytes per day may soon be exceeded. Detector arrays will soon produce two-dimensional images with up to 2,048 × 2,048 pixels (elements), and some instruments will add spectral, temporal, or polarization channels to generate even larger datasets. The data flow from the VLA can exceed 72 gigabytes per day. The current VLA computers cannot handle this maximum data flow, so that subsets of the data must be selected for transmission and analysis. This under-utilization could become even more dramatic for the next generation of optical and infrared telescopes that will use arrays consisting of millions of individual detectors. Computers capable of processing 30 gigabytes per night will be as essential as improved detectors for the optical and infrared sky surveys planned for the 1990s. Without adequate computational and data storage capabilities, astronomers will not be able to push current or planned ground-based telescopes, which cost tens of millions of dollars each, to their limits.
DATA REDUCTION AND ANALYSIS
Intensive data processing is often required to convert observations to understanding. The type of data processing required varies widely depending on the telescope and the purpose of the observations. Large user facilities with stable instrumentation and operating conditions like the VLA lend themselves to processing in which the typical user has little involvement until the final analysis stages. Similarly, specialized survey instruments like the Infrared Astronomical Satellite (IRAS) need production software to process large volumes of raw data into scientifically useful catalogs and images. Software development for these instruments is often best handled by professional software engineers working in close conjunction with astronomers who understand the technical problems and scientific goals and who are familiar with details of the instruments.
The problem of making general-purpose data-reduction software is a difficult one. The National Optical Astronomy Observatories developed the Image Reduction and Analysis Facility (IRAF) package as a community program for reducing, calibrating, and analyzing images or two-dimensional spectra from optical and infrared telescopes. Similarly, the National Radio Astronomy Observatory created the Astronomical Image Processing System (AIPS) for
processing data from radio interferometers. Many astronomers have adopted IRAF or AIPS in preference to the daunting task of developing their own codes. But researchers who want to make novel or demanding uses of state-of-the-art instruments find general-purpose software stifling; they may need a special calibration algorithm for their instrument that is not in the package. These astronomers find it difficult to integrate innovative data-reduction techniques into a large, centrally maintained package.
No perfect solution to this problem is yet in hand. To make effective use of the enormous advances in computer hardware and astronomical instrumentation expected in the next decade, astronomers will need an accompanying development of scientific software. Standard packages with modern interfaces are needed for users of optical and infrared telescopes whose needs are relatively standard. Programs consisting of 100,000 lines of code often require tens of person-years to develop, followed by a comparable and continuing effort to modernize, maintain, and document. Yet such packages are often difficult to modify for researchers interested in extracting the maximum possible information from their data. New architectures are needed that will provide both an open framework, within which an innovative user can develop and subsequently share new techniques, and a powerful set of fundamental tools for the general user. The development of such programs represents a major challenge to astronomers and to computer professionals. It is vital for both NASA and NSF to augment their efforts in the development, maintenance, and augmentation of community software.
There are compelling scientific reasons for archiving selected astronomical data from ground- and space-based telescopes. First, astronomical processes occur on time scales that are long compared to the lifetimes of individual researchers. The history of astronomy includes many instances when archival material dating back decades, and sometimes centuries, has proven essential in solving modern problems. Astronomers have an obligation to preserve contemporary data in an intelligible form for future generations of astronomers. Second, it is widely recognized that the definitive interpretations of much space astrophysics data are not found in the first papers, but rather in more extensive archival studies. Third, the large two-dimensional detectors of the 1990s will produce images of the sky taken at different wavelengths and at different times. Archival researchers can reanalyze these rich datasets to answer new questions and to make serendipitous discoveries.
The volume of data that will be gathered in the 1990s amounts to many terabytes per year. NASA's astronomy missions alone will generate about 10 terabytes of data annually. NASA is setting up the Astrophysics Data System to ensure wide and rapid access to data obtained from space observations;
the system will use high-speed networks and a centrally maintained directory. NASA has a scientifically productive program for archival research, primarily to analyze data from IRAS, the Einstein Observatory, and the International Ultraviolet Explorer (IUE).
The decentralization of ground-based observatories and the mixture of private, state, and federal funding of research has made it difficult to establish archives of ground-based astronomical data. However, the recent development of high-speed networks and data format standards can simplify the archiving process. Data obtained electronically at ground-based observatories can now, in principle, be archived and made available to remote users over computer networks.
The U.S. observatories are lagging behind major foreign observatories, such as the Anglo-Australian Telescope, the European Southern Observatory, and the La Palma Observatory, in developing archiving programs. The obstacles in setting up archives in the United States are financial and cultural: few people are willing to expend scarce personal and fiscal resources on archiving their data, and most ground-based astronomical data obtained outside the national observatories are treated as the private property of the observer, with no imperative to turn the data over to the community. Overcoming these barriers will depend on developing suitable incentives for individual scientists to archive their data, on protecting the rights of the original observers, and on making the process as effortless as possible.
As a long-term goal, digital data from ground- and space-based telescopes should be archived in a scientifically useful form; the archives would include all the raw data, calibration data, and information necessary to remove instrumental signatures from the data. These data would be available to the community after an appropriate proprietary period, typically one to two years after the completion of an observing program. On-line archives of major observational datasets, catalogs, and processed data from the astronomical literature would improve productivity and enhance the return from both ground- and space-based science programs. A national archiving program would allow researchers and students from smaller colleges and universities to work with data from the best instruments in a way that is now impossible.
The cost of the relevant technologies is being reduced to the point that it is becoming realistic to store selected and appropriate data from major ground-based telescopes. The recent development of computer networks and the implementation of NASA's Astrophysics Data System provide both the means and the model for widespread access to archives of ground-based observations. New archives should be compatible with, and a part of, the directory service of NASA's Astrophysics Data System. The sums that NASA has invested in archiving are small in comparison to the overall cost of its missions, but large by the standards of ground-based astronomy. The NSF can take advantage of NASA's investment in this area to help set up an appropriate archiving
program for ground-based astronomy. The committee notes that the NSF Subcommittee on Ground-based Optical and Infrared Astronomy, of the Advisory Committee for Astronomical Sciences, has highlighted digital archiving as a key recommendation.
COMPUTERS AND THEORETICAL ASTROPHYSICS
The light from many astronomical objects is produced by violent, complicated, and quickly evolving phenomena. Sophisticated simulations are often needed to understand, for example, shock waves around protostars and jets from the cores of active galaxies, but these simulations are frequently oversimplified due to a lack of sufficient computer resources. In the coming decade, more realistic simulations will play an essential role in understanding the underlying physics of these phenomena. The 1990s will be the decade in which a number of long-standing astrophysical problems will be solved, and computers will play an important role in these solutions.
Astrophysics depends on theory and modeling to a greater degree than most other physical sciences, because astronomers can observe only remotely. Moreover, the observed phenomena, the photons and fast particles that escape from astrophysical objects, are typically the result of complicated interactions among nonlinear processes. It is often necessary to construct elaborate models to achieve a satisfactory interpretation of the observations. More powerful computers and computer programs that incorporate realistic physics will greatly increase the ability of astrophysicists to extract physical insight from their observational data.
The committee estimates that 10 percent of practicing astronomers are engaged in theoretical simulation of astrophysical phenomena. Some of this computational astrophysics uses local workstations and mini-supercomputers. Roughly 10 percent of the time devoted to scientific computing at the NSF Supercomputer Centers is used by astrophysicists who are using supercomputers to solve problems that push the system to the limits of today's software and hardware capabilities (Plate 5.1 and Plate 5.2).
The observational community has made concerted efforts at developing community software; commendable examples include NSF's development at its national observatories of AIPS, IRAF, and the Flexible Image Transport System (FITS) image format. Comparable efforts should be made in the area of theoretical astrophysics. A wide range of theoretical techniques must be developed to meet different problems in astrophysics, but it is difficult to predict a priori which techniques will be most useful. Large programs for stellar evolution, radiative transport, magnetohydrodynamic simulations, and characterization of material properties such as equations of state and grain opacities can take tens of person-years to develop. Community modeling tools
would result in more efficient use of the individual astronomer's time and computer resources.
The committee recommends that archives of selected digital data from ground- and space-based observations should be established and made available over a high-speed national network.
The committee commends NASA's important steps in this direction for space-based data. Digital archiving is becoming increasingly important in sciences other than astronomy. Early funding for archiving could make astronomy a test bed for other disciplines. The NSF astronomy program should seek joint funding with other NSF divisions or from NSF management for a pilot archiving program but should proceed with such a program even in the absence of shared funding.
The committee recommends initiating promptly a technical study on the archiving of appropriate digital data from ground-based telescopes.
The study might identify the requirements for, and a preliminary design of, a national digital distributed astronomical archive. Topics that could be considered include what, initially, would constitute appropriate datasets for archiving; proposals to ensure the broadest acceptance of archiving plans; the identification of costs and funding sources for archiving; community comments on the archive designs; integrating ground-based archives with NASA's Astrophysical Data System and Planetary Data System; interagency and international collaboration; and establishment of a national archive center. It is important that the astronomical community determine the relative priorities between archiving particular datasets and supporting new observations.
The committee endorses NASA's goal of archiving data from all space-based telescopes and recommends that a complementary and compatible archive of selected data from ground-based telescopes be initiated by the mid-1990s.
The archiving initiative for ground- and space-based astronomy should have the following long-term goals: all new major observatories, both public and private, should incorporate an appropriate level of archiving in their design and in their standard operations; most public and private observatory surveys should be archived; and observers should obtain their raw data, whenever possible, in the archive format.
Workstations and Hierarchical Computing
The committee recommends the purchase of individual workstations and departmental mini-supercomputers to provide a distributed network for astronomical computing.
NASA has, on the one hand, pioneered a cost-effective policy of funding workstations for research. On the other hand, computers have in some cases been eliminated from the budgets of approved individual NSF (and NASA) proposals. The committee believes that requests for workstations should be encouraged within the existing individual grants program; the workstations could include “basic” or “high-performance” machines according to the legitimate needs of the investigator.
The committee urges that the NSF promote mid-range, local computing by funding computers to be used jointly by small groups of investigators. A modest annual budget could provide 50 or more machines for theoretical and observational groups, the largest of which could be funded on a cost-sharing basis with the individual institutions. This investment would bring immense improvements to the people who rely on computers for their work. The committee also urges NASA to allocate an appropriate fraction of the Mission Operations and Data Analysis funds associated with the space missions of the 1990s for purchases of mini-supercomputers by those university departments involved in space research. The committee bases this suggestion on the assumption that the NSF, NASA, and DOE supercomputer centers, which must be periodically upgraded to remain at the forefront of technology, will continue to provide this scarce resource to the astronomical community.
The committee recommends that the NSF's Astronomy Division encourage the development of high-speed national networks by funding, on a continuing basis, links between the national networks and widely used observatories, interested astronomy departments, and other research groups.
Community Code Development
The committee recommends that NASA increase its role in furthering the development of community software and software standards.