Cover Image


View/Hide Left Panel
Click for next page ( 316

The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 315
315 being funded to perform computations on a university com- puter. In these cases, net cost savings and performance gains may very well be realized by switching to a dedi- cated minicomputer-array processor system, and the users should be encouraged to submit a collaborative proposal to obtain such a system. 3. The Panel applauds the support of astrophysical computation provided by Lawrence Livermore Laboratory, NASA/Ames, NASA/Langley, Los Alamos National Laboratory, the National Center for Atmospheric Research, and others. The Panel recommends that these cooperative efforts be continued and expanded, since they work to the advantage of both parties. Astronomers have access to larger and faster machines than would otherwise be available, while the laboratories receive broader recognition and stimula- tion from more diverse viewpoints than would otherwise be the case. Astronomical use of these large facilities may be expanded by providing funds to support problem definition and software development and testing with university computers or dedicated minicomputers (as above) and by encouraging the large centers to make more time available for production runs. The software that has been developed locally may then be run at the large centers either via remote links or travel of the investigator to the center The Panel considered a number of other options but rejected them as being too costly. A National Computing Center for Astronomy with a large vector machine would cost $5 million per year. The provision of funds to buy time at existing large computers (at cost) could only be justified if the problem could not be run in a reasonable amount of time on a minicomputer-array processor system. Some problems are large and would cost $0.5 million to $1.0 million per problem. V. IMAGE PROCESSING AND ANALYSI S . Despite the title of this section, we do not intend to ignore the processing and analysis required for other kinds of data. It is simply that data generated by image detectors are so voluminous that if we have the capability to process and analyze these data we will certainly have the capability to handle other forms of astronomical data. It could conceivably be argued that we are collecting too many data--that we should cut back until the data collected match our capabilities to analyze and interpret

OCR for page 315
316 them. Such an argument is specious. We do not generate data simply for the sake of generating data. Each astro- nomical observation is performed in order to obtain data that will lead to new knowledge concerning specific scientific questions. Even survey observations (which may appear to some as indiscriminate collection of data) are designed to provide new knowledge that can be obtained only through statistical studies of large bodies of data. We unabashedly admit that we generate new knowledge for the sake of new knowledge. Until a few years ago, most astronomical observing techniques produced data at very modest rates. A night's observation might typically produce tens of stellar-source brightness or radio-flux measurements. However, with the advent of solid-state array detectors and rapid scanning microphotometers, data are now produced at a prodigious rate, typically in the form of digital images. In an observing run of a few nights, optical astronomers now generate many tens of images, which in turn consist of over 106 picture elements (each pixel containing un to 16 bits of intensity information). A single photographic plate taken at the prime focus of a large reflector contains well over 109 bits. Optical astronomers are hardly alone in their ability to generate large numbers of digital images. The Very Large Array produces radio maps with spatial resolutions from 0.1 arcsec to a few arcseconds. Each radio picture may contain up to 107 pixels with 16 bits per pixel. For many programs, images will be made at up to 256 frequencies simultaneously in order to map the velocity field in selected objects. m e Space Telescope Wide-Field/Planetary Camera will produce typically fifty 1600 x 1600 images per day. The high spatial resolution afforded by this space observa- tory should, by mid-decade, provide astronomers with an unprecedented view of the deep reaches of the Universe. me application of these new imaging techniques, both on the ground and in space observatories, offers enormous . . . . . . . promise in our quest to understand the geometry and structure of the Universe and the evolution of galaxies and stars. While astronomers are all beginning to take advantage of the potential for discovery afforded by application of these new tools, the community has just begun to realize that these technological opportunities present a major and immediate challenge to develop innovative techniques for the analysis of digital images. .

OCR for page 315
317 Application of interactive image analysis at several large universities and national centers has already per- mitted astronomers to attack certain classes of astro- nomical problems. For example, large-scale surveys of galaxy surface brightness distributions have provided new insights into the effects of environment on the formation and evolution of galaxies. Statistical studies of stellar images near the limit of detectability have suggested that, in our Local Group of galaxies, there appear to be radical differences in the history of star formation and the types of stars formed in different systems. The fac- tors that appear to influence the rate at which new stars are formed in external galaxies and the factors that determine how many stars are formed as a function of mass are starting to be sorted out. Radio maps, with arcsecond and subarcsecond resolution, have revealed structure rang- ing from the remnants of protostellar clouds to the detailed intensity and polarization structures of galactic jets. In all cases, progress on these problems depends in large measure on our ability to study not one or a few examples of an object or class of objects but rather a statistically significant sample. For each problem, astronomers must invent new techniques for extracting the essential data from digital image arrays. The horizon for creative analysis of these arrays is bounded solely by the imagination of individual scientists and their access to adequate image analysis facilities. While we can easily predict an explosion in the number of digital images generated on the ground and in space- based experiments, there appear at present to be few facilities capable of permitting individual astronomers to interact creatively with these data. If we are to take advantage of the promise of new imaging technology, it is crucial that the astronomical community, in con- junction with federal funding agencies, plan to develop and disseminate adequate image data-analysis capability throughout the United States. The need for extensive image-analysis capability for the astronomical community arises from (1) the increase in the number of solid-state array detectors that will be deployed at the major ground-based facilities over the next several years; (2) the high rate of image data gener- ation from the present-day deployment of such detectors, which already threatens to overwhelm existing data- analysis systems--data that will be produced by the Space Telescope, particularly from the Wide-Field/Planetary

OCR for page 315
318 Camera and the Faint-Object Camera, will further burden existing analysis systems; (3) our ability to recover, with digital microphotometers, the information contained on photographic plates (which will remain unchallenged as recorders of moderate-photometric-accuracy large-area- coverage data); and (4) the generation of high spatial and spectral resolution maps from the Very Large Array (VLA). We estimate that in the latter half of the 1980's these sources will be generating image data at a rate of approximately 5 X 1012 to 10 X 1012 bits per year (including calibration frames and ancillary data). Num- oers 1n tnls range can ne arrived at in several ways: (1) by multiplying the estimated number of telescopes equipped with image detectors by a typical night's output by the number of nights per year or (2) by scaling the output of Space Telescope (which has been estimated at 1ol bits per year from a detailed mission simulation) to the number of ground-based telescopes that are expected to be equipped with image detectors (taking account of the fact that Wide Field/Planetary Camera images will be 1600 X 1600 pixels but ground-based systems built around the same chip will probably produce 800 X 800 pixel images). Also, these numbers are in agreement with estimates made for the United Kingdom Starlink system (see below) if the difference in ~vail~hil iEv of F-1~- scopes is taken into account. Astronomers must be able to extract from digital images the data relevant to the solutions of the problems they pose. Extracting these data will involve two types of operations--first, processing of the data, which involves removing from the data the signatures characteristic of an instrument. As an example, images generated from solid-state arrays must be corrected pixel by pixel for variation in sensitivity across the array. In the case of intensified arrays, geometrical corrections to remove the effects of image distortion introduced by the inten- sifier may also have to be applied. Radio maps must be explicitly corrected for sidelobe and other distortion effects, to provide appropriate intensity versus position maps at each observed frequency. m e second function that mat he n~rfc~rm-~1 in :.n~lv:i~ of the data. After an array containing accurate inten- sity-position data is produced, the maths to problem solution begin to diverge. Each type of problem may require tne Development or special techniques for ex- tracting the relevant data from the image array and providing such problem-specific measures as object

OCR for page 315
319 brightness, statistical "counts" of numbers of objects brighter than a certain value, and distribution of bright- ness with position. Operations carried out under the rubric ~analysis" often demand the development of imagina- tive new techniques. In some cases, analysis will reveal that residual instrumental effects are still present in the data, thus forcing a return to the last stares of processing. This is particularly true in the case of VLA Images, where often only analysis reveals the potential and need for improvement in dynamic range. In developing a strategy for deployment of computing equipment required to meet the challenge of image analv- sis, we conclude the following: (a) Facilities capable of processing and analyzing data should be placed at or near the place of data origin. Each major ground-based observatory should be provided computing capability ade- quate to render data in the form of intensitY-position information. Such commuters would have sufficient Dower to permit the staff and a relatively small number of out- side users to analyze their data in parallel with continu- ina Processing activities. (b) Analysis functions, on should be carried out as close to one's ~ , the other hand, home institution as possible. The best scientific results derive from the active participation in data analysis by research astronomers and their graduate students operating on their own schedules with time to think about the inter- mediate results as they are obtained. For some problems, many man-months or -years are required in order to develop appropriate reduction algorithms. These two distinct environments, with their differing constraints of user access and throughput, require differ ent scales of investment. In the university community, local facilities are required for data analysis and theo- retical computation. Since these facilities are local, comparatively small numbers of astronomers are served, and they can make use of a ready access to the facility over an extended period of time. By contrast, at the National Astronomy Centers, including the Space Telescope Science Institute (STScI), and those centers of data acquisition where user service is contemplated, a large number of users must be served. Each user must be able to complete his or her work in a comparatively short time, since most astronomers cannot easily arrange for long periods away from their home institutions. - Also, at the National Centers it is imperative that all data processing (calibration and removal of instrumental effects) be done so that all observers can, at minimum, be provided with

OCR for page 315
320 data as free as practical from instrumental aberrations. This data-reduction problem should not propagate beyond the data source. Thus, while it is usual for an analysis problem to take an astronomer many months to complete at his or her home institution, a few days is the typical limit that one will likely be able to stay at a National Center beyond the scheduled observing time for data col- lection. National Centers should therefore be able to provide processed data within a day or so of the end of the observations. Routines should be available at National Centers to allow visitors to perform standard manipulations of data with at least the same rate as it is gathered and provide useful output formats (maps, pictures, tables, plots). Home institution computer facilities are more useful for extensive interactive image analysis, which may include development of new techniques requiring experimentation and iteration. This basic difference in the required user response time of the facility serves to set the scale and nature of the equipment required. National Centers require integrated computer systems that allow efficient inter- leaving of batch and interactive processing in large volumes. Such a system may take the form of a major batch facility (the host) interconnected to multiple, smaller interactive satellite computer systems. In some cases, rather elaborate pipelined computer structures may be needed to handle extensive, special processing prob- lems. The multiplicity of interactive computer facilities is essential to assure the availability and response required by short-term visitors. These satellite pro- cessors could serve as models (with modest upgrading of peripherals and memory) for individual university systems. Larger university systems could more closely follow the structure of the host processor, using this host to serve both batch and multiple interactive stations. For the National Centers, at present, a combined host-interactive satellite computer scheme will cost (capital) $1 million for a fully integrated system capable of serving simul- taneously 4-6 interactive picture-processing functions and the concurrent batch stream. Single-processor university systems will range in cost from 8100,000 to 400,000, depending on the scope of the problems to be attacked. The costs are likely to remain relatively constant through the 1980's; as costs drop, demand may be expected to increase at least as much. We recommend that the funding agencies (NASA and NSF) provide funds for the implementation of decentralized

OCR for page 315
321 image processing and analysis systems equivalent to 20 "canonical" systems (see Appendix 5.A). A continuing level of funding is required so that when a "steady state" is reached, the funding provides for the replacement of systems every 6 years--the typical life of a computer system before it becomes obsolete. Appendix 5.A elabo- rates on the need for replacement and also discusses the maintenance required for these systems. The British Starlink System (see below) provides a similar ratio (actually somewhat higher) of computer capacity to data volume as will be supplied by the pro- posed systems with the data volumes estimated above. Without array processors, the combined capacity of these 20 systems would be sufficient to perform approxi- mately 100 computer operations per pixel if the data rates are as described above. (Here we use "operation" to mean an elementary computer operation such as an addition or multiplication of two numbers; in other usage, the term "operation" may denote a function such as addition of two images.) . . . . . . . This may seem like a lot of capacity until one realizes that Lo perform a single func- tion, many operations are required. Lions, such as reading an image from a disk file and for- matting it for display on a video monitor, can consume tens of operations per pixel. Complex functions, such as geometric corrections, correction for nonlinear transfer functions, or applications of spatial filters can consume hundreds of operations per pixel. The estimate that 20 systems are required is unlikely to be too large and may, in fact, be considerably too small. However, the capa- bility of adding array processors to these systems pro- vides a margin of protection against this possibility. Up to now the discussion in this section has concen- trated on the hardware facilities required to provide adequate image processing and analysis capabilities. Of course, a substantial effort must be devoted to software development. As described in Appendix 5.A, a typical system will require, at a minimum, the support of one full-time software person to provide for maintenance of the system software and the development and maintenance of basic applications software. Also as described in Appendix 5.A, there exists considerable flexibility in the manner in which this support is provided--part-time student or faculty support, dedicated support, shared support, or other. However, with the decentralization of facilities as proposed above, there is the potential that much of the Even simple tunc-

OCR for page 315
322 software development will be duplicated at each facility. This is much more likely for image processing and analy- sis applications than for theoretical applications. Soft- ware required for the former is based on a core of stan- dard functions such as contrast stretching, filtering, and object identification, while software used for theo- retical computations tends to be much more application- dependent. The theorist tends to put his or her knowl- edge of the physics of the problem into the software itself, while the image analyst uses his or her knowledge of the problem to determine the sequence of basic func- tions to be applied to the data. (Of course, image analysis often requires the development of unique software to supplement the core of standard functions.) A possible solution to the potential problem of dup- lication of software development efforts is to insist that all facilities be built around machines with the same architecture (i.e., machines from a single family from a single vendor) and to designate one of the facil- ities to be responsible for software development for all facilities. Such a solution would be similar to that adopted by the British for their Starlink System. The Starlink System, currently being implemented, is a distributed system dedicated to image processing and analysis. There are to be six nodes in the system, each built around a Digital Equipment Corporation VAX 11/780 computer. The nodes will be connected by dedicated 9600- baud lines, and one node will be responsible for software development and maintenance. (A baud is a measure of information-carrying capacity--in typical applications a baud corresponds to 0.73 bit per second, thus, a 9600 baud line can carry about 875 eight-bit characters per second.) The communications links will be used primarily for dis- tribution of software updates and documentation. While such a system has advantages and is well matched to the needs of the British astronomical community, we believe that the Starlink model is not entirely appropri- ate for image processing and analysis in the United States. The U.S. astronomical community is spread over a much wider geographical area than is the British com- munity, and, even with low speed lines, the communication links would be a significant expense. Furthermore, funds are probably not available to purchase the complete facil- ities at the same time as the British are doing. This means that the facilities would have to be installed grad- ually and updated gradually. If, in addition, the key to the successful operation of the software is the fact that

OCR for page 315
323 the same computer and peripheral architectures exist at each node, then the facilities would be locked into the architecture they started with until that family of computers was discontinued by the manufacturer. At this point a tremendous spike in funding would be required to convert both hardware and software to a new architecture. Finally, it appears that a distributed system such as Starlink offers less flexibility than a decentralized system for tailoring the capabilities of individual nodes to the problems to be addressed at the node. Our suggested approach to the software problem is not a sure-fire solution, but it does represent, we believe, the best solution available at present. The National Centers (particularly Kitt Peak National Observatory, the National Radio Astronomy Observatory, and STScI) should take an active role in software development by 1. Developing software that is as machine independent as possible; 2. Adequately documenting the software; 3. Distributing standard processing and analysis software and/or algorithms; and 4. Assisting users in the implementation and use of software developed at the National Centers. Of these four functions, only the second two should impose any additional costs on the Centers. The first two are required in any well-managed software effort. We estimate that the second two functions can be provided at the addi- tional expense of one man-year (plus overhead) per center per year. In addition, it is necessary that there be some com- munity-wide coordination in such areas as software devel- opment and data format standards. For this purpose, a permanent committee should be established, perhaps under the auspices of the AAS and perhaps with federal support. The committee would be responsible for continuing evalua- tion of astronomical computing needs, developing standards for data formats and software compatibility, and collect- ing and disseminating software and hardware news of rele- vance to the community. The committee should also play a coordinating role in the archiving and data-base activ- ities described in the next two sections. Finally, the committee should provide a focal point for liaison with Starlink and other non-U.S. image processing and analysis facilities.