Page 77

The High Performance Computing and Communications Initiative: Background

The United States Needs More Powerful Computers and Communications

The High Performance Computing and Communications Initiative (HPCCI) addresses demanding applications in many diverse segments of the nation's economy and society. In information technology, government has often had to solve larger problems earlier than other sections of society. Government and the rest of society, however, have mostly the same applications, and all find their current applications growing in size, complexity, and mission centrality. All sectors are alike in their demands for continual improvement in computer speed, memory size, communications bandwidth, and large-scale switching. As more power becomes increasingly available and economical, new high-value applications become feasible. In recent decades, for example, inexpensive computer power has enabled magnetic resonance imaging, hurricane prediction, and sophisticated materials design. Box A.1 lists additional selected examples of recent and potential applications of high-performance computing and communications technologies. (See also Appendix D for a list of applications and activities associated with the ''National Challenges" and Appendix E for an outline of supercomputing applications.)

BOX A.1 Examples of Important Applications of High-Performance Computing and Communications Technologies

· Continuous, on-line processing of millions of financial transactions
· Understanding of human joint mechanics
· Modeling of blood circulation in the human heart
· Prediction and modeling of severe storms
· Oil reservoir modeling
· Design of aerospace vehicles
· Linking of researchers and science classrooms
· Digital libraries
· Improved access to government information

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 77
Page 77 A The High Performance Computing and Communications Initiative: Background THE TECHNICAL-ECONOMIC IMPERATIVE FOR PARALLEL COMPUTING The United States Needs More Powerful Computers and Communications The High Performance Computing and Communications Initiative (HPCCI) addresses demanding applications in many diverse segments of the nation's economy and society. In information technology, government has often had to solve larger problems earlier than other sections of society. Government and the rest of society, however, have mostly the same applications, and all find their current applications growing in size, complexity, and mission centrality. All sectors are alike in their demands for continual improvement in computer speed, memory size, communications bandwidth, and large-scale switching. As more power becomes increasingly available and economical, new high-value applications become feasible. In recent decades, for example, inexpensive computer power has enabled magnetic resonance imaging, hurricane prediction, and sophisticated materials design. Box A.1 lists additional selected examples of recent and potential applications of high-performance computing and communications technologies. (See also Appendix D for a list of applications and activities associated with the ''National Challenges" and Appendix E for an outline of supercomputing applications.) BOX A.1 Examples of Important Applications of High-Performance Computing and Communications Technologies · Continuous, on-line processing of millions of financial transactions · Understanding of human joint mechanics · Modeling of blood circulation in the human heart · Prediction and modeling of severe storms · Oil reservoir modeling · Design of aerospace vehicles · Linking of researchers and science classrooms · Digital libraries · Improved access to government information

OCR for page 77
Page 78 Conventional Supercomputers Face Cost Barriers For four decades and over six computer generations, there has been a countable demand, much of it arising from defense needs, for a few score to a few hundred supercomputers, machines built to be as fast as the state of the art would allow. These machines have cost from $5 million to $25 million each (in current dollars). The small market size has always meant that a large part of the per-machine cost has been development cost, tens to hundreds of millions of dollars. Such products are peculiarly susceptible to cost-rise, market-drop spirals. As supercomputers have become faster, they have become ever more difficult and costly to design, build, and maintain. Conventional supercomputers use exotic electronic components, many of which have few other uses. Because of the limited supercomputer market, these components are manufactured in small quantities at correspondingly high cost. Increasingly, this cost is capital cost for the special manufacturing processes required, and development cost for pushing the state of the component and circuit art. Moreover, supercomputers' large central memories require high bandwidth and fast circuits. The speed and complexity of the processors and memories demand special wiring. Supercomputers require expensive cooling systems and consume large amounts of electrical power. Thoughtful prediction shows that supercomputers face nonlinear cost increases for designing and developing entirely new circuits, chip processes, capital equipment, specialized software, and the machines themselves. At the same time, the end of the Cold War has eliminated much of the historical market for speed at any cost. Many observers believe we are at, or within one machine generation of, the end of the specialized-technology supercomputer line. Small Computers Are Becoming Faster, Cheaper, and More Widely Used Meanwhile the opposite cost-volume spiral is occurring in microcomputers. Mass-production of integrated circuits yields single-chip microprocessors of surprising power, particularly in comparison to their cost. The economics of the industry mean that it is less expensive to build more transistors than to build faster transistors. The per-transistor price of each of the millions of transistors in mass-produced microprocessor chips is extremely low, even though their switching speeds are now quite respectable in comparison to those of the very fastest transistors, and a single chip will now hold a quite complex computer. While microprocessors do not have the memory bandwidth of supercomputers, the 300-megaflop performance of single-chip processors such as the MIPS 8000 is about one-third the 1-gigaflop performance of each processor in the Cray C-90, a very fast supercomputer. Microprocessor development projects costing hundreds of millions of dollars now produce computing chips with millions of transistors each, and these chips can be sold for a few hundred dollars apiece. Moreover, because of their greater numbers, software development for small machines proves much more profitable than for large machines. Thus an enormous body of software is available for microprocessor-based computers, whereas only limited software is available for supercomputers.

OCR for page 77
Page 79 Parallel Computers: High Performance for Radically Lower Cost Mass-production economics for hardware and software argue insistently for assembling many microcomputers with their cheap memories into high-performance computers, as an alternative to developing specialized high-performance technology. The idea dates from the 1960s, but the confluence of technical and economic forces for doing so has become much more powerful now than ever before. CHALLENGES OF PARALLEL COMPUTING Organizing a coherent simultaneous attack on a single problem by many minds has been a major management challenge for centuries. Organizing a coherent simultaneous attack on a single problem by a large number of processors is similarly difficult. This is the fundamental challenge of parallel computing. It has several aspects. Applications It is not evident that every application can be subdivided for a parallel attack. Many believe there are classes of applications that are inherently sequential and can never be parallelized. For example, certain phases in the compilation of a program are by nature sequential processes. Many applications are naturally parallel. Whenever one wants to solve a problem for a large number of independent input datasets, for example, these can be parceled out among processors very simply. Such problems can be termed intrinsically parallel. Most applications lie somewhere in between. There are parts that are readily parallelized, and there are parts that seem sequential. The challenge is how to accomplish as much parallelization as is inherently possible. A second challenge of great importance is how to do this automatically when one starts with a sequential formulation of the problem solution, perhaps an already existing program. Hardware Design How best to connect lots of microprocessors together with each other and with shared resources such as memory and input/output has become a subject of considerable technical exploration and debate. Early attempts to realize the potential performance of parallel processing revealed that too rigid a connection between machines stifles their ability to work by forcing them into lock-step. Too loose a connection makes communication between them cumbersome and slow. The section below, "Parallel Architectures," sketches some of the design approaches that have been pursued. Numerical Algorithms During the centuries of hand calculations, people worked one step at a time. Ever since computers were introduced, the programs run on them have been mainly sequential, taking one small step at a time and accomplishing their work rapidly because of the prodigious number of steps that they can take in a short time. The current numerical algorithms for attacking problems are

OCR for page 77
Page 80 mostly sequential. Even when the mathematics of solution have allowed high degrees of parallel attack, sequential methods have generally been used. In fact, most languages used to express programs, such as FORTRAN, COBOL, and C, enforce sequential organizations on operations that are not inherently sequential. In the 30 years since parallel computers were conceived, computational scientists have been researching parallel algorithms and rethinking numerical methods for parallel application. This work proceeded slowly, however, because there were few parallel machines from which to benefit if one did come up with a good parallel algorithm, and few on which to develop and test such an algorithm. People didn't work on parallel algorithms because they had no parallel machines to motivate the work; people didn't buy parallel machines because there were few parallel algorithms to make them pay off. The HPCCI and its predecessor initiatives broke this vicious cycle. By funding the development of machines for which little market was developed, and by providing them to computational scientists to use, the HPCCI has vastly multiplied the research efforts on parallel computation algorithms. Nevertheless, 30 years of work on parallel approaches has not yet caught up with four centuries of work on sequential calculation. Learning New Modes of Thought Programmers have always been trained to think sequentially. Thinking about numerous steps taken in parallel instead of sequentially may initially seem unnatural. It often requires partitioning a problem in space as well in time. Parallel programming requires new programming languages that can accept suitable statements of the programmer's intent as well as new patterns of thought not yet understood and formalized, much less routinely taught to programmers. A NEW PARADIGM By responding to the technological imperative for parallel computing, the HPCCI has in a major way helped add a new paradigm to computing's quiver. Parallel computing is an additional paradigm, not a replacement for sequential and vector computing. Large numbers of researchers have begun to understand the task of harnessing parallel machines and are debating the merits of different parallel architectures. Because the parallel paradigm is new, no one can yet say which particular approaches will prove most successful. It is clear however, that this healthy debate and the workings of the market will identify and develop the best solutions. Has the parallel computing paradigm really been established as the proper direction for high-performance computing? The Committee to Study High Performance Computing and Communications: Status of a Major Initiative unanimously believes that it has. It is obliged to report, however, that the issue is still being hotly debated in the technical literature. In the November 1994 special issue of the Institute of Electrical and Electronic Engineers Computer magazine, Borko Furht asserts in "Parallel Computing: Glory and Collapse" that "the market for massively parallel computers has collapsed, but [academic] researchers are still in love with parallel computing." Furht (1994) argues, ''We should stop developing parallel algorithms and languages. We should stop inventing interconnection networks for massively parallel computers. And we should stop teaching courses on advanced parallel programming." An editorial by Lewis (1994) in the same issue similarly discounts highly parallel computing. Part of the difference of opinion is semantics. Computers have had a few processors working concurrently, at least internal input/output processors, since the late 1950s. Modern vector supercomputers have typically had four or eight processors. The new paradigm concerns highly parallel computing, by which some mean hundreds of processors. The committee believes that the

OCR for page 77
Page 81 number of processors in "parallel" computers in the field will grow normally from a few processors, to a few dozen processors, and thence to hundreds. For the next several years, many computer systems will use moderate parallelism. The strongest evidence, and that which convinces the committee that the parallel computing paradigm is a long-term trend and not just a bubble, comes from the surging sales of third-generation parallel processors such as the SGI Challenge, the SGI Onyx, and the IBM SP-2. SGI's director of marketing reports, for example, that SGI has sold more than 3,500 Challenge and Onyx machines since September 1993; IBM's Wladawsky-Berger reports that 230 orders for the SP2 have been booked since it was announced in summer 1994 (Parker-Smith, 1994a). In fairness to Furht and Lewis, these surging sales figures have appeared only in the last few months, whereas journal lead times are long. COMPUTER ARCHITECTURES Overview FIGURE A.1 Sequential computer organization. Sequential, Vector The simple sequential computer fetches and executes instructions from memory one after the other. Each instruction performs a single operation, such as adding, multiplying, or storing one piece of data. Decisions are made by conditionally changing the sequence of instructions depending on the result of some comparison or other operation. Every computer includes memory to store data and results, an instruction unit that fetches and interprets the instructions, and an arithmetic unit that performs the operations (see Figure A.1). Vector computers perform the same instruction on each element of a vector or a pair of vectors. A vector is a set of elements of the same type, such as the numbers in a column of a table. So a single "add" operation in a vector computer can cause, for example, one column of 200 numbers to be added, element by element, to another column of 200 numbers. Vector computers can be faster than sequential computers because they do not have to fetch as many instructions to process a given set of data items. Moreover, because the same operation will be done on each element, the flow of vector elements through the arithmetic unit can be pipelined and the operations overlapped on multiple arithmetic units to get higher performance. Parallel Parallel computers also have multiple arithmetic units, intended to operate at the same time, or in parallel, rather than in pipelined fashion. Three basic configurations are distinguished according to how many instruction units there are and according to how units communicate with each other. Within each configuration, designs also differ in the patterns, called topologies, for

OCR for page 77
Page 82 connecting the units to each other to share computational results. Thus applications programmed for a particular computer are not readily portable, even to other computers with the same basic configuration but different topologies. Single Instruction Multiple Data. In a single instruction multiple data (SIMD) computer, one instruction unit governs the actions of many arithmetic units, each with its own memory. All the arithmetic units march in lock step, obeying the one instruction unit but fetching different operands from their own memories (Figure A.2). Because of the lock step, if any node has to do extra work because of the particular or exceptional values of its data, all the nodes must wait until uniform operations can proceed. Full utilization of all the processor power depends on highly uniform applications. FIGURE A.2 A data-parallel computer organization. Multiple Instruction Multiple Data Message Passing. In a multiple instruction multiple data (MIMD) message-passing computer, each arithmetic unit has its own memory and its own instruction unit. So each node of such a machine is a complete sequential computer, and each can operate independently. The multiple nodes are connected by communication channels, which may be ordinary computer networks or which may be faster and more efficient paths if all the nodes are in one cabinet. The several nodes coordinate their work on a common problem by passing messages back and forth to each other (Figure A.3). This message-passing takes time and instructions. Various topologies are used to accelerate message routing, which can get complex and take many cycles. There are two quite different forms of MIMD computers, distinguished by the network interconnecting the processors. One, commonly called a massively parallel processor (MPP), has a collection of processor nodes co-located inside a common cabinet with very high performance specialized interconnections. The other, often called a workstation farm, consists of a group of workstations connected by conventional local area or even wide area networks. Such collections have demonstrated considerable success on large computing problems that require only modest internode traffic. Between the two extremes of the MPP and the workstation farm lie a number of parallel architectures now being explored. No one can say how this exploration will come out. FIGURE A.3 A message-passing parallel computer organization.

OCR for page 77
Page 83 Multiple Instruction Multiple Data Shared-Memory. In a multiple instruction multiple data (MIMD) shared-memory computer, the separate nodes share a large common memory. The several nodes coordinate work on a common problem by changing variables in the shared memory, which is a simple and fast operation (Figure A.4). Each node also has its own memory, generally organized as a cache that keeps local copies of things recently accessed from the shared memory. The use of individual cache memories at each processor radically reduces traffic to the shared memory. FIGURE A.4 A shared-memory parallel computer organization. The shared memory may be a single physical memory unit, as in the SUN SPARCCenter. This kind of computer organization cannot be scaled indefinitely upward—the shared memory and its bus become a bottleneck at some point. A more scalable distributed memory design has a single shared memory address space, but the physical memory is distributed among the nodes. This arrangement exploits microprocessors' low memory cost and gives better performance for local access. Many experts believe this will become the dominant organization for machines with more than a few processors. Some distributed-memory machines, such as the Convex Exemplar, enforce cache coherence, so that all processors see the same memory values. Others, such as the Cray T3D, do not enforce coherence but use very fast special circuits to get very low shared-memory latency. Most machines with a shared physical memory maintain cache coherence. Generations of Parallel Computers First Commercial Generation: SIMD Parallel computers with small numbers of processors have been standard commercial fare for 30 years. In some cases, the multiple processors were in duplex, triplex, or quadruplex configurations for high availability; in most advanced computers there have been processors dedicated to input-output. Most vector computers have also been modestly parallel for more than a decade. One-of-a-kind highly parallel computers have been built now and then since the 1960s, with limited success. The Advanced Research Projects Agency (ARPA) recognized the technical-economic imperative to develop highly parallel computers for both military and civilian applications and acted boldly to create its high-performance computing program. This stimulus combined with a ferment of new ideas and with entrepreneurial enthusiasm to encourage several manufacturers to market highly parallel machines, among them Intel, Ncube, Thinking Machines Corporation (TMC), and MasPar. Most of these first-generation machines were SIMD computers, exemplified by the CM1 (Connection Machine 1) developed by Thinking Machines. Because SIMD execution lacks the information content of multiple instruction flows, applications have to be more uniform to run efficiently on SIMD computers than on other types of parallel computers. Compounding this inherent difficulty, the first-generation machines had only primitive software tools. No application software was available off the shelf, and existing codes could not be automatically ported, so that each application had to be rebuilt from start. Moreover,

OCR for page 77
Page 84 few of the first-generation machines used off-the-shelf microprocessors with their economic and software advantages. The first generation of highly parallel computers had some successes but proved to be of too limited applicability to succeed in the general market. Some naturally parallel applications were reprogrammed for these machines, realizing gains in execution speed nearly in proportion to the number of processors applied to the problem, up to tens of processors. The set of applications for which this was true was quite limited, however, and most experts agree that the SIMD configuration has its units too tightly coupled to be used effectively in a wide variety of applications. Nonetheless, the creation of this generation of machines, and their provision of a platform for pioneering and experimental applications, clearly started a great deal of new thinking in academia about how to use such machines. Second Generation: Message-Passing MIMD Striving for the wider applicability that would be enabled by a more flexible programming style, parallel computer researchers and vendors developed MIMD configurations made up of complete microprocessors (sometimes augmented by SIMD clusters). By and large, these machines used message-passing for interprocessor communication. The Thinking Machines CM5 is a good example of this second generation. Other examples use off-the-shelf microprocessors as nodes. Although improving somewhat in ease of use, such machines are still hard to program, and users still need to change radically how they think and the type of algorithms they use. Moreover, because these machines were different both from conventional computers and from first-generation highly parallel computers, the compilers and operating systems again had to be redone "from scratch" and were primitive when the machines were delivered. The second-generation machines have proven to be much more widely applicable, but primitive operating systems, the continuing lack of off-the-shelf applications, and the difficulties of programming with elementary tools prevented widespread adoption by computer-using industries. As the market registered its displeasure with these inadequacies, several of the vendors of first- and second-generation parallel computers, including TMC and Kendall Square Research, went into Chapter 11 protection or retired from the parallel computer-building field. A beneficial side effect of these collapses has been the scattering of parallel-processing talent to other vendors and users. As parallel computers gained acceptance, existing vector computer vendors claimed their sales were being harmed by the government promotion and subsidization of a technology that they saw as not yet ready to perform. Cray Research and Convex, among others, saw their sales fall, partly due to performance/cost breakthroughs in smaller computers, partly due to the defense scale-back, and partly due to some customers switching from vector to parallel computers. The complaints of the vector computer vendors triggered studies of the HPCCI by the General Accounting Office and the Congressional Budget Office (see "Concerns Raised in Recent Studies"). Cray Research and Convex have since become important vendors of parallel computers. Third Generation: Memory-Sharing MIMD In the third generation, major existing computer manufacturers independently decided that the shared-memory organization, although limited in ultimate scalability, offered the most viable way to meet present market needs. Among others, SGI, Cray Research, and Convex have made such systems using off-the-shelf microprocessors from MIPS, IBM, DEC, and Hewlett-Packard, respectively. As noted above, market acceptance has been encouraging—industrial computer users have been buying the machines and putting them to work. Many users start by using standard

OCR for page 77
Page 85 software and running the systems as uniprocessors on bread-and-butter jobs, and then expand the utilization of the multiple processors gradually. As parallel algorithms, compilers, languages, and tools continue to develop, these memory-shared machines are well positioned to capitalize on them. Programming The development of parallel computing represents a fundamental change not only in the machines themselves, but also in the way they are programmed and used. To use fully the power of a parallel machine, a program must give the machine many independent operations to do simultaneously, and it must organize the communication among the processor nodes. Developing techniques for writing such programs is difficult and is now regarded by the committee as the central challenge of parallel computing. Computer and computational scientists are now developing new theoretical concepts and underpinnings, new programming languages, new algorithms, and new insights into the application of parallel computing. While much has been done, much remains to be done: even after knowledge about parallel programming is better developed, many existing programs will need to be rewritten for the new systems. Algorithms There is a commonly held belief that our ability to solve ever larger and more complex problems is due primarily to hardware improvements. However, A.G. Fraser of AT&T Bell Laboratories has observed that for many important problems the contributions to speed-ups made by algorithmic improvements exceed even the dramatic improvements due to hardware. As a long-term example, Fraser cited the solution of Poisson's equation in three dimensions on a 50 by 50 by 50 grid. This problem, which would have taken years to solve in 1950, will soon be solved in a millisecond. Fraser has pointed out that this speed-up is owing to improvements in both hardware and algorithms, with algorithms dominating.1 During the mid-1980s, several scientists independently developed tree codes or hierarchial N-body codes to solve the equations of the gravitational forces for large multibody systems. For I million bodies, tree codes are typically 1,000 times faster than classic direct-sum algorithms. More recently, some of these tree-code algorithms have been modified to run on highly parallel computers. For example, Salmon and Warren have achieved a speed-up of 445 times when running their codes on a computer with 512 processors as compared with running them on a single processor (Kaufman and Smarr, 1993, pp. 73-74). Over the half-century that modern computers have been available, vast improvements in problem solving have been achieved because of new algorithms and new computational models; a short list from among the numerous examples includes: · Finite-element methods, · Fast Fourier transforms, · Monte Carlo simulations, · Multigrid methods, · Methods for sparse problems, · Randomized algorithms, · Deterministic sampling strategies, and · Average case analysis.

OCR for page 77
Page 86 The exponential increase in the sizes of economical main memories has also enabled a host of new table-driven algorithmic techniques that were unthinkable a decade ago. Discovering and developing new algorithms for solving both generic and specific problems from science, engineering, and the financial services industry, designed and implemented on parallel architectures, will continue to be an important area for national investment. A SKETCH OF THE HPCCI'S HISTORY Development and Participants To quote from the 1993 Blue Book: "The goal of the federal High Performance Computing and Communications Initiative (HPCCI) is to accelerate the development of future generations of high-performance computers and networks and the use of these resources in the federal government and throughout the American economy" (FCCSET, 1992). This goal has grown, like the HPCCI itself, from many roots and has continued to evolve as the initiative has matured. Box A.2 illustrates the evolution of the HPCCI's goals as presented by the Blue Book annual reports. BOX A.2 HPCCI Goals As Stated in the Blue Books FY 1992 Accelerate significantly the commercial availability and utilization of the next generation of high-performance computers and networks: · Extend U.S. technological leadership in high-performance computing and communications. · Widely disseminate and apply technologies to speed innovation and to serve the national economy, national security, education, and the global environment. · Spur productivity and competitiveness. FY 1993 Unchanged. FY 1994 Goals remained the same with addition of the Information Infrastructure and Technology Applications and program element and mention of the National Information Infrastructure. FY 1995 Meta-goal ("Accelerate significantly . . . ") not mentioned. Goals consolidated as: · Extend U.S. technological leadership in high-performance computing and communications; and · Widely disseminate and apply technologies to speed innovation and to improve national economic competitiveness, national security, education, health care (medicine), and the global environment. Beginning in the early 1980s, several federal agencies advanced independent programs in high-performance computing and networking.2The National Science Foundation (NSF) built on recommendations from the National Science Board Lax report in 1982,3as well as a set of internal reports4that recommended dramatic action to end the 15-year supercomputer famine in U.S.

OCR for page 77
Page 87 universities. NSF asked Congress in 1984 for funds to set up, by a national competition, a number of supercomputer centers to provide academic researchers access to state-of-the-art supercomputers, training, and consulting services. Very quickly this led to the creation of an NSF network backbone to connect the centers. This in turn provided a high-speed backbone for the Internet. Several organizations, including the Office of Management and Budget and the former Federal Coordinating Council on Science, Engineering, and Technology (FCCSET) of the Office of Science and Technology Policy (OSTP), built on these activities and similar efforts in the Department of Energy (DOE),5 the National Aeronautics and Space Administration (NASA), and the Department of Defense (DOD) to develop the concept of a National Research and Education Network (NREN) program (CSTB, 1988). These explorations were linked to other concurrent efforts to support advanced scientific computing among researchers and to promote related computer and computational science talent development. The result was the High-Performance Computing Program. The program included an emphasis on communications technology development and use from the outset. High-performance Computing Program structure and strategy were discussed intensively within several federal agencies in 1987-1988, resulting in initial formalization and publication of a program plan in 1989 (OSTP, 1989). OSTP provided a vehicle for interagency coordination of high-performance computing and communications activities, acting through FCCSET and specific subgroups, including the Committee on Physical, Mathematical, and Engineering Sciences; its subordinate Panel on Supercomputers; its High Performance Committee (later subcommittee); its Research Committee (later subcommittee); and its High Performance Computing, Communications, and Information Technology (HPCCIT) Subcommittee. The initial HPCCI effort was concentrated in four agencies: DOD's Advanced Research Projects Agency, DOE, NASA, and NSF. These agencies remain the dominant supporters of computing and computational science research. Although not then a formal member, the National Security Agency (NSA) has also always been an influential player in high-performance computing, due to its cryptography mission needs. High-performance computing activities received added impetus and more formal status when Congress passed the High-Performance Computing Act of 1991 (PL 102-194) authorizing a 5-year program in high-performance computing and communications. This legislation affirmed the interagency character of the HPCCI, assigning broad research and development (R&D) emphases to the 10 federal agencies that were then participating in the program without precluding the future participation of other agencies. The group of involved agencies expanded to include the Environmental Protection Agency, National Library of Medicine (part of the National Institutes of Health), National Institute of Standards and Technology (part of the Department of Commerce (DOC), and National Oceanographic and Atmospheric Administration (part of DOC) as described in the 1992 and 1993 Blue Books. Additional agencies involved subsequently include the Education Department, NSA, Veterans Administration (now the Department of Veteran Affairs), and Agency for Health Care Policy and Research (part of the Department of Health and Human Services). These and other agencies have participated in HPCCIT meetings and selected projects either as direct members or as observers. Since its legislative inception in 1991, the HPCCI has attained considerable visibility both within the computer research community and as an important element of the federal government's technology programs. When originally formulated, the HPCCI was aimed at meeting several "Grand Challenges" such as modeling and forecasting severe weather events. It was subsequently broadened to address "National Challenges" relating to several important sectors of the economy, such as manufacturing and health care, and then the improvement of the nation's information infrastructure. The evolution of emphasis on the Grand and National Challenges and also the nation's information infrastructure is outlined in Box A.3.

OCR for page 77
Page 88 BOX A.3 From Grand Challenges to the National Information Infrastructure and National Challenges: Evolution of Emphasis as Documented in the Blue Books FY 1992 · Grand Challenges featured in title and discussed in text Forecasting severe weather events Cancer gene research Predicting new superconductors Simulating and visualizing air pollution Aerospace vehicle design Energy conservation and turbulent combustion Microelectronics design and packaging Earth biosphere research · National Challenges not discussed FY 1993 · Grand Challenges featured in title and discussed in text Magnetic recording technology Rational drug design High-speed civil transports (aircraft) Catalysis Fuel combustion Ocean modelin Ozone depletion Digital anatomy Air pollution Design of protein structures Venus imaging Technology links to education · National Challenges not discussed FY 1994 · National Information Infrastructure (NII) featured in title Medical emergency and weather emergency discussed as examples of potential use of NII · Potential National Challenge areas listed in Information Infrastructure Technology and Applications discussion Civil infrastructure Digital libraries Education and lifelong learning Energy management Environment Health care Manufacturing processes and products National security Public access to government information

OCR for page 77
Page 89 BOX A.3-continued · Grand Challenges discussed as case studies in text Climate modeling Sharing remote instruments Design and simulation of aerospace vehicles High-performance life science: molecules to magnetic resonance imaging Nonrenewable energy resource recovery Groundwater remediation Improving environmental decision making Galaxy formation Chaos research and applications Virtual reality technology High-performance computing and communications and education Guide to available mathematics software Process simulation and modeling Semiconductor manufacturing for the 21st century Field programmable gate arrays High-performance Fortran and its environment FY 1995 · National Information Infrastructure featured in title and discussed in text Information infrastructure services Systems development and support environments Intelligent interfaces · National Challenge areas discussed in text Digital libraries Crisis and emergency management Education and lifelong learning Electronic commerce Energy management Environmental monitoring and waste minimization Health care Manufacturing processes and products Public access to government information · Major section devoted to "High-Performance Living" with future scenario based on the National Challenges and the National Information Infrastructure · Grand Challenges discussed in text. More than 30 Grand Challenges illustrated by examples within the following larger areas: Aircraft design Computer science Energy Environmental monitoring and prediction Molecular biology and biomedical imaging Product design and process optimization Space science

OCR for page 77
Page 90 Concerns Raised in Recent Studies A 1993 General Accounting Office (GAO) study of ARPA activities related to the HPCCI and a 1993 Congressional Budget Office (CBO) study of HPCCI efforts in massively parallel computing have been regarded by some as being critical of the entire HPCCI. The committee, which received detailed briefings from the studies' authors, offers the following observations. GAO Report The GAO report6did not attempt to evaluate the entire HPCCI but focused instead on research funding, computer prototype acquisition activities, and the balance between hardware and software investments by ARPA. It recommended that ARPA (1) broaden its computer placement program by including a wider range of computer types, (2) establish and maintain a public database covering the agency HPCCI program and the performance characteristics of the machines it funds, and (3) emphasize and provide increased support for high-performance software. The report's authors stated to the committee that although recommending improvements, they had found that ARPA had administered its program with propriety. The committee notes that progress has been made on each of GAO's recommendations, and it has urged that further progress be supported. Committee recommendation 4 calls for further reduction in funding of computer development by vendors and for experimental placement of new machines. These actions should result in a wider variety of machine types as agencies select different machines to meet their mission needs. The National Coordination Office (NCO) has made more program information available and the committee recommends that functions in this area receive added attention by an augmented NCO (recommendation 11). Likewise the committee has called in recommendation 3 for added emphasis on the development of software and algorithms for high-performance computing. CBO Report The primary theme of the CBO report (1993) was that because it was aimed primarily at massively parallel machines, which currently occupy only a small part of the computer industry, the High-Performance Computing Systems component of the HPCCI would have little impact on the computer industry. (The high-performance communications and networking segment of the program is not addressed in the CBO report.) The CBO report assumed that the HPCCI was to support the U.S. computer industry, in particular the parallel-computing portion. Although this might be an unstated objective, the explicitly stated goals relate instead to developing new high-performance computer architectures, technologies, and software. The HPCCI appears to be fulfilling the stated goals. The CBO report did not attempt to analyze the impact of the development of high-performance computing and communications technology on the larger computer industry over a longer period of time. The primary focus of the high-performance computing portion of the program is the creation of scalable parallel machines and software. It is widely believed in both the research and industrial communities that parallelism is a key technology for providing long-term growth in computing performance, as discussed in the early sections of this appendix. The HPCCI has demonstrated a number of successes in academia, in industry, and in government laboratories that provide a significant increase in our ability to build and use parallel machines. Just as reduced instruction set computers (RISC) technology, developed partly with ARPA funding, eventually became a dominant force in computing (some 10 years after the program started), the initiative's

OCR for page 77
Page 91 ideas are starting to take root in a larger context across the computer industry. Since using parallel processors requires more extensive software changes than did embracing RISC concepts, we should expect that multiprocessor technology will take longer to be adopted. NOTES 1. A.G. Fraser, AT&T Bell Laboratories, personal communication. 2. Department of Energy (DOE) officials point out that their efforts date from the mid-1970s. For example, in 1974 DOE established a nationwide program providing energy researchers with access to supercomputers and involving a high-performance communications network linking national laboratories, universities, and industrial sites, the precursor of today's Energy Sciences Network (ESNet). See Nelson (1994). 3. Report of the Panel on Large Scale Computing in Science and Engineering, Peter Lax, Chairman, sponsored by the U.S. Department of Defense and the National Science Foundation, in cooperation with the Department of Energy and the National Aeronautics and Space Administration, Washington, D.C., December 26, 1982. 4. A National Computing Environment for Academic Research. Marcel Bardon and Kent Curtis, NSF Working Group on Computers for Research. National Science Foundation, July 1983. 5. The DOE laboratories had been involved in supercomputing since World War II and were not particularly affected by the setting up of the NSF centers or FCCSET until the late 1980s or early 1990s as the HPCCI emerged. 6. See GAO (1993). Another GAO report (1994) was not released in time for the committee to receive a detailed briefing on which to base further group deliberations. However, observations from that report are drawn in the body of this report.