Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Executive Summary By definition, supercomputers are those hardware and software computing systems that provide close to the best currently achievable sustained performance. The performance of supercomputers, which is normally achieved by introducing parallelism, is typically much better than that of the vast majority of installed systems. Supercomputers are used to solve complex problems, including the simulation and modeling of physical phenomena such as climate change, explosions, or the behavior of molecules; the analysis of data from sources such as national security intelligence, genome sequencing, or astronomical observations; or the intricate design of engineered products. Their use is important for national security and defense, as well as for research and development in science and engineering. As the uses of computing have increased and broadened, supercomputing has become less dominant than it once was. Many interesting applications require only modest amounts of computing, by today's standards. Because of the increase in computer processing power, many problems whose solution once required supercomputers can now be solved on relatively inexpensive desktop systems. That change has caused the computer industry, the research and development community, and some government agencies to reduce their attention to supercomputing. Yet problems remain whose computational demands for scaling and timeliness stress even our current supercomputers. Many of those problems are fundamental to the government's ability to address important national issues. One notable example is the Department of Energy's computational requirements for nuclear stockpile stewardship. The government has sponsored studies of a variety of supercomputing topics over the years. Some of those studies are summarized in the body of this report. Recently, questions have been raised about the best ways for the government to ensure that its supercomputing needs will continue to be satisfied in terms of both capability and cost-effectiveness. To answer those questions, the National Research Council's Computer Science and Telecommunications Board convened the Committee on the Future of Supercomputing to conduct a 2-year study to assess the state of supercomputing in the United States and to give recommendations for government policy to meet future needs. This study is sponsored jointly by the Department of Energy's Office of Science and by its Advanced Simulation and Computing (ASC) Program. This interim report, presented approximately 6 months after the start of the study, reflects the committee's current understanding of the state of U.S. supercomputing today, the needs of the future, and the factors that contribute to meeting those needs. After such a short time, the committee is not yet ready to comment in detail on the specifics of existing supercomputing programs or to present specific findings and well-supported recommendations. Although the committee has made considerable progress in understanding the current state of supercomputing and how it got there, it still has much more work to do before it develops recommendations.
2 THE FUTURE OF SUPERCOMPUTING: ANINTERIMREPORT Many technical, economic, and policy issues need to be addressed in this study. They include (1) the computational needs of present and future applications and approaches to satisfying them; (2) the balancing of commodity components and custom design in supercomputer architectures and the effects of software design improvements and industry directions on that balance; (3) the interplay of research, development, prototyping, and production in creating innovative advances; (4) the extent and nature of direct government involvement to ensure that its needs are met; and (5) the important requirement that the present not be neglected while the future is being determined. Although this report touches on each of those topics, the committee has not completed its consideration of them. The particular technical approaches of any program that develops or uses supercomputing represent a complex compromise between conflicting requirements and an assessment of risks and opportunities entailed in various approaches. An evaluation of these approaches requires a detailed understanding of (1) the relevant applications, (2) the algorithms used to solve those application problems, (3) the performance likely to be achieved by codes that implement these algorithms on different platforms, (4) the coding efforts required by various approaches, (5) the likely evolution of supercomputing technology over multiple years under various scenarios, and (6) the costs, probabilities, and risks associated with different approaches. In its final report, the committee will seek to characterize broadly the requirements of different application classes and to examine the architecture, software, algorithm, and cost challenges and trade-offs associated with these application classes keeping in mind the needs of the nuclear stockpile stewardship program, the broad science community, and the national security community. (Note that a separate, classified report by the JASONs is expected to identify the distinct requirements of the stockpile stewardship program and its relation to the ASC acquisition strategy.) The committee believes that it would be unwise to significantly redirect or reorient current supercomputing programs before careful scientific consideration has been given to the issues described above. Such changes might be hard to reverse, might reduce flexibility, and might increase costs in the future. In the period ahead, the committee will continue to learn and to analyze. A workshop focused on applications, to be held in the fall of 2003, will include a number of applications experts from outside the committee. Its purpose will be to identify both the computational requirements of important applications and the opportunities to adapt and evolve current solutions so as to benefit from advances in algorithms, architectures, and software. In addition, the committee will meet with experts who are developing solutions for applications of particular importance for national defense and security within the Department of Energy (DOE) and the National Security Agency (NSA). The committee will also meet with managers of supercomputing facilities, procurement experts, industrial supercomputer suppliers, experts on computing markets and economics, and others whose expertise will help to inform it. SUPERCOMPUTING TODAY According to the June 2003 list of the 500 most powerful computer systems in the world, the United States leads the world in the manufacture and use of supercomputers, followed by Japan. A number of other countries make or use supercomputers, but to a much lesser degree. Virtually all supercomputers are constructed by connecting large numbers of compute nodes, each having one or more processors and a common memory, by an interconnect network (a switch). Supercomputer architectures differ in the design of their nodes, their switches, and the node-switch interfaces. Higher node performance is achieved by using commercial scalar microprocessors with 64-bit data paths intended primarily for commercial servers or nodes designed specially for supercomputing, rather than the high-volume, 32-bit scalar microprocessors used in workstations and lower capability cluster systems. The custom nodes tend to use special mechanisms such as vectors or multithreading to reduce memory latency rather than relying solely on the more limited latency avoidance afforded by caches. High-bandwidth, scalable interconnects are typically custom-built and more expensive than the lithe TOP500 list is available at <www.top500.org>.
EXECUTIVE SUMMARY more widespread Ethernet interconnects. Custom switch use is often augmented by custom node/switch interfaces. The highest-ranked system in the TOP500 list is the Japanese-built Earth Simulator, released in the spring of 2002 and designed specifically to support geoscience applications. That system has custom multiprocessor vector-based nodes and a custom interconnect. The emergence of that system has been fueling recent concerns about continued U.S. leadership in supercomputing. The system software that is used on most contemporary supercomputers is some variant of Unix, either open source or proprietary. Programs are written in FORTRAN, C, and C++ and use a few standard application libraries. All of these supercomputers use implementations of the message passing interface (MPIN standard to sunnort messa~e-nassin~-stvle internode communication. Relativelv little ~ ~ O ~ O stanclarcllzatlon exists tor other aspects ot software environments and tOOlS. EVOLUTION IN SUPERCOMPUTING A major policy issue for supercomputing is the proper balance between investments that exploit and 3 evolve current supercomputmg architectures and software tine evolutionary aspect' and Investments in alternative approaches that may lead to a paradigm shift (the innovative aspect). Both aspects are important. At this stage in the study, the committee sees the following advantages for an evolutionary approach to investment and acquisition. First, much useful work is getting done using the existing systems, and their natural successors can be expected to continue that work. In addition, there are no obvious near- term architectural alternatives: The promising technology breakthroughs, such as processor-in-memory, streaming architectures, and the like, that might revolutionize supercomputing are far off in the future and less than certain. Of course, higher capability would enable better solutions to be obtained faster. But history suggests that even when revolutionary advances come along, they do not immediately supplant existing architectures. Different problems benefit from different architectures and no one design is universally best. The committee sees a need for evolutionary investments in all major approaches to supercomputing that are currently pursued: clusters built entirely of commodity components; scalable systems that reuse commodity microprocessors together with custom technology in the interconnect or the interconnect interface; and systems in which microprocessors, the interconnect, and their interface are all customized. Although some advantages also accrue from evolutions in software, the committee sees a need for investment that would accelerate that evolution. The advantages of commodity architectural components might be more easily realized if more applications were redesigned, better custom software was provided, and the cost of bringing software to maturity was better appreciated. The benefit of software investment is that it tends to have continuing value as architectures evolve. At the same time, both the maintenance and the evolution of legacy applications must be anticipated and supported. Finally, the committee observes that uncertainties in policy and inconsistencies over time can be both disruptive and expensive. Unexpected pauses in an acquisition plan or failure to maintain a diversity of suppliers and products can cause both the suppliers and the skilled workforce to divert their attention from supercomputing. It can be difficult and expensive to recover the supply of expertise. INNOVATION FOR SUPERCOMPUTING . ~ . . Innovation in supercomputing stems from application-motivated research that leads to experimentation and prototyping, to advanced development and testbeds, and to deployment and products. All the stages along that path need continuous, sustained investment in order that the needs of the future will be met. If basic research activities are not supported, revolutionary advances are less
4 THE FUTURE OF SUPERCOMPUTING: ANINTERIMREPORT likely; if experimentation and advanced development are not done, the promising approaches never ripen into products. In supercomputing, innovation is important in architecture, in software, in algorithms, and in application strategies and solution methods. The coupling of these aspects is equally important. Major architecture challenges stem from the uneven performance scaling of different components. In particular, as the gap between processor speeds, memory bandwidth, and memory and network latency increases, new ideas are needed to increase bandwidth and hide (tolerate) latency. Additionally, as new mechanisms are introduced to address those issues, there is a need for ways to supply a stable software interface that facilitates exploiting hardware performance improvements while hiding the changes in . mecnamsm. The need for software innovation is motivated by its role as an intermediary between the application (the problem being addressed) and the architectural platform. Innovation is needed in the ways that system software manages the use of hardware resources, such as network communication. New approaches are needed for ways in which the applications programmer can express parallelism at a level high enough to reflect the application solution and without platform-specific details. Novel tools are needed to help application-level software designers reason about their solutions at a more abstract and problem-specif~c level. Software technology is also needed to lessen future dependence on legacy codes. Enough must be invested in the creation of advanced too! and environment support for new language approaches so that users can more readily adopt new software technology. Importantly, advances in algorithms can sometimes improve performance much more than architectural and software advances do. More realistic simulations and modeling require not only increased supercomputer performance but also new methods to handle finer spatial resolution, larger time scales, and very large amounts of observational or experimental data. Additional applications challenges for which innovation is needed are the coupling of multiple physical systems, such as the ocean and the atmosphere, and the synthesizing of a physical system's design by analytic estimation of its properties. Emerging applications in areas such as bioinformatics, biological modeling, and nanoscience and technology are providing both new opportunities and new challenges. THE ROLE OF THE GOVERNMENT There are several important arguments for government involvement in the advancement of supercomputers and their applications. The first is that unique supercomputing technologies are needed to perform essential government missions and to ensure that critical national security requirements are met. Furthermore, without the government's involvement, market forces are unlikely to drive sufficient innovation in supercomputing, because the innovators like innovators in many other high-technology areas do not capture the full value of their innovations. Historically, it seems that innovations in supercomputing have played an important role in the evolution of today's mainstream computers and have provided important benefits by virtue of their use in science and engineering. These benefits seem to significantly exceed the value captured by the initial inventors. It appears that the ability of government to affect the supercomputing industry has diminished because supercomputing is a smaller fraction of the total computer market and computer technology is increasingly a commodity. This situation requires a careful assessment of the most effective ways for government to influence the future of supercomputing.