Read "The Future of Supercomputing: An Interim Report" at NAP.edu

« Previous: 3. Continuity and Predictability

Page 22 Cite

Suggested Citation:"4. Future Supercomputing and Research." National Research Council. 2003. The Future of Supercomputing: An Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10784.

Page 23 Cite

Page 24 Cite

Page 25 Cite

Page 26 Cite

Page 27 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

4 Future Supercomputing and Research As the uses of computing to address important societal problems continue to grow and the place of supercomputing within the overall computing industry continues to change, the value of innovation in supercomputer architecture, modeling, systems software, applications software, and algorithms will endure. Drawing on the recent supercomputing reports summarized in Chapter 2 and on its own insights, the committee outlines the main arguments for a vigorous research program in supercomputing. Some characteristics of successful innovation in areas such as high-performance computing are described and some key research problems are identified. INNOVATION IN HIGH-END COMPUTING A mature field is one characterized by small incremental improvements rather than large changes. By that measure, computing, and in particular high-end computing, is not a mature field. The underlying technology continues to evolve at a rapid pace, and there are ample opportunities for innovations in architecture, systems software, and applications software to dramatically improve performance and productivity. New architecture and software technologies are needed to maintain historical growth in performance. To ensure that new technologies are available to form the basis for supercomputers in the 5-15 year time frame (a typical interval between research innovation and commercial deployment in the computer industry), a significant and continuous investment in basic research is required. Historically, such an investment in basic research has returned large dividends in terms of new technology. The need for basic research in supercomputing is particularly acute. Although there has been basic research in general-purpose computing technologies with broad markets, and there has been significant expenditure in advanced development efforts such as the ASC program and the TeraGrid, there has been relatively little investment in basic research in supercomputing architecture and software over the past decade, resulting in few innovations to be incorporated into today's supercomputer systems. Because supercomputing is affected by dislocations due to nonuniform technology scaling (described subsequently) before mainstream computers are affected, the lack of that research will eventually weigh on the broader computer industry as well. Successful innovation programs in areas such as high-performance computing have a number of characteristics: I International Technology Roadmap for Semiconductors. 2002. Update. Available online at <http ://public.itrs.net/Files/2002Update/Home.pdf>. 22

FUTURE SUPERCOMPUTING AND RESEARCH Continuous investment is needed at all stages of the technology pipeline, from initial investigation of new concepts to technology demonstrations and products. With no initial, speculative research, the pipeline dries out. With no technology demonstrations, new research ideas are much less likely to migrate to products. With no investment in products, supercomputers are not built. · Continuous investment is needed in all contributing sectors, including universities, national laboratories, and vendors. All of these sectors, which often depend on government funding for their existence, have small communities of researchers and developers that are necessary for the continued evolution of supercomputing. · A mix of small science projects and large efforts that create significant experimental prototypes is necessary. Large numbers of small individual projects are often the best way of studying new concepts. A smaller number of technology demonstration systems can draw on the successes of basic research in architecture, software, and applications concepts, demonstrate their interplay, and validate concepts ahead of their use in production systems. It is important that such pilot systems be built because without real hardware platforms, systems software and applications programs will not be written nor will the experience gained from such system building be acquired. For instance, pilot systems serve to identify research issues associated with the integration of hardware and software and to address system-level problems such as I/O performance in high-performance computing. Research is not a linear pipeline or a funnel, where losers are successively winnowed out until one winning product emerges. Successful research projects often incorporate the best ideas from related efforts. In addition, the experience gained from later stages often triggers reconsideration of earlier decisions. Good research should be organized to maximize the flow of ideas and people across projects and concepts. 23 ARCHITECTURE RESEARCH The memory wall and the programming wall are two particularly challenging problems in supercomputing that could benefit from architecture research. The memory wall, the growing mismatch between memory bandwidth and latency and processor cycle time, is a major factor limiting performance. On many applications, modern processors are limited by memory system performance to a small fraction of peak performance. This problem appears to be growing worse over time. The memory wall is a special case of nonuniform scaling. As technology improves, different aspects of technology scale at different rates, leading to disparities in system parameters. When these disparities become large enough, or when a new technology is introduced, there is a discontinuity in system design that calls for innovative architecture and software. To address the memory wall problem, innovative architectures are needed that increase memory bandwidth or perhaps memory bandwidth per unit cost both to local memory and across an interconnection network to remote memory. In addition, architectures must tolerate memory latency. Memory latency (local and global) is expected to increase relative to processor cycle time. Processors, however, can be designed to tolerate this latency without loss of performance by exploiting parallelism. While waiting for one result to return from memory, the processor works on a different, parallel part of the problem, perhaps using some combination of vectors and fine-grained multithreading. Innovation is needed to better identify and exploit locality. The memory bandwidth required by an application can often be reduced by a combination of architecture and software. The architecture provides local storage locations (registers and caches), and the software transforms the program to reduce the volume of intermediate data produced so that it fits in these local stores. A programming wall also exists namely, it is becoming increasingly difficult to write complex codes for high-end computers. Moreover, considerable effort is required to port these codes to new systems with different performance parameters. Innovation in architecture must take that issue into account. Architecture research is needed to devise a stable and efficient interface to systems software and applications, thereby masking, at least to some extent, the variety of architectural strategies that enhance

24 THE FUTURE OF SUPERCOMPUTING: ANINTERIMREPORT performance. For example, one could attempt to define an appropriate simple abstract virtual machine that reflected the architectural characteristics. All applications would then be programmed and compiled for that virtual machine. This approach could make it much easier for the software system to provide performance portability that is, the ability to move a code from one machine to another without extensive performance tuning.2 In addition, widespread adoption of better abstractions than those we have now would enable higher-level programming languages and simpler algorithm and software development. Sequential orientation the one-operation-at-a-time nature of contemporary architecture and software is another major impediment to high-performance computing. While most high-end computers are necessarily parallel, they are built using processors and programming languages that are fundamentally serial. By incorporating notions of parallelism in the virtual machine described above some issues of sequential orientation also might be mitigated.3 SOFTWARE RESEARCH at, The development of scalable scientific codes today is a laborious process. Mathematical algorithms are translated by a programmer into detailed programs and tuned to a specific architecture using programming notations that reflect the underlying architecture a manual, error-intensive process. The resulting code is hard to maintain, evolve, and port to new machines. The programmer must provide a wealth of detail that can obscure the high-level structure of the application solution for example, the strategy to obtain parallelism. Also, the programmer may have an imperfect understanding of how low- level mechanisms are best used to achieve high performance. High-performance computing offers unique challenges because of the need for large-scale parallelism and for latency tolerance and because performance is important for large, expensive hardware platforms. Research is needed to find fresh approaches to expressing both data and control parallelism at the application level, so that the strategy for achieving latency tolerance, locality, and parallelism is devised , _ _ , , ~ .. .. .. .. . . . .. .. . .. . . . . . .. .. and expres sect ny the app llcatlon Dove loper, will le separating out the low- leve l Details that support particular platforms. Both new compilation and operating system capabilities and new tools are needed to realize high performance on modern supercomputing architectures. Many of the languages, operating systems, and tools in current use have evolved by modifying languages and operating systems designed for sequential systems to infer opportunities for parallelism and to add explicit mechanisms to invoke parallel layouts and operations. Similarly, sequential tools have been modified for use in parallel environments. While that evolution is natural (and leverages existing knowledge and skills), it may no longer be sufficient. For example, the effort spent to maintain compatibility by changing the sequential base may limit the time available for enhancing support for parallelism and weaken the integrity of the parallel versions. New software approaches for high-performance computing could exploit several special advantages. First, many application solutions are derived from a precise mathematical formulation of a physical problem, such as a finite difference discretization of a differential equation on a grid. By taking the problem domain into account, the necessary relationships between states of the computation and states of a mathematical system can facilitate both the mapping of the computation onto a large-scale parallel machine and the ensuing code development and testing. Second, HPC codes are often developed by small teams of highly capable scientists, who are often willing and able to use expert-friendly tools and environments if those tools will enhance their productivity. Finally, the difficulties in using the current tools for challenging and hard problems are a strong incentive for the user community to explore more advanced software technology. 2MPI is sometimes cited as such a target, but it is a low-level abstraction. 3Threads provide an aspect of concurrency, but possibly not in the form most appropriate for some applications developers and some hardware architectures.

FUTURE SUPERCOMPUTING AND RESEARCH 25 It is important to pursue the most promising software research with perseverance and a long-term view. It often takes significant investments and a long time to change widely used programming paradigms. A significant investment also may be needed in compilers, run times, tools, and libraries to induce a user community to shift paradigms, even if the new paradigm holds the promise for significant productivity enhancements. An alternative approach, which is more common, is to fund a diversity of small enhancements to current systems. That approach runs the risk that none of the enhancements will make enough of a difference, and that few of them will be pushed far enough to be transferred to practice. At the operating systems level, current HPC systems (especially cluster systems) have inherited a design that is not well tuned to the needs of large-scale parallel processing. As an example, a Linux cluster is managed by a large number of autonomous kernels, each making independent decisions on memory or processor allocation even though the entire cluster (or large parts of it) needs to run as one tightly coupled application. This discrepancy has been observed on many systems to have a negative effect on performance. A parallel application is not an entity that is recognized as a whole by the distributed operating system; there are no standardized parallel operating system services (e.g., parallel scheduling, parallel I/O, parallel memory management), although some implementations exist and are used. Communication and synchronization within a parallel application are achieved inefficiently using the same operating system mechanisms that are used for communication and synchronization across independent processes, or they bypass the system services. The mere need for bypass indicates that current interfaces are inadequate. Research that addresses the performance and semantic inadequacies of operating systems could lead to significant benefits in performance and software productivity and could push high-performance computing into new realms for example, the use of large-scale parallelism for interactive computing. Software research in HPC is likely to be more successful if closely coupled with research on algorithms and on architecture. Innovation often comes from a redesign of the interfaces between the various layers and from a better match between functionality across layers. A major challenge in building revolutionary architectures and software systems is dealing with the large volume of legacy code. On the one hand, innovative research should not be constrained by compatibility needs of existing instruction sets, programming languages, operating systems, and application implementations. Advanced HPC research is likely to be more productive if it is free to explore paradigm-shifting approaches. On the other hand, a transition plan is needed to encourage adoption of new technologies. The transition plan should leverage the existing code base, because the cost of rewriting all of the legacy code from scratch is prohibitive. RESEARCH ON APPLICATIONS AND ALGORITHMS Supercomputing applications exist in a number of well-established and important fields, such as national security (cryptanalysts, intelligence, defense systems design, and nuclear stockpile stewardship), weather and climate forecasting, and automotive and aircraft design. New applications are emerging in the life sciences and biochemistry, among others. This interim report was written in advance of an applications workshop to be held by the committee that will help it to formulate the supercomputing- related research needs in these areas. Following are some tentative observations. There is an ever-increasing need for increased performance. The limitations of present-day supercomputers prevent many applications from being run using realistic parameter ranges of spatial resolution and time integration. For such applications, a significant increase of simulation and prediction quality can be attained by applying more computer power with primarily the same algorithms. For example, mesh resolution can be increased. In other applications, new algorithms and/or new processes are required to substantially advance the application involved. Increased mesh resolution often requires the development of new physics or algorithms for sub-grid-scale processes. In some cases, submodels of detailed processes may be required within a coarser mesh (e.g., cloud-resolving submodels embedded within a larger climate model grid).

26 THE FUTURE OF SUPERCOMPUTING: ANINTERIMREPORT As applications evolve, the workload characteristics change. Many codes evolve toward more complex, time-dependent and data-dependent control logic and more irregular data structures. This evolution taxes current architectures. Huge amounts of model output and real data play an integral part of almost all supercomputing applications. The ways in which large datasets are prepared, stored, visualized, and analyzed highlight the need for new software, input/output, storage, and communication capabilities to go along with enhanced supercomputers and advanced methods. Improvements in algorithms can sometimes improve performance much more than improvements in hardware and software do. For example, algorithms for solving the special linear system arising from the Poisson equations on a regular grid have improved over time from needing O(n29 arithmetic operations to O6n log n) or even O6n9. Such algorithmic improvements can contribute to increased supercomputer performance as much as decades of hardware evolution. While such breakthroughs are hard to predict, the rewards can be significant. Further research can lead to such breakthroughs in the many complicated domains to which supercomputers are applied. New algorithmic demands are driven by the following needs: . Discip1/tinary needs. The need for higher-resolution analyses leads to larger problems to solve that lead to the need for faster algorithms (e.g., O(n log n) instead of O(n29 ). As the resolution increases, completely different physical models may be required (e.g., particle models instead of continuum models), which in turn require different solution methods. In some problems (such as turbulence), physically unresolved processes at small length or time scales may have large effects on macroscopic phenomena, requiring approximations that differ from those for the resolved processes. Interdiscip1/tinary needs. Many real-world phenomena involve two or more coupled physical processes for which individual models and algorithms may be known (clouds, winds, ocean currents, heat flow inside and between the atmosphere and the ocean, atmospheric chemistry, and so on) but where the coupled system must be solved. Vastly differing time and length scales of the different disciplinary models frequently makes this coupled model much harder to solve. · Synthesis and optimization rep1/lacing ana1/lysis. After one has a model that can be used to analyze (predict) the behavior of a physical system (such as an aircraft or weapons system), it is often desirable to use that model to trv to synthesize or optimize a system so that it has certain desired Properties. Such a . , , ~ , ~ ~ problem can be much more challenging than analysis alone. As an example, a typical analysis computes, from the shape of an airplane wing, the lift resulting from air flow over the wing, by solving a differential equation. The related optimization problem is to choose the wing shape that maximizes lift, incorporating the constraints that ensure that the wing can be manufactured. Solving that problem requires determining the direction of change in wing shape that causes the lift to increase, either by repeating the analysis as changes to shape are tried or by analytically computing the appropriate change in shape. ~ Huge data sets. Many fields (one is biology) that previously had relatively few quantitative data to analyze now have very large quantities, often of varying type, meaning, and uncertainty. These data may be represented by a diversity of data structures, including tables of numbers, irregular graphs, adaptive meshes, relational databases, two- or three-dimensional images, text, or various combined representations. Extracting scientific meaning from these data requires coupling numerical, statistical, and logical modeling techniques in ways that are unique to each discipline. · Changing machine mode1/ts. A machine model is the set of operations and their costs presented to the programmer by the underlying hardware and software. As the machine model changes between technology generations, an algorithm will probably have to be changed to maintain performance and scalability. This could involve adjusting a few parameters in the algorithm describing data layouts, running a combinatorial optimization scheme to rebalance the load, or using a completely different algorithm that trades off computation and communication in different ways. Some success has been 4A Poisson equation is an equation that describes many physical systems, including heat flow, fluid flow, diffusion, electrostatics, and gravity, with n unknowns.

FUTURE SUPERCOMPUTING AND RESEARCH achieved in automating this process, but only for a few important algorithmic kernels. For example, the ATLAS5 and FFTW6 systems automatically choose implementations of matrix-matrix-multiplication and the fast Fourier transform, respectively, to maximize performance on a particular architecture, depending on properties such as memory speed and number of registers. Emerging application areas also drive the need for new algorithms and applications. Bioinformatics, 27 for example, is driving the need to couple equation-driven numerical computing with probabilistic and constraint-driven computing. Large volumes of data from the Human Genome Project, clinical trials, statistics, population genetics, and imaging and visualization research stress the I/O capabilities of contemporary systems. Many simulation and optimization codes that are now evolved and maintained by independent software vendors (ISVs) originated in research labs and universities. The arguments for government funding of supercomputing that are outlined in the next section apply as well to supercomputing application software: markets are likely to underinvest in such software, the government is an early and main customer for many such packages, and ISV codes are heavily used in weapon design. Thus, there is a strong case for government investment, not only in algorithm and application research, but also in the development of robust and scalable application software. 5See <http://math-atlas.sourceforge.net>. 6See <http://www.fftw.org>.

Next: 5. The Role of Government in Supercomputing »

The Future of Supercomputing: An Interim Report (2003)

Chapter: 4. Future Supercomputing and Research

Welcome to OpenBook!

Get Email Updates