Read "The Future of Supercomputing: An Interim Report" at NAP.edu

« Previous: 2. Supercomputing Past and Present

Page 18 Cite

Suggested Citation:"3. Continuity and Predictability." National Research Council. 2003. The Future of Supercomputing: An Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/10784.

Page 19 Cite

Page 20 Cite

Page 21 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

3 Continuity and Predictability The history of supercomputing in the U.S. in the last few decades is not a history of stability. Technical directions have changed (for instance, single instruction multiple data (SIMD) architectures have come and gone); tens of supercomputer manufacturing companies have gone out of business; models for government support of supercomputing R&D have changed; and levels of government support have fluctuated. Although technical fluctuations are caused, in part, by unpredictable changes due to innovation and may therefore be inevitable, more stability in supercomputing-related government policy would be advantageous. A major policy issue for supercomputing, as well as for other technological fields, is the proper balance between investments that exploit and evolve current architectures and software and investments in alternative approaches that may lead to a paradigm shift. The interim report does not make recommendations on the crucial issue of how best to find that balance in the present context. This chapter and the next outline some of the main arguments for each type of investment. This chapter outlines the arguments for a continued, steady investment in the evolution of current architectures and software. The next chapter outlines the arguments for a sustained and vigorous research program in supercomputing. The two aspects are complementary. IMPORTANT WORK IS GETTING DONE In support of both basic scientific research and large missions of national importance, today's mix of supercomputing architectures and infrastructure are producing tangible results. For example, significant finding issues (SFIs) in the nuclear weapon stockpile have been closed with the help of massively parallel simulations run on ASCI platforms. Current missions require two kinds of supercomputer: (1) machines of high capability, on which a single very demanding problem uses the entire machine (for problems in which higher resolution in space or time is critical and for the most important and time-urgent problems, where faster times to solution are critical) and (2) workhorse-capacity machines that are used for multiple simultaneously executing jobs or embarrassingly parallel computations such as parameter studies. Capability machines stretch the scalability of current supercomputing technology to its limits they are designed for the most demanding computational problems. Typically, the solutions to some of the computational problems within a mission organization will not exploit the full capability of such systems, either because the problem does iAs an example, an ASCI code has contributed to the yield reanalysis of a particular weapon that resulted in a revised certified yield. 18

CONTINUITY AND PREDICTABILITY not demand that level of capability or because current methods fail to achieve it. For those problems, systems using less aggressive technology and lower levels of parallelism will typically reach capacity at better overall cost and performance. However, the lesser systems will not provide the capability needed to solve critical problems in a timely manner. Using ASCI codes and computers, the design of an arming, fusing, and firing device for the W76 warhead was optimized over a weekend. If run on serial machines, the same analysis would have taken significantly longer to complete. NO NEAR-TERM ALTERNATIVES One reason for fluctuations in investments in supercomputing is the perpetual hope for a silver bullet that will revolutionize supercomputing technology. While there are a number of promising opportunities for technology advances (e.g., processor in memory or streaming architectures), they are far in the future and will require concentrated investments to bring to fruition. They are also less than certain. Near-term research breakthroughs in high-performance computer architecture are unlikely, perhaps because of the lack of research investment in recent years. Evolution from current architectures is the only viable approach to meet needs in the immediate future (i.e., 3 to 5 years). OLDER ARCHITECTURES COEXIST WITH NEW ONES 19 Changes do occur over time. For example, the "attack of the killer micros"2 was largely successful, and supercomputers built of conventional microprocessors have largely displaced vector supercomputers. However, the current success of the Japanese Earth Simulator and the reentry of Cray in the vector market with the new X1 system also show the limitations of a short-term view: Supercomputers built of "killer micros" have not fully replaced vector architectures, even after many years, and the alternative technology still has its niche. Similarly, current supercomputing architectures are likely to be around for many years to come. Furthermore, current architectures will survive and coexist with new innovations well after the latter are introduced. There has never been a single architecture that is best for all applications. Historically, the diverse needs of the U.S. scientific and defense missions have often led to the coexistence of major supercomputer architectures. The evolution of supercomputer architecture probably will include supercomputers built from commercial microprocessor servers (such as the ASCI Q machine), hybrid systems that use commercial microprocessors with custom interfaces and switches (such as the T3E and the planned Red Storm system being built for Sandia National Laboratory), and purpose-built supercomputers (such as the Cray X1~. Building supercomputers from commodity components (as in the ASCI machines) will continue to be an attractive approach. The high-volume commodity market produces components, from processor chips to complete machines, with price-performance ratios difficult to achieve in any low-volume product. In the future, as in the past, there will continue to be opportunities to build larger machines with these components, whether as clusters of multiprocessors with standard I/O interfaces or as more integrated and higher bandwidth machines like Red Storm (using commodity AMD processors). Scalable machines built from commodity parts have other attractive features in the context of a national supercomputing program. Because they are built on a scalable technology, the supercomputers are merely the extreme end of a continuum of products. Thus, some aspects of software and application 2Eugene Brooks. 1989. "Attack of the Killer Micros." White paper presented at Supercomputing 1989, Reno, Nev. A "killer micro" is a microprocessor-based machine that infringes on mini, mainframe, or supercomputer performance (see http://jargon.watson-net.com/jargon.asp?w=killer°/020micro). The allusion is to the science fiction spoofAttack of the Killer Tomatoes (1978~.

20 THE FUTURE OF SUPERCOMPUTING: ANINTERIMREPORT development can be done on smaller machines, affordable to an academic research group or department for example, and then adapted to larger machines at national laboratories and centers. Supercomputers built from custom processors also embody an evolutionary path with an implied human and technology investment over time. For example, the Cray X1 shares an architectural heritage with the Cray T3E, and its vector processors are a natural evolution from previous vector processor designs. Similarly, the NEC Earth Simulator should be seen not as a radically new design but as the natural evolution of earlier SX processors. More customized supercomputers enlarge the set of applications that can be supported in a supercomputing universe. There is not a problem-independent, time-independent, or cost-independent dominance of custom-component supercomputers over commodity-component supercomputers, or vice versa. Each type contributes unique attributes to national-scale programs; the problem mix appropriate to each type can change as technology changes, but it is unlikely in the near future that either type will come to replace the other. THE IMPORTANCE AND CONTINUING VALUE OF SOFTWARE RESEARCH AND ALGORITHM DEVELOPMENT This committee notes, as many previous committees have noted, that software research and development continue to receive inadequate attention in national supercomputing programs. Hardware evolution needs to be supported by a robust investment in software development. Not only is such support necessary to ensure our ability to migrate key applications and algorithms to new architectures- indeed, if timely, it may inspire improvements in those architectures but also it is necessary in support of the unique scaling requirements of supercomputing now and in the future. Government support for the development of the unique portable parallel debugger TotalView, which is marketed by Etnus and essential in supercomputing but of problematic economic viability in the broader computing marketplace, is a success story that needs to be repeated many times. Neither the current limited investments of platform vendors nor open source code developed by the national laboratories, industry, and academia are likely, by themselves, to fulfill the need for the standardized, high-quality programming environments that are needed to enhance programmer productivity in high-performance computing. An interesting aspect of the Earth Simulator system is the successful use of high performance FORTRAN (HPF) on realistic applications to achieve significant performance levels (e.g., 14.9 Tflop on a three-dimensional fluid simulation for fusion science). HPF is an extension to FORTRAN that was developed in the early 1990s and was viewed at the time as a promising approach for achieving high performance on scalable computers while programming at a higher level of abstraction. However, HPF did not deliver (fast enough) on its early promises and was largely abandoned in the United States The successful use of HPF on the Earth Simulator suggests that there may have been a lack of perseverance in pursuing this and other software technologies. This is not to argue for the merits of HPF per se. Rather, it is to point out that lack of appreciation for the promise of supercomputing software technology may rob promising approaches of the time they need to mature. A similar observation applies to application code development. Changing computing platforms require continual rethinking and redesigning of codes. Additionally, the continually increasing computing power produces new scientific targets of opportunity for which code enhancements must be made. Support for this activity has been consistently undervalued. Investments in improvements to parallelism and in testing and validating algorithms and methods will continue to have value, even if supercomputing architectures change. For example, many software strategies for handling parallelism transcend the details of a particular parallel architecture. Additionally, over time, successors to some high-end machines tend to resemble widely available machines. Put most simply, tomorrow's Beowulf clusters may well look like today's ASCI machines. Hence, software research and development done on today's supercomputers will benefit tomorrow's smaller research

CONTINUITY AND PREDICTABILITY machines. Opportunities for leveraging people, talent, and experience across institutional boundaries can be exploited. LEGACY CODES CANNOT BE ABANDONED UNTIL THEY ARE REPLACED 21 An important aspect of software evolution is the handling of existing scientific application codes (often referred to as legacy codes). Legacy codes represent a major investment; they evolve over many years, even decades, of use. Changes in programming languages, tools, and libraries, in programming models, or in hardware platforms entail a significant cost as codes are ported or rewritten. Several software capabilities are needed to support both the use and the evolution of legacy applications. Supercomputer system software must provide continuing support for the basic operations used in the applications by keeping the legacy software running until it can be replaced, by providing tools for performance tuning and debugging on new Platforms. and bv Providing effective methods for Porting and evolution. ~ , , ~ A, ~ A, Ideally, software should present a stable programming model. This allows programmers to be more productive in creating new software, because they can better leverage their experience. Moreover, developing new algorithms to exploit different costs of computation takes time. To the extent that a stable programming model reduces the need to reprogram fundamental computations, it can smooth this transition. It also mitigates the training costs. Many changes in architecture have been accompanied by disruptions in programming models and software tools, impeding progress. It is imperative that new systems be developed in conjunction with development software (preferably using familiar, portable programming models) if they are to increase user productivity. Of particular importance is ensuring that changes in low-level hardware do not create unnecessary changes in higher-level software; for example, programmers working in object-oriented languages should not have to rewrite their code for new processor instruction sets. This will happen only by structuring funding to consider both hardware and software at the research stage and not allowing one to go forward without the other. UNCERTAINTY AND INCONSISTENT POLICIES CAN BE EXPENSIVE Failure to maintain steady, substantial investment in supercomputing could raise the cost of developing new generations of supercomputers in a timely manner. Rational firms will substantially reduce their level of effort when there is uncertainty about the demand for certain products. Because the cost of building up and tearing down the small, highly skilled teams that develop supercomputers can be significant, a temporary reduction in supercomputer acquisitions might in the long run raise the overall cost of supercomputing procurements. It may be that the "keep the shipyard alive" argument used to ensure that Navy ships are built at a continuous, predictable rate as a matter of national interest applies to supercomputer acquisitions from multiple vendors. While maintaining a predictable, continuous stream of investments in supercomputing is important, it appears that diversity of investment also is crucial (see Chapter 4~. A rational investment program requires a diversified portfolio of investments in a variety of platforms, both hardware and software, and in basic research. In summary, it is important that the long-term benefits of maintaining multiple suppliers, and helping them to do the long-range planning that sustains their supercomputing expertise, be carefully considered by policy makers.

Next: 4. Future Supercomputing and Research »

The Future of Supercomputing: An Interim Report (2003)

Chapter: 3. Continuity and Predictability

Welcome to OpenBook!

Get Email Updates