Supercomputingisveryimportant totheUnited States forconducting basic scientific research and for ensuring the physical and economic well-being of the country. The United States has a proud history of leadership in supercomputing, which has contributed not only to its international standing in science and engineering and to national health and security but also to the commercial strength of many industries, including the computing industry. Supercomputing has become a major contributor to the economic competitiveness of our automotive, aerospace, medical, and pharmaceutical industries. The discovery of new substances and new techniques, as well as cost reduction through simulation rather than physical prototyping, will underpin progress in a number of economically important areas. The use of supercomputing in all of these areas is growing, and it is increasingly essential to continued progress.
However, in recent years our progress in supercomputing has slowed, as attention turned to other areas of science and engineering. The advances in mainstream computing brought about by improved processor performance have enabled some former supercomputing needs to be addressed by clusters of commodity processors. Yet important applications, some vital to our nation’s security, require technology that is only available in the most advanced custom-built systems. We have been remiss in attending to the conduct of the long-term research and development we will one day need and to the sustenance of the industrial capabilities that will also be needed. The Japanese Earth Simulator has served as a wake-up call, reminding us that complacency can cause us to lose not only our competi-
tive advantage but also, and more importantly, the national competence that we need to achieve our own goals.
To maintain our level of achievement in supercomputing and its applications, as well as to keep us from falling behind relative to other nations and to our own needs, a renewed national effort is needed. That effort must have the following components:
Government leadership in maintaining a national planning activity that is sustained, ongoing, and coordinated and that drives investment decisions.
Continuing progress in creating hardware, software, and algorithmic technologies that enable the application of supercomputing to important domain-specific problems. Such progress will require continuing government investment.
International collaborations in all aspects of supercomputing except those that would demonstrably compromise national security.
Supercomputing has always been a specialized form of computing at the cutting edge of technology. As the computing field has grown and matured, computing has become broader and more diverse. From an economic perspective, there are large new markets that are distinct from supercomputing—for example, personal computing devices of various kinds, computers invisibly embedded in many kinds of artifacts, and applications that use large amounts of computing in relatively undemanding ways. As a consequence, potential providers of supercomputing systems and software and potential creators of future supercomputing technology are fewer in number than they once were. In the face of continuing need and the competing demands that weaken supply, the committee recommends that the following actions and policies be initiated.
Overall Recommendation: To meet the current and future needs of the United States, the government agencies that depend on supercomputing, together with the U.S. Congress, need to take primary responsibility for accelerating advances in supercomputing and ensuring that there are multiple strong domestic suppliers of both hardware and software.
The government is the primary user of supercomputing. Government-funded research that relies on supercomputing is pushing the frontiers of knowledge and bringing important societal benefits. Because supercomputing is essential to maintain U.S. military superiority, to achieve the goals of stockpile stewardship, and to maintain national security, the government must ensure that the U.S. supercomputing infrastructure advances sufficiently to support our needs in the coming years.
These needs are distinct from those of the broad information technology industry. They involve platforms and technologies that are unlikely on their own to have a broad enough market in the short term to satisfy government needs.
To guide the government agencies and Congress in assuming that responsibility, the committee makes eight recommendations.
Recommendation 1. To get the maximum leverage from the national effort, the government agencies that are the major users of supercomputing should be jointly responsible for the strength and continued evolution of the supercomputing infrastructure in the United States, from basic research to suppliers and deployed platforms. The Congress should provide adequate and sustained funding.
A small number of government agencies are the primary users of supercomputing, either directly, through acquisitions, or indirectly, by awarding contracts and grants to other organizations that purchase supercomputers. At present, those agencies include the Department of Energy (DOE), including its National Nuclear Security Administration (NNSA) and its Office of Science; the Department of Defense (DoD), including its National Security Agency (NSA); the National Aeronautics and Space Administration (NASA); the National Oceanic and Atmospheric Administration (NOAA); and the National Science Foundation (NSF). (The increasing use of supercomputing in biomedical applications suggests that the National Institutes of Health (NIH) should be added to the list.) Although the agencies have different missions and different needs, they could benefit from the synergies of coordinated planning and acquisition strategies and coordinated support for R&D. For instance, many of the technologies, in particular the software technology, need to be broadly available across all platforms. Therefore, those agencies must be jointly responsible and jointly accountable. Moreover, for the agencies to meet their own mission responsibilities and also take full advantage of the investments made by other agencies, collaboration and coordination must become much more long range. The agencies that are the biggest users of supercomputing must develop and execute an integrated plan.
The committee emphasizes the need for developing an integrated plan rather than coordinating distinct supercomputing plans through a diffuse interagency structure. An integrated plan is not an integrated budget. Such a plan would not preclude agencies from individual activities, nor would it prevent them from setting their own priorities. Also, it must not be used to the exclusion of unanticipated needs and opportunities. Rather, the intent is to identify common needs at an early stage, and to leverage shared efforts for meeting those needs, while minimizing duplicative ef-
forts. Different agencies should pick the activities that best match their missions; for example, long-term basic research best matches NSF’s mission, while industrial supercomputing R&D is more akin to the mission of the Defense Advanced Research Projects Agency (DARPA).
Recommendation 2. The government agencies that are the primary users of supercomputing should ensure domestic leadership in those technologies that are essential to meet national needs.
Current U.S. investments in supercomputing and current plans are not sufficient to provide the supercomputing capabilities that our country will need. It needs supercomputers that satisfy critical requirements in areas such as cryptography and stockpile stewardship, as well as systems that will enable breakthroughs for the broad scientific and technological progress underlying a strong and robust U.S. economy. The committee is less concerned that the top-ranked computer in the TOP500 list (as of June 2004) was located in Japan. U.S. security is not necessarily endangered if a computer in a foreign country is capable of doing some computations faster than U.S.-based computers. The committee believes that had the United States at that time made an investment similar to the Japanese investment in the Earth Simulator, it could have created a powerful and equally capable system. The committee’s concern is that the United States has not been making the investments that would guarantee its ability to create such a system in the future.
Leadership is measured by the ability to acquire and exploit effectively machines that can best reduce the time to solution of important computational problems. From this perspective, it is not the Earth Simulator system per se that is worrisome but rather the fact that the construction of this system might turn out to have been a singular event. It appears that custom high-bandwidth processors such as those used by the Earth Simulator are not viable products without significant government support. Two of the three Japanese companies that were manufacturing such processors do not do so anymore, and the third (NEC) may also bow to market realities in the not-too-distant future, since the Japanese government seems less willing now to subsidize the development of cutting-edge supercomputing technologies. Only by maintaining national leadership in these technologies can the U.S. government ensure that key supercomputing technologies, such as custom high-bandwidth processors, will be available to satisfy its needs. The U.S. industrial base must include suppliers on whom the government can rely to build custom systems to solve problems arising from the government’s unique requirements. Since only a few units of such systems are ever needed, there is no broad market for the systems and hence no commercial off-the-shelf suppliers. Domestic supercomputing vendors can become a source of both
the components and the engineering talent needed for building these custom systems.
Recommendation 3. To satisfy its need for unique supercomputing technologies such as high-bandwidth systems, the government needs to ensure the viability of multiple domestic suppliers.
Supercomputers built out of commodity components satisfy a large fraction of supercomputing applications. These applications benefit from the fast evolution and low cost of commodity technology. But commodity components are designed for the needs of large markets in data processing or personal computing and are inadequate for many supercomputing applications. The use of commodity clusters results in lower sustained performance and higher programming costs for some demanding applications. This is especially true of some security-related computations where shorter time to solution is of critical importance, justifying the use of custom-built, high-bandwidth supercomputers even at a higher cost per solution.
It is important to have multiple suppliers for any key technology in order to maintain competition, to prevent technical stagnation, to provide diverse supercomputing ecosystems that will address diverse needs, and to reduce risk. However, it is unrealistic to expect that such narrow markets will attract a large number of vendors. As is true for many military technologies, there may be only a few suppliers.
To ensure their continuing existence, domestic suppliers must follow a viable business model. For a public company, that means having a predictable and steady revenue stream recognizable by the financial market. A company cannot continue to provide leadership products without R&D. At least two models have been used successfully in the past: (1) an implicit guarantee for the steady purchase of supercomputing systems, giving the companies a steady income stream with which to fund ongoing R&D and (2) explicit funding of a company’s R&D. Stability is a key issue. Suppliers of such systems or components are often small companies that can cease to be viable; additionally, uncertainty can mean the loss of skilled personnel to other sectors of the computing industry or the loss of investors. Historically, government priorities and technical directions have changed more frequently than would be justified by technology lifetimes, creating market instabilities. The chosen funding model must ensure stable funding.
Recommendation 4. The creation and long-term maintenance of the software that is key to supercomputing requires the support of those agencies that are responsible for supercomputing R&D. That software includes operating systems, libraries, compilers, software development and data analysis tools, application codes, and databases.
Supercomputer software is developed and maintained by the national laboratories, by universities, by vertically integrated hardware vendors, and by small independent companies. An increasing amount of the software used in supercomputing is developed in an open source model. Many of the supercomputing software vendors are small and can disappear from the marketplace. The open source model may suffer from having too few developers of supercomputing software with too many other demands on their time.
The successful evolution and maintenance of complex software systems are critically dependent on institutional memory—that is, on the continuous involvement of the few key developers that understand the software design. Stability and continuity are essential to preserve institutional memory. Whatever model of support is used, it should be implemented so that stable organizations with lifetimes of decades can maintain and evolve the software. At the same time, the government should not duplicate successful commercial software packages but should instead invest in new technology. When new commercial providers emerge, the government should purchase their products and redirect its own efforts toward technology that it cannot otherwise obtain.
Barriers to the replacement of application programming interfaces are very high, owing to the large sunk investments in application software. Any change that significantly enhances our nation’s ability to program very large systems will entail the radical, coordinated change of many technologies, creating a new ecosystem. To facilitate this change, the government needs long-term, coordinated investments in a large number of interlocking technologies.
Recommendation 5. The government agencies responsible for supercomputing should underwrite a community effort to develop and maintain a roadmap that identifies key obstacles and synergies in all of supercomputing.
The challenges in supercomputing are very significant, and the amount of ongoing research is limited. To make progress, it is important to identify and address the key roadblocks. Furthermore, technologies in different domains are interdependent: Progress on a new architecture may also require specific advances in packaging, interconnects, operating system structures, programming languages and compilers, and the like. Thus, investments need to be coordinated. To drive decisions, one needs a roadmap of all the technologies that affect supercomputing. The roadmap needs to have quantitative and measurable milestones. Its creation and maintenance should be an open process that involves a broad community. It is important that a supercomputing roadmap be driven both top-down by application needs and bottom-up by technology barriers and
that mission needs as well as science needs be incorporated. It should focus on the evolution of each specific technology and on the interplay between technologies. It should be updated annually and undergo major revision at suitable intervals.
The roadmap should be used by agencies and by Congress to guide their long-term research and development investments. Those roadblocks that will not be addressed by industry without government intervention must be identified, and the needed research and development must be initiated. Metrics must be developed to support the quantitative aspects of the roadmap. It is important also to invest in some high-risk, high-return research ideas that are not indicated by the roadmap, to avoid being blindsided.
Recommendation 6. Government agencies responsible for supercomputing should increase their levels of stable, robust, sustained multiagency investment in basic research. More research is needed in all the key technologies required for the design and use of supercomputers (architecture, software, algorithms, and applications).
The peak performance of supercomputers has increased rapidly in the last decades, but their sustained performance has lagged, and the productivity of supercomputing users has lagged. Over the last decade the advance in peak supercomputing performance was largely due to the advance in microprocessor performance driven by increased miniaturization, with some contribution from increased parallelism. Perhaps because a large fraction of supercomputing improvements resulted from these advances, few novel technologies were introduced in supercomputer systems, and supercomputing research investments decreased. However, many important applications have not benefited from these advances in mainstream computing, and it will be harder for supercomputing to benefit from increased miniaturization in the future. Fundamental breakthroughs will be needed that will require an increase in research funding.
The research investments should be informed by the supercomputing roadmap but not constrained by it. It is important to focus on technologies that have been identified as roadblocks and that are beyond the scope of industry investments in computing. It is equally important to support long-term speculative research in potentially disruptive technical advances. The research investment should also be informed by the “ecosystem” view of supercomputing—namely, that progress is often needed on a broad front of interrelated technologies rather than as individual breakthroughs.
Research on supercomputing hardware and software should include a mix of small, medium, and large projects. Many small individual projects
are necessary for the development of new ideas. A smaller number of large projects that develop technology demonstrations are needed to bring these ideas to maturity and to study the interaction between various technologies in a realistic environment. Such demonstration projects (which are different from product prototyping activities) should not be expected to be stable platforms for exploitation by users, because the need to maintain a stable platform conflicts with the ability to use the platform for experiments. It is important that the development of such demonstration systems have the substantial involvement not only of academic researchers but also of students, to support the education of the new generation of researchers and to increase the supercomputing workforce. It is also important that the fruits of such projects not be proprietary. The committee estimated the necessary investments in such projects at about $140 million per year. This estimate does not include investments in the development and use of application-specific software.
In its early days, supercomputing research generated many ideas that eventually became broadly used in the computing industry. Such influences will continue in the future. Many of the technical roadblocks faced today by supercomputing are roadblocks that will affect all computing over time. There can be little doubt that solutions developed to solve this problem for supercomputers will eventually influence the broader computing industry, so that investment in basic research in supercomputing is likely to be of widespread benefit to all of information technology.
Recommendation 7. Supercomputing research is an international activity; barriers to international collaboration should be minimized.
Research has always benefited from the open exchange of ideas and the opportunity to build on the achievements of others. The national leadership advocated in these recommendations is enhanced, not compromised, by early-stage sharing of ideas and results. In light of the relatively small community of supercomputing researchers, international collaborations are particularly beneficial. The climate modeling community, for one, has long embraced that view.
Collaboration with international researchers must include giving them access to domestic supercomputing systems; they often spend time in the United States to work closely with resident scientists. Many of the best U.S. graduate students come from other countries, although they often remain as permanent residents or new citizens. Access restrictions based on citizenship hinder collaboration and are contrary to the openness that is essential to good research.
Restrictions on the import of supercomputers to the United States have not benefited the U.S. supercomputing industry and are unlikely to
do so in the future. Some kinds of export controls—on commodity systems, especially—lack any clear rationale, given that such systems are built from widely available commercial components. It makes little sense to restrict sales of commodity systems built from components that are not export controlled. Because restrictions on the export of supercomputing technology may damage international collaboration, the benefit of using export controls to prevent potential adversaries or proliferators from accessing key supercomputing technology has to be carefully weighed against that damage.
Since supercomputer systems are multipurpose (nuclear simulations, climate modeling, and so on), their availability need not compromise the domestic leadership needed for national defense, so long as safeguards are in place to protect critical applications.
Recommendation 8. The U.S. government should ensure that researchers with the most demanding computational requirements have access to the most powerful supercomputing systems.
Access to the most powerful supercomputers is important for the advancement of science in many disciplines. A model in which top supercomputing capabilities are provided by different agencies with different missions is healthy. Each agency is the primary supporter of certain research or mission-driven communities; as such, each agency should have a long-term plan and budget for the acquisition of the supercomputing systems that are needed to support its users. The planning and funding process followed by each agency must ensure stability from the viewpoint of its users.
The users should be involved in the planning process and should be consulted in setting budget priorities for supercomputing. The mechanisms for allocating supercomputing resources must ensure that almost all of the computer time on capability systems is allocated to jobs for which that capability is essential. Budget priorities should be reflected in the high-end computing plan proposed in Recommendation 1. In Chapter 9, the committee estimates the cost of a healthy procurement process at about $800 million per year. Such a process would satisfy the capability supercomputing needs (but not the capacity needs) of the main agencies using supercomputing and would include the platforms primarily used for research. It would include both platforms used for mission-specific tasks and platforms used to support science.
The NSF supercomputing centers have traditionally provided open access to a broad range of academic users. However, some of these centers have increased the scope of their activities in order to support high-speed networking and grid computing and to expand their education mission. The increases in scope have not been accompanied by corresponding in-
creases in funding, so less attention is paid to supercomputing, and support for computational scientists with capability needs has been diluted.
It is important to repair the current situation at NSF, in which the computational science users of supercomputing centers appear to have too little involvement in programmatic and budgetary planning. All the research communities in need of supercomputing capability have a shared responsibility to provide direction for the supercomputing infrastructure they use and to ensure that resources are available for sustaining the supercomputing ecosystems. Funding for the acquisition and operation of the research supercomputing infrastructure should be clearly separated from funding for computer and computational science and engineering research. It should compete on an equal basis with other infrastructure needs of the science and engineering disciplines. That is not now the case.