Page 9 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

1

Overview and Recommendations

The National Science Foundation (NSF) requested that the National Academies of Sciences, Engineering, and Medicine carry out a study examining anticipated priorities and associated trade-offs for advanced computing in support of NSF-sponsored science and engineering research. In this study, advanced computing is defined as the advanced technical capabilities, including both computer systems and expert staff, that support research across the entire science and engineering spectrum and that are of a scale and cost so great that they are typically shared among multiple researchers, institutions, and applications.¹ As used here, the term encompasses support for data-driven research as well as modeling and simulation.² Data have always been an important element of advanced computing, but the emergence of “big data” has created new opportunities for research and stimulated new demand for data-intensive capabilities. The scope of the study encompasses advanced computing activities and programs throughout NSF, including, but not limited to,

___________________

¹ Also critical to NSF-supported advanced computing activities are wide-area and campus networks, which provide access and the infrastructure necessary to bring together data sources and computing resources where they cannot practically be colocated. Both types of networks have been supported by NSF programs. Understanding future networking needs would involve examination of a much wider range of activities across NSF—not just advanced computing, including many aspects of cyberinfrastructure, but also planned major experimental facilities—and is therefore not addressed in this report.

² Throughout this report, “computing” should be read broadly as encompassing data analytics and other data-intensive applications as well as modeling and simulation and other numerically intensive or symbolic computing applications.

Page 10 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

those of its Division of Advanced Cyberinfrastructure. The statement of task for the Committee on Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science in 2017-2020 is given in Box P.1. This final report from the study follows the committee’s interim report issued in 2014.³

The committee’s recommendations are aimed at achieving four broad goals: (1) position the United States for continued leadership in science and engineering, (2) ensure that resources meet community needs, (3) aid the scientific community in keeping up with the revolution in computing, and (4) sustain the infrastructure for advanced computing.

1.1 POSITION THE UNITED STATES FOR CONTINUED LEADERSHIP IN SCIENCE AND ENGINEERING

NSF’s investments in advanced computing are critical enablers of the nation’s science leadership. Advanced computing at NSF has been used to understand the formation of the first galaxies in the early universe and to analyze the impacts of cloud-aerosol-radiation on regional climate change. Advanced computing has been a key to award-winning science, including the 2011 Nobel Prize in physics and the 1998 and 2013 Nobel Prizes in chemistry (see Box 3.2). Its use has moved outside of traditional areas of science to understanding social phenomenon captured in real-time video streams and the connection properties of social networks.

Large-scale simulation, the accumulation and analysis of massive amounts of data, and other forms of advanced computing are all revolutionizing many areas of science and engineering research. Modeling and simulation, the historical focus of high-performance computing systems and programs, is a well-established peer of theory and experimentation. Increased capability has historically enabled new science, and many fields increasingly rely on high-throughput computing.

Data-driven research has emerged as a complementary “fourth paradigm” for scientific discovery⁴ that needs data-intensive computing capabilities and resources configured for the transfer, search, analysis, and management of scientific data, often under real-time constraints. Even in modeling and simulation applications, data-intensive aspects are increasingly important as large data sets are produced by or incorporated into the simulations. Both data-driven and computationally driven sci-

___________________

³ National Research Council, Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020: An Interim Report, The National Academies Press, Washington, D.C., 2014.

⁴ J. Gray, T. Hey, S. Tansley, and K. Tolle, “Jim Gray on eScience: A Transformed Scientific Method,” in The Fourth Paradigm: Data-Intensive Scientific Discovery, Microsoft Research, Redmond, Wash., 2009.

Page 11 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

entific processes involve a range of algorithms and workflows that may be compute-intensive or bandwidth-intensive, making simple machine characterizations difficult, especially given that science and engineering discovery frequently integrates all of these. As a result, leadership in frontier science also requires that the United States maintain leadership in both simulation science and data-driven science.

NSF has been very successful in making advanced computing resources, especially in support of modeling and simulation, available to an expanding set of disciplines supported by NSF, and has an opportunity to assert similar leadership in data-driven science. NSF is a major provider of computing support for the nation’s science enterprise, not just for the research programs it directly supports. For example, about half of the computer resources allocated under the Extreme Science and Engineering Discovery Environment (XSEDE) program are to non-NSF-supported researchers, including 14 percent for work supported by the National Institutes of Health. Moreover, the science and engineering community and other federal agencies that support scientific research look to NSF to provide leadership and to play crucial roles in developing and applying advanced computing, including advancing the intellectual foundations of computation, creating practical tools, and developing the workforce.

An exponential rate of growth in demand is now observed that is outpacing the rate of growth in advanced computing resources. At the same time, the cost of provisioning facilities has risen because demand is rising faster than technology improvements are now able to deliver at fixed price. The rise in data-driven science and increasing need for both numerically intensive and data-intensive capabilities (Recommendation 2) create further demand for resources.

Production support is needed for software (including pre-installed popular applications and libraries) as well as hardware, to include community software as well as frameworks, shared elements, and other supporting infrastructure. NSF’s Software Infrastructure for Sustained Innovation (SISI) program is a good foundation for such investments. However, SISI needs to be grown in partnership with NSF’s science directorates to a scale that matches need, and then be sustained essentially indefinitely; the United Kingdom’s Collaborative Computational Projects (CCPs) provide examples of the impact and successful operation of community-led activities that now span nearly four decades. Production support is further needed for data management. Curation, preservation, archiving, and support for sharing all need ongoing investment.

Recommendation 1. The National Science Foundation (NSF) should sustain and seek to grow its investments in advanced computing—to include hardware and services, software and algorithms, and exper-

Page 12 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

tise—to ensure that the nation’s researchers can continue to work at frontiers of science and engineering.

An important element of fulfilling its role of maintaining the nation’s science leadership and achieving the vision in NSF’s Cyberinfrastructure Framework for 21st Century Science is providing the research community with access to the needed advanced computing capabilities. This will include

Providing access to sufficient computing facilities and services to support NSF’s portfolio of science and engineering research, including both aggregate capacity and large-scale parallel computers and software systems;
Assuming leadership in providing access to general-use hardware and software that integrate support for data-driven science as well as large hardware and software systems focused on data-driven science; and
Assuming leadership for data-driven science, first by integrating support for data-driven science into most or all of the systems it provides support for on behalf of the research community and next by deploying advanced computing systems focused on data-driven science.

Recommendation 1.1. NSF should ensure that adequate advanced computing resources are focused on systems and services that support scientific research. In the future, these requirements will be captured in its roadmaps.

Recommendation 1.2. Within today’s limited budget envelope, this will mean, first and foremost, ensuring that a predominant share of advanced computing investments be focused on production capabilities and that this focus not be diluted by undertaking too many experimental or research activities as part of NSF’s advanced computing program.

Recommendation 1.3. NSF should explore partnerships, both strategic and financial, with federal agencies that also provide advanced computing capabilities, as well as federal agencies that rely on NSF facilities to provide computing support for their grantees.

Today’s landscape for advanced computing is far richer in terms of an expanding range of needs and in terms of technical opportunities for meeting those needs. Key elements of this landscape include the following:

Page 13 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

Scientists supported by NSF advanced computing increasingly include a “long tail” of users with more modest requirements for advanced computing than those with research applications that require parallel computers with a large number of tightly coupled processors. The latter applications cannot be run (or run with acceptable efficiency) on smaller systems or on current commercial cloud systems.
Increased capability has historically enabled new science (see examples in Box 3.1). Without at least some growth in capability, researchers pursuing science that requires capability computing will have difficulty making advances.
Many fields increasingly rely on high-throughput computing that requires a greater aggregate amount of computing than a typical university can be expected to provide. Such applications can be run efficiently on both large and medium-size machines. Although a large-scale system can run many smaller jobs with good efficiency, systems capable of running only smaller jobs cannot run large-scale jobs with acceptable efficiency. It is not necessary or more efficient to restrict large, tightly coupled systems to run only large, highly scalable applications. Modestly sized jobs may still require tight connections, even though at smaller scale, and the utilization of large systems is improved with a mixture of job sizes.
The rise in the volume and diversity of scientific data represents a significant disruption and opportunity for science and engineering and for advanced computing. Data-intensive advanced computing represents a significant opportunity for U.S. science and engineering leadership. Some data-intensive applications can be accommodated in more data-capable general-purpose platforms; other applications will require specifically configured systems. Supporting data-driven science also places additional demands on wide-area networking to share scientific data and raises challenges around long-term storage, preservation, and curation. It also requires diverse and hard-to-find expertise.
Large systems are more accessible to a larger group of users; both cloud technologies and science gateways lower the barriers to access applications at scale.
Cloud computing has shown that access can be “democratized”: many users can access a large system for small amounts of total time in a fashion not supported by current approaches to allocating supercomputer time. Moreover, cloud computing users can leverage extensive libraries of software tools developed by both commercial providers and individual scientists. In many ways, this ability for a far larger community to access the power of large-scale systems, whether it is a conventional supercomputer or a commercial cloud configured to support some aspects of scientific discovery, represents a qualitative change in the computing landscape. However, NSF computing centers already exploit economies of

Page 14 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

scale and load sharing, and commercial cloud providers do not currently support very large, tightly coupled parallel applications, especially for high-end simulation workloads. However, this area is under rapid development, and the price (i.e., cost to NSF) and types of services are likely to change. The cost of commercial cloud services could be greatly reduced by reducing or eliminating the overhead charged on these services, bulk purchase by NSF of cloud resources, and/or partnering with commercial cloud providers.

The greater complexity of the landscape means that it will be especially important, as recommended in Section 1.2, to derive future requirements for advanced computing platforms from an analysis of science needs, workload characteristics, and priorities.

To maximize performance, NSF could deploy systems that were optimal for each class of problem. But as a practical matter and for cost-effectiveness, NSF must secure access to capabilities that will represent compromises with respect to individual applications but reasonably support the overall research portfolio. Put another way, it will require careful resource management driven by an understanding of the science and engineering returns on investments in advanced computing. Understanding which compromises to make requires a comprehensive understanding of science requirements and priorities; see the discussion of requirements and roadmapping below.

Recommendation 2. As it supports the full range of science requirements for advanced computing in the 2017-2020 time frame, the National Science Foundation (NSF) should pay particular attention to providing support for the revolution in data-driven science along with simulation. It should ensure that it can provide unique capabilities to support large-scale simulations and/or data analytics that would otherwise be unavailable to researchers and continue to monitor the cost-effectiveness of commercial cloud services.

Recommendation 2.1. NSF should integrate support for the revolution in data-driven science into NSF’s strategy for advanced computing by (a) requiring most future systems and services and all those that are intended to be general purpose to be more data-capable in both hardware and software, (b) expanding the portfolio of facilities and services optimized for data-intensive as well as numerically intensive computing, and (c) carefully evaluating inclusion of facilities and services optimized for data-intensive computing in its portfolio of advanced computing services.

Page 15 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

To support data-driven science, advanced computing hardware and software systems will need adequate data capabilities, in most cases more than is currently provided. Some research will need large-scale data-centric systems with data-handling capabilities that are quite different from traditional high-performance computing systems. For example, data analytics often requires that data reside on disk for extended periods. Several factors suggest that meeting these needs will require one or more large investments, rather than just multiple small projects, including the following: (1) the scale of the largest problems, (2) the opportunities for new science when disparate data sets are colocated, and (3) the cost efficiencies that come from consolidating facilities. Indeed, the growth in data-driven science suggests that investments will ultimately be needed on a scale comparable to those that support modeling and simulation At the very least, the systems should be better balanced for data (input/output and perhaps memory size), thereby allowing the same systems to be used for different problems without needing to double the size of the resources. As data play a growing role in scientific discovery, long-term data management will become an important aspect of all planning for advanced computing. A partnership with a commercial cloud provider could provide access to larger systems than NSF could afford to deploy on its own. Of course, even as it moves to provide better support for data-driven research, NSF cannot neglect simulation and modeling research.

Recommendation 2.2. NSF should (a) provide one or more systems for applications that require a single, large, tightly coupled parallel computer and (b) broaden the accessibility and utility of these large-scale platforms by allocating high-throughput as well as high-performance workflows to them.

Simply meeting current levels of demand will require continuing to provide at least the capacity currently provided by the XSEDE program and the capability currently provided by Blue Waters. Even as NSF develops its future requirements (Recommendation 3) that can be used to develop long-term plans, the observed growth in demand suggests that some growth be included in NSF’s short-term plans.

Recommendation 2.3. NSF should (a) eliminate barriers to cost-effective academic use of the commercial cloud and (b) carefully evaluate the full cost and other attributes (e.g., productivity and match to science workflows) of all services and infrastructure models to determine whether such services can supply resources that meet the science needs of segments of the community in the most effective ways.

Page 16 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

For 2020 and beyond, many of these recommendations may well still hold true, but NSF should rely on the requirements process outlined in the next section.

1.2 ENSURE RESOURCES MEET COMMUNITY NEEDS

At a time when resources are tight and demand for advanced computing resources continues to grow, it is especially important for NSF to maximize the return on investment in terms of science and engineering outcomes by improving the efficiency of advanced computing facility use. One part of this is ensuring that the resources provided match the requirements of the science applications, and this aspect is discussed separately below; another is to ensure that the resources are effectively used. How NSF can help the community use the computing infrastructure effectively is discussed in Sections 1.3 and 1.4.

The resources available for advanced computing are inherently limited by research budgets as compared to the potentially ever-expanding demand for advanced computing. Despite various ongoing efforts to collect and understand requirements from some science communities and occasional efforts to chart strategic directions, the overall planning process for advanced computing resources and programs is not systematic or uniform and is not visibly reflected in NSF’s strategic planning, despite its foundation-wide importance. Further, much of what quantification there is makes use of measurements related to floating-point performance; this is misleading both because the performance of many applications is not well modeled using just floating-point performance and because the sustained as opposed to peak performance of some processors (especially most highly parallel processors) is low on many of those applications.

The creation of an ongoing and more regular and structured process would make it possible to collect requirements, roll them up, and prioritize advanced computing investments based on science and engineering priorities. It would reflect the visions of science communities and support evaluation of potential scientific advances, probability of success, advanced computing requirements and their costs, and their affordability. Such a process needs to be nimble enough to respond to new science opportunities and computing technologies but have a long-enough time horizon to provide continuity and predictability to both users and resource providers. The process also needs to involve the growing body of researchers from a growing number of disciplines who use NSF infrastructure.

Requirements established for future systems and services must also address trade-offs—for example, within a given budget envelope for hardware, more memory implies less compute or input/output capacity.

Page 17 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

The criteria established for future procurements should reflect scientific requirements rather than simplistic or unrepresentative benchmarks.

One way to capture requirements and enable the science community to participate in the process is to establish roadmaps. Roadmaps do not suggest a single path to a destination but rather multiple routes to a variety of goals. Such roadmaps would help make science requirements concrete and relate them to future computing capabilities and facilitate planning by researchers, program directors, and facility and service operators at centers and on campuses over a longer time horizon. By capturing anticipated technology trends, the roadmaps can also provide guidance to those responsible for scientific software projects. The roadmaps can also address dependencies between investments by federal agencies through consultation with agencies that use NSF advanced computing facilities or provide computing to the NSF-supported research community.

The goal is to develop fairly brief documents that set forth the overall strategy and approach rather than high-resolution details, looking roughly 5 years ahead with a vision that extends perhaps for 10 years ahead. Roadmaps would help inform users about future facilities, guide investment, align future procurements with requirements and services, and enable more effective partnerships within NSF and with other federal agencies. If researchers are given information about the capabilities they can expect, they can make better plans for their future research and the software to support it. By describing what types of resources NSF will and will not provide, roadmaps would permit other agencies, research institutions, and individual principal investigators to make complementary plans for investments. They would also encourage reflection within individual science communities about their future needs and the challenges and opportunities that arise from future computing technologies. By establishing predictability over longer timescales, roadmaps would help those proposing or managing major facilities to rely on shared advanced computing resources, helping reduce the overall costs of advanced computing. The provision in 2015 of such a roadmap for the Department of Energy (DOE) by its Office of Advanced Scientific Computing Research has already enabled the community and science programs to direct their investments and software development efforts toward systems that, in some detail, they know will appear in 2018-2019 and, in less detail, toward a path that extends into the exascale era of around 2023 and beyond. The NSF academic community presently lacks this ability to plan.

The roadmapping process would also be an opportunity to address data curation and storage requirements and link them to individual programs developing data capabilities such as the Big Data Regional Innovation Hubs. In essence, it could provide ingredients of an NSF-wide data

Page 18 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

plan that supports the needs of NSF’s grantees and the science communities NSF supports.

Requirements-setting and roadmapping efforts could be built or modeled on activities undertaken to define requirements for large scientific facilities such as the Academies’ astronomy and astrophysics decadal surveys or DOE’s Particle Physics Project Prioritization Panel. However, the requirements will need to be aggregated at a higher level given that advanced computing facilities generally serve many scientific disciplines. In addition, because of the wide use of computing and data at all scales of resources, it is critical that any such requirements gathering include input from the whole community, including those with more modest (midrange) computing and data needs. Sometimes called “the long tail of science,” these users have more modest requirements (but still beyond that available in a group, departmental, or campus system) and make up the majority of researchers.

Recommendation 3. To inform decisions about capabilities planned for 2020 and beyond, the National Science Foundation (NSF) should collect community requirements and construct and publish roadmaps to allow it to better set priorities and make more strategic decisions about advanced computing.

Recommendation 3.1. NSF should inform its strategy and decisions about investment trade-offs using a requirements analysis that draws on community input, information on requirements contained in research proposals, allocation requests, and foundation-wide information gathering.

Recommendation 3.2. NSF should construct and periodically update roadmaps for advanced computing that reflect these requirements and anticipated technology trends to help it set priorities and make more strategic decisions about science and engineering and to enable the researchers that use advanced computing to make plans and set priorities.

Recommendation 3.3. NSF should document and publish on a regular basis the amount and types of advanced computing capabilities that are needed to respond to science and engineering research opportunities.

Recommendation 3.4. NSF should employ this requirements analysis and resulting roadmaps to explore whether there are more opportunities to use shared advanced computing facilities to sup-

Page 19 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

port individual science programs such as Major Research Equipment and Facilities Construction projects.

The roadmapping and requirements process could be strengthened by developing a better understanding of the relationship among the cost of different approaches (roadmap choices), requirements, and science benefits. For example, the information would inform program managers about the total cost of proposed research and help focus researchers’ attention on effective use of these valuable shared resources, encouraging more efficient software and research techniques. NSF’s XSEDE program has adopted this practice, which could be expanded to cover all aspects of NSF-supported advanced computing including campus-level resources.

Recommendation 4. The National Science Foundation (NSF) should adopt approaches that allow investments in advanced computing hardware acquisition, computing services, data services, expertise, algorithms, and software to be considered in an integrated manner.

Recommendation 4.1. NSF should consider requiring that all proposals contain an estimate of the advanced computing resources required to carry out the proposed work and creating a standardized template for collection of the information as one step of potentially many toward more efficient individual and collective use of these finite, expensive, shared resources. (This information would also inform the requirements process.)

Recommendation 4.2. NSF should inform users and program managers of the cost of advanced computing allocation requests in dollars to illuminate the total cost and value of proposed research activities.

1.3 AID THE SCIENTIFIC COMMUNITY IN KEEPING UP WITH THE REVOLUTION IN COMPUTING

However, even with a good match to the science requirements, getting the most out of modern computing systems is difficult. Better software tools and more flexible service models (ways of delivering software and computing resources) can improve the productivity of researchers.

Improvements to software and new algorithms can often significantly reduce computational and data-processing demands. One class of improvements increases performance on current computer architectures; another takes better advantage of new architectures. There is considerable

Page 20 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

uncertainty about future architectural directions for computing in general and for advanced computing for science and engineering specifically. Architectures are already changing in response to power density issues, which have had limited clock speed growth since 2004, even as transistor density continued to grow. As a result, the creation and evolution of software for scientific applications have become more difficult, especially for those problems that do not readily lend themselves to massive parallelism.

The service model, application programming interfaces, and software stacks offered by cloud computing complement the existing supercomputing batch models and software stacks. Both the economics and applicability across the full range of science applications will need careful examination.

Production support is needed for software as well as hardware, to include community software as well as frameworks, shared elements, and other supporting infrastructure. NSF’s SISI program is a good foundation for such investments. Production support is further needed for data management. Curation, preservation, archiving, and support for sharing all need ongoing investment.

Recommendation 5. The National Science Foundation (NSF) should support the development and maintenance of expertise, scientific software, and software tools that are needed to make efficient use of its advanced computing resources.

Recommendation 5.1. NSF should continue to develop, sustain, and leverage expertise in all programs that supply or use advanced computing to help researchers use today’s advanced computing more effectively and prepare for future machine architectures.

Recommendation 5.2. NSF should explore ways to provision expertise in more effective and scalable ways to enable researchers to make their software more efficient; for instance, by making more pervasive the XSEDE (Extreme Science and Engineering Discovery Environment) practice that permits researchers to request an allocation of staff time along with computer time.

Recommendation 5.3. NSF should continue to invest in supporting science codes and in continuing to update them to support new systems and incorporate new algorithms, recognizing that this work is not primarily a research activity but rather is support of software infrastructure.

Page 21 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

If NSF was to invest solely in production, it would miss some key technology shifts and its facilities would quickly become obsolete. Some innovation takes the form of fine-tuning of production systems, yet nontrivial but small investments in exploratory or experimental facilities and services are also needed to create, anticipate, and prepare for technology disruptions. NSF needs to play a leadership role in both defining future advanced computing capabilities and enabling researchers to effectively use those systems. This is especially true in the current hardware environment, where architectures are diverging in order to continue growing computing performance. Leadership by NSF will help ensure that its software and systems remain relevant to its science portfolio, that researchers are prepared to use the systems, and that investments across the foundation are aligned with this future.

It will be especially important for NSF to be not only engaged in but helping to lead the national and international activities that define and advance future software ecosystems that support simulation and data-driven science, including converging the presently distinct tools and programming paradigms, and the software required for exascale hardware technologies. NSF may be especially well positioned to collaborate internationally compared to the mission science agencies given its long track record of open science collaboration. DOE is currently investing heavily in new exascale programming tools that, through the scale of investment and buy-in from system manufacturers, could plausibly define the future of advanced programming even though the design may not reflect the needs of all NSF science because the centers and researcher communities it supports are not formally engaged in the specification process. It is also important for NSF to be engaged with the private sector and academia for insights into data analytics.

Recommendation 6. The National Science Foundation (NSF) should also invest modestly to explore next-generation hardware and software technologies to explore new ideas for delivering capabilities that can be used effectively for scientific research, tested, and transitioned into production where successful. Not all communities will be ready to adopt radically new technologies quickly, and NSF should provision advanced computing resources accordingly.

Investments by other federal agencies in new computing technologies and NSF’s own computing research programs will both be sources of advanced hardware and software architectures to consider adopting in NSF’s advanced computing programs. Achieving continued growth in NSF’s aggregate computing performance on a fixed budget will likely require new architectural models that are more energy efficient. The

Page 22 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

requirements-gathering and roadmapping process can be used to obtain long-term predictions of available capabilities and their energy requirements. That process will also provide insights into which technology advances are suitable for future production services.

1.4 SUSTAIN THE INFRASTRUCTURE FOR ADVANCED COMPUTING

A hard but essential part of managing advanced computing in a fixed budget envelope will be discontinuing activities in order to start or grow other activities. The requirements analysis will provide a rational and open basis for these decisions, and the roadmaps will enable communities to plan and adapt in advance of future investments. Even in a favorable budget environment for science and engineering generally and for advanced computing specifically, NSF will need to manage exponentially growing demand and rising costs (see Section 1.1).

One response to these challenges is to take advantage of the opportunities described in Sections 1.2 and 1.3 to increase efficiency and productivity in the use of advanced computing facilities, to use the requirements process to inform trade-offs, and to exploit new technologies. In addition, there are several possibilities for finding more or better leveraging resources. These include the following:

Making a case for additional resources based on the requirements analysis. For example, the 2003 report A Science-Based Case for Large-Scale Simulation⁵ is widely credited with developing the rationale and science case for a major expansion of DOE’s Advanced Scientific Computing Research program. It may also be useful to look retrospectively at what computing capabilities were needed to achieve past science breakthroughs.
Seeking funding mechanisms that ensure consistent and stable investments in advanced computing.
Adopting approaches that make it easier to accommodate the costs of large facilities within annual budgets, such as leasing to smooth costs across budget years.
Exploring partnerships, both strategic and financial, with federal agencies that also provide advanced computing capabilities as well as federal agencies that rely on NSF facilities to provide computing support for their grantees. For example, NSF might enter into a financial agreement with other federal (or possibly private) providers of advanced computing

___________________

⁵ Office of Science, U.S. Department of Energy, A Science-Based Case for Large-Scale Simulation, Washington, D.C., 2003.

Page 23 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

services for access to a fraction of a large system, thus maintaining some ability to support research that involves a large system without incurring its full acquisition cost.

Chapter 7 of this report provides more details on these options and discusses several others for NSF to consider.

In recent years, NSF has adopted a strategy for acquiring computing facilities and creating centers and programs to operate and support them that relies on irregularly scheduled competition among host institutions roughly every 2 to 5 years and on equipment, facility, and operating cost sharing with those institutions. Mounting costs and budget pressures suggest that a strategy that relies on state, institutional, or vendor cost sharing may no longer be viable. Moreover, there are reasons to consider models that provide a longer funding horizon for service providers that operate facilities and the expertise needed for their effective utilization.

In particular, one key reason is to ensure the development and retention of the advanced computing expertise that is needed to effectively manage systems, support their users, address the increasing complexity of hardware and software, and manage the needed transition of software to make effective use of today’s and tomorrow’s architectures. Doing so requires sustained attention to the workforce and more viable career pathways for its members. A longer funding horizon would also better match the depreciation period for buildings, power, and cooling infrastructure.

Another reason to consider other models is that repeated competition can lead to proposals designed to win a competition rather than maximize scientific returns. For example, it can unduly favor unproven technology over more proven, production-quality technology. By contrast, a model with longer time horizons may be better positioned to deliver systems that meet the scientific requirements established by the requirements definition and roadmapping activities. Supporting at least two entities will provide healthy competition as well as stability. The acquisition of individual systems from commercial vendors would remain competitive. Such longer-term entities can take the form of distributed organizations; XSEDE, for example, has evolved in this direction in providing the scientific research community with expertise and services.

A longer funding horizon would also better match the duration of major scientific facilities and the useful lifetime of scientific data, creating new opportunities to address long-term challenges of storage, preservation, and curation. Greater continuity would also foster greater leveraging of advanced computing expertise and facilities across NSF. For instance, long-lived experimental or observational facilities could better manage the risk of standing up their own cyberinfrastructure by partnering with centers.

Page 24 Cite

Suggested Citation:"1 Overview and Recommendations." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

Recommendation 7. The National Science Foundation (NSF) should manage advanced computing investments in a more predictable and sustainable way.

Recommendation 7.1. NSF should consider funding models for advanced computing facilities that emphasize continuity of support.

Recommendation 7.2. NSF should explore and possibly pilot the use of a special account (such as that used for Major Research Equipment and Facilities Construction) to support large-scale advanced computing facilities.

Recommendation 7.3. NSF should consider longer-term commitments to center-like entities that can provide advanced computing resources and the expertise to use them effectively in the scientific community.

Recommendation 7.4. NSF should establish regular processes for rigorous review of these center-like entities and not just their individual procurements.