Page 102 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

6

Range of Operational Models

The National Science Foundation’s (NSF’s) current model of cyberinfrastructure, including advanced computing, is based on a mix of centralized and distributed funding, anchored by the Division of Advanced Cyberinfrastructure (ACI) within the Directorate of Computer and Information Science and Engineering (CISE). Previously, ACI was the Office of Cyberinfrastructure (OCI), reporting to the director. This central structure currently supports the Blue Waters facility (a leading-edge facility) and a set of smaller computing and storage resources via the Extreme Science and Engineering Discovery Environment (XSEDE). In addition to these centrally funded resources, the Geosciences Directorate operates advanced computing facilities at the National Center for Atmospheric Research (NCAR), and it and other NSF directorates fund cyberinfrastructure via a variety of programs.

Advanced computing shares many elements of other NSF infrastructure investments, but it also differs in some profound ways. First, unlike advanced telescopes or particle accelerators, where there is no competing commercial market, a vibrant computing industry develops new technologies and products and responds to market needs and opportunities that dwarf computing expenditures in academia and by federal research sponsors. Second, computing market shifts and the well-documented, rapid evolution of computing technology mean that researcher expectations and economically viable computing technologies change every few years. Consequently, advanced computing capital assets have a very short operational lifetime, in marked contrast to many other scientific instru-

Page 103 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

ments. These shifts, however, do not mean that long-term planning is unnecessary or impossible. Businesses and academia regularly develop strategic information technology (IT) plans that accommodate technology shifts.

Third, advanced computing is distinguished by its universality; it is applicable to all scientific and engineering domains, spanning data capture and analysis, simulation and modeling, and communication and collaboration. Fourth, and consequently, demand for advanced computing continues to grow rapidly, placing increasing stress on the financial models and social processes used to support research cyberinfrastructure. Although states, universities, and companies have long subsidized the capital and operating costs of NSF’s leading-edge advanced computing, those costs have now reached tens to hundreds of millions of dollars. Consequently, the willingness of these parties to engage in “pay to play” (i.e., accept losses in exchange for publicity or collateral institutional advantage) has declined accordingly.

6.1 GOALS AND OPPORTUNITIES

The unique attributes of advanced computing create both opportunities and challenges for any NSF strategy, requiring both nimbleness in the face of changing technologies and economics and stability to ensure sustained capabilities and research continuity. The following basic principles will help ensure the sustainability of NSF’s advanced computing strategy:

Realistic business assessment that exposes the true costs and subsidies of cyberinfrastructure deployment and operation at all scales;
Identification and tracking of technology trends and economics, along with the research opportunities they create;
Long-term planning and articulated strategy (a roadmap) that allows the broad research community and service providers to plan accordingly;
Balanced support for computing hardware, storage systems, and networks, along with professional staff, software and tools, and operating budgets; and
NSF-wide commitment to cyberinfrastructure investment, strategic directions, and operational processes.

Three crosscutting aspects of sustainability are particularly crucial: continuity, coverage, and skills.

Page 104 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

6.1.1 Service Continuity and Adaptability

Service continuity encompasses long-term strategic planning and sustainability on a decadal or longer timescale. NSF’s Major Research Equipment and Facilities Construction (MREFC) projects for scientific infrastructure typically involve years of planning. Today, NSF’s cyberinfrastructure facilities are rarely used to support computational modeling and data analysis for MREFC projects. The former have lifetimes of just a few years, making it impractical for MREFC project leaders to reduce overall costs of advanced computing by including NSF’s own cyberinfrastructure facilities on the MREFC operational plan. This must change if common cyberinfrastructure is to support MREFC projects and other long-term community research.

Historically, most research data has been produced by carefully planned experiments, and it has been both expensive to capture and highly guarded by the researchers who produced it. Ubiquitous, inexpensive sensors and a new generation of large-scale scientific instruments, including MREFC infrastructure, have changed the economics of data capture and are shifting scientific expectations about data retention and community sharing.

Although NSF’s recent requirement that all NSF-funded research projects have a data management and accessibility plan is an explicit policy recognition of data’s importance, there is no NSF-wide cyberinfrastructure strategy or program to support disciplinary or cross-disciplinary data sharing and preservation. Hence, much of the data preservation responsibility and financial burden rests on individual investigators and their home institutions. Today, when the cognizant investigators no longer perceive value in retaining the data, those data are often lost. This is increasingly problematic as the longer-term research value of data often accrues to those in other disciplines.

6.1.2 Service Coverage: Breadth and Depth

In its earliest form, cyberinfrastructure was synonymous with high-performance computing and computational science. Today it encompasses not only high-performance computing but also large-scale data archiving and analytics, software codes and tools, and human expertise and computing-mediated research and discovery. Orthogonally, cyberinfrastructure spans the capabilities and needs of individual investigator laboratories, campus sites, regional and national research facilities, and commercial cloud service providers.

Any comprehensive cyberinfrastructure strategy must include the entire spectrum of services and span the entire range of organizational

Page 105 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

scales. It cannot be simply about leading-edge supercomputing platforms or just about big data analytics; it must integrate both at multiple scales. Nor can it focus on hardware infrastructure while neglecting both software development and maintenance and training and support of technical expertise. It must balance sustainability against adaptation, recognizing that community needs evolve and technology shifts drive new solutions.

The rise of “big data” as a cyberinfrastructure challenge that rivals the scale and complexity of advanced scientific computing is indicative of this need for community adaptation. To respond appropriately to this technology shift and opportunity, NSF must adapt its investments and infrastructure. Big data will require big infrastructure, just as leading-edge computational science does, and will likely involve a mix of both centralized facilities and decentralized repositories at universities. The Australian eResearch initiative and its Australian National Data Service is a relevant example.

In this context, the NSF community would benefit from a coherent, big data retention and preservation strategy and capability, one that balances investigator and disciplinary differences against communal benefit and research collaborations. Unfunded mandates for retention and preservation will not be workable. A balanced model is likely to require greater total funding, a better balance of capital and operating budgets, more focus on business practices and return on research investment, and greater coordination across NSF directorates.

6.1.3 Skills and Workforce

Sustainable and effective cyberinfrastructure depends critically on the skills and expertise of domain scientists and of committed and well-trained advanced computing professionals. Even if they are not directly responsible for code development and workflow management, scientists using advanced computing need to be generally knowledgeable about these matters. For their part, technical staff members not only deploy and operate facilities, but also support community toolkits and codes, serve as keepers of institutional knowledge and expertise, and manage and ensure data security and provenance. Unlike hardware, with a lifetime of a few years, the human infrastructure of people’s experiences in operating such systems has a lifetime of decades. Despite their importance, these staff often lack clear academic career paths and are dependent on an uncertain stream of funding for support.

Given the global competition of computing and computational science talent, any cyberinfrastructure plan must include mechanisms that recognize and reward professional staff and ensure they have career opportunities that retain their talent within the academic community. One

Page 106 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

important contribution to retaining and rewarding this skilled workforce is stability in funding for centers, recognizing that developing an expert staff is a long-term process that can be wasted with even a short-term gap in staff funding.

Programs are also needed to train future computational science and data analytics experts. The report of the NSF Task Force on Cyberlearning and Workforce Development¹ addressed this issue in depth and includes, more broadly, the use of computer-based approaches in learning and recognizes the need to train both the workforce that supports advanced computing and the practicing scientists who make use of advanced computing. Note that the effective use of advanced computing systems requires specialized and advanced training. NSF computing centers and other centers of advanced computing expertise (academic departments involved in advanced computing, national laboratories, and private industry) have leveraged their in-house expertise to offer such training. Examples include training programs for users offered by XSEDE and Blue Waters and the Argonne Training Program in Extreme Scale Computing. Such programs could benefit from a more formal approach and, in particular, long-term support for training materials and resources.

The pervasive NSF-wide and nationwide nature of advanced computing presents a perhaps unique opportunity, and responsibility, to pursue NSF’s diversity and inclusion goals.² This includes ensuring the broadest possible benefit from and access to NSF’s cyberinfrastructure, as well as translating this participation into creating and sustaining a computationally skilled workforce that reflects our nation. XSEDE has made significant progress in increasing the number of underrepresented minority and women users and, more notably, principal investigators (PIs) with allocations. The successful XSEDE campus champions program is a human network, which, while pursuing its primary mission of “empowering campus researchers, educators, and students to advance scientific discovery,”³ also serves other missions including advancing diversity through increased awareness, training, and education. Increased access to statistics and metrics, concerning not just PIs and users but also those accessing online materials or participating in events or using other services, could better inform and guide actions by NSF, XSEDE, and

___________________

¹ National Science Foundation, Advisory Committee for Cyberinfrastructure, Task Force on Cyberlearning and Workforce Development Final Report, March 2011, https://www.nsf.gov/cise/aci/taskforces/FrontCyberLearning.pdf.

² National Science Foundation, Diversity and Inclusion Strategic Plan 2012-2016, http://www.nsf.gov/od/odi/reports/StrategicPlan.pdf, accessed January 29, 2016.

³ XSEDE, “Campus Champions—Overview,” https://www.xsede.org/campus-champions, accessed January 29, 2016.

Page 107 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

the community, and XSEDE is already working toward increased public access to data.

6.2 ORGANIZATIONAL CHALLENGES AND COMMUNITY NEEDS

Although NSF’s current mix of centralized and distributed cyberinfrastructure has had many notable successes, it is not without problems, both for infrastructure providers and for the research community. Some of these problems are rooted in history, some are embedded in the NSF culture, and some are consequences of NSF’s organizational structure.

6.2.1 Competitive Challenges

From its origins, NSF’s advanced computing programs—the original 1980s supercomputer centers program, the 1990s Partnership for Advanced Computational Infrastructure (PACI) program, the 2000s Distributed and Extensible Terascale Facilities, and now XSEDE—have all been based on a repeated cycle of competitions to host and operate large-scale cyberinfrastructure. This cycle continues to pit putative operators—universities and national laboratories—against one another in irregularly scheduled “winner take all” competitive battles. In each case, competitors build ad hoc hardware and software vendor alliances to mount proposals. To compete, they also leverage institutional funds to cover facility, hardware, and operations costs (which are capped in the competitions as a percentage of hardware costs). Much of this difficulty is rooted in the lack of distinction between research and infrastructure funding. Each has widely differing timescales and success metrics.

Not only does repeated infrastructure competition on 2- to 5-year cycles create strong disincentives for national collaboration, it convolves performance review, recompetition, and strategic planning in ways that are challenging for all. In addition, it leads to proposals designed to win a competition rather than maximize community scientific returns. For example, it places a premium on sometimes unproven, next-generation technology that can serve as a vendor-marketing showpiece, rather than on proven, production-quality infrastructure, and researchers have little input into vendor selection, configuration options, or service models. (There is a role for facilities to test novel and risky computing technologies, but it is not in production systems.)

Researchers whose work depends on access to shared facilities also face a form of “double jeopardy.” The scientific merit of their proposed work is assessed via the standard peer review process. However, if funded, they are still not assured of access to the computing and storage resources

Page 108 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

they need to conduct their research. A separate proposal for shared cyberinfrastructure access is conducted by either the XSEDE Resource Allocation Committee (XRAC) or the Petascale Computing Resource Allocations Committee (PRAC) to assess the competence of the researcher and his/her team to use the cyberinfrastructure resources efficiently. However, there is little operational follow-up to ensure the resources are in fact used wisely and efficiently. This is especially problematic because the monetary value of computing resource awards continues to increase.

Finally, as discussed earlier, the current model is structured largely in support of individual investigator and small team projects, with a nominal 3-year lifetime. Larger disciplinary projects and major scientific instruments (e.g., NSF MREFC projects or cross-agency partnerships) with longer production cycles have no mechanism to plan for and request cyberinfrastructure for a 10- or 20-year horizon, because there is no guarantee that any of the extant cyberinfrastructure facilities will still be operational. This adversely affects data preservation activities in particular, because, by definition, they target long-term access.

6.2.2 Structural Challenges

Since the beginning of the NSF supercomputing centers program in the 1980s, NSF ACI and its predecessor organizations have supported computational science research across NSF and provided services to a user base that spans all federal research agencies. Despite the clear recognition that computational science and data analytics are true peers with theory and experiment in the scientific process, NSF-wide coordination and support remain somewhat informal and ad hoc, with directorate participation often a secondary responsibility of the designees.

Although researchers in all NSF directorates are critically dependent on cyberinfrastructure, at present there are no formal mechanisms for coordinated strategic planning, nor are there ready ways to pool and disburse shared resources. Concretely, there are no shared negotiations for discounted infrastructure or services, nor an accepted strategy for prioritizing the balance of individual investigator, campus, and shared infrastructure. NSF would benefit from a formal roadmapping committee for cyberinfrastructure with representatives drawn from all directorates and shared responsibility for cross-directorate resource investment and strategy. In addition, it is crucial that advanced computing be treated as an NSF asset and funded accordingly, regardless of its organizational location. The need is too great and current resources are too limited for loosely coordinated action and reactive processes.

One corollary to the need for strategic coordination is scaling and scoping to match available resources. As a decentralized organization,

Page 109 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

with frequent rotation of program officers, NSF regularly launches new programs and initiatives. For research, this is the distinguishing characteristic of NSF; it is community driven and adaptive. For infrastructure, this is often debilitating, because it leads to a proliferation of small efforts and projects that consume critical resources. When building and operating infrastructure, it is critical to do a small number of things extremely well. Successful infrastructure is derived from a sustained strategy and driven by relentless focus. The implication for NSF is clear. Given limited cyberinfrastructure resources, it must do a very small number of things extremely well, avoiding mission creep and resource dilution at all costs.

A second and equally important corollary is an integrated strategy for high-performance computing and big data analytics and a concomitant rebalancing of investments. Big data requires strongly coordinated big infrastructure, just as leading-edge computational science requires advanced computing systems. The lessons of commercial cloud computing are clear; centralization and scale create unprecedented opportunities for innovation and discovery. Clear and unambiguous requirements for data deposit and access are also needed. Only via such a mechanism, developed in broad community consultation, can the true benefits of data analytics be realized.

6.3 POTENTIAL SUSTAINABILITY APPROACHES

As the scale and scope of advanced computing demands and associated facilities and services have grown, the irregular, winner-take-all process described above has become more problematic. First, the scale and cost of high-end or leadership-class facilities needed to meet researcher demands is a large fraction of the total currently available in the NSF budget, whether within the ACI division budget or the budgets of other directorates. (Whether NSF needs a leadership-class or high-end system should be determined by the analysis of science requirements.) NSF could afford to purchase a significantly larger system than it is currently acquiring, but only by focusing on that investment rather than a larger number of much smaller investments.

Second, uncertainty regarding the timing and capability of infrastructure upgrades makes community planning difficult, and the timing is often not well matched to vendor hardware and software upgrade cycles. Third, the timescales are incompatible with the planning and life cycle of other scientific infrastructure, making use of centrally funded cyberinfrastructure difficult at best and often impossible.

Current models of funding for advanced computing (based on periodic recompetition) and service block allocations (via committee) create substantial uncertainty regarding service continuity and research access.

Page 110 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

There are several ways to address these shortcomings while retaining the best elements of the current approach. These include approaches as varied as public-private partnerships for access to cloud services, federally funded research and development centers (FFRDCs) for organizational sustainability, and MREFC projects for facility construction. Many of these are not mutually exclusive and could be combined to address limitations of the current model.

6.3.1 A Regular Cadence of Infrastructure Investments

The cost of leading-edge advanced computing facilities and user support, whether for computational modeling or data analytics, is no longer measured in tens of millions of dollars. Rather, the costs are now denominated in hundreds of millions of dollars. Indeed, large-scale commercial data centers operated by cloud providers now cost over $1 billion each. The MREFC process may be a useful point of departure. Although there are some aspects of MREFC projects that match the needs of advanced computing infrastructure, the current MREFC mechanisms may need to be modified and adapted to the unique needs of advanced computing infrastructure, including the general nature of computing and the need for regular refresh of computing equipment.

To establish a regular cadence of infrastructure investments, NSF would plan and budget an upgrade every 3 to 5 years, with planning and construction of each generation overlapping the operation of the previous generation. This would clarify and systematize the technology upgrade and refresh process, provide a community mechanism to plan and shape infrastructure transitions, elevate budget planning and prioritization to NSF-wide discussion and approval, and provide the level of funding needed to maintain leading-edge capability.

As with MREFC projects, NSF would be able to request new funds as a line item in its annual budget request, explicitly acknowledging that that current, internal funding is inadequate to meet burgeoning need and scientific priorities. Finally, it would provide an operational instantiation for an NSF-wide advanced computing roadmap.

6.3.2 Leased Infrastructure

Historically, NSF cyberinfrastructure facilities have been operated by academic institutions on NSF’s behalf, typically via cooperative agreements. In turn, the academic institutions have purchased computing, storage, and networking hardware from computing vendors at the start of the cooperative agreement to deliver the committed services. This hardware then depreciates over its nominal 3- to 5-year lifetime until its

Page 111 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

residual economic value is minimal and its performance and capability are no longer competitive. At that point, only another infusion of capital will ensure service continuity.

Rather than purchasing hardware at the time of an award, NSF or its awardees might choose to lease the desired hardware from a vendor or a system integrator. In the simplest variation of this model, the hardware remains the property of the vendor but is located at the operator’s facility. From an operational perspective, a simple leasing model is indistinguishable from outright purchase. Alternatively, the hardware could be hosted and maintained at a vendor facility, with a division of hardware service and user support between the partners.

Annual lease payments would smooth the punctuated budget shock of capital acquisitions, allowing amortization across multiple budget years. Lease terms at a higher level might also include periodic hardware upgrades to maintain leading-edge capability (e.g., equipment could be upgraded during the life of a cooperative agreement without competition to meet a series of performance targets) as well as quality of service and/or performance guarantees. Leases could also include exit clauses for termination, either with or without cause.

This is not a new idea. For example, the Department of Energy (DOE) has used this strategy successfully for its leading-edge computing deployments. University supercomputing centers in Japan also use leasing, which permits a regular and stable annual funding for each center.

6.3.3 Commercial Cloud Service Purchases

The explosive growth of commercial cloud services and their widespread adoption by both large corporations and small start-ups offers another alternative for provisioning advanced computing but is not a panacea (Boxes 6.1 and 6.2). Cloud computing now allows large organizations to outsource the provisioning, maintenance, and operation of computing infrastructure and commodity services, allowing them to focus resources and expertise on their core competence and differential value proposition. For smaller companies, the ability to offer services on a pay-as-you-go basis has reduced capital start-up requirements and lowered the barrier to market entry. The same could be true of individual laboratory users where computing use is highly episodic, with periods of low and high utilization.

The ability to scale services rapidly and dynamically across a wide range of demand is a consequence of the massive scale of cloud service deployment. All of the major cloud service vendors are investing billions of dollars annually to offer advanced computing and data analytics services. In addition, market competition is driving rapid declines in

Page 112 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

BOX 6.1
The Role of Commercial Cloud Computing

Cloud computing has recently emerged as an effective way to provide computing to diverse communities. By taking advantage of economies of scale and easy network access, clouds can provide large amounts of computing power as well as convenient access to shared data. A natural question is whether cloud computing can meet the advanced computing needs of segments of the science community. This box considers some of the advantages and disadvantages of commercial cloud services today. The role of clouds for the National Science Foundation (NSF) will need to be re-evaluated frequently because the technology and ecosystem around clouds continues to change rapidly.

What Is Cloud Computing?

The term “cloud computing” has many parts and multiple definitions. The following aspects of cloud computing are relevant to the discussion here:

Clouds are typically large clusters of computers that exploit economies of scale, providing both computing and data capabilities.
The cloud is a shared resource. Many users can make use of it; the amount of resource is flexible (e.g., not a specific number of cores or nodes). The resource is not just hardware; it includes software and, often, data. It is easy to access cloud services, typically over the Internet.
The resources available to a single job can vary from a single virtual CPU to a substantial faction of the entire cloud. This characteristic is sometimes described as “elastic.” From the user’s perspective, the cloud allows an application to use as much computing power as desired.
Clouds may provide access to shared data, permitting a diverse user community to share the data and data products.
Clouds provide a very flexible service model, permitting rapid access to resources with (usually) no long-term commitment.

Advantages of Cloud Computing

Cloud computing provides a number of advantages, particularly for single investigators or small research groups. Perhaps the most obvious advantage is that a cloud provides quick and easy access to computing power, and it is just as easy to get 10,000 cores as 1 core. For many users, the fact that access is available on demand within minutes of making the initial request is a major advantage. For others, the availability, if only for a short time, of more resources than they could otherwise afford is the key advantage.

For many users, the cost of cloud computing is much lower than the cost of buying and operating a system that is capable of meeting peak needs. The fixed costs, including the often-neglected cost of maintaining cybersecurity as well as data backups against both user errors (e.g., recovering a deleted file) and facility disaster (e.g., fire in the computer room), can be high. If there is no existing software base, then software must also be developed, sometimes at high cost.

Page 113 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

Another advantage is the ability to share non-compute resources, such as data or networking. The easy-access model for clouds makes it simple to share data between users and communities at different institutions, and even different countries. Cloud computing also allows researchers to leverage rapid developments in data analytics that are being driven by the private sector and offered by commercial cloud computing providers. For those whose needs are not met by this software, it will be necessary, as with traditional high-performance computing (HPC), for communities to develop custom software.

Similar services can be offered by NSF centers, although the different allocation and resource model imposes some constraints. In particular, for very large data repositories, it may be impractical for each user to have a copy of the data, and also impractical to move the data to the user’s site, even with high-performance, wide-area networking.

Note, however, that some of these advantages are or could be provided by NSF-operated advanced computing resources, which are already elastic. For example, an allocation on the Blue Waters system can be used for any number of nodes, permitting the use of as little as 32 cores (1 node) and as many as nearly 800,000 cores.

Clouds and Time and Space Sharing

The idea of sharing a computing resource to exploit economies of scale is not new. Computing centers of all types, including the supercomputing centers operated by NSF, the Department of Energy, and the Department of Defense, have done this almost from the beginning of computing.

However, there are important differences between cloud computing and conventional time- and space-sharing systems, although some of these are a matter of degree rather than being qualitatively different. First, clouds are accessed through a convenient network interface. This network connectivity makes it much easier to provide the resource to anyone on the planet, rather than those with access to the facility. NSF’s advanced computing facilities are also conveniently accessible but less so than commercial cloud services, which require only a credit card for access. (For example, NSF’s Blue Waters system requires two-factor authentication for access.) Second, virtualization support has made it much easier to securely run the customer’s software environment, including the operating system. Third, standardized APIs for web access make it easy to provide interactive access to a computing resource on demand, including access to data repositories.

Cloud Cost Realities

Clouds provide many advantages, but it is important to separate cloud myths from realities. Clouds are not free. Some researchers have been given free cloud resources for small or high-profile research projects, and that initiative is to be applauded, but it is not realistic to expect 5 billion CPU-core hours as a gift from a commercial cloud provider (that is a small fraction of just the CPU time NSF consumes in a year and does not include data or network costs).

Commercial clouds, such as those operated by Amazon, Google, or Microsoft, are often very large in scale, with aggregate compute capacity larger than

Page 114 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

leadership-class systems. The very size of these systems gives these vendors substantial economies of scale as well as the ability to influence the hardware and software that goes into these systems. While this gives them some cost advantage, it does not necessarily mean they are cheaper than federally supported HPC centers.

Costs also include more than a charge for CPU time; in any cost comparison, it is important to include all costs. Costs for data handling, such as to/from disk, and for network access, are often a significant additional charge and might exceed the cost of CPU time. To further complicate the issue, commercial clouds rarely give enough details to make cost comparisons; for example, a cloud vendor may charge for a virtual CPU, but the specifics of the hardware (including details of cache size and speeds, specific processor model, and input/output [I/O] characteristics) are not provided.

Some analyses of the use of clouds for scientific computing found that commercial clouds are more expensive than a traditional supercomputing center,¹ although direct comparisons are difficult. Sustained system-performance benchmarking would help inform an understanding of true costs. To provide an updated rough-cost comparison, the committee estimated the cost of providing a leadership-class system using the Amazon Elastic Cloud (Box 6.2).

Convenience is not free. Demand for NSF’s advanced computing services exceeds supply (see Figure 2.5); whenever that is the case, the supply must be rationed by some mechanism. NSF currently does this through the allocations process (which introduces various delays). The commercial market does this by adjusting price (not cost). There is no free lunch: on-demand access is, on average, more expensive than scheduled bulk access (although spot markets also offer an opportunity to get lower costs on occasions when demand is lower than supply). Cloud availability and cost savings depend, in large part, on uncorrelated use by the different customers. In the end, the only way to address the long queue times is to provide enough capacity. Using external clouds, at a higher unit cost, will decrease the available capacity, not increase it, given a fixed expenditure on computing.

Cloud service providers were not the first to seek to greatly improve their cost per unit of performance by exploiting commodity computing and the declining cost/performance ratio of its technologies. HPC has a long tradition of doing this. The best known is the Beowulf cluster. Although not the first effort to exploit commodity processors and networking, beginning in 1994, many groups built effective HPC systems from commodity parts. Many believe that this led to broader use of HPC by making systems more widely available. Today, many HPC systems are 100 percent commodity hardware, making use of high-end, but still commodity, interconnects such as InfiniBand, along with high-end (“server”) processors and I/O systems. Today, commercial cloud systems employ a mix of commodity and custom hardware. For example, servers used by leading vendors include custom accelerators.

Both commercial cloud operators and government-funded HPC centers ex-

Page 115 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

ploit significant economies of scale. Commercial cloud systems are very large, but not larger than other large HPC systems. For example, currently AWS has two systems on TOP500, but the highest is ranked only 180th on the November 2015 TOP500 list. Windows Azure reached number 165 on the 2012 list (but is not currently on the list).

Clouds also may not match the computing needs of large-scale, tightly coupled parallel science applications. Most clouds are designed to provide single “CPUs,” possibly in large numbers, to the user. High-end HPC applications can use tens of thousands of nodes and require frequent and efficient communication among the nodes (e.g., communication every 100 microseconds with a communication overhead of 1-2 microseconds). This requires (1) a fast interconnect, (2) co-scheduling of all (not just most) of the processes in the program, and (3) efficient mapping of the program’s processes onto the specific compute nodes to avoid communication interference with other jobs. (Clouds may in fact be distributed across the entire planet, adding significant speed-of-light latencies.) Clouds can be built to provide these capabilities, but only at additional cost.

In short, as a past study² has shown and as the discussion above further suggests, supercomputing centers already exploit many of the cost advantages of clouds and can be significantly cheaper than commercial cloud providers for some science applications.

Software and Expertise

Researchers will need more than access to the cloud services themselves if they are to make effective and efficient use of cloud services. Although some research communities have developed cloud-based applications and software stacks for certain applications, many disciplines lack common tools that reduce the development and management burden on researchers. Researchers will also need assistance selecting the appropriate services from a growing range of commercially offered options, including among multiple hardware configurations.

Some communities have been looking into taking advantage of clouds and seeing how to take advantage of improving software stacks. A few communities have developed “point-and-click” solutions, but these do not exist for the vast majority of scientific workflows. Just as NSF has invested in expertise to accompany its hardware acquisition programs, it seems natural to extend the model by helping support researchers who want to further explore using cloud services. Indeed, given that cloud services may be of the greatest immediate value in serving the long tail of users, who are less likely to have expertise and experience than larger users, provisioning expertise may be especially important.

__________________

¹ Department of Energy (DOE), The Magellan Report on Cloud Computing for Science, 2011, http://science.energy.gov/~/media/ascr/pdf/program-documents/docs/Magellan_Final_Report.pdf.

² See, for example, DOE, The Magellan Report on Cloud Computing for Science, 2011, page ii, or Finding 7, p. iv.

Page 116 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

BOX 6.2
The Price for a Leadership-Class Machine Implemented Using Amazon Elastic Cloud

It is natural to ask whether one could replace a large high-performance computing system with cloud services, especially given that clouds are often viewed as providing very-low-cost computing. The 2011 Magellan Report on Cloud Computing for Science,¹ prepared for the Department of Energy (DOE), asked just this question. Chapter 12 of the report contains an analysis of the cost of using a cloud to provide the compute and storage capability roughly in line with that at two DOE supercomputing centers, National Energy Research Scientific Computing Center and Argonne Leadership Computing Facility. The report’s analysis considers the different costs, both on the cloud and at a center. Center costs include staffing for operation, building, and power, as well as the computing equipment. This analysis found that Amazon’s commercial cloud offering was roughly three to seven times more expensive at providing compute cores and file storage than the two DOE centers. Section 12.6, “Late Update,” noted a significant drop in price for the Amazon cloud as well as the introduction of more types of nodes, optimized for different types of computational needs. The authors are also careful to note that the analysis does not take into account the sustained performance on the sort of parallel science application that is common for DOE (and the National Science Foundation [NSF]) supercomputers, nor does it include the performance of the input/output (I/O) system and the cost of I/O operations. In addition, these analyses look solely at the cost of the computing resource and do not take into account the expertise in using these systems and working with computational scientists. The intent was to estimate a lower bound for the cost of a cloud; it is likely that the true cost will be higher.

However, 2011 was a long time ago, and cloud technologies and businesses have advanced. Are these conclusions still relevant? A check of Amazon Web Services pricing for computing and storage² suggests that they are. There are now many different tiers of nodes and I/O services, including nodes that provide GPUs

service costs and frequent service expansions (e.g., in software tools and packages).

NSF could make cloud services available to its researchers in one of several ways. All would likely involve NSF negotiating a bulk purchase agreement for data analytics and computing services.

Individual investigators could request cloud services as part of a standard NSF proposal. The PIs of funded proposals could spend awarded funds with the cloud service provider of their choice. This is possible today, although cloud services incur indirect costs that may be more than 50 percent at many institutions, making them significantly less attractive than they otherwise would be compared to the purchase

Page 117 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

and large memory nodes. In addition, substantial discounts are available by purchasing longer term (1- and 3-year) reserved instances. The committee compared the cost of simply providing the cores and the file storage for a large supercomputer to an estimate of the cost to NSF of the Blue Waters supercomputer, using data on January 12, 2016. Note that this cost only includes the processors, memory, and file space and does not include I/O operations (charged for separately by Amazon); the Blue Waters high-performance, low-latency interconnect; the Blue Waters tape library that can hold 320 PB of data; or an HPC-optimized software stack. Using 3-year reserved instances (which provide the greatest discount) and assuming 100 percent utilization of the Amazon resource and an estimate of about 75 percent utilization for Blue Waters, the cloud was still two to three times more expensive, depending on the exact choice of node type. Using 1-year reserved instances increases the cloud cost by about 50 percent.

This analysis does not mean that clouds must be more expensive; for example, NSF could negotiate a better (lower price) deal with a cloud provider. Rather, the point of this analysis is twofold. First, clouds are not necessarily cheaper than public supercomputing centers. Second, the costs must be very carefully analyzed to include all costs (which the committee did not do here) and to reflect the sustainable rather than peak performance available to the applications on the respective systems. For this reason, it will be important for NSF and the science community to continue to monitor the opportunities in cloud computing and to take advantage of them where it makes sense, but to also be aware that clouds are not necessarily cheaper than supercomputing centers and to be very careful in comparing costs.

__________________

¹ Department of Energy, The Magellan Report on Cloud Computing for Science, 2011, http://science.energy.gov/~/media/ascr/pdf/program-documents/docs/Magellan_Final_Report.pdf.

² K. Asanovic, R. Bodik, B.C. Catanzaro, J.J. Gebis, P. Husbands, K. Keutzer, D.A. Patterson, W.L. Plishker, J. Shalf, S.W. Williams, and K.A. Yelick, The Landscape of Parallel Computing Research: A View from Berkeley, Technical Report No. UCB/EECS-2006-183, December 18, 2006, http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf.

of computing hardware, which presently seems inequitable because the cost to an institution for purchasing cloud services is more akin to that of a recurring credit card charge or a subcontract. By bulk purchasing, NSF could eliminate this additional cost as well, potentially receiving more favorable rates than single investigators could obtain. Alternatively, mechanisms to reduce the indirect cost rate charged on cloud services can be explored.

The current computing allocation review process could be expanded to include award of cloud services. Approved users would receive a budget to be spent with their chosen cloud provider. This would ensure centralized assessment of the appropriateness and likely efficiency

Page 118 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

of the request, albeit with the double jeopardy of separate research and computing reviews.
NSF could negotiate an agreement with one or more commercial cloud service providers (e.g., Amazon, Google, or Microsoft) and then operate a virtual facility on behalf of its users. In this model, user and application support would still rest with a noncommercial entity (e.g., via a cooperative agreement with an academic institution), and the cloud vendor would provide computing and storage services. NSF could leverage the Internet2 organization’s NET+ initiative, which has selected commercial cloud services for its members and negotiated pricing and other terms.

All of these approaches would help take advantage of the rapid evolution of cloud services, the vibrant software ecosystem for cloud data analytics, the ability to use resources at massive scale, and the presence of large, shared data sets.

To address the structural disparity in the cost of cloud services compared to hardware acquisition, NSF would need to address the facilities and administrative (F&A) costs now charged for purchase of cloud services. Today, researchers can include cloud services as direct costs in research proposals, but these services are not excluded from the modified total direct cost (MTDC) on which F&A is computed. In contrast, capital equipment costs (e.g., computing equipment exceeding $5,000) are excluded from MTDC. The result is that $1 of cloud service costs $1.XX, where XX is the F&A rate at the researcher’s institution. In contrast, the equivalent service on computing equipment purchased by an investigator on a research award costs only $1. In addition, power, cooling, and space for equipment are included in F&A, further skewing the incentive toward equipment purchase rather than service purchase. Removing this inequity would allow a more direct comparison and researcher selection based on perceived research value.

6.3.4 Cooperative Agreement Extension

Any funding and organizational structure must balance organizational stability and sustainability against responsiveness to technological change and customer needs. As noted earlier, NSF has long supported leading-edge cyberinfrastructure via a series of solicitations and open competitions. Although this has stimulated intellectual competition and increased NSF’s financial leverage, it has also made deep and sustainable collaboration difficult among frequent competitors. Individual awardees quite rationally often focus more on maximizing their long-term prob-

Page 119 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

ability of continued funding, rather than adapting and responding to community needs.

Frequent competitions have also made it more difficult for NSF-funded service providers to recruit and retain talented staff when the horizon for funding is only 2 to 5 years. This is especially true when the competition for IT and computational science expertise with industry is so great. Periodic review and rigorous performance assessment need not be coupled with “life or death” proposal competition and cooperative agreement funding.

Other federal agencies regularly review the performance of their service facilities, providing strategic and tactical guidance, without coupling those reviews to a facility termination decision. For example, DOE operates its National Energy Research Scientific Computing Center (NERSC) in this model. Hardware acquisition decisions, management reviews, and service priorities are subject to stringent reviews, but NERSC itself is not subject to termination review each time a new system is acquired. This also allows more honest and forthright discussion of problems, without existential fears.

NSF could consider designating one or more cyberinfrastructure centers as a core facility with a nominal lifetime of a decade—for example, as part of an extended cooperative agreement. Working with NSF and under regular review, the center would deploy and operate cyberinfrastructure on NSF’s behalf. This would ensure organizational lifetime and planning horizons more similar to those of other NSF MREFC projects, which often last 10 to 20 years. In addition, longer horizons would also let NSF and its service providers evolve services and staffing in response to changing community needs and business partnerships. As extant examples, NSF’s National Radio Astronomy Observatory and National Optical Astronomy Observatory play these roles in the astronomy community.

6.3.5 Federally Funded Research and Development Centers

As noted above, continuity is crucial to strategic planning, staff retention, and cross-domain partnerships. Cooperative agreements, whether for MREFC projects or other initiatives, provide one mechanism for collaborative planning and management. Implicit in all such approaches is a presumption that the project has a bounded lifetime. In turn, that presumption profoundly and adversely affects strategic planning and a commitment to sustainability within NSF and the community.

The centrality of advanced computing to research suggests that NSF treat it as a long-term, indefinite commitment that more clearly delineates the distinction between performance review and accountability and organizational continuity and service capabilities. Such separation

Page 120 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

would allow service providers to work more collaboratively with NSF on responses to community needs and would encourage interorganizational collaboration.

An FFRDC is an excellent example of this balance. FFRDCs are independent nonprofit entities sponsored and funded by the U.S. government to meet specific long-term technical needs in areas of national interest. They operate as long-term strategic partners with their sponsoring government agencies. Many FFRDCs, such as DOE laboratories, include multiple programs spanning many areas of science and engineering research. NSF already uses an FFRDC, NCAR, as an integral part of NSF’s cyberinfrastructure service strategy for the geoscience community; it can budget and plan new equipment acquisitions, and it offers staff career paths and continuity.

NSF could consider establishing one or more FFRDCs to support national cyberinfrastructure for research. Working with NSF, industry, and academia, such cyberinfrastructure FFRDCs could develop a strategic plan for cyberinfrastructure that meets evolving community needs, tracks technology developments, and provides a roadmap for NSF’s directorates. The FFRDCs would also deploy and operate general or domain-specific cyberinfrastructure for the national community.

6.3.6 Partnerships with Other Agencies

NSF could explore partnerships with other federal agencies. For example, NSF could coordinate complementary leadership-class system configurations with DOE, especially with DOE systems that are used to support the DOE Innovative and Novel Computational Impact on Theory and Experiment program. The purpose of this partnership is not to shift the responsibility for providing cycles from NSF to DOE; rather, it is in recognition of the fact that there is not a simple one-dimensional configuration space for advanced cyberinfrastructure. Such a partnership would develop a way to fairly serve special needs from the population supported by each agency. For example, today NSF operates a system with more memory than any DOE system; conversely, DOE operates a system with more GPUs and peak floating-point operations per second (FLOP/s) than any NSF system. Currently, computational scientists request time on a variety of resources, taking advantage of DOE, NSF, and other providers of advanced computing infrastructure to the science community. But there is no formal coordination between agencies of the systems that they acquire, and trade-offs are made independently. Partnerships with other agencies could help ensure that the full spectrum of advanced cyberinfrastructure is available to the science community.

Page 121 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

6.3.7 Strategic Public-Private Partnerships

As the demand for cyberinfrastructure continues to rise, the costs for deployment and operation rise commensurately. This is true for both aggregate demand—laboratory and institutional capabilities—and leading-edge computing and data storage systems. Superficially, this may seem paradoxical, given the dramatic increases in computing capability and storage capability regularly delivered by the computing industry. However, those same computing advances have birthed new sensors and scientific instruments and a torrent of new digital data, as well as new simulation models and expectations for ever-larger computing capability.⁴

Rising demands for computing and storage (end-to-end capabilities, not just hardware) now challenge the finances and social processes of both NSF and its academic grantees. Simply put, the rising cost of leading-edge facilities (NSF Track 1 and Track 2 systems) is not sustainable under the current partnership model and may not be sustainable under any government-funded model. Put another way, the perceived return on investment for a facility costing hundreds of millions of dollars must be substantial, particularly when the equipment has a useful lifetime of only 3 to 5 years.

NSF might consider alternative public-private partnership models that create financial incentives for private-sector partners to operate large-scale cyberinfrastructure facilities on the research community’s behalf. These necessarily require more flexible approaches than traditional fee-for-service models and might include such options as access to university intellectual property in exchange for cyberinfrastructure services. Precisely how such arrangements might work would depend on the willingness of the academic community to agree on, for example, vendor exclusivity and intellectual property sharing.

6.3.8 User-Driven Acquisition and Allocation

All of the operational strategies described above are based on some variant of central planning and resource management. Alternatively, NSF could decentralize cyberinfrastructure acquisition and support and rely on social and economic forces to define and optimize community cyberinfrastructure. One first step in this process would be denominating all services in dollars, rather than the abstract, normalized service units (SUs) or storage allocations used today. SUs play an important role by attempting to enable the comparison of allocations on computers that may differ widely in both architecture (e.g., conventional processors or

___________________

⁴ The end of Dennard scaling and limits of future microprocessor performance increases mean the “free lunch” of performance doubling will bring new and sobering economic constraints. Larger capability will require larger capital infusions.

Page 122 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

graphical processing units) and time of deployment. For instance, the use of SUs makes more quantitative the assessment in Figure 2.5 of resources over the past decade. However, despite their merit, SUs obscure from users the actual costs associated with requests and allocations, and the use of SUs also distances the NSF programs and the user community from the prioritization processes about how the underlying funding is allocated. Moreover, the conversion factor between actual wall time on a computational resource and SUs is established by each site based on High-Performance Linpack benchmark results, which is just a single and outdated metric that does not capture the diversity of factors controlling the capability (which is more than just performance) of individual applications mapped to different architectures. Recently, XSEDE has started notifying both users and associated NSF program managers of the actual dollar value associated with an allocation, and there seem to be multiple significant potential benefits in making users even more cognizant of and ultimately responsible for the actual costs and effective use of resources.

Realizing these benefits can certainly start with increasing user awareness of costs and engaging users in resource planning and acquisition. In a more extensive realization of this model, however, individual researchers or research teams would be allowed to spend awarded cyberinfrastructure dollars at their discretion. This cyberinfrastructure marketplace might include the following options:

Purchasing local computing infrastructure, services, or staff support for use within the individual researcher’s laboratory;
Contributing dollars to a university pool that operates a campus facility under a “campus condominium” model;⁵
Pooling research dollars to purchase and operate shared regional or national facilities; and
Purchasing commercial cloud services, exploiting the properties of elasticity and on-demand access.

All of these variants allow individual researchers and research teams to make separate decisions on how best to advance their research. They also remove researchers from double jeopardy, where they must compete separately for research funding and for computing resources. In addition, the options expose the costs of each option in a common currency. How-

___________________

⁵ Under a condominium model, a university purchases a baseline computing and storage infrastructure and allows individual researchers to purchase and contribute nodes and storage to the shared pool. Researchers receive access priority in proportion to their financial contribution.

Page 123 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

ever, the risk is that the sum of the local research optimizations may not be globally optimal for the national community.

Moreover, some form of such a model may provide an effective mechanism to encourage and formalize investments and responsibilities of researchers, institutions, and regions in private and shared local or national infrastructure. NSF already recognizes that there are significant computing resources “at the edges” (meaning within campuses and states) and that there is a clear need to coordinate and leverage investments. Programs such as Campus Cyberinfrastructure—Data, Networking, and Innovation Program (CC*DNI) and Major Research Instrumentation help develop this infrastructure, and elements of XSEDE, such as campus champions, are directed toward tying both communities and cyberinfrastructure together. However, the same economic and technological forces driving the decisions on national computing infrastructure are eroding the abilities of campuses to purchase and operate their own cyberinfrastructure, and especially challenging are the cost and complexity of managing research data. Thus, smaller institutions are now choosing to invest in infrastructure operated by larger neighbors or at national centers, which can provide both cost and other advantages compared to attempting to use the commercial cloud. However, in the absence of a scalable national model, such partnerships are presently ad hoc. The NSF Big Data Regional Innovation Hubs (BD Hubs) program is potentially a powerful catalyst to drive regional synergy, but this still needs to be tied to a national narrative that includes all aspects of advanced cyberinfrastructure.

Variations of this economic model have been explored in the past. Then called the “green stamps” model of resource allocation, it was analyzed in the 1995 Report of the Task Force on the Future of the NSF Supercomputer Centers Program.⁶ The report noted

The key concept in a green stamp mechanism is the use of the stamps to represent both the total allocation of dollars to the Centers and the allocation of those resources to individual PI’s. NSF could decide a funding level for the Centers, which based on the ability of the Centers to provide resources, would lead to a certain number of stamps, representing those resources, being available. Individual directorates could disperse the stamps to their PI’s, which could then be used by the researchers to purchase cycles. Multiple stamp colors could be used to represent different sorts of resources that could be allocated.

The major advantages raised for this proposal are the ability of the di-

___________________

⁶ National Science Foundation, Report of the Task Force on the Future of the NSF Supercomputer Centers Program, NSF9646, September 15, 1995, https://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf9646.

Page 124 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

rectorates to have some control over the size of the program by expressing interest in a certain number of stamps, improvement in efficiency gained by having the Centers compete for stamps, and improvements in the allocation process, which could be made by program managers making normal awards that included a stamp allocation.

Other than the mechanics of overall management, most of the disadvantages of such a scheme have been raised in the previous sections. In particular, such a mechanism (especially when reduced to cash rather than stamps) makes it very difficult to have a centralized high-end computing infrastructure that aggregates resources and can make long-term investments in large-scale resources.

NSF could conduct a pilot project to evaluate the power of market forces in allocating limited cyberinfrastructure support. Among the issues to evaluate is whether such an approach would exacerbate the problem of buying resources by the hour (see Section 5.5) without recognizing the fixed costs, such as the cost of retaining staff and supporting the use of new architectures.

Independently of any pilot projects, NSF will benefit by expressing in dollars the true cost of large cyberinfrastructure resource allocations (i.e., those now made by the XSEDE Resource Allocation Committee [XRAC] and Petascale Computing Resource Allocation Committees [PRAC]). First, it would allow researchers to identify the value of cyberinfrastructure awards to their institutions. Second, and equally important, it would make clear that such large allocations have true costs, encouraging wise and efficient use.

Page 125 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×

Appendixes

Page 126 Cite

Suggested Citation:"6 Range of Operational Models." National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. Washington, DC: The National Academies Press. doi: 10.17226/21886.

×