The National Academies of Science, Engineering, and Medicine convened a Workshop on Converging Simulation and Data-Driven Science on May 10, 2018, in Washington, D.C.
Convergence has been a key topic of discussion about the future of cyberinfrastructure for science and engineering research. Convergence refers both to the combined use of simulation and data-centric techniques in science and engineering research and the possibilities for a single type of cyberinfrastructure to support both techniques.
The workshop featured speakers from universities, national laboratories, technology companies, and federal agencies who addressed the potential benefits and limitations of convergence as they relate to scientific needs, technological capabilities, funding structures, and system design requirements.
This proceedings was created from the presenters’ slides, notes, and a full transcript of the proceedings to serve as a public record of the workshop presentations and discussions.
William D. Gropp, Ph.D., University of Illinois at Urbana-Champaign, set the stage with a brief summary of the issues and previous work that inspired and informed the workshop.
Along with Robert Harrison (Stony Brook University), Gropp served as co-chair of the National Academies committee that produced the 2016 consensus report Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020.1 The workshop was designed to build upon that report by exploring how convergence might (or might not) address some of the complex challenges identified in the report.
Gropp highlighted three of the report’s major conclusions that are relevant to the workshop’s context. First, the report concludes that the demand for the National Science Foundation’s (NSF’s) advanced computing infrastructure exceeds supply, and that there is no “magic fix” that will solve that imbalance. Second, successfully transitioning this infrastructure to new, more capable computer architectures requires effectively engaging with the scientific communities this infrastructure serves. Finally, moving advanced computing forward requires focusing not only on hardware that makes computation possible but also on the software used for simulation or data analysis—as well as the infrastructure needed to store and manage the data being analyzed.
1 National Academies of Science, Engineering, and Medicine, Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020, The National Academies Press, Washington, DC.
When the 2016 report was released, one of the questions raised in response was whether convergence between simulation and data-driven computation is in fact both possible and desirable. Although there is certainly a trend toward convergence at the device level, the same is not necessarily true at the system or software levels, Gropp noted. Since that report was released, in addition to continued growth in classic high-performance computing (focused on running simulations), there has been substantial growth in data-centric computing. These developments come at a time when the resources of the agencies that fund advanced computing infrastructure have been constrained.
The 2016 report recommended developing a roadmap for cyberinfrastructure so that scientists would have a better sense of the computing resources they can expect to have available in the future. Although NSF has released its roadmaps for computing infrastructure over the next decade, Gropp noted that other countries, such as Japan, have or are developing even stronger advanced computing portfolios, with some of their second-tier “Track-2” systems rivaling the performance of NSF’s current top-tier “leadership” class system. It is likely, Gropp acknowledged, that universities and other federal agencies will be expected to fill the gap to ensure U.S. computing resources keep pace with the needs of the U.S. scientific community. This gap raises the questions of whether limited funds could be stretched further with converged platforms that meet a larger fraction of scientists—whether they are running simulations or analyzing data—and how far such a strategy could go in meeting the needs of scientific research.
To frame the workshop, Gropp posed several questions relevant to charting a path forward for the convergence of simulation and data-driven science. First, what are the opportunities? In other words, what science does convergence make feasible that would not otherwise be feasible? Second, what are the challenges? Could a single system meet all needs, or is some specialization necessary or desirable? Finally, how do we move forward? Given resource constraints, how should funding agencies and universities address trade-offs, and how can they support innovation in computing platforms without starving current research of the computing capacity it needs?