Opportunities from
the INTEGRATION of
SIMULATION SCIENCE
and DATA SCIENCE
PROCEEDINGS OF A WORKSHOP
Committee on Future Directions for NSF
Advanced Computing Infrastructure to Support U.S. Science in 2017-2020
Computer Science and Telecommunications Board
Division on Engineering and Physical Sciences
THE NATIONAL ACADEMIES PRESS
Washington, DC
www.nap.edu
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001
This activity was supported by contracts between the National Academy of Sciences and the National Science Foundation under award number OCI-1344417. Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project.
International Standard Book Number-13: 978-0-309-48186-1
International Standard Book Number-10: 0-309-48186-4
Digital Object Identifier: https://doi.org/10.17226/25199
Copyright 2018 by the National Academy of Sciences. All rights reserved.
Printed in the United States of America
Suggested citation: National Academies of Sciences, Engineering, and Medicine. 2018. Opportunities from the Integration of Simulation Science and Data Science: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: https://doi.org/10.17226/25199.
The National Academy of Sciences was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president.
The National Academy of Engineering was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. C. D. Mote, Jr., is president.
The National Academy of Medicine (formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president.
The three Academies work together as the National Academies of Sciences, Engineering, and Medicine to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine.
Learn more about the National Academies of Sciences, Engineering, and Medicine at www.nationalacademies.org.
Consensus Study Reports published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the study’s statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committee’s deliberations. Each report has been subjected to a rigorous and independent peer-review process and it represents the position of the National Academies on the statement of task.
Proceedings published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies.
For information about other products and activities of the National Academies, please visit www.nationalacademies.org/about/whatwedo.
COMMITTEE ON FUTURE DIRECTIONS FOR NSF ADVANCED COMPUTING INFRASTRUCTURE TO SUPPORT U.S. SCIENCE IN 2017-2020
WILLIAM GROPP, NAE,1 University of Illinois at Urbana-Champaign, Co-Chair
ROBERT J. HARRISON, Stony Brook University, Co-Chair
MARK ABBOTT, Woods Hole Oceanographic Institution
ROBERT GROSSMAN, University of Chicago
PETER M. KOGGE, EMU Solutions, Inc.
PADMA RAGHAVAN, Vanderbilt University
DANIEL A. REED, University of Utah
VALERIE E. TAYLOR, Argonne National Laboratory
KATHERINE A. YELICK, NAE, University of California, Berkeley
Staff
JON EISENBERG, Senior Director
KATIRIA ORTIZ, Associate Program Officer
SHENAE BRADLEY, Administrative Assistant
JANKI PATEL, Senior Program Assistant
___________________
1 Member, National Academy of Engineering.
COMPUTER SCIENCE AND TELECOMMUNICATIONS BOARD
FARNAM JAHANIAN, Carnegie Mellon University, Chair
LUIZ ANDRÉ BARROSO, Google, Inc.
STEVEN M. BELLOVIN, NAE, Columbia University
ROBERT F. BRAMMER, Brammer Technology, LLC
DAVID CULLER, NAE, University of California, Berkeley
EDWARD FRANK, Cloud Parity, Inc.
LAURA HAAS, NAE, University of Massachusetts, Amherst
MARK HOROWITZ, NAE, Stanford University
ERIC HORVITZ, NAE, Microsoft Corporation
VIJAY KUMAR, NAE, University of Pennsylvania
BETH MYNATT, Georgia Institute of Technology
CRAIG PARTRIDGE, Raytheon BBN Technologies
DANIELA RUS, NAE, Massachusetts Institute of Technology
FRED B. SCHNEIDER, NAE, Cornell University
MARGO SELTZER, Harvard University
MOSHE VARDI, NAS1/NAE, Rice University
Staff
JON EISENBERG, Senior Director
LYNETTE I. MILLETT, Associate Director
SHENAE BRADLEY, Administrative Assistant
EMILY GRUMBLING, Program Officer
RENEE HAWKINS, Financial and Administrative Manager
KATIRIA ORTIZ, Associate Program Officer
JANKI PATEL, Senior Program Assistant
For more information on CSTB, see its website at http://www.cstb.org, write to CSTB,
National Research Council, 500 Fifth Street, NW, Washington, DC 20001,
call (202) 334-2605, or email the CSTB at cstb@nas.edu.
___________________
1 Member, National Academy of Sciences.
Preface
In 2016, the Committee on Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science in 2017-2020 issued a report making recommendations aimed at achieving four broad goals: (1) positioning the United States for continued leadership in science and engineering, (2) ensuring that resources meet community needs, (3) aiding the scientific community in keeping up with the revolution in computing, and (4) sustaining the infrastructure for advanced computing. In 2018, as the culmination of its work, the committee organized a workshop, which is summarized in this report, to explore opportunities for the integration of simulation and data-driven science.
The workshop was organized to examine current and emerging science applications that span simulation and data-driven science, their characteristics, and future approaches for cyberinfrastructure to support them, with a focus on advanced computing needs. It was focused on building on issues and themes advanced in the committee’s report and engaged representatives of scientific communities who currently work at the simulation-data intersection or may do so in the future as well as those exploring new computing architectures for supporting this research. Some examples of the types of questions posed to attendees during this workshop include:
- How can one characterize the range of scientific research that involves simulation and data-driven science? Is there a set of particular cases that can be used to illustrate that range?
- To what extent can converged cyberinfrastructure designed to support simulation and data-driven science meet future science needs, and what applications may require more specialized approaches?
- How much of convergence can be accomplished through shared systems vs. using the same basic components and architectures but in different configurations?
- What are the implications and opportunities for science of the convergence between high-performance computing and data analytics in the commercial sector?
- What roles can cloud technologies and commerce cloud providers play in meeting the needs of future science?
- What technical barriers exist to achieving convergence, such as different software stacks for simulation and data-driven science?
- What are some next steps that the scientific community could take to better understand future applications, cyberinfrastructure requirements, and opportunities for convergence?
The first chapter provides some context, drawing on William Gropp’s overview presentation. Chapter 2 contains brief summaries of presentations made at the workshop. Chapter 3 summarizes the closing discussion session along with some observations by the committee of themes throughout the workshop. The agenda of the workshop is in Appendix A. Short biosketches of the committee members and speakers appear in Appendixes B and C, respectively.
Our sincere thanks to the committee members and National Academies staff who helped organize the workshop, as well as to the invited speakers for their thoughtful remarks and enthusiastic participation in the discussions that ensued. The workshop proved especially timely, as evidenced by the high level of interest, and the enthusiasm and attentiveness of the workshop participants.
In preparing this report, some speakers prepared initial drafts of the summaries of their presentations, and all speakers were given an opportunity to review the accuracy of the summaries. Writing support was provided by Anne Frances Johnson, Creative Science Writing. Katiria Ortiz, associate program officer at the National Academies, organized the workshop and led development of this report. Shenae Bradley, administrative assistant, handled travel arrangements and meeting logistics. Jon Eisenberg, Computer Science and Telecommunications Board director, oversaw the project. We also extend our appreciation to the National Science Foundation for their support and encouragement of this activity.
William Gropp, University of Illinois at Urbana-Champaign, Co-Chair
Robert Harrison, Stony Brook University, Co-Chair
Acknowledgment of Reviewers
This Proceedings of a Workshop was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies of Sciences, Engineering, and Medicine in making each published proceedings as sound as possible and to ensure that it meets the institutional standards for quality, objectivity, evidence, and responsiveness to the charge. The review comments and draft manuscript remain confidential to protect the integrity of the process.
We thank the following individuals for their review of this proceedings:
Robert F. Brammer, Brammer Technology,
Rudolf Eigenmann, University of Delaware,
Thomas Furlani, University at Buffalo,
David Konerding, Google, Inc., and
Tony Hey, Science and Technology Facilities Council, Rutherford Appleton Lab.
Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the content of the proceedings nor did they see the final draft before its release. The review of this proceedings was overseen by Daniel Atkins III, NAE, University of Michigan. He was responsible for making certain that an independent examination of this proceedings was carried out in accordance with standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the rapporteurs and the National Academies.
This page intentionally left blank.
Contents
Numerical Laboratories on Exascale
Envisioning a Cyberinfrastructure Ecosystem for an Era of Extreme Compute and Big Data
NSF’s Role in Cyberinfrastructure
Current and Future Developments
Guiding Principles Going Forward
Architectural Landscape and Trends
Google’s Tools for Convergence
Architectures at Amazon Web Services
Service, Usage Models, and Economics
Stream Processing and Simulation at Amazon Web Services
Campus-Based Systems and the National Cyberinfrastructure Ecosystem
Cloud Computing with Microsoft Azure
Data and Convergence at the Department of Energy
To Keep, or Not To Keep, That Is the Question: And Whether Convergence, Clouds, or Commons Is Better
Convergence Opportunities and Limits
Convergence Lessons: Future Infrastructure
Opportunities for Overcoming Data Bottlenecks
Modeling and Simulation (ModSim) Convergence in the Exascale Computing Project