Skip to main content

Currently Skimming:

2 Presentation Summaries
Pages 3-17

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 3...
... support them? Numerical Laboratories on Exascale Alexander Szalay, Johns Hopkins University Cosmologist and computer scientist Alexander Szalay discussed trends in advanced scientific computing and highlighted examples of approaches being employed in physics, cosmology, and oceanography.
From page 4...
... While no scientist will turn down more data, what they actually need is more data that is relevant to the science that they are doing -- not more noise. Frontiers at the Interface of High-Performance Computing, Big Data Analytics, Deep Learning, and Multimessenger Astrophysics Eliu Huerta, University of Illinois at Urbana-Champaign Eliu Huerta, an astrophysicist at the National Center for Supercomputing Applications (NCSA)
From page 5...
... . He spoke to the ways in which big data is generated and used in the biosciences, noted gaps in the use of tools and simulations, and highlighted examples of how deep learning and graph mining are being applied in this space.
From page 6...
... He added that GPU resources are particularly useful for deep learning, which is nascent in materials science but holds the promise of performing better than traditional architectures for data-driven materials science. ENVISIONING A CYBERINFRASTRUCTURE ECOSYSTEM FOR AN ERA OF EXTREME COMPUTE AND BIG DATA Manish Parashar (National Science Foundation)
From page 7...
... Responses reflect a growing need for computing as well as continued changes in how computing infrastructure is used, with an increasing emphasis on on-demand computation, rapid data processing, comparisons between simulations and observations, data management, machine learning, big data techniques, and streaming data from the Internet of Things, large instruments, and experimental facilities. Discoverability, accessibility, and reproducibility are key concerns, and software training and workforce development will be needed to keep pace with capabilities.
From page 8...
... The Big Data and Extreme-Scale Computing workshops and report9 articulate the need for a strategy to better align the highperformance computing community and the data science community so each can better leverage the other's work and resources. While Beckman acknowledged that bringing these divided communities together may be an uphill task, he emphasized that it is nonetheless worthwhile.
From page 9...
... To achieve the needed flexibility and make this infrastructure easier to use, Beckman proposed a three-pronged approach: replace the classic HPC file system with flexible storage services that use token-based authentication, tweak high-performance computing interconnects to allow software-defined networking, and make it possible to clear a node quickly and immediately make it available to another user. Such updates would make this infra­ structure more competitive with commercial cloud services, which are currently far easier to use than research cyber­nfrastructure, and also better position it to integrate AI hardware that is coming down the pipeline.
From page 10...
... Computer architectures are evolving at a rapid pace, mainly in response to the changing needs and capabilities, also driven by the use of machine learning for applications such as natural language processing, convolutional neural nets in image and video recognition, autonomous vehicles, and reinforcement learning in robotics, as well as high-performance computing for studies in areas that include computational fluid dynamics, financial and data analytics, weather simulation, high-energy physics, and computational chemistry. Padhi highlighted ways that AWS provides capabilities that support these ever-changing needs.
From page 11...
... One reason the traditional machine learning paradigm does not work well with high-volume heterogeneous streaming data is because it can be hard to acquire suitable training data to iteratively develop decision trees. The Robust Random Cut Forest approach continually refreshes the ensemble of trees with new data as the stream evolves in time, keeping a little history as it goes forward -- essentially creating a sketching library for streaming data.
From page 12...
... He argued that the current NSF model of supporting campus-based systems, national capability class systems, and national capacity class systems provides a diverse cyberinfrastructure ecosystem to support academic research needs and that, given the critical role that simulation science and data-driven science play in science and engineering as well as the U.S. economy, additional funding is needed to support acquisition at all of these levels.
From page 13...
... Microsoft also targeted the data science community as part of this effort by donating cloud services to the NSF Big Data Innovation Hubs. One particularly powerful benefit of using cloud services for research is the flexibility in terms of the types of virtual machines that can be utilized.
From page 14...
... Noting that many facilities have weak systems for cataloguing data resources, Ross suggested steps toward smarter data retention systems could include indexing capabilities or even applying a graph approach to file system metadata. Going further, some decisions about what to retain could be automated based on an understanding of the relationships between jobs and data products and among multiple data products, enabling an assessment of the implications of losing a given data product.
From page 15...
... Data commons are one of the architectures used to support data science. Data ecosystems integrate multiple data commons, cloud computing resources, and various services, applications, and resources to support data science for a research discipline or disciplines.
From page 16...
... Even as the underlying hardware continues to diversify due to the end of Dennard scaling and increasing focus on accelerators and FPGAs, Reed asserted that the traditional scientific computing software ecosystem and the new data analytics and deep learning ecosystems are not converging. Today, the rapid developments are concentrated at the two extremes: the very small (edge computing and sensors)
From page 17...
... Projects in these areas frequently involve both big data and simulation. To facilitate this research, ECP is incorporating and advancing technologies for AI, in situ analytics, machine learning, new approaches to steering simulations, predictive analytics, and graph analytics, among other areas.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.