Meetings on Data-Intensive Science
Shifting Power: Data Analytics & the Smart Energy Grid 202035
August 23-24, 2010
Microsoft Corporate Campus
This meeting assembled an invited group of scientists and engineers from major research universities, private industry, and government to discuss the impact of large and distributed datasets, together with the associated computational challenges, on the smart energy grid. The smart grid focus was selected in part due to the growing international interest in this subject and in part because the issues can be abstracted to other application domains (e.g., real time analysis of streaming sensor data is broadly applicable). This meeting was U.S.-centric, with nominal international representation. The subject did, however, provide a valuable “use case” for discussing issues directly linked to DoD’s priority S&T investment areas, as multiple participants noted that monitoring of international smart grid initiatives may yield useful insights regarding trends and accelerators that will shape the S&T landscape in areas of interest to DoD. Discussions throughout the two days centered on issues relating to decision making from multiple stakeholder perspectives—ranging from design optimization to operational control at various levels of the grid to individual consumer decisions enabled by “smart” meters. Topics discussed included the need for real time and predictive analytics together with better visualization techniques to inform decisions, additional grid management complexities stemming from distributed decision-making, and a spectrum of data management challenges (ownership, retention, access, privacy, etc.). In discussing research pathways to underpin smart grid initiatives, several participants noted the need to fuse data across disciplines as well as across time and space. All of these issues address research related to DoD’s “Data to Decisions” investment area.
Many participants noted the need for a ‘smart grid’ that is self-healing, i.e., able to identify and react to system disturbances and take actions to correct them with little or no human interaction. They emphasized that the smart grid would need to be resilient to human-induced and natural disasters, resisting attacks on its physical and computerized infrastructure. Achievement of these attributes would depend on further research in areas including “Engineered Resilient Systems,” “Autonomy,” “Electronic Protection,” and “Cyber Science & Technology.”
The meeting participants were selected for their expertise directly relating to some aspect of the myriad smart energy grid initiatives. Some of the BGST members who participated in the meeting observed that an unintended consequence of this selection was a tendency to focus on issues—particularly in the policy realm—that were specific to the smart energy grid application. As a result, in structuring its next meeting, BGST chose to invite participants who were working in multiple problem domains.
35 A brief summary of this meeting can be found at http://sites.nationalacademies.org/xpedio/groups/pgasite/documents/webpage/pga_062054.pdf.
Realizing the Value from Big Data36
February 28-March 2, 2011
Institute for Infocomm Research (I2R) of Singapore’s Agency for Science, Technology, and
This meeting convened bioinformatics scientists and environmental scientists together with computational/data scientists, who were asked to identify computational and policy roadblocks that prevent their disciplines from fully extracting value from “big data.” Bioinformatics and environmental sciences were selected not only because both are data-rich applications, but also because the underlying research challenges are inherently international. Invited participants were selected jointly with principals from Singapore’s I2R,37 which hosted the meeting, and were drawn from research organizations in Australia, China, England, Hong Kong, Japan, Korea, the Netherlands, Portugal, Singapore, and the United States. Discussions during the meeting reflected broad international interest in the subject, but also exposed difficulties inherent to communications across problem domains as well as across cultural contexts.
A central theme centered on challenges stemming from researchers’ needs to find and use “big data” captured or generated by others; many participants generally agreed that improvements in this area would enhance the efficiency of their own research. Issues ranged from researchers’ inability to find and access relevant datasets to an inability to make sense of the data, given access. While some barriers derived from policy (e.g., ownership, privacy), other impediments were related to the absence of standards for metadata that could enable search engines to find relevant datasets and also help researchers understand the provenance and meaning of the data. Participants working in small groups were asked to identify specific initiatives that might mitigate key barriers; suggestions ranged from the development of common abstractions that could be reused across domains, to the notion of a standardized Internet protocol that would facilitate identification and location of “big data” of interest to a research team.
Participants also discussed challenges related to the management and exploration of “big data,” e.g., the importance of common infrastructures to share the cost burden associated with “big data”; efficient processes and incentives to motivate researchers to share data; and common tools to facilitate mining and exploration of complex datasets. Participants expressed differing opinions on the definition of “big data”; some viewed it as a matter of size, while others associated it with complexity. Disciplinary differences arose, too. Most computer scientists voiced a desire to perform research at a level of abstraction above that valued by domain scientists working on specific problems (e.g., many computer scientists wanted to be seen as “enablers” rather than “plumbers”). Many participants expressed the view that a “principal investigator-centric” funding model is not well-matched to “big data” problems, as a multidisciplinary collaborative environment is needed.
Participants at this meeting identified a diverse array of issues—most of which were common to all nations represented—that today limit their abilities to fully extract value from ‘big data.’
36 A brief summary of this meeting can be found at http://sites.nationalacademies.org/xpedio/groups/pgasite/documents/webpage/pga_062988.pdf.
Many expressed the view that a number of the barriers mentioned would require international remedies to enable researchers and practitioners to tackle the big problems that cross national borders (e.g., environment, health, and, increasingly, security). Other participants noted the relevance of issues on “big data” to DoD’s “Data to Decisions” investment area, since the data required to inform decisions is increasingly heterogeneous, and drawn from disparate sources that are distributed over both space and time.