Page 37 Cite

Suggested Citation:"11 Session 10: Capability Technology Matrix." National Academies of Sciences, Engineering, and Medicine. 2017. Challenges in Machine Generation of Analytic Products from Multi-Source Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24900.

×

11

Session 10: Capability Technology Matrix

MACHINE LEARNING FOR ENERGY APPLICATIONS

Devanand Shenoy, Department of Energy

Devanand Shenoy, Department of Energy (DOE), explained that although DOE is not currently supporting any artificial intelligence or machine learning projects, there is a lot of investment from industry in these areas, and DOE is interested in relevant energy applications. Shenoy provided an overview of the offices within DOE, with particular attention to the Energy Efficiency and Renewable Energy sector, and noted that DOE has oversight of over 17 national laboratories. He also provided an overview of the U.S. energy system.

Machine learning has interesting applications for renewable energy in particular, according to Shenoy. Trained on data from 1,600 U.S. sites, IBM’s Watt-Sun is a self-learning weather model and renewable forecasting technology that is more accurate than previous solar forecasting models. He added that machine learning also has applications for energy efficiency: it can be applied to cooling systems data to achieve energy reduction in data centers as demonstrated by Google. Artificial intelligence can be used to minimize wasted energy and balance the electric grid by exploiting energy demand based on analytics collected from sensors. Shenoy added that smart wells that can sense temperature, pressure, chemicals, and vibrations can be deployed to streamline efficiency and mitigate failures in the oil and gas industry.

Shenoy described an occupant tracking and building energy management technology emerging from Carnegie Mellon University that utilizes floor vibration to better monitor who is in a building at a particular time. He also described structural health monitoring systems that use sensors and pattern recognition to predict future structural damage in bridges and buildings.

To learn more about opportunities for machine learning, Shenoy encouraged participants to review the Quadrennial Technology Review, which contains detailed energy technology assessments. Shenoy described some additional areas that could benefit from machine learning, many of which cross multiple industries:

Critical materials. Reduce the reliance on imported materials and increase recycling of materials.
Sustainable manufacturing. Provide the opportunity to design materials that can be reused or recycled, thus reducing energy usage.
Combined heat and power. Combine heat and electricity more efficiently.
Waste heat recovery. Recover waste heat, which in turn presents cost-saving opportunities.

Page 38 Cite

Suggested Citation:"11 Session 10: Capability Technology Matrix." National Academies of Sciences, Engineering, and Medicine. 2017. Challenges in Machine Generation of Analytic Products from Multi-Source Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24900.

×

Advanced sensors, controls, platforms, and modeling for manufacturing. Improve predictive maintenance and product customization. This is the topic area for one of the Manufacturing Institutes supported through the Advanced Manufacturing Office at DOE. There are currently 14 Manufacturing Institutes supported through the Department of Defense (DOD), DOE, and the Department of Commerce. These Institutes typically have $70 million of funding from the government, with at least a 50:50 cost share from industry, state and local governments, and other private entities.
Process heating. Offer new ways to save energy and increase efficiency by heating materials in different ways.
Process intensification. Scale down the size of chemical factories while increasing efficiency.
Roll-to-roll processing. Reduce the cost to print electronics and other functional materials.
Composite materials. Improve technologies in lightweighting and thermosetting.
Additive manufacturing, advanced materials manufacturing, and materials for harsh service conditions. Innovate to improve efficiency and to extend life of materials.
Wide bandgap power electronics. Increase efficiency by replacing silicon and converting power (for example, direct current to alternating current) using silicon carbide and gallium nitride materials.
Thermoelectric and direct energy. Convert heat into useful power to be more efficient in the overall system.

Shenoy turned to a discussion of challenges for microelectronics, including the concern that the electricity supply expected to be used for microelectronics will soon be unsustainable due to current transistor scaling constraints. To avoid this potential crisis, the efficiency improvement rates of the transistors and the data interconnects would need to double, according to Shenoy. Several DOE laboratories are collaborating on a multi-scale co-design framework to help address this challenge. Three DOE offices are also joining forces to address this challenge. Shenoy concluded by emphasizing that one of the best ways to address national security challenges, economic challenges, and energy efficiency challenges is through building public-private partnerships.

Jonathan Fiscus, National Institute of Standards and Technology, asked if DOE can create data sets to facilitate the kind of research discussed in Shenoy’s presentation; if so, they could be used for challenge competitions. Shenoy responded that DOE’s Advanced Manufacturing Office recently held a workshop on “Artificial Intelligence Applied to Materials Discovery and Design,” and the information gleaned from that workshop could inform a potential program to generate such data. In response to a question from Lih Young, previous candidate for U.S. Senate, Maryland, Shenoy mentioned that the Defense Advanced Research Projects Agency is studying the value of public-private partnerships for microelectronics.

USING METROLOGY TO IMPROVE ACCESS TO “UNSTRUCTURED” DATA

Ellen Voorhees, National Institute of Standards and Technology

Ellen Voorhees, National Institute of Standards and Technology (NIST), shared NIST’s mission to promote U.S. innovation and industrial competitiveness by advancing measurement science, standards, and technology in ways that enhance economic security and improve our quality of life. She noted that because it is not a regulatory body and its primary mission is not funding research, NIST is a unique federal government entity. NIST is involved in evaluation because it offers a technology-neutral site with both technical understanding and expertise in developing evaluations. NIST also keeps track of all of the artifacts via a public proceedings, data sets (often publicly available), and an archive of past results.

Voorhees explained that NIST’s work in artificial intelligence spans its broad portfolio. NIST’s long-standing program in metrology for information access creates evaluation infrastructure through community evaluations, the point of which is to support research and thus improve the technology. NIST’s portfolio focuses on diverse sources (e.g., speech, text, images, video, biometrics) and diverse tasks (e.g., recognition, search, extraction, summarization), and NIST has worked in close collaboration with the intelligence community for at least the past three decades.

Voorhees emphasized that community evaluations form and/or solidify a research community, which helps

Page 39 Cite

Suggested Citation:"11 Session 10: Capability Technology Matrix." National Academies of Sciences, Engineering, and Medicine. 2017. Challenges in Machine Generation of Analytic Products from Multi-Source Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24900.

×

to make progress on problems, establish research methodology, facilitate technology transfer, document the state of the art, and amortize the costs of infrastructure. She noted, however, that there are some drawbacks of community evaluations, including that they can take money and time from other efforts and they can have a problem with overfitting data sets.

Voorhees defined a good evaluation task as an abstraction of a real-world task that is controllable enough for the researcher to understand what is causing the performance and realistic enough that it captures the important aspects of the task. She added that evaluation metrics must predict the relative effectiveness of systems on the real task. The evaluation task must also have an adequate level of difficulty so that something can be learned. She noted that it is best if measures used to score the results of an evaluation are diagnostic. Voorhees concluded her talk with a description of the many different tasks and document types evaluated through the Text REtrieval Conference (TREC).¹

Anthony Hoogs, Kitware, Inc., noted that, historically, the computer vision community has not used the data sets from TRECVid because of the associated competition constraints and data restrictions, despite the fact that it began as the largest, most formalized evaluation in the community. He highlighted a number of other competitions that do not have similar barriers. Voorhees explained that the restrictions Hoogs described do not occur with all tracks of TRECVid, and she noted that the expectation of the annualized challenge cycle is not that people will participate every year.

Peter Pirolli, Institute for Human and Machine Cognition, wondered if it would make more sense to spend time trying to understand the interaction between machines and people and to understand more about the tasks themselves than to focus on evaluating system metrics. Voorhees said that NIST’s focus on evaluating systems is informative and valuable because the community learns important information about technology, though she acknowledged that other challenges and open research questions remain.

CHALLENGE PROBLEMS FOR MULTI-SOURCE INSIGHTS

Travis W. Axtell, Office of the Under Secretary of Defense for Intelligence

Travis Axtell, Office of the Under Secretary of Defense for Intelligence, shared a sampling of comments from various DOD researchers. One researcher’s description of machine-augmented analysis of multi-source data is that multi-source apertures are intrinsically concurrent and semi-sequential (i.e., data sources are generally processed with machines using multiple execution flows that progress simultaneously [intrinsically concurrent], with a staggered offset between flows [too large of an offset yields sequential execution]; the semi-sequential nature of processing does add some processing latency or delay of resulting computation). They added that while latency is often a barrier to progress, it can be beneficial if the result is improved accuracy. They noted the importance of time synchronization across sources and that components from various elements of the data-to-decision chain should weave together in an interoperable framework. They highlighted the value of data-driven insights from and across multiple scales.

Axtell shared the results from an Air Force Research Laboratory experiment to introduce a topic of value for future research: training an algorithm to understand when the change detection aspects are working at a practical consideration for an operator. Another area worthy of study is the parallel development in intelligence community organizations of processing frameworks. Axtell next explained that because algorithms used today have difficulty registering moving objects, moving objects could be a focus of multi-source research. He warned that recognition and registration of moving objects can be challenging, but registration aspects will lead to interesting developments in the future. Axtell explained that it is also important to be able to share information from one source with another. Eventually, he hopes that single-source processing and multi-source processing can be automated with machine learning.

Axtell explained that there are a number of ways to set up and solve an optimization problem. He mentioned

___________________

¹ For information about TREC, see http://trec.nist.gov, accessed August 29, 2017.

Page 40 Cite

Suggested Citation:"11 Session 10: Capability Technology Matrix." National Academies of Sciences, Engineering, and Medicine. 2017. Challenges in Machine Generation of Analytic Products from Multi-Source Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24900.

×

YALMIP,² a MATLAB toolkit that symbolically represents an optimization problem and then converts the problem to match the capabilities of the available optimization solver. According to Axtell, the intelligence community would benefit from having a symbolic representation of the neural network’s intent. With this representation, the computer could then propose an appropriate design and present a solution.

Hoogs mentioned that time synchronization can be a problem when multiple video sources are looking at the same outdoor scene and one wants to know the three-dimensional structure of an object in that scene—cameras often do not have an exact notion of time to be able to synchronize down to the frame level. Axtell responded that the capability exists to develop the registration and solve for that registration between those two sources, which is more fruitful than trying to place clocks in the scenes. Hoogs added that latency (i.e., the time between capturing the image and reporting the element) and computational costs are also problematic in the computer vision community, especially with advanced methods and graphical processing units. Axtell noted that uncertainty is also a factor in these issues.

In response to a question from an audience participant, Axtell explained that a time representation is needed to infer a relationship between two data files. In response to another question from the audience, Axtell emphasized the importance of understanding the conditions of each sensor type. He added that with higher speed comes more data, which makes the task of knowledge discovery even more difficult.

AN OVERVIEW OF NATIONAL SCIENCE FOUNDATION RESEARCH IN DATA ANALYTICS

James Donlon, National Science Foundation

James Donlon, National Science Foundation (NSF), summarized NSF’s mission as transforming the frontiers of science and engineering. NSF funds approximately 12,000 of the 50,000 proposals it receives each year. In FY 2017, approximately $7.5 billion was allocated to support fundamental scientific research. NSF’s Directorate for Computer and Information Science and Engineering (CISE),³ which supports machine learning, artificial intelligence, and data analytics research, received $934 million in FY 2016. Overall, NSF is responsible for 82 percent of the federal support of academic basic research in computer science. He emphasized that although basic research may seem like it is not pointed enough to meet customer needs, without it, it is impossible to make larger discoveries.

Donlon characterized NSF’s research portfolio, with particular attention to machine-augmented analysis of multi-source data. CISE has three core programs in its division on Information and Intelligent Systems, which supports research and education activities that study the interrelated roles of people, computers, and information. The Robust Intelligence program’s goal is to advance and integrate the traditions of artificial intelligence, computer vision, human language research, robotics, machine learning, computational neuroscience, cognitive science, and related areas. The Information Integration and Informatics program studies the creation, management, visualization, and understanding of diverse digital content, while the Cyber-Human Systems program studies issues of user interface, assistive technology, human–computer interaction, and the combination of social media and social networks. Donlon encouraged workshop participants to visit NSF’s website and use its publicly accessible “NSF Award Search” feature⁴ to search for the many active awards in this research space.

To understand the themes that play the largest role in the research supported by NSF, Donlon performed a cluster analysis of titles and abstracts of awards. The analysis showed machine learning to be the most dominant research theme of the Robust Intelligence and Information Integration and Informatics divisions; however, this only means that the phrase is used often as a label, not necessarily that this theme plays a central role in research. A list of recent grants includes topics in foundational machine learning—for example, adversarial machine learning, deep reinforcement learning, learning structured prediction models, uncertainty representation, and personalizing from observational data. And a select list of active awards includes multi-modal research in computer vision and

___________________

² The website for YALMIP is https://yalmip.github.io, accessed August 29, 2017.

³ The CISE website is https://www.nsf.gov/dir/index.jsp?org=CISE, accessed August 29, 2017.

⁴ The website for NSF’s Award Search is https://www.nsf.gov/awardsearch/, accessed August 29, 2017.

Page 41 Cite

Suggested Citation:"11 Session 10: Capability Technology Matrix." National Academies of Sciences, Engineering, and Medicine. 2017. Challenges in Machine Generation of Analytic Products from Multi-Source Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24900.

×

human language technology, although Donlon noted from his search that, generally, problems of multi-modality seem to be of greater interest to practitioners than to NSF-supported researchers. He added that it is important for funding agencies to think about how to better encourage and fund multidisciplinary research.

CISE’s “national priorities” include big data, cybersecurity, the National Robotics Initiative, understanding the brain, the National Strategic Computing Initiative, Smart Cities, Computer Science for All, and advanced wireless research. Donlon concluded his talk with a summary of what he discovered about NSF’s funding in relation to the themes of the workshop:

Machine learning is everywhere and has been especially successful within a single modality on data-rich classification problems and where the problem space is sufficiently constrained.
Multi-modal, multi-source data was a focus of only a small number of projects, which indicated that the funding agencies may have more work to do in engaging the community.
Solving the end-to-end problem will always be an engineering endeavor that brings to bear outcomes from research across the whole spectrum.
The impact of big data is evident everywhere, as are the problems and opportunities for social networks.

Kathy McKeown, Columbia University, agreed with Donlon that NSF needs to think carefully about how it funds interdisciplinary research. She also noted that it is very difficult to secure funding for big data research if it does not immediately connect to the foundational area that is being supported. She suggested NSF consider creating another core program that would be more interdisciplinary in nature to address both of these funding issues. Donlon acknowledged McKeown’s input as very useful for NSF. He introduced NSF’s Research Advanced by Interdisciplinary Science and Engineering (RAISE) program, which invites multidisciplinary research and reduces some of the stresses on the merit review community, but he noted that NSF has more work to address this systemic problem.

Hoogs highlighted the new field of visual question answering (i.e., an example of integration of vision and language) that plays a prominent role in the computer vision community. He suggested that it would be valuable to look at where the experts in this field received their funding since they were able to bridge the gap between fields successfully.

Page 42 Cite

Suggested Citation:"11 Session 10: Capability Technology Matrix." National Academies of Sciences, Engineering, and Medicine. 2017. Challenges in Machine Generation of Analytic Products from Multi-Source Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24900.

×