Challenges in
MACHINE GENERATION
of Analytic Products from
MULTI-SOURCE DATA
PROCEEDINGS OF A WORKSHOP
Linda Casola, Rapporteur
Intelligence Community Studies Board
Division on Engineering and Physical Sciences
THE NATIONAL ACADEMIES PRESS
Washington, DC
www.nap.edu
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001
This activity was supported by Contract No. 2014-14041100003-012 between the National Academy of Sciences and the Office of the Director of National Intelligence. Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project.
International Standard Book Number-13: 978-0-309-46573-1
International Standard Book Number-10: 0-309-46573-7
Digital Object Identifier: https://doi.org/10.17226/24900
Additional copies of this publication are available for sale from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; http://www.nap.edu.
Copyright 2017 by the National Academy of Sciences. All rights reserved.
Printed in the United States of America
Suggested citation: National Academies of Sciences, Engineering, and Medicine. 2017. Challenges in Machine Generation of Analytic Products from Multi-Source Data: Proceedings of A Workshop. Washington, DC: The National Academies Press. doi: https://doi.org/10.17226/24900.
The National Academy of Sciences was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president.
The National Academy of Engineering was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. C. D. Mote, Jr., is president.
The National Academy of Medicine (formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president.
The three Academies work together as the National Academies of Sciences, Engineering, and Medicine to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine.
Learn more about the National Academies of Sciences, Engineering, and Medicine at www.nationalacademies.org.
Consensus Study Reports published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the study’s statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committee’s deliberations. Each report has been subjected to a rigorous and independent peer-review process and it represents the position of the National Academies on the statement of task.
Proceedings published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies.
For information about other products and activities of the National Academies, please visit www.nationalacademies.org/about/whatwedo.
PLANNING COMMITTEE ON THE INTELLIGENCE MACHINE ANALYTICS WORKSHOP
RAMA CHELLAPPA, University of Maryland, College Park, Chair
THOMAS DIETTERICH, Oregon State University
ANTHONY HOOGS, Kitware, Inc.
JOHN E. KELLY III, NAE,1 International Business Machines Corporation
KATHLEEN McKEOWN, Columbia University
JOSEPH L. MUNDY, Vision Systems, Inc.
Staff
GEORGE COYLE, Senior Program Officer, Study Director
CHRIS JONES, Financial Officer
MARGUERITE SCHNEIDER, Administrative Coordinator
DIONNA ALI, Research Assistant
ADRIANNA HARGROVE, Senior Program Assistant/Financial Assistant
___________________
1 Member, National Academy of Engineering.
INTELLIGENCE COMMUNITY STUDIES BOARD
DONALD M. KERR, Independent Consultant, Chair
JULIE BRILL, Microsoft Corporation
FREDERICK CHANG, NAE,1 Southern Methodist University
TOMÁS DÍAZ DE LA RUBIA, Purdue University Discovery Park
ROBERT C. DYNES, NAS,2 University of California, San Diego
ROBERT FEIN, McLean Hospital/Harvard Medical School
MIRIAM JOHN, Independent Consultant
ANITA JONES, NAE, University of Virginia
ROBERT H. LATIFF, R. Latiff Associates
MARK LOWENTHAL, Johns Hopkins University
MICHAEL MARLETTA, NAS/NAM,3 University of California, Berkeley
L. ROGER MASON, JR., Noblis
ELIZABETH RINDSKOPF PARKER, State Bar of California
WILLIAM H. PRESS, NAS, University of Texas, Austin
DAVID A. RELMAN, NAM, Stanford University
Staff
ALAN SHAW, Director
ANDREW KREEGER, Program Officer
CHRIS JONES, Financial Officer
MARGUERITE SCHNEIDER, Administrative Coordinator
DIONNA ALI, Research Assistant
STEVEN DARBES, Research Assistant
ADRIANNA HARGROVE, Senior Program Assistant/Financial Assistant
___________________
1 Member, National Academy of Engineering.
2 Member, National Academy of Sciences.
3 Member, National Academy of Medicine.
Acknowledgment of Reviewers
This Proceedings of a Workshop was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies of Sciences, Engineering, and Medicine in making each published proceedings as sound as possible and to ensure that it meets the institutional standards for quality, objectivity, evidence, and responsiveness to the charge. The review comments and draft manuscript remain confidential to protect the integrity of the process.
We thank the following individuals for their review of this proceedings:
Anthony Hoogs, Kitware, Inc.,
Kathleen McKeown, Columbia University,
John Montgomery, NAE,1 Naval Research Laboratory (retired),
Noah A. Smith, University of Washington, and
Peter Weinberger, Google, Inc.
Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the content of the proceedings nor did they see the final draft before its release. The review of this proceedings was overseen by Edward W. Felten, NAE, Princeton University, who was responsible for making certain that an independent examination of this proceedings was carried out in accordance with standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the rapporteur and the National Academies.
___________________
1 Member, National Academy of Engineering.
This page intentionally left blank.
Contents
Operational Perspective: Project Maven
3 SESSION 2: MACHINE LEARNING FROM IMAGE, VIDEO, AND MAP DATA
Learning from Overhead Imagery
Deep Learning for Learning from Images and Videos: Is It Real?
Learning about Human Activities from Images and Videos
4 SESSION 3: MACHINE LEARNING FROM NATURAL LANGUAGES
Machine Learning from Text: Applications
Deep Learning for Natural Language Processing
Machine Learning from Conversational Speech
5 SESSION 4: LEARNING FROM MULTI-SOURCE DATA
Situational Awareness from Multiple Unstructured Sources
6 SESSION 5: LEARNING FROM NOISY, ADVERSARIAL INPUTS
Harnessing Machine Learning for Global Discovery at Scale
8 SESSION 7: HUMANS AND MACHINES WORKING TOGETHER WITH BIG DATA
Sensemaking Systems and Models
Crowdsourcing for Natural Language Processing
9 SESSION 8: USE OF MACHINE LEARNING FOR PRIVACY ETHICS
Toward Socio-Cultural Machine Learning
10 SESSION 9: EVALUATION OF MACHINE-GENERATED PRODUCTS
11 SESSION 10: CAPABILITY TECHNOLOGY MATRIX
Machine Learning for Energy Applications
Using Metrology to Improve Access to “Unstructured” Data
Challenge Problems for Multi-Source Insights
An Overview of National Science Foundation Research in Data Analytics
A Biographical Sketches of Workshop Planning Committee