David M. Isaacson, Office of the Director of National Intelligence
Rama Chellappa, University of Maryland, College Park
George Coyle, National Academies of Sciences, Engineering, and Medicine
David Isaacson, Office of the Director of National Intelligence (ODNI), opened the workshop by providing an overview of ODNI as well as its expectations for the workshop. He explained that ODNI is developing a framework to identify emerging intelligence challenges, which could help generate investment guidance for research and development. The general information pipeline in the intelligence community includes tasking, collection, processing, exploitation, and product dissemination stages. Each stage is labor-intensive and time-consuming, with human-performed tasks introducing latency into the pipeline. New machine analysis techniques have the potential to improve decision making and quicken product deployment in the intelligence community by automating processes and reducing the number of bottlenecks that currently exist in the pipeline.
Noting ODNI’s preliminary work in this area, Isaacson discussed the Xpress Challenge,1 a joint competition sponsored by ODNI and the Office of the Under Secretary of Defense for Intelligence (OUSD(I)). Three hundred eighty-seven people from 42 countries registered for the challenge, and 15 of these registrants submitted solutions that are currently under review by ODNI. The task was to create an algorithm for an analytic product, using established intelligence community formatting and evaluation criteria, that could address an intelligence question and be used to aid policy makers and war fighters. Upon completion of the Xpress Challenge, ODNI and OUSD(I) created the Xtend Challenge,2 scheduled to open in Fall 2017. Xtend Challenge participants will be asked to develop methods for machine evaluation of analytic products, which could be used to improve the quality of traditional human-generated analytic products. Isaacson believes these challenges reveal information about the current state of the art and what is next (and “after next”) in innovation for the intelligence community.
A central topic of exploration for this workshop, Isaacson explained, is how machines and human analysts can share analytical tasks to improve efficiency and reliability. It is also important, he noted, to ensure that fund-
2 When open, the Xtend Challenge will be posted on the following Challenge.gov website at https://www.challenge.gov/agency/office-ofdirector-of-national-intelligence/.
ing for such research is adequate but unduplicated among government sponsors. In light of these issues, Isaacson requested that the workshop participants create a capability technology matrix (see Appendix D3) that would connect current and future research challenges with current or planned investments for relevant federal agencies. Isaacson encouraged speakers and participants to be forward-thinking in their discussions of promising research opportunities and effective technologies.
Workshop chair Rama Chellappa, University of Maryland, College Park, commented that certain issues are important for research and innovation in national security, such as human and/or machine bias in decision-making tasks, adversarial input, product quality, result accuracy, multi-source data, social media data sources, and the balance of physics and geometry with data.4 George Coyle, National Academies, noted that each speaker would be asked to identify technologies (either hardware or software) enabling the artificial intelligence/machine learning capabilities addressed in their respective presentations, and federal government representatives would offer an overview of the state of artificial intelligence/machine learning investment in their respective agencies.
Tom Dietterich, Oregon State University
Tom Dietterich, Oregon State University, provided a brief overview of progress in machine learning over the past 10–20 years. Supervised learning is the process of training a computer to identify or classify an unknown input based on some previously observed training examples. For example, supervised learning can train a handwriting recognition system to identify letters and digits based on previous handwriting samples that have been labeled accordingly.
Dietterich described two fundamental paradigms that have been explored in machine learning: the probabilistic modeling paradigm and the end-to-end function learning paradigm. The learning process used in the probabilistic modeling paradigm involves designing and fitting a probabilistic model to the data and performing probabilistic inference to make a prediction. For the end-to-end function learning paradigm, a space of parameterized functions is defined and then the best setting of the parameters is determined by solving an optimization problem.
During the past 15–20 years, Dietterich noted, there has been much progress in both paradigms. The probabilistic modeling space has expanded with the development of (1) rich probabilistic models such as complex mixture models, (2) non-parametric models that allow the number of parameters to increase or decrease based on the amount of available data, and, most recently, (3) probabilistic programming.5 Dietterich explained that there have been corresponding innovations in probabilistic inference methods, including in belief propagation, variational inference, Markov chain Monte Carlo, and Hamiltonian Monte Carlo techniques. Probabilistic modeling can help represent knowledge related to the problem at hand, reason about latent variables (which is crucial for the intelligence community), and generate calibrated assessments of uncertainty in the conclusions.
Dietterich described similar successes in function learning over the past two decades. Complex and expressive function classes have been developed including support vector machines, decision tree ensembles, the multi-layer perceptron, convolutional neural networks, long- and short-term memory networks, and residual networks. Dietterich added that experimentation with various models continues today at a rapid pace. He noted that creating language to describe models is only a starting point; the next step is to solve the optimization problem of fitting
3 A diligent attempt was made to develop a full Capability Technology Matrix. This proved too aggressive to accomplish within the scope of a 2-day workshop. Nevertheless, each expert speaker was asked to present their thoughts on near- and long-term enabling technologies they believed would be useful to intelligence community analysts. These technologies are presented as a Capability Technology Table in Appendix D. Also, not all speakers responded to the request to identify enabling technologies. This caveat applies to all further references to the matrix in this workshop proceedings.
5 An example of a probabilistic programming language is Church, a Scheme-like language that allows a probability distribution to be described by writing a program. Stan is a probabilistic programming language that has been embraced by the statistics community. For more information about probabilistic programming languages, see http://probabilistic-programming.org/wiki/Home, accessed September 4, 2017.
these models. Progress in this area has been made in large part owing to the collaborations between machine learning experts and operations research and optimization theory experts. With sufficient data, Dietterich explained, function learning can be more accurate than probabilistic modeling, in part because fewer modeling assumptions are made and less effort is required for model development.
Despite the many strengths of the probabilistic modeling paradigm and the end-to-end function learning paradigm, Dietterich explained that there are weaknesses related to the assumption of stationarity; need for an abundance of data; data collection process that is frequently, but often unnecessarily, biased; brittleness of end-to-end training; and need for new methods for verification, validation, and monitoring. To address this last weakness, Dietterich suggested increased collaboration among the software engineering, reliability engineering, and machine learning communities.
Concluding his presentation, Dietterich listed the machine learning community’s current capabilities to (1) apply deep learning to signals-type data if they are stationary and if there is enough training data; and (2) adapt quickly to new problems using fine tuning and, to some extent, anomaly detection. In the next 3 to 5 years, Diettterich expects to see further improvement in the following:
- Open category classification,
- Automatic detection of biased and untrusted data sources,
- Anomaly detection on time-varying and network data, and
- Initial methods for validation and system monitoring.
In the longer term, he hopes there will be advances in the following:
- Defense against adversarial examples,
- Integration of large knowledge bases with machine learning,
- Use of meta reasoning to develop and test hypotheses about data source reliability, and
- Multi-scale machine learning.
Devanand Shenoy, Department of Energy, asked how much learning is impacted by computing power. Dietterich responded that computing power is critical to processing more data. This has motivated companies to develop new specialized chips. For example, Google is developing tensor processing units. He added that hardware improvement is an area in which much progress can be made. An audience participant observed that Dietterich’s presentation primarily focused on machine learning within a particular modality of data and wondered what that implies about the readiness of machine learning to handle multi-modal data. Dietterich agreed with the participant that machine learning develops point solutions to single data sources, so the machine learning community is not the best place to look for solutions related to multi-modal data. In response to a question from an audience participant about multi-scale methods, Dietterich noted that such methods are needed to detect changes at many scales and adjust accordingly.
Josyula R. Rao, IBM Corporation
Josyula Rao, IBM Corporation, focused his presentation on how machine learning has been incorporated in the security sector as well as how it may be utilized in the future. He shared that in 2016, more than 4 billion records were breached across a variety of industries. He added that while a system can be breached within hours or minutes, it can take weeks or months to detect a breach and costs, on average, $7 million per breach in the United States. He explained that because each security tool addresses a different kind of problem, there are too many tools from which security officers can choose, thereby increasing the difficulty of providing visibility. Machine learning techniques can reduce this fragmentation and also address the skills gap that exists in the security workforce.
Rao provided an overview of a typical cybersecurity attack. Many attacks, he explained, combine a social
engineering attack with an advanced persistent threat, resulting in the implantation of malware in the enterprise. The usual attack starts by spear phishing a vulnerable user; then, malware is downloaded and a back door is installed on a vulnerable machine. Malware moves laterally within the enterprise, subverting more machines until it locates the target and exfiltrates the information. Rao remarked that a new approach toward security control that can address these many vulnerabilities is needed in order to make better cybersecurity decisions.
Today, security analysts are employed in security operation centers to monitor activities within enterprise networks, but these analysts have an increasingly difficult, time-consuming, and often mundane task. After it detects a threatening activity, the enterprise has to work backward and identify affected users and devices, which, Rao emphasized, takes too long. A process that takes hours instead of months to respond to an attack is needed, but industry is only just starting to explore the use of technologies with this capability.
Rao presented the new life cycle of security defense: detect an event, distinguish between anomalous and malicious, consult threat intelligence (using natural language processing techniques), connect external threat intelligence with the internal threat to begin reasoning, connect the business operations, and provide automation. Rao added that the security industry has used advanced visualizations (e.g., vision and speech) and adversarial machine learning techniques to try to improve their approaches. He described an emerging data-driven approach (based on machine learning) that complements knowledge-driven approaches; this “man–machine symbiosis” is what will move the security field forward, according to Rao. Rao highlighted the ways in which machine learning techniques can be used to classify unstructured data and build defenses using fine-grained parameters around those determined to be high-value assets. He noted, however, that there are both organizational challenges and big data challenges associated with this process (e.g., volume, velocity, variety, veracity, scalability, and accuracy). He added that industry would benefit from more attention to offense instead of dedicating all of its efforts toward defense against security breaches.
Rao provided a few examples of useful data-driven approaches and models for security, including using rule-based systems that study generic attack patterns, viewing domain name server traffic to identify evasive cohort behaviors of botnets, beaconing detection, looking at web access patterns, predicting malicious URLs, passively exploring networks, and identifying device anomaly detections. He explained that adversaries use “fast-fluxing” (i.e., changing an IP address) and domain name generation to evade detection.
For insider threats, Rao explained that user activity monitoring is helpful. Big data approaches can be used to identify at-risk users, discover risky behavior, and uncover fraudulent activity. He added that social media monitoring, in combination with cognitive analytics and data mining, is also useful for identifying and predicting abnormal behaviors in an organization.
Rao explained that because there is an abundance of unstructured data available, it is necessary to build a security ontology in order to better understand observations as they relate to the realities of the external world. He also noted that artificial intelligence techniques can be used to secure cognitive systems. Rao concluded by reiterating that the human will never be displaced from security measures; the best approach is to combine data-driven intelligence with knowledge-driven intelligence.
Shenoy asked what can be done with bitcoin technology to eliminate its potential vulnerability. Rao noted that one idea is to remove the anonymity associated with using bitcoin, but neither academia nor industry is likely to support such a controversial solution. Anthony Hoogs, Kitware, Inc., asked if there has been any achievement in which a new attack is anticipated before it happens. Rao explained that most attackers can circumvent an enterprise’s security controls, but if an enterprise can get to attackers before they access the data, that would be considered a predictive success because the damage has been prevented. Kathleen (Kathy) McKeown, Columbia University, asked what kind of ground truth data can be used to learn in different and new security scenarios that arise. Rao noted that the action has shifted to profiling what is normal for a user and forming a loose boundary around those activities and participating communities. If there is an attack, it will traverse those different communities, so then the anomalous activity must be identified through causality tracking. An audience participant asked how Rao’s solution recognizes something that seems abnormal as normal. Rao said that this situation is called “drift detection,” and it is relatively easy to track (e.g., in the case of an employee who accesses private information because he has new privileges from a promotion, not because of an intent to breach security).
Travis W. Axtell, Office of the Under Secretary of Defense for Intelligence
Travis Axtell, Office of the Under Secretary of Defense for Intelligence (OUSD(I)), opened his presentation by encouraging workshop participants to watch Lt. Gen. Jack Shanahan’s keynote address on Project Maven from the 2017 Conference on Geospatial Intelligence.6 Axtell noted that, in collaboration with international defense partners, the science and technology community in the U.S. Department of Defense (DOD) is already (1) using machine learning in both limited and full operation systems, (2) applying machine learning concepts to help analysts downselect and upselect pieces of information, (3) planning an artificial intelligence workshop, and (4) expanding the artificial intelligence/machine learning practitioner workforce beyond the research organizations. Axtell emphasized that workforce is as important as the algorithms and the methods themselves when discussing the future of machine learning.
Quoting guidance from former Deputy Secretary of Defense Robert O. Work, Axtell explained that DOD must “do much more and move much faster . . . to take advantage of recent and future advances in [artificial intelligence, big data, and deep learning]” and has thus “establish[ed] the Algorithmic Warfare Cross-Functional Team [Project Maven] to accelerate DOD’s integration of big data and machine learning . . . to turn the enormous volume of data available to DOD into actionable intelligence and insights at speed.”7 He noted that Project Maven expands research in artificial intelligence and increases the speed at which warfighters can be supported, jobs can be created, and automation engines can be developed for difficult problems. Axtell added that Project Maven will also work with data needed for wide-area motion imagery processing. He introduced five critical areas associated with Project Maven’s plan: (1) develop a data-labeling enterprise, (2) apply neural networks for computer vision, (3) enhance computing power, (4) establish program of record integration, and (5) increase user engagement.
Axtell next moved to a brief discussion of ontology. He acknowledged that ontologies8 can be a reasonable starting point but explained that it can be an outmoded way of thinking that may not be useful for solving current problems. Instead, he suggested that researchers label the data based on what is in the data and allow the ontology to grow from there. He added that it is also necessary to establish a pipeline to the program of record so that updates are available often. This pipeline includes data pre-processing, data labeling, algorithm development, and algorithm integration.
Axtell highlighted the mission areas of the six central activities with Project Maven: (1) computer vision, (2) document media exploitation, (3) collection management, (4) all-source analysis, (5) targeting, and (6) indications and warning. He emphasized that the need to utilize available masses of structured, labeled data sets extends beyond the intelligence community, throughout the entire DoD, and beyond the National Security agencies.
Opening the discussion section, Joseph Mundy, Vision Systems, Inc., asked how to best transition technology (e.g., by providing toolkits and computing infrastructure). Axtell explained that specific methods to improve transition are still being evaluated but hopefully technology integration will be more customized to each program office and to each task in the future. Axtell also suggested that interesting business relationships may emerge as a result of Project Maven, as it is crucial to partner with a company that has a long-term relationship with and understanding of the government’s processes when transitioning to new technologies.
Jay Hughes, Space and Naval Warfare Systems Command, asked if machine learning or deep learning algorithmic applications can address the time delay associated with the contract process. Axtell agreed that cycles of slow responses are barriers to progress. Insights about the latest state of the art shared at this workshop can be useful in improving DOD’s acquisition processes, Axtell continued. McKeown added that there are funding opportunities in the legal field to explore how machine learning techniques can be better utilized to analyze the contract process.
6 The address can be viewed at Vimeo, “Keynote: Lt. Gen. John N.T. “Jack” Shanahan, Director for Defense Intelligence, Warfighter Support, OUSD(I),” https://vimeo.com/220699218, accessed August 24, 2017.
7 R.O. Work, 2017, “Memorandum for Establishment of the Algorithmic Warfare Cross-Functional Team,” https://www.govexec.com/media/gbc/docs/pdfs_edit/establishment_of_the_awcft_project_maven.pdf.
8 For further discussion on ontologies, see summaries of the presentations of Josyula Rao, Joseph Mundy, Rama Chellappa, Kathy McKeown, and Dragomir Radev.
A participant asked Axtell if DOD uses crowdsourcing for data labeling. Axtell responded that DOD is evaluating various labeling strategies, including crowdsourcing. Some members of the workforce are eager to label data because they know it will help the military; the engineering science workforce will likely perform data labeling on a part-time basis depending on the type of data sets needed and the urgency of those needs. Axtell indicated that whichever strategy is adopted, a cultural change will be involved. Rao inquired about DOD’s approach to securing both the talent and the technologies needed to execute the goals of Project Maven. Axtell said that the primary focus of Project Maven is to acquire and build relationships with existing commercial capabilities. Ultimately, DOD’s objective is to maintain a competitive advantage. A participant asked if Axtell’s team plans to engage the larger research and government funding communities for guidance. Axtell emphasized that traditional relationships (e.g., between the Defense Advanced Research Projects Agency and academia) will continue while new relationships are being formed.