Kathleen Fisher moderated a panel on the workshop’s second day focused on the offensive uses of artificial intelligence (AI) and machine learning (ML). Fisher, chair of the Computer Science Department at Tufts University, previously served as a program manager at the Defense Advanced Research Projects Agency (DARPA) where she launched and managed the HACMS1 and PPAML2 programs. The session’s panelists were David Brumley, Bosch Security and Privacy Professor in the Department of Electrical and Computer Engineering at Carnegie Mellon University and CEO and co-founder, ForAllSecure, Inc.; Tyler Moore, Tandy Associate Professor of Cyber Security and Information Assurance in the Tandy School of Computer Science, University of Tulsa; and Wyatt Hoffman, senior research analyst with the Nuclear Policy Program and the Cyber Policy Initiative at the Carnegie Endowment for International Peace.
Fisher provided opening remarks to help introduce and frame the panel’s discussion, identifying four key topics for the panelists to consider: implications of AI across the “cyber kill chain,” the importance of understanding an attacker’s motivation and intent, the evolving landscape of international conflict, and the potential for AI to introduce a cyber arms race.
Artificial Intelligence Across the Cyber Kill Chain
Fisher began by considering the Lockheed Martin cyber kill chain, a framework for understanding the structure of a cyberattack in terms of seven stages: reconnaissance, weaponization, delivery, exploitation, installation, command and control, and actions on objective.3 She noted that AI and ML could have implications for all stages of the cyber kill chain, with the potential role and utility of any particular AI technology likely varying across the different levels.4
1 Defense Advanced Research Projects Agency, “High-Assurance Cyber Military Systems Program,” https://www.darpa.mil/program/high-assurance-cyber-military-systems, last accessed March 11, 2019.
2 Defense Advanced Research Projects Agency, “Probabilistic Programming for Advanced Machine Learning Program,” https://www.darpa.mil/program/probabilistic-programming-for-advancing-machine-learning.
3 Lockheed Martin, “Cyber Kill Chain,” https://www.lockheedmartin.com/en-us/capabilities/cyber/cyber-kill-chain.html, last accessed March 11, 2019.
4 Fisher also pointed out that our shared notion of what tools are considered “AI” has shifted over time. For example, search engines used to be considered AI technologies, but they have become so commonplace today that they are not typically considered to be AI.
She offered insights on a range of examples, including the simple case of the use of search engines to gather information on a target during the reconnaissance stage. In the weaponization through installation phases, AI and ML could be effective in numerous ways. One key application is in spearphishing attacks. AI—for example, via mining of information available on social media, perhaps combined with text generation techniques—could soon make it exceedingly difficult for people to distinguish between benign email messages and spearphishing messages.
Fisher also suggested that AI could have a significant impact on authentication-based attacks; ML can generate synthetic voiceprints or fingerprints that could be used to fool an authentication system. For example, existing tools, such as Lyrebird,5 can generate fake audio of an individual’s voice using sample recordings as an input. The ability to spoof an individual’s voice to fool a voice authentication system has been recently demonstrated6 as part of a qualifying event, organized by Yan Shoshitaishvili, for the 2018 DEF CON Capture the Flag (CTF) competition—however, while the spoof completely fooled the computer system, it was not convincing to humans.
The potential for AI to assist deeper into the cyber kill chain has been explored via DARPA’s Cyber Grand Challenge (CGC) competition. Some plausibility has been demonstrated, primarily based upon “first-wave” AI tools, such as expert systems and rule-based systems, as opposed to statistical ML. Fisher noted that Mike Walker, the DARPA program manager for the CGC, has hypothesized that AI/ML will be least effective at the core of program analysis, the “hard” part of the cyber kill chain, including things like debugging, decompilation, reachability analysis, and finding vulnerabilities and patches. She noted that David Brumley, the first panelist and member of the winning CGC team, would have the opportunity to comment on this based upon his CGC experiences. For the final part of the cyber kill chain—command and control—she suggested that AI/ML could enable deployed malware to act independently, without having to “phone home” for instructions.
Understanding Motivations and Intent
Fisher emphasized the importance of understanding an attacker’s underlying intent in informing how AI and ML technologies might be used for a particular offensive task. She provided three key examples of attacker profile. An attacker with malicious intent toward a specific person might utilize fake news or videos, for example, to create revenge porn. An attacker with criminal intent might be motivated by profit and interested in AI tools that could be used to enable extortion, or to steal money or property. At the level of national security, an advanced persistent threat (APT) might desire to steal intellectual property (an approach used by China), influence elections (such as Russia’s actions in the 2016 U.S. presidential election), complement physical attacks (such as when Russia disabled the Ukrainian power grid to augment its efforts to annex Crimea), or surveil a group of people (as in China’s surveillance of the Uighur population7). Finally, an attacker intent on loss of life could be interested in lethal autonomous weapons.
Fisher raised several provocative ideas related to cybersecurity and the future of national security and peace for the panel to consider. First, she asked how the future of warfare and international conflict might change given the potential for a cyberattack to achieve physical-world outcomes, such as remotely disabling an adversary’s power grid. To what extent will physical resources like aircraft carriers still be needed?
Next, she asked whether the United States is at a disadvantage compared to its nation-state adversaries when it comes to studying and deploying AI technologies. For example, the United States has a culturally strong respect for principles such as privacy and human rights. She noted that commitment to these values could limit the ability to study and comprehend certain adversarial uses of technology, while adversaries could be uninhibited in their practice of these approaches.
7 See, for example, I. Cockerell, 2019, “Inside China’s Massive Surveillance Operation,” Wired, May 9, https://www.wired.com/story/inside-chinas-massive-surveillance-operation/.
In addition, the nature of separation between government and industry is different for the United States compared to other nations. For example, the separation is not as clear in China, where data can be shared more seamlessly between government and industry, which could provide an advantage for progress in ML-based applications, which typically requires access to large data sets.
A Cyber Arms Race
Finally, Fisher raised the potential for AI to fuel a cyber arms race, given that AI presents the potential for cyber weapons to operate much faster than—and even independent of—humans. She suggested that the established doctrines developed for nuclear arms may not apply here, for several reasons. First, identifying the source of a nuclear attack would be relatively straightforward, but attribution of a cyberattack is much harder and can take a significant amount of time. Second, a cyber actor may feel pressure to use a zero day, a previously unknown vulnerability, before it expires (i.e., before it becomes public knowledge), an incentive that might not have an analog in the nuclear context. She suggested that we will lose the race if an adversary obtains effective AI-enabled cyberweapons before us.
Fisher challenged panelists and attendees to consider these points in their discussions of the implications of AI across the broader cybersecurity landscape.
David Brumley, Carnegie Mellon University and ForAllSecure
Brumley discussed his experiences and lessons learned from hacking competitions and challenges at the annual DEF CON conference and the DARPA CGC, and provided insights on the use of AI and ML in launching—and thwarting—cyberattacks. In particular, he suggested that it is more beneficial to develop strategies for (1) achieving system autonomy (as opposed to focusing on the deployment of ML) and (2) achieving the specific end goal as defined within the constraints of the game, rather than holding up an abstract notion of security as the objective.
Lessons from DEF CON
DEF CON’s8 CTF competitions challenge the world’s best-ranked hackers to try to break into each other’s computer systems while trying to protect their own. Each team works to steal a file, called a flag, from another team, which they then submit to a scoring server that awards them points for the success. The process is carried out in rounds, meaning that a winning team must demonstrate persistence—both in the ability to fix (or “patch”) their own system’s weaknesses, and to break others’ constantly improving defenses.
With funding from the National Science Foundation (NSF), Brumley—an academic who likes to view cybersecurity through an offensive lens—started a competitive hacking team of undergraduates at Carnegie Mellon University that has won more DEF CON CTFs than any other team. In Brumley’s view, the competition—while different in some ways from real-world contexts—can serve as a useful microcosm for many questions related to cyberattacks and cybersecurity.
Brumley shared some of the lessons he learned from DEF CON. He began by noting that there are some differences between the competition and real-life scenarios. For example, all CTF competitors run the same software, and the contest is somewhat divorced from policy and moral considerations that are raised in real-world contexts. Thus, some types of attacks used effectively in the competition might not be exercised by the U.S. government.
For example, some of the best teams perform a practice known as reflection, where they observe how they are being exploited by one team and then adopt that same approach for re-use in an attack on some other team that is
8 Rapporteur’s note: DEF CON is a major hacking conference that occurs annually and features real-time team-based hacking competitions known as Capture the Flag (CTF). See the DEF CON website at https://www.defcon.org/.
unlikely to already understand how it works. He suggested that the United States could adopt a similar approach: identify all zero-days used against it, and stockpile them for potential future use against a different adversary.9
In addition, the incentives of CTF center around winning, rather than around the abstract concept of how to be secure. He suggested that such economics might translate usefully into real-world cybersecurity contexts as well.
Lessons from DARPA’s Cyber Grand Challenge
Brumley recalled that Mike Walker, a past participant in DEF CON’s CTF, went to DARPA in 2014 as a program manager and asked a new question: Can we teach computers to hack? This led to the DARPA CGC,10 designed to spur researchers to build the first prototypes of reasoning cyber defense AI, to be pitted against each other in competition. The end goal was to create systems capable of automatically checking and protecting commercial, off-the-shelf (COTS) software.
In practice, all players received the same binary code (software instructions written in an alphabet of ones and zeros) and were challenged to come up with exploits and send them, again in rounds, to DARPA along with patches for problems fixed. Patches were judged on security, performance, and functionality. Nothing was static: both the attacks and the patches evolved in stages, and DARPA sent out all patches to all competitors, allowing teams to analyze their opponents’ defenses and try to circumvent them. Brumley’s team won DARPA’s first CGC with their system, called Mayhem, and received $2 million. Now, the team is working on transitioning the technology from research to practice.11
Both the CGC and DEF CON’s CTF are structured around cycles resembling the interactions between persistent and adaptive real-world adversaries. Based on his experiences with the CGC and recent CTFs, Brumley suggested that it makes sense to think about cybersecurity as not just a binary state of being (i.e., secure or insecure), but as a process, and in terms of optimizing the stability of the process. He noted that DARPA built the notion of evolution of attackers into the game, an important element of the dynamic.
The Utility of Machine Learning for Cyber Operations
In the CGC, Brumley’s team began by designing a system, called SWORD, for finding vulnerabilities in COTS software code (in binary form). The team began under the assumption that ML’s capabilities for pattern recognition could be applied to the task of identifying vulnerabilities—surprisingly, this did not work.12 Similarly, they did not use ML in any sort of defense operations. Ultimately, Brumley said the team’s approach emphasized other methods for closing the loop on system autonomy.13
However, ML—in particular, deep neural networks—did prove useful for generating chaff traffic14 that resembled the distribution of real attacks and successfully fooled other teams. In fact, the chaff traffic was so convincing that, in one case, the CGC commentator announced that Mayhem had launched an exploit that failed (when it was really just chaff, designed as a source of misdirection). Brumley reiterated a lesson from DEF CON: Bleeding off another team’s energy can be a key strategy to achieve the goal of winning. Opponents waste time chasing down chaff when their limited resources would have been better used elsewhere.
9 Rapporteur’s note: It has been reported that other nations have used this approach, in particular re-using exploits developed by the U.S. government—including against U.S. targets. See, for example, N. Perlroth and S. Shane, 2019, “In Baltimore and Beyond, a Stolen N.S.A. Tool Wreaks Havoc,” New York Times, May 25, https://www.nytimes.com/2019/05/25/us/nsa-hacking-tool-baltimore.html.
10 Defense Advanced Research Projects Agency, “Cyber Grand Challenge (CGC) (Archived),” https://www.darpa.mil/program/cyber-grand-challenge, last accessed March 11, 2019.
12 Brumley noted that some successes were subsequently demonstrated, primarily in amplifying the efficacy of existing measures for vulnerability discovery.
14 Rapporteur’s note: Chaffing is the technique of generating decoy signals that provide cover for true ones, confusing an adversary and yielding some confidentiality. Its use in cybersecurity was introduced by Ron Rivest. See R. Rivest, 1998, “Chaffing and Winnowing: Confidentiality Without Encryption,” CryptoBytes (RSA Laboratories), 4(1):12-17, https://pdfs.semanticscholar.org/aaf3/7e0afa43f5b6168074dae2bc0e695a9d1d1b.pdf.
The team also found ML useful for optimizing resources and for automating decision-making—that is, to enable the computer to make decisions in places where a human normally would. Brumley noted that their strategy was similar to how Google has traditionally used ML, for example, to identify what ads to show a user in order to maximize revenues—a version of a multi-armed bandit problem. In their case, it was optimizing their team’s performance through decisions about when to give up on a particular analysis if it has not yet proven fruitful. Brumley believes that his team won the competition because it was better at such decision-making.
The Role of Strategy
Since Mayhem’s success at the CGC, Brumley’s former graduate student Tiffany Bao, now at Arizona State University, has been leading new research into the notion of autonomy, and on where ML, algorithms, and game theory can be used to achieve better outcomes for a team’s specific mission—rather than working toward a notion of security in the abstract. She has taken advantage of the fact that information on the Shellphish15 CGC team’s system is publicly available and investigated what that team might have done differently to improve its final ranking at the DARPA CGC.16
Bao found that possession of the best available tools for binary analysis would not have improved Shellphish’s outcome—they still would have come in third place. However, she found that they could have come in second place with the tools they had at the time, if they had employed a better strategy for using them. Bao identified two specific strategic adjustments that would have had an impact. First was the reuse of other teams’ exploits, harnessing their competitors’ work for their own gain (i.e., reflection). Second, Bao found that Shellphish would have benefited from allocating fewer resources to defense. While their defenses were very effective, the team sometimes wasted time patching bugs that no adversaries ever exploited, costing them in their overall performance according to the scoring rules. Brumley highlighted the importance of balancing resources and prioritizing objectives—for example, in the real world, businesses may not patch every known weakness because it would be too costly in terms of lost time for achieving their business objectives. In such instances, a focus on perfecting security for its own sake could undermine an entity’s reason for existing in the first place, whereas a risk management approach enables performance optimization around multiple goals.
Brumley concluded with some final observations on how we think about and experiment with AI in the context of cybersecurity. From a practical standpoint, he noted that the DARPA CGC is conducted on an artificial platform, called DECREE, and contrasted this decision with China’s choice of launching its own subsequent CGC-like competitions on operationally relevant platforms such as Linux. He suggested that DARPA’s decision was related to ethical considerations and the values it wanted to uphold as an agency. The DARPA CGC’s rules excluded the use of methods for counter autonomy, an area that Brumley sees as under-researched. Mayhem had originally incorporated this approach into its suite of techniques, specifically by generating and submitting malicious patches that would compromise any system that either deployed or analyzed them, making use of exploits in the analysis tools used by all of the teams. Upon realizing that this was against the rules, Brumley’s team removed the code in question.
More broadly, the outcomes of DEF CON CTFs and the CGC point to a need for a greater emphasis on autonomy in cyber operations. Autonomy, rather than AI or ML methods, was critical to winning the DARPA CGC, Brumley said. In particular, it is worth using ML to improve automatic analysis, he said, as his team did with its multi-armed bandit approach. As this approach begins to yield payoffs, an organization can start to connect the dots to make one action trigger another, ultimately enabling autonomous, self-healing systems. Based on his experiences, Brumley reiterated the need to think in terms of how to win according to clear objectives against a field of multiple, evolving adversaries, rather than focusing on how to make systems more secure. He concluded by stating that we can in fact teach computers to hack.
15 Rapporteur’s note: Shellphish is a team of computer science graduate students at the University of California at Santa Barbara that has competed in the CGCs and in DEF CON’s CTFs. For more information see University of California, Santa Barbara, 2016, “Team Shellphish Nets $750,000 Win at Cyber Grand Challenge,” https://www.cs.ucsb.edu/spotlights/team-shellphish-nets-750000-win-cyber-grand-challenge.
16 T. Bao, Y. Shoshitaishvili, R. Wang, C. Kruegel, G. Vigna, and D. Brumley, 2017, “How Shall We Play a Game?: A Game-Theoretical Model for Cyber-Warfare Games,” 2017 IEEE 30th Computer Security Foundations Symposium (CSF), https://ieeexplore.ieee.org/document/8049648.
At the conclusion of his remarks, Chang asked Brumley why his team at the DARPA CGC didn’t attempt to hack the scoring server. Brumley clarified that, while the contest allowed competitors to attack their opponents’ systems in any way they choose, breaking into the scoring server or infrastructure was off limits. A second question came from Yevgeniy Vorobeychik, who pointed out that contests have a single winner, which is a key difference from the real world, where there isn’t necessarily a single winner. Despite this limitation, zero-sum activities can nonetheless be a useful way to shed light on aspects of engagement that may otherwise be overlooked, Brumley noted, suggesting that military leaders might not appreciate the value of exploit re-use. Tyler Moore commented that these scenarios can yield helpful insights, but some of the conclusions—such as whether to prioritize offense or defense—are specific to the game in question. To be able to translate the lessons to society in the real world, the utility models and frameworks of the games would need to reflect real-world costs and values.
SOME THOUGHTS ON THE USE OF ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN CYBERATTACKS: ECONOMIC AND PRACTICAL CONSIDERATIONS
Tyler Moore, University of Tulsa
In his talk, Moore discussed current trends and the potential implications of AI for cybersecurity. He focused on four key themes: strategic interactions between attack and defense, incentives of cyberattackers and defenders interacting with ML-based systems, an economics-based approach to understanding these dynamics, and the importance of data access for researchers.
The Current Landscape
Moore’s research emphasizes measurement of cyber crimes and the economics of security. In his work, he studies what is actually going on in the world in order to help explain the efficacy of cyberattacks and cyber defenses.
He noted that on the order of 50,000 malicious websites are being detected each week. While most attacks are automated, attackers today rarely use AI or ML, he said, largely because they do not need them to achieve their goals. On the positive side, the fact that attackers are not yet using much AI or ML suggests we might have more time to prepare for the types of AI-enabled attacks that may become more important in the future. On the negative side, the reason attackers are not using these technologies is that more pedestrian approaches continue to work, reflecting significant weaknesses in current systems. Attackers are likely to continue to guess passwords, run standard exploits, and scan for vulnerable services for as long as those actions work at scale. Moore said it is important to recognize that attackers are only as clever as they need to be. The fact that attackers do not yet seem to be focusing on AI doesn’t mean we should not prepare for new kinds of attacks that may emerge as defenses improve, he said.
Moore turned to a current and well-known vulnerability: signature-based anti-virus protections, which have been broken for 15 years. Here, attackers are able to leverage an inherent asymmetry by changing a few lines of code and making the compiled code look completely different, breaking the system’s defenses in a way that cannot be eluded. He asked whether AI could lead to new attacks that will fundamentally break today’s defenses in the same way that signature-based antivirus has been broken. Moore posited that one way to prepare for AI-based attacks is to look for analogous asymmetries.
Incentives of Attackers and Defenders
On the issue of incentives, Moore noted that defenders may not see many actual attacks, meaning that there are insufficient examples on which to train an ML model, leading it to produce a high rate of false positives. Because false positives can be costly, defenders often prioritize reducing the rate of false positives. However, this often has the effect of also driving up the rate of false negatives, by definition increasing the likelihood that they will experience a successful attack.
He noted that this challenge is exacerbated by the fact that security tools work on multiple interdependent systems and are often deployed by parties who are not the resource owners themselves. For example, a Web services company that blacklists malicious websites that it does not own will only choose to do so if it is certain that a given site is compromised, because erroneously identifying a third-party site as malicious could damage its owner, leading to complaints or even legal liability. These potential costs are a strong incentive to avoid false positives. Moore noted the significant challenge of keeping false positives low without creating openings for attackers.
In terms of economics, Moore posited that ML tools for cybersecurity are not optimized to meet the needs of either defenders or attackers. Ideally, defenders of a system want to minimize the financial damage of an attack, while attackers want to maximize the damage and get the best return on their investment. However, ML-made classification tools today are generally accuracy-based, meaning that they simply decide whether something is an attack based on a computed probability. Moore noted that it could be useful to couple this approach with some sort of metric related to both the cost of the attack (if it turns out to be a real attack) and the cost of dealing with a false positive (if it turns out not to be a real attack). Such tools would be challenging to implement, because it is hard to know how to quantify these costs. Research efforts in this area could lead to more economically informed models and improved defenses. Moore suggested that attackers may already implicitly be choosing targets based on such economic considerations.
Access to Data
In general, data is an essential input for cybersecurity research, and data will become even more necessary as we move toward AI-based cybersecurity. However, access to data is unequal, Moore said. For example, people working to solve cybersecurity problems who don’t have partnerships with a large technology company—such as those working in academia or startups—often are at a severe disadvantage in terms of access to data. While adversaries with limited resources may experience similar barriers to entry for using AI in cyberattacks—potentially keeping them at bay—the hurdle is not insurmountable and is unlikely to exist for nation-state adversaries. He suggested that broader data access for the relevant parts of the research community is thus critical for creating an environment of innovation at the interface of AI and cybersecurity.
Moore provided some insights extracted from his group’s analysis of nearly 1,000 research papers presented at top security conferences. In particular, it was found that 74 percent of the research papers used data from a publicly available data set as input for their research—a strong indicator that data is important for progress. However, of the research that produced data sets, that data was made publicly available only 15 percent of the time, suggesting that relatively few groups are actually sharing with others the new data that they are responsible for creating.17
He went on to note that the researchers who did share their data had their papers cited more often—a potential incentive for data sharing among researchers. While there may be good reasons for not sharing data—for example, privacy or confidentiality requirements—Moore suggested that the problem of data sharing will need to be addressed in order to make progress. In a follow-up question, Phil Venables of Goldman Sachs brought up the concept of data sheets as a way to encourage consistent data standards and reusability of data. Moore commented that his graduate students had a difficult time in carrying out their analysis for exactly this reason: published data is often disorganized.
17 M. Zheng, H. Robbins, Z. Chai, P. Thapa and T. Moore, 2018, “Cybersecurity Research Datasets: Taxonomy and Empirical Analysis,” presented at 11th USENIX Workshop on Cyber Security Experimentation and Test (CSET ‘18), https://www.usenix.org/system/files/conference/cset18/cset18-paper-zheng.pdf.
Wyatt Hoffman, Carnegie Endowment for International Peace
Hoffman addressed the ways in which AI or ML might affect cyber conflicts on the world stage, emphasizing an understanding of actors’ high-level strategic objectives over a focus on tactical elements of cyber engagement. His remarks aimed to create a bridge between the technical discussions about applying AI or ML in cyberattacks and the ways in which the strategic community thinks about and analyzes cyber conflicts in the real world.
Hoffman defined the concept of cyber strategy as how nation-states try to accomplish their political objectives using cyber operations—a broader perspective than one might take when considering a strategy for attacking or defending a particular network. He combined knowledge of existing technological capabilities with a current understanding of dynamics of cyber conflict, state competition, and confrontations in cyber space in order to speculate on how the landscape might evolve as AI enters the field.
Potential Uses of Artificial Intelligence and Machine Learning in Cyber Operations
Hoffman reiterated that AI or ML methods could be used to automate actions at different stages of the cyber kill chain, including hunting for vulnerabilities or introducing new attack vectors that target AI/ML algorithms or the associated training data. Adversaries’ use of AI-based methods will vary with their objectives. AI could affect both the scope and scale of an adversary’s impact, and it could enable them to carry out new kinds of operations that might otherwise have been beyond their capabilities. Hoffman provided several examples.
Researchers have already demonstrated how ML can be used to reconstruct information about industrial designs from the sounds made by manufacturing equipment such as 3D printers. Such side-channel attacks could be deployed in industrial or national security espionage. Hoffman also suggested that the substantial planning and human intelligence required for sophisticated attacks, such as the Stuxnet18 worm, could potentially be substituted, to some extent, by autonomous capabilities using AI and ML to integrate intelligence gathering and execution of an attack. This would bring such capabilities to organizations that would otherwise lack the human resources to carry out such an operation.
Another potential application of AI is in false-flag operations in which attacks are made to look like they are coming from some actor other than the actual adversary. Hoffman noted that some actors have already been experimenting with false-flag attacks, citing reports of Russian cyberattacks at the Winter Olympics in South Korea designed to look like the work of North Korea.19 While most false-flag operations have been generally fairly crude (e.g., inserting comments in a certain language), ML-based analysis of an adversary’s known malware could provide subtler, harder-to-detect ways of spoofing an attack that a target would attribute to that adversary.
Role of Nation-State Strategic Objectives
On the global stage, there are key differences in the strategic objectives of nation-state adversaries that will influence how they employ AI technologies. Hoffman highlighted Russia and China, describing characteristic cyber postures in the contexts of disinformation and acceptance of collateral damage. In his assessment, Russia deploys disinformation to sow mass confusion rather than convince people of any particular narrative. For example, when Ukrainian separatists shot down a Malaysian airlines flight, Russian strategy was to supply many different explanations and undermine people’s abilities to determine the truth of what happened. China, by contrast, uses disinformation in highly targeted campaigns in order to shape concrete narratives that align with state objectives. Thus, according to Hoffman, while ML capabilities might be most appealing to Russia for ramping up the scale
18 Rapporteur’s note: Stuxnet is a computer worm that was used to disable nuclear facilities in Iran by causing malfunction leading to physical damage of industrial equipment.
19 E. Nakashima, 2018, “Russian Spies Hacked the Olympics and Tried to Make It Look Like North Korea Did It, U.S. Officials Say,” Washington Post, February 24, https://www.washingtonpost.com/2018/02/24/44b5468e-18f2-11e8-92c9-376b4fe57ff7_story.html?utm_term=.af3fccedb0ba.
of its disinformation campaigns, for instance, through automated generation and propagation of content, China may be more focused on employing ML to improve the sophistication, tailoring, and precision of its messaging.
Different adversaries also may have different levels of tolerance for collateral damage. Russia, for example, has according to Hoffman shown a lack of concern for impacts on norms and a high tolerance for collateral damage; part of the strategy may actually be to manipulate their adversaries’ perceptions of Russian tolerance for risk. Hoffman pointed to the NotPetya global cyberattack,20 which exploited a piece of Ukrainian accounting software; it was so indiscriminate that it spread to become an existential threat to corporations, including Russia’s own gas companies. He also noted Russian efforts to synchronize conventional or unconventional military operations with cyber operations designed to obfuscate their effects. According to Hoffman, Russia also engaged in a multi-pronged cyber operation targeting the Ukrainian Central Election Commission (CEC) in 2014 that caused vandalism and disruption to CEC systems, likely aimed at masking the subtler attempt to change the reported outcome of a Ukrainian presidential election. Hoffman commented that, although the sophisticated attack on CEC was detected and thwarted, Russian state media nonetheless reported the false Ukrainian election outcome that its malware had tried to orchestrate in the Ukraine. The Russian government’s tolerance for collateral damage and disregard for norms suggests, said Hoffman, that it may be more willing to employ ML-enabled cyber capabilities with less-predictable effects, which may confer a strategic advantage over adversaries employing such capabilities in a more restrained manner to ensure control.
Asymmetries in Cyber Conflicts
Hoffman weighed in on the concept of offensive/defensive balance in cyberspace, noting that cyberspace is commonly believed to favor attackers, often described as an offense-dominant domain. However, while a cyber-attacker clearly has certain structural advantages, these advantages do not necessarily make it easier to achieve offensive strategic objectives than it is to achieve defensive strategic objectives.
With a state entity, what is important is not simply the tactical advantage gained by using a particular tool, but rather the ability to accomplish the state’s strategic objectives. With this in mind, Hoffman suggested that the term offense persistence21 is more accurate than offense dominance in the context of cybersecurity. This framing acknowledges that although attackers find it easy and low risk to constantly probe and test defenses and exploit opportunities, this does not mean that defenders inevitably lose. Adversaries engage in offensive cyber operations not because they always succeed in accomplishing their objectives but because offense persistence lends itself to constant, opportunistic cyber aggression.
More simply stated, the fact that it’s easier to penetrate a network than prevent all network intrusions does not necessarily mean that an attacker will always accomplish its strategic objective. Defenders do have some inherent advantages because they control the terrain, and application of techniques such as deception and honeypots at specific points along the cyber kill chain can help to overcome an adversary’s advantages.
Hoffman suggested that, to understand how AI or ML will affect the dynamics of cyber conflicts, we must consider how they might impact these asymmetries between cyber defense and cyber offense. For example, while ML will probably not enable defenders to prevent all attacks, it might make it harder for attackers to accomplish their objectives, or reduce the benefits of an attack. It is also possible that ML-based cyber defenses could raise the cost of an attack—for example, by forcing attackers to employ more sophisticated capabilities and increasing the risk that defenders might learn from the attack. Finally, ML may enable new methods of imposing costs on attackers through more effective and automated active defenses (e.g., honeypots that could enable further countermeasures). In this way, even as certain offensive activities will no doubt be augmented or automated, ML defenses could actually correct some of the asymmetries that currently favor attackers and change the payoff matrix for malicious activity, Hoffman suggested.
20 NotPetya is a variant of encrypting ransomware that was used in 2017 for a global cyberattack that primarily targeted the Ukraine. For more on the attack, see D. Palmer, 2017, “A Massive Cyberattack Is Hitting Organizations Around the World,” ZDNet, June 27, https://www.zdnet.com/article/a-massive-cyberattack-is-hitting-organisations-around-the-world/.
21 R. Harknett and E. Goldman, 2016, “The Search for Cyber Fundamentals,” Journal of Information Warfare 15(2):81-88.
Security Dilemmas and Escalation
Hoffman reiterated the notion that offensive and defensive capabilities coevolve as attackers and defenders learn from and adapt to each other’s tactics: superior cyber operations require the best possible understanding of one’s adversaries and how they operate. He posited that in a competitive environment between peer adversaries employing ML and AI capabilities, the way to overcome an opponent will be to improve one’s own capabilities through successive interactions with the opponent. This would create incentives to constantly probe and test adversaries’ ML- and AI-enabled capabilities. Moreover, effective application of these across the cyber kill chain—for instance, a defensive system that can deceive, interfere with lateral movement of, and deny the objective to an attacker—may trump superior application at any one point in the cyber kill chain. Taken together, these suggest that as ML and AI are incorporated and applied at higher and higher levels (i.e., operational- rather than tactical-level decisions and actions), the corresponding pressures to gather intelligence aggressively to understand adversaries will rise. States are already under significant pressure to conduct surveillance, intelligence, and reconnaissance operations in cyberspace to develop a sophisticated understanding of their adversaries and how and where to exploit them—even if only to maintain retaliatory offensive capabilities as deterrence to an attack. In the future, the widespread integration of ML into cyber operations could potentially drive a systemic escalation in such activities.
While many refer to cybersecurity as an arms race, Hoffman instead introduced the classical political science notion of a security dilemma. This term applies to the situation where two nation-states are developing capabilities that appear threatening to each other, even though their actions are not intended to be aggressive or threatening. This scenario can launch an escalatory spiral of aggressive activity, even though both states see themselves as acting defensively.
Hoffman noted that this dynamic appears to occur in the cybersecurity context: states are engaging in constant cyber operations out of fear of being attacked themselves, a so-called cybersecurity dilemma22 arising, in part, from the blurred lines between offense and defense. For example, a nation-state might have operations inside an adversary’s network in order to understand and defend against potential advanced, persistent threat (APT) attacks from that adversary and to develop offensive capabilities to deter potential attacks by them.
One significant concern about ML is that it might exacerbate the cybersecurity dilemma by increasing pressures to gather information on adversaries’ capabilities and further blurring the lines between offensive and defensive actions, spurring the escalation of cyber conflicts. Hoffman predicted that ML will result in a shift from algorithms that detect existing malware to ones that can also predict new kinds of malware and yield insights into how malware is created. Such algorithms could potentially be applied for both defensive and offensive purposes.
In the future, AI and ML capabilities are likely to be incorporated more and more into nearly every aspect of a state’s strategic interests—for example, in the economic sphere, for strategic communications, or for countering disinformation—raising the stakes for cybersecurity across the board. This pervasion of AI could lead to new security risks and further increase the strategic incentives for participating in persistent cyber engagement.23
On a positive note, Hoffman did see prospects for international cooperation, noting a shared interest in the explainability and predictability of deployed AI systems and in maintaining stability in strategic capabilities and institutions. Opportunities for cooperation could center around preventing systematically destabilizing developments such as attacks on the financial sector, propagation of AI capabilities to loosely controlled third parties, and the development of cyber operational capabilities that could unintentionally affect nuclear command and control systems—a key concern of the Carnegie Endowment for International Peace.
Fisher moderated an open discussion between panelists and the workshop audience. Participants tackled considerations around collateral damage from cyber engagements, different ways of viewing how AI and ML may be
22 As described in B. Buchanan, 2016, The Cybersecurity Dilemma: Hacking, Trust, and Fear Between Nations, Oxford University Press, Oxford, UK.
23 Rapporteur’s note: Persistent engagement or constant contact refer to a paradigm of engagement where the line between offense and defense is blurred.
used to enhance offensive or defensive actions, the role of games in illuminating these issues, and the best ways to monitor adversaries’ capabilities in this area.
Externalities and Collateral Damage
Yevgeniy Vorobeychik, Washington University, returned to the idea of how different adversaries have different tolerances for collateral damage. While competitions such as DEF CON CTF do not consider whether third parties are damaged by an action, these considerations are important in the dynamics of real-world engagements. He wondered how redesigning a competition to incorporate such dynamics would affect the way the game is played, and whether this might provide a venue through which better to understand the dynamics around collateral damage. Fisher reiterated the question of whether the United States might be at a strategic disadvantage because it cares more about collateral damage than its adversaries and invited panelists to weigh in.
Brumley said considering collateral damage can have a big impact on actors’ decisions. He provided a hypothetical example: If the National Security Agency (NSA) were to find a new zero day for Windows 10, would NSA tell Microsoft so that the company can patch it, or would it use the vulnerability to attack Russia? In practice, intelligence benefits need to be weighed against the costs (such as business problems and cybersecurity risks) that would result from exploiting rather than reporting the vulnerability. He suggested that involving economists in cyber warfare could improve the ability to reason through such questions around collateral damage.
Moore said the question is really about externalities and realizing that cyber offenses can have negative externalities. One of the fundamental problems in cybersecurity, in his view, is that some decisions are made that have adverse consequences for others, but the consequences aren’t always brought into the decision-making process.
Hoffman pointed out that real-world adversaries don’t have a static tolerance for risk or collateral damage. An adversary perceiving that it is at a greater disadvantage or experiencing greater pressure—for example, to reverse political or military losses—will be more willing to take risks. Furthermore, the more that offenses become autonomous and employ capabilities like deep neural networks, the less an adversary may understand the risks of collateral damage. The risks are even higher if more discretion is given to an autonomous system to choose how to achieve a strategic objective—it may do so in a completely unanticipated way—for example, by taking out a civilian electric grid instead of a specific U.S. military target. Hoffman suggested that there may be inherent incentives to do so precisely because ML systems tend to perform better under looser parameters or when they’re capable of “gaming” a system’s constraints24 in order to achieve an objective. The potential for an adversary to gain a strategic advantage from a greater willingness to tolerate the risks of AI- and ML-enabled systems is of significant concern.
Extracting Value from Games and Simulations
Vinh Nguyen, NSA, asked participants to elaborate on their process for determining where AI and ML can be valuable. Brumley said the most obvious way to find value is to use observational research as a starting point for developing theoretical insights. He also observed that objective competitions such as the DARPA CGC can be immensely helpful, because the parameters can be controlled to develop a deeper understanding—and that there should be more of these.
Una-May O’Reilly, Massachusetts Institute of Technology (MIT), discussed the process of creating and learning from an artificial adversarial model.25 She felt that her insights came about incrementally, and that certain elements of the model took years to frame. To build her model, she consulted work on the CGC, and in the areas of policy and economics, but formalizing all of these parameters was challenging. She is currently working to build a so-called “game sheet”—analogous to the “data sheet” for an integrated circuit or proposed for data sets—that formalizes notions such as externalities, risk tolerance, position hierarchy, and unknowns. She suggested that
24 As an example, Hoffman noted the recent case of an algorithm for playing tic-tac-toe, which won the game by requesting a move so odd that it crashed the opposing algorithm.
standardization of game sheets—incorporating insights from multiple disciplines to identify the right structure and goals—would help researchers to gain insights from games, competitions, and simulations.
When asked about the role that simulations might play, Brumley commented that he did not find the Department of Defense’s (DoD’s) simulations to be realistic, especially those that are AI-generated—and acknowledged that he might be unique in this perspective. He noted that simulations can fall short because they don’t account for the adaptive nature of competitions or the real world and rely upon known grammar. That is, simulations generate known attack patterns, while humans are creative. In competitions, on the other hand, humans may come up with novel strategies.
Lura Danley, an applied psychologist for human behavior and cybersecurity with MITRE Corporation, pointed to the value of objective observers of games for understanding their true dynamics. For example, behavioral scientists can help to identify realized human behaviors and intents and the extent to which they do or do not align with the expectations or objectives of the game, which can shed light on training needs. She observed that, if disciplinary silos can be broken down, experts from outside of technology disciplines can apply unique lenses to cybersecurity issues, leading to important insights that enhance understanding and efficacy of solutions.
Moore added that Brumley’s CTF strategy focused on optimizing the use of time and manpower, an approach directly related to economic models. Furthermore, he pointed out that economic modeling is also a ripe area for ML to aid in decision making. O’Reilly suggested that it is important for one to think about and understand threat dynamics as a stack of abstractions—from the technical implementation layer all the way up to the strategic goals.
John Manferdelli, Northeastern University, also commented on the topic of optimal focus of attention and asked whether there was evidence of sophisticated coordination or knowledge transfer between a CTF team’s offensive and defensive teams, which enables optimal effectiveness. Brumley noted that this was not explicitly explored in the competition environment, but that it would be useful to consider how the AI workforce might be optimized, in terms of both motivations and goals, to achieve the right balance or enhance synergies.
Chang pointed to Moore’s observation that attackers are not currently using much AI or ML because it’s simply not necessary for deploying a successful attack. He wondered whether a point in time might arrive where this is no longer true. What if we suddenly begin seeing AI-based attacks? He noted the concept of cryptographic agility,26 wondering whether we might consider a similar concept of AI agility—that is, an ability to switch to new defensive paradigms quickly and with ease in the face of the onset of AI-based attacks. Moore responded that cryptographic agility seems to involve abrupt or discontinuous changes, but that a transition to AI-based attacks would likely involve a more continuous process. Brumley pointed out that the agility of U.S. military systems is nowhere near the agility of a company like Google when it experiences a zero day. He said that DARPA challenges are important in demonstrating “the art of the possible” and asked whether the U.S. government was planning to fund such activities in the future. He suggested that research funding that enables continued generation of superior technology and scholars is an important enabler of future agility in computer security.
O’Reilly pointed out that while competitions address short-term horizons, academic researchers deal with longer-term goals, working on technologies that take 3 to 10 years to develop and may be deployed to meet future challenges, acknowledging sponsors such as DARPA who enable the freedom to be forward-looking. She noted a major need for an improved understanding of software, including how to leverage different code bases and languages. Regardless of whether the emphasis is on bug and vulnerability finding or bug fixing, such work needs to be invested in now, she suggested, because in the longer term, it will pay off. She noted that programming is not yet automatable, and she could foresee longer-term investments, for example, to study people’s brains via functional magnetic resonance imaging in order to improve our understanding of how to build software with human users in mind. Brumley added the importance of identifying the right people for a given effort and cautioned against just funding everyone to do software research, harkening back to previous comments about optimization of resource use.
26 Rapporteur’s note: “Cryptographic agility” refers to how quickly or easily we can transition away from a cryptographic system upon learning that it is vulnerable to attack. For an in-depth discussion, see National Academies of Sciences, Engineering, and Medicine, 2017, Cryptographic Agility and Interoperability: Proceedings of a Workshop, The National Academies Press, Washington, DC, https://doi.org/10.17226/24636.
Monitoring Artificial Intelligence and Machine Learning Capabilities
Another question focused on monitoring emerging AI and ML capabilities: How we will know when our adversaries have made progress in developing AI-based capabilities? Moore said it is possible to watch for new developments as they happen, but what we really want to be able to do is predict when they will happen. Toward this end, he suggested there is a need for more funding for measurement research, adding that new offensive techniques might be developed in response to major progress on the defensive side. Closing off an attack vector, he said, provides impetus for innovation. O’Reilly added that data-driven AI and ML methods could potentially help accomplish these measurement improvements. For example, natural language processing could be useful for analyzing trends in the cybersecurity research literature.
Chang asked about the potential for other unobtrusive methods for learning about new capabilities, such as by monitoring patent applications. Moore suggested that looking instead at the actions of the big technology platforms, because they have the most data and are likely to apply new AI or ML technologies first. Brumley agreed, noting that patent trends could be misleading and could lag actual developments. He also asked whether we might be able to use economic indicators to determine where technological breakthroughs are likely to happen. In addition, he suggested looking at the players who are gaining a lot of experience in cyberattacks (such as Russia via its disinformation campaigns) and thus gaining a lot of data and experience on how best to carry out these attacks. Could these players be transferring the knowledge gleaned from offensive cybersecurity experience to more observable enterprises, such as marketing? If so, marketing data could be a potential indicator. Fisher added that she would look to nation-states’ actions for indicators of change, noting that Stuxnet was a benchmark indicating a significant shift forward in capabilities. She also suggested that it could be fruitful to analyze the cyber platforms in regions with ongoing conflict—the places where new approaches might be most useful for strategic gain—for evidence of game-changing technologies. Nicolas Papernot pointed out that using AI methods to detect when AI is being used for offensive purposes could be problematic. In particular, as has been seen with fake content, generative models will likely be able to avoid AI-based detection methods.
Two more questions were presented for the record—the first by Sven Krassner of CrowdStrike, the second by David Martinez of MIT Lincoln Laboratory—but were not answered for lack of time:
- How many years away are we from viable offenses and viable automated defenses?
- We tend to think of ML as monolithic, impacting only the algorithms. One class of ML would be a game-theoretic approach addressing the top of the AI stack—at the decision-support level. How do we create a data set that addresses all the aspects of ML, including unsupervised learning and supervised learning, as well as things that relate to decision-support systems?