Phil Venables, Goldman Sachs, moderated a panel exploring how artificial intelligence (AI)-enabled systems deployed in different contexts might themselves be attacked. Panelists included Nicolas Papernot, research scientist at Google Brain; Bo Li, assistant professor in the Department of Computer Science at the University of Illinois, Urbana-Champaign; and Zico Kolter, assistant professor in the Computer Science department at Carnegie Mellon University and chief scientist of the Bosch Artificial Intelligence Center.
As more systems, including military and intelligence systems, become AI enabled, thinking about how adversaries might exploit them becomes more critical. For example, Venables noted, adversaries might seek to exploit AI-enabled systems by interfering with inputs or training data, or potentially find ways to gain insights about training data by examining the output of specially tailored test inputs. He asked how one might think about putting AI systems into a “protective harness” or otherwise limit the risk or “blast radius” when something goes wrong. Another challenge is the fact that the workings of some AI systems cannot be fully explained. Venables suggested this lack of explainability may raise the perception of risk and constrain the use of AI for some applications, such as in safety-critical and highly regulated environments.
Nicolas Papernot, Google Brain
Papernot discussed some of the ways in which adversaries may seek to exploit AI systems, potential mechanisms for detecting or thwarting attacks, and how to translate existing principles for secure computer system design to designing secure AI systems.
Potential Avenues for Attack
Papernot began with the example of a simple machine learning (ML) model designed to predict whether patients have diabetes or anorexia or are healthy, using a medical records as inputs. Conceptually, if test points are drawn from the same distribution as the training data, the ML system will work and make correct predictions. However, in the real world, much more ambiguous examples can arise, such that the ML system makes a low-confidence or incorrect prediction. It is also likely that a large part of the input domain hasn’t been modeled, which
would lead some input queries to result in random output. A skilled adversary could fool the ML into giving the wrong output by carefully crafting a perturbation to what otherwise appears to be a legitimate input.
Papernot explained that this type of attack can be generated using a process similar to the one used to train the ML system: that is, instead of computing the derivatives of the error of the system with respect to the training parameters (as one would to optimize a model), one can instead compute the derivatives of the error of the system with respect to the input itself. In this way, an adversary or researcher can systematically find perturbations for any input in order to trick any model they want to target, Papernot said. The general technique can work against ML in any type of application—for example, image recognition, audio transcribing, or malware detection.
Attackers may also seek to compromise the confidentiality or privacy of training data. This can be done through careful observation of how a model’s predictions vary for different inputs. If a model was overfit to its training data set, it will be very sensitive to inputs similar to the training set’s outliers—these points can be inferred through careful testing. This sort of attack is known as a membership inference attack.
Papernot stressed that these examples are just two of the many ways adversaries can target ML systems and pointed out that there are opportunities for attack at every step in the ML pipeline. An adversary could poison the training data, make inferences about confidential training data based on knowledge of the model and its parameters, extract the model itself through observation of its predictions for different inputs, or learn how to perturb an input to trick the system.
Putting the Risks into Context
Given that security and privacy in ML is a major concern, Papernot posed the question: Is the security and privacy of ML systems any different from what is seen in traditional computer security, or even real-world security? In all cases, he said, security and privacy are difficult, and faster CPUs and the Internet have probably made the challenge harder rather than easier. He referenced Butler Lampson’s characterization that “[p]ractical security balances the cost of protection and the risk of loss, which is the cost of recovering from a loss times its probability.”1 In this light, ML could be seen as just another way of analyzing data—one that introduces new attack surfaces that can be exploited, potentially leading to an arms race.
However, Papernot takes an optimistic view of the potential future security of ML. He believes that ML systems are sufficiently different from traditional computer systems that they can be designed with systematic and principled approaches to security and privacy. The reason, he said, is that ML, like cryptography, can be expressed in large part in a mathematical form. He noted that progress was not made in the field of cryptography until the interaction between adversaries and defenders was formally specified, suggesting that such an opportunity could also exist for ML.
Security Requirements and Approaches
Papernot asserted that there is a need for efforts to specify ML security and privacy policies. Researchers, he argued, need to find the right abstraction or language to formalize ML security and privacy requirements with precise semantics and no ambiguity. As a useful model, he pointed to a 1975 paper by Saltzer and Schroeder that outlines 10 principles involved in protecting information in computer systems (see Box 5.1 for a complete list).2
Papernot said that all of these principles relate directly to his current research in secure ML systems. He went on to provide specific examples for three of the eight principles.3
1 B.W. Lampson, 2004, Computer security in the real world, IEEE Computer 37(6):37-46, doi:10.1109/MC.2004.17.
2 J.H. Saltzer and M.D. Schroeder, 1975, The protection of information in computer systems, Proceedings of the IEEE 63(9):1278-1308.
3 Papernot referenced a report available via his website that explains the connection between each of the eight principles and his current research: N. Papernot, 2018, “A Marauder’s Map of Security and Privacy in Machine Learning: An Overview of Current and Future Research Directions for Making Machine Learning Secure and Private,” p. 1-1 in Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, Association for Computing Machinery, New York, NY.
First, he addressed the principle of psychological acceptability, which requires having interfaces and systems that humans can understand. Papernot stressed that people will not use a defense tool that they don’t understand. He delved into this principle in the context of privacy. Since everyone has their own idea of what privacy means, security researchers have coalesced around a definition known as differential privacy, which refers to a mathematical technique used to maximize the accuracy of queries from databases while minimizing impact on the privacy of that data.4
Differential privacy algorithms have been developed to make it impossible for an adversary to tell what data from which individuals were included in a training set, so the adversary can’t learn anything about the individuals or any information about the data they contributed. The most standard algorithm for training ML algorithms is stochastic gradient descent, which takes a batch of data, computes the error, computes the gradients of the error in relation to the model parameters, and applies the gradients to update the model parameters. It can be made differentially private by (1) clipping gradients and (2) noising the clipped gradients before they are applied to update model parameters.
While this process may seem simple to researchers, those not familiar with differential privacy won’t understand it, Papernot said; as a result, the process does not achieve psychological acceptability. For that reason, Papernot’s team developed a different approach called PATE—Private Aggregation of Teacher Ensembles.5 With PATE, a user can protect sensitive data in a data set by splitting the data into partitions, where the only requirement is that any training point will be included in only one partition. One ML model (called a “teacher”) is trained on each subset of data, so that the result is numerous models trained independently to solve the same task using different subsets of the data. Each model gets a “vote” on the correct label for a given input.
To preserve privacy, the system user simply asks each teacher to vote on the label for a specific test point, but only returns the aggregated result of the vote. If all of the teachers assign the same label to the test input, the label
4 C. Dwork, F. McSherry, K. Nissim, and A. Smith, 2006, “Calibrating Noise to Sensitivity in Private Data Analysis,” pp. 265-284 in Theory of Cryptography Conference, Springer, Berlin, Heidelberg.
5 N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and Ú. Erlingsson, 2018, “Scalable Private Learning with PATE,” Sixth International Conference on Learning Representations (ICLR 2018), https://arxiv.org/abs/1802.08908v1.
is almost certainly correct, as each model arrived at the prediction independently. In addition, there is no way for the prediction to violate the privacy of any of the training sets. However, disagreements among the teachers could reveal information about the data in their sets. To reduce this risk, noise is introduced into the vote count. Overall, this approach provides a differential privacy guarantee.
As an added benefit, the model could be explained to and understood by someone even if they don’t understand the concept of differential privacy. In addition, the introduction of differential privacy also improves performance, because the model only extracts patterns that can be found across the training data, and reduces the impact of overfitting, said Papernot. Both the accuracy and privacy of the system can be improved even further by only revealing the predictions of the teachers when they all, or almost all, agree on the prediction. This result is exciting because it runs counter to conventional wisdom that privacy always comes with a tradeoff in utility.
During the discussion, Papernot further elaborated that one potential use case for PATE that has not been tried yet would be using the partitioning of subsets of data that already exist, rather than creating the subsets artificially. He gave the example of several hospitals working together to train an ML model, in which individual predictions are aggregated to train a student model that would conform better than any of the individual models without leaking private information.
Papernot then discussed work related to the principles of complete mediation—the idea that all accesses should be controlled—and compromise recording—the idea that any improper accesses be documented if they cannot be prevented. He explained that these principles translate to two major needs for cybersecurity verification: a need for improved model assurance and admission control in ML systems. According to Papernot, the ML community tends to focus on the average case performance as determined by accuracy tests of a given model. However, from a privacy and security standpoint, researchers may be more interested in the worst-case performance as a measure of reliability. While useful ways of measuring the privacy of ML systems have been developed, more work is needed to establish metrics for security of ML.
At training time, model assurance is needed to establish with confidence that security requirements are satisfied—this requires a clear security policy, or formal requirement of what we aim to achieve. Intuitively, we know that we want a system to succeed perfectly at modeling the task it was designed to model. Formally defining this is not easy. One could consider whether the implementation is correct, or whether it performs with high accuracy without introducing any undesired behavior, just as creating a backdoor for adversaries.
At test time, there is a need for admission control, that is, a way of selecting whether or not a model input/output pair should be included in a pool of answers shared with the user—a question that is difficult to answer today. The decision comes down to one’s ability to estimate the certainty of a prediction—however, determining uncertainty is difficult because the actual distribution that is being modeled is unknown. Thus, uncertainty can only be estimated, preferably in a manner that is not susceptible to manipulation by adversaries.
Papernot pointed to a system being prototyped called Deep k-Nearest Neighbors,6 which makes use of information at each layer of a deep neural network. For a given test point, the system identifies the training data whose representations most closely match the representation of the test point and compares the labels at all stages. If all of the representations remain consistent, such that the labels are the same across all layers, that means the model is predicting by generalizing, which is considered accurate with high confidence. However, in cases of adversarial manipulation, at a certain layer, the test point’s representation and label diverge from those of the training data whose input representations were similar, resulting in a mislabeled test point as an output. Papernot suggested that examination of the labels at each stage of a deep neural net, and potentially imposing some constraints on the structure of the process across all layers, might help to identify or reduce the potential for adversarially manipulated inputs.
Papernot suggested that researchers need to think about how to audit ML systems. For example, comparing mistakes made by a privacy-preserving model to those made by a non-privacy-preserving model can provide information about differences in performance and ultimately be used to improve both performance and privacy protection.
In conclusion, Papernot stressed three key points. First, more work is needed in order to identify the right abstraction or language for specifying security and privacy policies to provide assurance in ML systems. Second, he identified a need for auditing when assurance is not possible, potentially along with sandboxing, input/output validation, and compromise recording. Third, he noted that such security and privacy mechanisms should strive to align with the goals of the ML itself—such mechanisms are more likely to be adopted if they also improve a model’s performance. He noted that these complementary synergies could be explored through research on the relationship between private learning, robust learning, and generalization, or the relationship between data poisoning and learning from noisy data or in the presence of distribution drifts. As a final note, he quoted Goodhart’s law: “When a measure becomes a target, it ceases to be a good measure.”7
Bo Li, University of Illinois, Urbana-Champaign
Li discussed results from her research into physical world adversarial attacks on ML systems. These attacks are unique in that the manipulation is performed not on the digital image of the object, but on the physical object itself, meaning that the digital data collected by a sensor (e.g., a camera) is “born” as an adversarial digital object. Li illustrated specific examples of attacks generated by manipulation of two-dimensional (2D) and three-dimensional (3D) real-world objects and discussed potential mechanisms for defending against such attacks.
Li noted that ML has become more widely deployed in many types of systems and that the associated environments can be adversarial. While ML systems are often designed under the assumption that the testing data come from the same distribution as the training data, so that the model will generally be accurate, this is not the case in an adversarial environment; evasion attacks may be achieved by manipulating the distribution of either the training or testing data.
In the case of computer vision algorithms, which take digital images as inputs and provide labels for their contents as outputs, adversarial examples can be generated by manipulating the values or positions of pixels within the image, Li said. Models that make use of perception systems for obtaining data inputs—such as autonomous vehicles, which use sensors to recognize pedestrians, road signs, and other vehicles—are vulnerable to attack before the image is even digitized. Targeted manipulations could potentially cause an autonomous car to engage in dangerous behavior such as running through a stop sign.
Realizing Physical-World Attacks
While physical world adversarial attacks are possible, there are several challenges to creating the physical perturbations that would lead to an adversarial outcome. First, there are varying physical conditions, including the angle from which an image is obtained, the distance of an object from a camera, and the lighting conditions under which an image is obtained. Next, errors may be introduced in the processes of fabrication and perception of a visual perturbation—for example, an image printout might not have perfect color reproduction or might not be read correctly by a camera.
Furthermore, a physical sensing device, such as an optical camera, has limits of perceptibility and may lack the sensitivity to detect an adversarial perturbation at levels that would also be imperceptible to a human. One way
7 As articulated in M. Strathern, 1997, “‘Improving Ratings’: Audit in the British University System,” European Review 5(3):305-321.
around this is to generate perturbations that hide in plain sight—or at least in the human psyche. For example, the adversarial perturbation could be masked as graffiti, or a sticker or poster, that a human might notice but might not think to consider as a potential evasion attack.
Li showed examples of successful attacks based on physical modification of a stop sign against the sophisticated object detection systems YOLO8 and Fast R-CNN.9 For the YOLO attacks, she showed two videos, side by side, of a car approaching a stop sign. In one, the stop sign was unmodified. In the other, the stop sign was modified with black and white rectangles affixed to the surface. The modified stop sign tricked the algorithm into identifying it as a “Speed Limit 45” sign, and it was not able to correctly identify the stop sign until it was very close to it—potentially too late to actually stop and avoid an accident. She also shared another video recording made in a laboratory environment, illustrating a YOLO algorithm’s successful identification of objects such as a sofa, a TV monitor, and a chair, and its failure to identify an altered stop sign from almost every position. In this instance, the stop sign had been modified with two stickers with abstract red and green designs; the perception algorithm focused on four features on these stickers, identifying them as bottles or as people, seemingly unable to see the sign as a whole except in a few fleeting instances. Similar effects were seen against the Fast R-CNN model. These examples demonstrate that adversarial perturbations are possible in the physical world, even under a range of conditions and viewpoints, including distance and angles. They also generated significant discussion in the research community and in the public sphere about the potential security of self-driving vehicles.
While these results were significant, Li noted that a stop sign has a flattened shape, and it amounts for the most part to a 2D object whose surface can be modified with 2D stickers of images. Some researchers wondered whether autonomous vehicles could be made more secure if they relied not only on 2D sensor input, but also on 3D sensing information obtained through the lidar systems commonly deployed on self-driving cars. However, it turns out that the lidar system is also easy to attack. In this instance, the attacker similarly wants to affect the labeling outcome of the model with the minimal perturbation possible—amounting to the same optimization problem seen in the 2D scenario. The main difficulty in designing an attack in 3D is in understanding the possible strategies that could be deployed by an attacker.
Digital 3D object data is commonly represented as either a cloud of dots, or points (known as a point cloud) whose density maps out the surface of an object, or a wire-frame representation of the object’s surface. Thus, an attacker might choose to generate perturbations that have the effect of shifting points slightly from their original position—a type of perturbation not easily detectable by a human. Alternately, additional points or adversarial objects or clusters of points could be added. Li showed that these 3D models are vulnerable to perturbation. For example, researchers were able to fool a system into perceiving a bottle as a bed or a table by adding point clusters shaped like airplanes around the original object. Successful attacks have also been realized in real-world settings for arbitrary target objects.
Li then briefly addressed what the security research community has been doing to protect against adversarial attacks on lidar and vision perception systems, and against adversarial attacks more generally. She explained that many general detection and defense mechanisms have been developed to counter such attacks, although perhaps 90 percent of them are subject to further, adaptive attacks.
She pointed to multiple approaches to such models, including the k-Nearest Neighbor approach identified by Papernot, that work at multiple layers of a deep learning (DL) model. She noted that these, and other approaches based upon local intrinsic dimensionality (LID), perform quite well at separating benign and adversarial examples—and even against adaptive attacks under certain constraints. She suggested that the true solution might lie in task- or domain-specific defenses, rather than generalized approaches.
She provided an example of individual pixel classification by segmentation of images. Images should have spatial consistency, and that consistency will break when a perturbation is added. By examining the entropy for different image segments, it is possible to identify inconsistencies resulting from image alteration. According to Li, this should open opportunities to develop a detection mechanism that can find adversarial behaviors.
Similarly, the requirement of temporal consistency in video and audio should enable researchers to identify attacks embedded in corresponding files, she said.
Li briefly discussed back door attacks enabled by the poisoning of training data, specifically the case where an adversary adds a certain feature to all images of a certain labeled class, which causes any test data containing that feature to be so classified. Her current research suggests that a concept from game theory called the Shapley value can be used to detect these kinds of attacks.
In conclusion, she expressed her view that in the current adversarial environment detecting and defending against attacks can be done more effectively—and more knowledge can be gained—by focusing on specific ML tasks and data distribution properties, rather than focusing on universal defense or detection mechanisms.
Zico Kolter, Carnegie Mellon University
Kolter noted that the workshop speakers had provided many examples of how ML systems can be broken by an adversary. Deep neural networks are vulnerable to both physical and digital attacks, and many proposed defenses against adversarial attacks prove ineffective. By way of illustration, he noted that many new models accepted to a major DL research conference (the 2018 International Conference on Learning Representations) were broken by submissions to the next big ML conference (the 2018 International Conference on Machine Learning).
Nonetheless, Kolter said researchers are beginning to work toward developing provably secure ML and DL algorithms. While the problem is far from solved, and the work is challenging, Kolter suggested that recent progress provides some hope of formal methods for creating secure ML. He went on to discuss some of his research in this area.
Robust Deep Learning Algorithms
DL works by characterizing the key features of objects; these features act like coordinates on a multidimensional map within which any object can be placed. A DL algorithm aims to identify regions of the map within which all objects have the same label, operating in an iterative fashion. The nature of this process is nonlinear; even if objects come from a well-contained region of the map (i.e., a convex boundary in representation space), their associated outputs will often end up contained within a non-convex boundary in representation space. The surface that contains the output points with a given label is called the decision boundary, as it defines the line across which the label assigned to a given input will change. It is very hard to reason about the validity of these decision boundaries, Kolter said, or to make any guarantees about the accuracy of the result.
To address this problem, Kolter’s research team uses a specific strategy: it considers boundary convex relaxation of the region achievable under bounded adversarial perturbations. Then, to determine the validity of the outputs, it identifies the worst-case point as the one in the region that is closest to the model’s decision boundary. If the worst-case point is determined to be safe, then it assumes that all points within the region are safe. This was done using rectified linear unit (ReLU) networks. These networks are comprised exclusively of linear operators except for one nonlinearity, called the ReLU. They are also similar to most DL networks and can be generated from them by constraining certain parameters to zero to remove nonlinearities. By assuming upper and lower bounds on the region of interest before applying the ReLU, it is actually possible to replace the ReLU with a convex region or segment. This makes a difficult, nonlinear problem into a tractable, linear optimization problem. However, Kolter said, despite being a relatively efficient computation, it would need to be carried out too many times to be practically feasible. To get around this, the researchers made use of a characteristic intrinsic to linear programs: the existence of a dual program that can be used to form an outer bound—essentially, an upper bound on the upper bound originally constructed. It turns out that the dual is actually another deep network, which, surprisingly, is just a back-propagation network through the original network with a few free variables added to allow for freedom. This back propagation can be run once and used to define a provable upper bound on the worst-case robustness of the network.
Practically speaking, this method enables researchers to train a dual formulation of a DL network instead of the original model to achieve some guarantee of model robustness: instead of minimizing the loss of the model, the training involves optimizing some function of the loss. Kolter’s team has tested this approach using MNIST10 data, achieving the same error rate as the unmodified model but with a proven degree of robustness—as opposed to standard models, which can easily be fooled.
This approach can easily be applied to any model by plugging it into the associated code, replacing the normal loss function with the new loss function to get a provably robust model. This code has been made publicly available, and other researchers have done additional work to develop the approach further.
Opportunities for Further Improvement
While the method looks promising, Kolter noted that bounds still need to be improved and the methodology needs to scale better. When a researcher trains a normal network, bounds are only non-vacuous if the researcher trains the network on the bounds themselves, which makes them tight. However, on large networks, the bounds tend to smooth the network so much that performance suffers. This is one problem Kolter’s team is working to address. A second problem involves finding new metrics for the task at hand.
Kolter stated he presented a paper at the 2018 Conference on Neural Information Processing Systems (NeurIPS), which addressed scaling his team’s method to larger regimes.11 His team’s worst-case complexity was quadratic in its number of hidden units. The researchers used random matrices—in particular Cauchy matrices—to linearize the operations. The process requires about 50 times more computations than what is needed for a normal network, which is not a problem for big companies with significant compute resources. The research is now at a point where it can train these complex networks; however, using training data from CIFAR,12 their network had about 46 percent provable accuracy, meaning that although the computational problem has been solved, the accuracy gap on complex data sets remains large and needs to be closed, Kolter said.
Achieving a Certified Bound
Another approach Kolter’s team is testing is randomized smoothing and classification on random noise to understand the confidence that can be ascribed to a model. He noted a surprising result: under certain conditions, it can be shown that Gaussian noise is the worst case for an adversary using a linear classifier, which leads to an ability to show a certified bound on the robustness of the classifier. He also said this new work is much more scalable than past work and can be applied to something as large as ImageNet to get nontrivial bounds on the robustness of a classifier. Kolter said the work still includes only small perturbations, but the team can still acquire bounds on the scale of ImageNet, which is the first time the technique has been able to achieve this.
Alternate Physical Attacks
Kolter briefly addressed physical-world attacks, noting that his team had developed different types of attacks than those presented by Li. Rather than looking at how a stop sign could be modified, for example, they looked at what would happen if a physical sticker is placed over the camera of an autonomous car, causing a universal perturbation. The set of allowable patterns in this scenario is very restricted compared to attacks in the digital domain. To defend against such attacks, researchers are looking at different notions of distance in image space, such as signed distance. The team is developing methods to capture space changes such as how much mass in an image is moved, as well as image translations and rotations. This work leads to a more natural capture of different spatial changes and helps in building robust models that defend against these attacks.
10 The Modified National Institute of Standards and Technology database (MNIST) is a large database commonly used for training image processing systems and for training and testing in machine learning.
Kolter closed by noting that much work still needs to be done on scaling of robust models, randomization-based approaches for improving robustness, and strategies for moving toward human semantic understanding of perturbation regions.
Venables moderated an open question-and-answer session with the panelists and other workshop attendees.
Vorobeychik asked the panelists to elaborate further on how defenders would deal with attacks that modify physical objects, and how to create an appropriate abstraction of the constraints on such attacks. Kolter posited that researchers might consider creating generative models of the physical world. Such models have already been shown to do a good job at capturing semantics; perhaps those same tools could be used to capture the semantics of the world better—however, they are themselves subject to attack, pointing to a need for improved robustness and reliability before they are deployed.
Building on this point, Papernot added that as long as researchers do not have any form of input validation, they will never be able to produce a model that will respond correctly to all the possible inputs that could be presented to it. Generative models could be helpful in understanding the distribution being modeled; however, there are no current models that can be deployed in a security- or safety-sensitive domain because there will still be adversarial examples that are very close to the decision boundary that are nonetheless classified with high confidence.
Li added that researchers need to expand semantic meanings that can be explored—beyond colors and textures, for example—which will serve as a soft constraint for perturbations in the physical world. Vorobeychik, in the spirit of devil’s advocate, suggested that if the community had a complete picture of semantics, then adversarial ML would no longer be needed. In reality, the community has no good models for semantics in the area of perception.
Tyler Moore, University of Tulsa, suggested that adversarial evasion is not the only way to attack an ML-based system. Rather than designing undetectable attacks, an adversary could generate a large number of fake attacks designed to elicit many false positives. The success of such an approach would be in causing the defender to expend significant resources in relying on manual intervention. Kolter agreed that this kind of attack is different from the ones discussed by the panel. He suggested that he sees these as an issue in data-poisoning attacks, where the attacker injects examples to interfere with the training of a classifier, noting that these attacks are very different from ones that attack an existing trained classifier.
Li added that this sort of attack is also possible against a trained classifier—instead of identifying a malicious example as benign, the attacker’s goal is for the classifier to identify benign examples as malicious. She suggested that using dynamic thresholds could be a way to reduce the efficacy of such an attack. Papernot suggested a fail-safe default with two models actively deployed, one of which is trained on new data as it comes in, whose results can be compared as a means of identifying potential false positives. While this would not prevent data poisoning over time, it would at least increase the adversary’s required level of effort.
Robustness in Practice
Wenke Lee raised the possibility that adversarial examples may not adhere to the constraints assumed in the models for robustness. For instance, an evasion attack could be very different from any of the training examples, for example, in a space that the model is blind to. Provable robustness is limited by the underlying assumptions. Kolter agreed that his team demonstrated provable robustness under a specific threat model, but suggested that, as with Vorobeychik’s RobustML, a system can perform well even if the underlying assumptions do not exactly reflect practical conditions. Vorobeychik suggested that the jury is still out on the question of whether approximate
models can be validated. They seem obviously wrong, but, he argued, they can still be useful against real attacks. He said that more work is needed in the area of model validation in order to demonstrate which models will work in practice. The assumptions underlying the models presented are easy to describe mathematically; in general, they must be tested in multiple domains with multiple variations to develop an understanding of their extensibility.
The Role of Machine Learning in the Science of Cybersecurity
Workshop chair Fred Chang returned to the notion of a science of cybersecurity, pointing to two of its established areas: cryptography and formal methods. He asked about the feasibility that ML might be on a similar path. Papernot said ML is different from cryptography or formal methods because it has many different application domains; instances of ML are domain-specific and may not be transferrable to another. For example, a robust vision model might not prove robust for malware detection, and the differential privacy formalism he discussed does have limitations. ML formalisms may not be generalizable beyond a certain domain. Kolter commented that he sees his work as building formal methods for ML—although the field is still far from this goal—rather than cybersecurity per se. Lee agreed that the domains are distinct, and expressed a hope that at some point the ML community’s progress might be translatable to cybersecurity.