The Forum on Cyber Resilience of the National Academies of Sciences, Engineering, and Medicine hosted the Workshop on Recoverability as a First-Class Security Objective on February 8, 2018, in Washington, D.C.
The workshop featured presentations from several experts in industry, research, and government roles who spoke about the complex facets of recoverability—that is, the ability to restore normal operations and security in a system affected by software or hardware failure or a deliberate attack. The workshop concluded with a lively discussion of the presentations, the uncertainties and complexities involved, and ideas for future research on resilience and recovery.
The meeting was open to the public. This proceedings was created from the presenters’ slides and a full transcript of the meeting and is intended to serve as a public record of the workshop presentations and discussions.
Fred B. Schneider, Forum Chair
Fred Schneider, the Samuel B. Eckert Professor of Computer Science at Cornell University, member of the National Academy of Engineering, and workshop chair, opened the meeting with an overview of the National Academies’ Forum on Cyber Resilience.
Forum workshops are designed to bring together experts in relevant fields to share their perspectives on issues that the forum views as critical topics in cyber infrastructure and cyber resilience. The forum takes a deliberately wide view of cyber infrastructure as it relates to technology’s increasingly ubiquitous presence in our lives. Workshops typically address both technical and policy implications of selected topics.
This workshop’s focus was on recovery of technological systems after a failure or intrusion. While it may seem mundane, recoverability is much more complicated than most people realize, and a lack of recoverability can have severe consequences, Schneider said. The purpose of the workshop’s presentations and discussions is to probe the issue deeply by identifying key recovery issues and understanding their importance, both within the technology world and for the wider public.
Butler W. Lampson, Microsoft Research
Butler Lampson, technical fellow at Microsoft Research, kicked off the workshop with a keynote address framing the issue of recovery. He took a broad view of recovery, discussing its meaning and implications, its role within cybersecurity, and pathways for improvement.
Recoverability Scope and Goals
Recovery is a vital component of cybersecurity because it provides a means to move forward after breaches and successful attacks occur. Software bugs, hardware failures, and deliberate attacks can affect all types of systems, making recoverability relevant not only for cloud or server systems but also for end-user items such as personal computers, phones, and Internet of Things (IoT) devices.
Lampson described a successful recovery as follows: A successful recovery regains a system’s availability, integrity, and confidentiality. First, normal service should be made available quickly.1 Second, the software and hardware should be returned to a state of integrity that is both safe and current—attributes that can be difficult to achieve at the same time. Finally, it is important to restore confidentiality, although once secrets have been made public, preventing subsequent access to them entirely can be very difficult, if not impossible.
1 In some contexts, returning to a state of degraded or minimum essential services may be an important interim step toward normal service.
Recasting Recovery within Cybersecurity
Lampson noted that most consider the goal of cybersecurity efforts and practices to be to prevent intrusions and their negative consequences. Although this is a worthy goal, he noted that in his view, it is nearly impossible to prevent all intrusions. He posited that it may be more effective, and more realistic, to pursue two other security goals simultaneously—to recover easily from security breaches and to punish attackers. Indeed, many of the steps taken toward prevention can also assist in recovery efforts (e.g., inventory control of hardware and software, logging and monitoring strategies, and so on). These goals, recovery and accountability, place greater emphasis on deterrence, representing what Lampson calls a “retroactive” view of security. In combination, accountability coupled with recovery (which limits the impacts of an attack, thereby making it a less attractive course) may help deter bad actors.
Current cybersecurity approaches provide some minimal facilities for prevention and recovery, such as securing simple programs and isolating complex programs or sanitizing their inputs. However, Lampson said current approaches fall short in securing more complex systems or maintaining security after changes are made. He also observed that users cannot be expected to be skilled or informed enough to make good security decisions, which further compounds the challenge.
Lampson noted that the standard approach to cybersecurity is to isolate a system, limit access to it, and include a software guard that monitors access attempts. He observed that in theory, this setup prevents security breaches, but we know that in practice, it does not always work. These experiences, he argued, suggest a need to change how we perceive and pursue security goals.
First, instead of trying to secure everything, Lampson posited that we should prioritize what is truly important. He used the analogy of a bank vault—they are expensive and inconvenient but work very well for storing the most valuable items. Second, instead of trying to prevent all possible attacks, we should be reacting to actual attacks and focusing on deterrence and recovery, which is largely how security is tackled in the non-cyber world. Burglars, he noted, are not dissuaded by complex household locks but rather by the fear of jail time.
Retroactive security may not be perfect, but it is better than the current system, Lampson argued. It does not mean that there are no bugs in the code, it means that people who take advantage of the bugs get punished. In fact, according to Lampson, the aspects of current cybersecurity approaches that work well often draw their value from retroactive measures, such as the ability to undo fraudulent financial transactions.
Background and Context
Foundations for Recoverability
Lampson argued that in order for retroactive security to work, there needs to be a solid and secure core of the given system that can reboot to a clean state after a breach, that can verify signatures of both regular distributions and patches, and that can capture an audit log to determine where problems have occurred. The core should be able to perform updates to that clean state based on a redo log, which is preferable to a system-wide backup (where work could be lost). He noted that there should also be a secure configuration, with a list of acceptable software and trusted principals who have access to it. In addition, audit logs and user authentication can be used to detect unusual behavior.
Lampson explained that in addition to helping restore the system to a good state, these foundational elements enable accountability, and ultimately blame, so that offenders can be found and punished, whether with jail time, fines, being fired, or some other accountability measure. Lampson said that if widely implemented, a secure foundation could handle small annoyances, such as spam, as well as larger problems, such as a major security breach by a state-sponsored actor.
Today, restoring the integrity of a system as part of recovery is often very difficult and time intensive. Lampson pointed to the work of Taesoo Kim, Georgia Institute of Technology, as a promising direction in which to head. Kim and his colleagues propose an approach based on the notion of “selective redo” or “selective undo,” where only the bad acts and their consequences are undone during recovery and restoration. He said their research demonstrates that selective undo can significantly improve recoverability in a variety of scenarios.2
Confidentiality also suffers when systems are breached, and secrets cannot be retrieved once publicly revealed. However, Lampson observed, secrets can be protected in ways that make them harder to find, an approach pursued in Europe under the “right to be forgotten” approach. Or, he said, confidentiality could be enhanced by stricter enforcement of rules regarding the use of data, such as by implementing identity tags and other mechanisms to build a chain of data tracking and handling within those systems that are under the purview of effective government regulation. Implementing such measures more widely would be difficult, but well worth the effort, Lampson argued.
Special Considerations for Internet of Things
Lampson then turned to the IoT, which he said is especially problematic with regard to cybersecurity. Unlike traditional computers, IoT devices are embedded in the real world where they have physical world impacts, and the business models behind most
2 T. Kim et al., “Intrusion Recovery Using Selective Re-Execution,” Proceedings of the 9th Symposium on Operating Systems Design and Implementation, 2010.
IoT devices result in millions of lines of code being written by inexpert programmers. In addition, update capabilities are often not as robust as in other sorts of systems, making recovery from malicious modification more challenging. The consequences of failure in these devices can be serious; malfunctions in a “smart” traffic light, for example, could cause major traffic accidents.
To address such challenges, Lampson proposed several approaches IoT companies could take. First, it could be useful to increase the emphasis on building IoT devices from common components that are thoroughly tested and formally verified, rather than separate, made-to-order components for each device. Second, Lampson urged that “safety-critical code” be identified and isolated for even stricter testing. An extension of this approach, he said, is to architecturally separate safety-critical code from non-critical code. In a smart traffic light, for example, safety critical code would tell the lights to always be red in at least one direction, and to stay yellow for 3 seconds before switching to red. The code for other, less critical functions of the light could be physically separate from these critical code components and not be able to override the safety-critical code. Lampson noted that while such approaches are feasible in a smaller system such as a traffic light, implementing them in more complex IoT devices, such as self-driving cars, will be more challenging.
Questions and Discussion
Lampson concluded by posing several questions for attendees to consider throughout the workshop:
- Can the recoverability strengths of the biggest cloud services, which are thought to be the most secure, be scaled down for smaller systems?
- Can we better incentivize design for recovery and make selective undo more practical?
- Is there an easier way to prevent secrets from being published?
- Finally, how do we know whether our current foundations are strong enough to allow for good recovery?
In closing, Lampson emphasized that recovery is a vital component of cybersecurity and that answering such questions will be important to devising effective solutions.
Tadayoshi Kohno, University of Washington, asked whether a shared definition of recovery exists and where researchers should focus their efforts, given the many challenges in this space and the many dimensions of recovery, such as timing, effects on interconnected systems, and the needs of different stakeholders. Lampson said that in
his view, recovery is restoring system software or hardware to a good, current state while continuing to deliver service.
With regard to the challenge of preventing breaches of confidentiality, Susan Landau, Tufts University, noted that the HTTPA protocol, developed by a team of Massachusetts Institute of Technology researchers, explored an approach to establishing data provenance for various purposes, including for tracking privacy or copyright violations.
Peter Swire, Georgia Institute of Technology, pointed out that other domains, such as counterterrorism, are moving toward emphasizing prevention rather than punishment—the opposite of the retroactive approach Lampson proposes. It is a comfortable myth to believe that our lives or our computers and devices can be made fully secure, Lampson said, but it is not possible to prevent all threats from being realized. Focusing on prevention, he argued, perpetuates the myth of security, while focusing on recovery and punishment represents a more realistic view.