Page 1 Cite

Suggested Citation:"1 Introduction and Framing." National Academies of Sciences, Engineering, and Medicine. 2018. Recoverability as a First-Class Security Objective: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25240.

×

1

Introduction and Framing

The Forum on Cyber Resilience of the National Academies of Sciences, Engineering, and Medicine hosted the Workshop on Recoverability as a First-Class Security Objective on February 8, 2018, in Washington, D.C.

The workshop featured presentations from several experts in industry, research, and government roles who spoke about the complex facets of recoverability—that is, the ability to restore normal operations and security in a system affected by software or hardware failure or a deliberate attack. The workshop concluded with a lively discussion of the presentations, the uncertainties and complexities involved, and ideas for future research on resilience and recovery.

The meeting was open to the public. This proceedings was created from the presenters’ slides and a full transcript of the meeting and is intended to serve as a public record of the workshop presentations and discussions.

OPENING REMARKS

Fred B. Schneider, Forum Chair

Fred Schneider, the Samuel B. Eckert Professor of Computer Science at Cornell University, member of the National Academy of Engineering, and workshop chair, opened the meeting with an overview of the National Academies’ Forum on Cyber Resilience.

Page 2 Cite

Suggested Citation:"1 Introduction and Framing." National Academies of Sciences, Engineering, and Medicine. 2018. Recoverability as a First-Class Security Objective: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25240.

×

Forum workshops are designed to bring together experts in relevant fields to share their perspectives on issues that the forum views as critical topics in cyber infrastructure and cyber resilience. The forum takes a deliberately wide view of cyber infrastructure as it relates to technology’s increasingly ubiquitous presence in our lives. Workshops typically address both technical and policy implications of selected topics.

This workshop’s focus was on recovery of technological systems after a failure or intrusion. While it may seem mundane, recoverability is much more complicated than most people realize, and a lack of recoverability can have severe consequences, Schneider said. The purpose of the workshop’s presentations and discussions is to probe the issue deeply by identifying key recovery issues and understanding their importance, both within the technology world and for the wider public.

FRAMING KEYNOTE: A BROAD VIEW OF RECOVERY

Butler W. Lampson, Microsoft Research

Butler Lampson, technical fellow at Microsoft Research, kicked off the workshop with a keynote address framing the issue of recovery. He took a broad view of recovery, discussing its meaning and implications, its role within cybersecurity, and pathways for improvement.

Recoverability Scope and Goals

Recovery is a vital component of cybersecurity because it provides a means to move forward after breaches and successful attacks occur. Software bugs, hardware failures, and deliberate attacks can affect all types of systems, making recoverability relevant not only for cloud or server systems but also for end-user items such as personal computers, phones, and Internet of Things (IoT) devices.

Lampson described a successful recovery as follows: A successful recovery regains a system’s availability, integrity, and confidentiality. First, normal service should be made available quickly.¹ Second, the software and hardware should be returned to a state of integrity that is both safe and current—attributes that can be difficult to achieve at the same time. Finally, it is important to restore confidentiality, although once secrets have been made public, preventing subsequent access to them entirely can be very difficult, if not impossible.

___________________

¹ In some contexts, returning to a state of degraded or minimum essential services may be an important interim step toward normal service.

Page 3 Cite

Suggested Citation:"1 Introduction and Framing." National Academies of Sciences, Engineering, and Medicine. 2018. Recoverability as a First-Class Security Objective: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25240.

×

Recasting Recovery within Cybersecurity

Lampson noted that most consider the goal of cybersecurity efforts and practices to be to prevent intrusions and their negative consequences. Although this is a worthy goal, he noted that in his view, it is nearly impossible to prevent all intrusions. He posited that it may be more effective, and more realistic, to pursue two other security goals simultaneously—to recover easily from security breaches and to punish attackers. Indeed, many of the steps taken toward prevention can also assist in recovery efforts (e.g., inventory control of hardware and software, logging and monitoring strategies, and so on). These goals, recovery and accountability, place greater emphasis on deterrence, representing what Lampson calls a “retroactive” view of security. In combination, accountability coupled with recovery (which limits the impacts of an attack, thereby making it a less attractive course) may help deter bad actors.

Current cybersecurity approaches provide some minimal facilities for prevention and recovery, such as securing simple programs and isolating complex programs or sanitizing their inputs. However, Lampson said current approaches fall short in securing more complex systems or maintaining security after changes are made. He also observed that users cannot be expected to be skilled or informed enough to make good security decisions, which further compounds the challenge.

Lampson noted that the standard approach to cybersecurity is to isolate a system, limit access to it, and include a software guard that monitors access attempts. He observed that in theory, this setup prevents security breaches, but we know that in practice, it does not always work. These experiences, he argued, suggest a need to change how we perceive and pursue security goals.

First, instead of trying to secure everything, Lampson posited that we should prioritize what is truly important. He used the analogy of a bank vault—they are expensive and inconvenient but work very well for storing the most valuable items. Second, instead of trying to prevent all possible attacks, we should be reacting to actual attacks and focusing on deterrence and recovery, which is largely how security is tackled in the non-cyber world. Burglars, he noted, are not dissuaded by complex household locks but rather by the fear of jail time.

Retroactive security may not be perfect, but it is better than the current system, Lampson argued. It does not mean that there are no bugs in the code, it means that people who take advantage of the bugs get punished. In fact, according to Lampson, the aspects of current cybersecurity approaches that work well often draw their value from retroactive measures, such as the ability to undo fraudulent financial transactions.

Page 4 Cite

Suggested Citation:"1 Introduction and Framing." National Academies of Sciences, Engineering, and Medicine. 2018. Recoverability as a First-Class Security Objective: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25240.

×

Background and Context

The following information was provided to workshop speakers and attendees to offer context on recoverability issues and to provide a structure for the workshop and its intended purposes.

Large-scale data breaches have spotlighted the challenge of maintaining confidentiality of data.¹ Cyberattacks and breaches, such as ransomware, can also compromise availability and/or integrity of critical systems. The abilities to mitigate the effects of a successful attack and to reliably recover either to full functionality, or to a well-understood set of critical functionalities, are important; in some circumstances, recovering to full functionality is more important than the ability to protect confidentiality. This workshop will explore such recoverability as a first-class security objective—at different granularities (from documents to data centers) and from both research and operational perspectives.

A long history of cybersecurity challenges and failures suggests that preventing attacks, although important, is insufficient. Attacks will sometimes succeed, and we must be able to return a system to operation confidently and expeditiously after the attack has been detected. The security policies, practices, and implementations for most information systems today are not designed to facilitate recoverability. What would constitute an effective approach to recovery is likely to be system- and domain-specific. For some systems, complete functionality might need to be restored rapidly. For most systems, however, it may suffice to have cataloged which capabilities are critical (for instance, read access to files in cloud storage or SMS capabilities on a phone might be deemed critical) and then continually to update plans and processes that ensure those capabilities can be rapidly recovered.²

In financial services, for example, relatively clear guidelines have been established. For instance, contractual agreements are put in place that describe expectations and impose penalties when those expectations are not met. In addition, most financial transactions can be undone if need be, back-up data systems are isolated, and so on. In other contexts, similar thought is nascent. Discussion of power grid cyber resilience and associated restart capabilities, for example, is only now intensifying.

Support for system recoverability, requires new research, new tools, and new practices. Recoverability can be addressed at many levels of granularity in a system—from processes and practices that provide for ready recovery of data in an enterprise environment to large-scale transparent fail-overs or priority-feature provisions to millions of users for a web application.

Support for data recovery is often important part of overall system recoverability. After an intrusion, system operators and users may be left uncertain about:

Whether intruders have actually been evicted from the system,
What data can be trusted and what data may have been altered,

__________________

¹In 2016, the Forum on Cyber Resilience hosted a workshop on mitigating harms from data breach. A summary of discussions at that workshop is available from National Academies of Sciences, Engineering, and Medicine, Data Breach Aftermath and Recovery for Individuals and Institutions: Proceedings of a Workshop, The National Academies Press, Washington, DC, 2016, https://doi.org/10.17226/23559.

²R. Danzig, Surviving on a Diet of Poisoned Fruit: Reducing the National Security Risks of America’s Cyber Dependencies, Center for a New American Security, Washington, DC, July 2014; B. Lampson, “Viewpoint: Usable Security—How to Get It,” Comm. ACM 52(11:25-27, 2009.

Page 5 Cite

Suggested Citation:"1 Introduction and Framing." National Academies of Sciences, Engineering, and Medicine. 2018. Recoverability as a First-Class Security Objective: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25240.

×

How far in the past the intrusion occurred and thus how far the system must be backed up, and/or
Whether intruders have installed back doors that will facilitate their return.

Operationally, there is much to learn regarding how better to recover from attacks, outages, and failures. DevOps and system administration communities have an important role to play in improving the recoverability of systems.

Topics speakers at the workshop will be invited to address:

Policies and Practices

How to design effective organizational policies, terms of service, and/or guarantees that provide sufficient incentive for services to reliably recover from disruption; what policy and organizational changes help to improve recoverability prospects?
What aspects of the legal, policy, and regulatory landscape affect requirements for recoverability? For example, e-discovery rules, by which parties are expected to share documents electronically, has implications for how information systems that support legal efforts are designed and architected.
What data and metadata are needed to effectively recover after a widespread ransomware or destructive malware incident in an enterprise?
Examples of recovery approaches at a variety of scales (documents to data centers) and for various kind of security properties; case studies. To what extent can learning and insights from these experiences be generalized to be accessible to others who wish to develop better recoverability prospects?
Cyber recoverability in practice—what can we learn from DevOps and SysAdmin communities about bringing systems and capabilities back online after a breach or failure?
In addition to technical recovery what plans and processes can help recover from failures of trust (for instance, a significant certificate authority is breached, with revelation of the private signing key)?
What kinds of advance preparation are useful to help response plans be more effective?

Learning from Other Domains

What would be the digital infrastructure equivalent of a power grid “black start”? What is the equivalent of “safe mode” while in recovery? Can recovery of all or part of an on-premises configuration be quickly recovered to a public cloud environment, and what preparation would be needed to make this possible?
Assessing recoverability needs—what are the critical aspects and how can those be determined? For instance, how do organizations (or industries/sectors) prioritize between repairing damage versus providing services? What non-digital processes and capabilities aid digital recovery? How are recovery efforts complicated when IoT devices are involved?
What lessons can be learned from other industries and sectors related to community resilience and disaster response and recovery?

Research

Research that is required to facilitate recovery, including problem formulation, coping with various scale dimensions, and addressing system administration and configuration as first-class security research considerations.

Page 6 Cite

Suggested Citation:"1 Introduction and Framing." National Academies of Sciences, Engineering, and Medicine. 2018. Recoverability as a First-Class Security Objective: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25240.

×

Foundations for Recoverability

Lampson argued that in order for retroactive security to work, there needs to be a solid and secure core of the given system that can reboot to a clean state after a breach, that can verify signatures of both regular distributions and patches, and that can capture an audit log to determine where problems have occurred. The core should be able to perform updates to that clean state based on a redo log, which is preferable to a system-wide backup (where work could be lost). He noted that there should also be a secure configuration, with a list of acceptable software and trusted principals who have access to it. In addition, audit logs and user authentication can be used to detect unusual behavior.

Lampson explained that in addition to helping restore the system to a good state, these foundational elements enable accountability, and ultimately blame, so that offenders can be found and punished, whether with jail time, fines, being fired, or some other accountability measure. Lampson said that if widely implemented, a secure foundation could handle small annoyances, such as spam, as well as larger problems, such as a major security breach by a state-sponsored actor.

Today, restoring the integrity of a system as part of recovery is often very difficult and time intensive. Lampson pointed to the work of Taesoo Kim, Georgia Institute of Technology, as a promising direction in which to head. Kim and his colleagues propose an approach based on the notion of “selective redo” or “selective undo,” where only the bad acts and their consequences are undone during recovery and restoration. He said their research demonstrates that selective undo can significantly improve recoverability in a variety of scenarios.²

Confidentiality also suffers when systems are breached, and secrets cannot be retrieved once publicly revealed. However, Lampson observed, secrets can be protected in ways that make them harder to find, an approach pursued in Europe under the “right to be forgotten” approach. Or, he said, confidentiality could be enhanced by stricter enforcement of rules regarding the use of data, such as by implementing identity tags and other mechanisms to build a chain of data tracking and handling within those systems that are under the purview of effective government regulation. Implementing such measures more widely would be difficult, but well worth the effort, Lampson argued.

Special Considerations for Internet of Things

Lampson then turned to the IoT, which he said is especially problematic with regard to cybersecurity. Unlike traditional computers, IoT devices are embedded in the real world where they have physical world impacts, and the business models behind most

___________________

² T. Kim et al., “Intrusion Recovery Using Selective Re-Execution,” Proceedings of the 9th Symposium on Operating Systems Design and Implementation, 2010.

Page 7 Cite

Suggested Citation:"1 Introduction and Framing." National Academies of Sciences, Engineering, and Medicine. 2018. Recoverability as a First-Class Security Objective: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25240.

×

IoT devices result in millions of lines of code being written by inexpert programmers. In addition, update capabilities are often not as robust as in other sorts of systems, making recovery from malicious modification more challenging. The consequences of failure in these devices can be serious; malfunctions in a “smart” traffic light, for example, could cause major traffic accidents.

To address such challenges, Lampson proposed several approaches IoT companies could take. First, it could be useful to increase the emphasis on building IoT devices from common components that are thoroughly tested and formally verified, rather than separate, made-to-order components for each device. Second, Lampson urged that “safety-critical code” be identified and isolated for even stricter testing. An extension of this approach, he said, is to architecturally separate safety-critical code from non-critical code. In a smart traffic light, for example, safety critical code would tell the lights to always be red in at least one direction, and to stay yellow for 3 seconds before switching to red. The code for other, less critical functions of the light could be physically separate from these critical code components and not be able to override the safety-critical code. Lampson noted that while such approaches are feasible in a smaller system such as a traffic light, implementing them in more complex IoT devices, such as self-driving cars, will be more challenging.

Questions and Discussion

Lampson concluded by posing several questions for attendees to consider throughout the workshop:

Can the recoverability strengths of the biggest cloud services, which are thought to be the most secure, be scaled down for smaller systems?
Can we better incentivize design for recovery and make selective undo more practical?
Is there an easier way to prevent secrets from being published?
Finally, how do we know whether our current foundations are strong enough to allow for good recovery?

In closing, Lampson emphasized that recovery is a vital component of cybersecurity and that answering such questions will be important to devising effective solutions.

Tadayoshi Kohno, University of Washington, asked whether a shared definition of recovery exists and where researchers should focus their efforts, given the many challenges in this space and the many dimensions of recovery, such as timing, effects on interconnected systems, and the needs of different stakeholders. Lampson said that in

Page 8 Cite

Suggested Citation:"1 Introduction and Framing." National Academies of Sciences, Engineering, and Medicine. 2018. Recoverability as a First-Class Security Objective: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25240.

×

his view, recovery is restoring system software or hardware to a good, current state while continuing to deliver service.

With regard to the challenge of preventing breaches of confidentiality, Susan Landau, Tufts University, noted that the HTTPA protocol, developed by a team of Massachusetts Institute of Technology researchers, explored an approach to establishing data provenance for various purposes, including for tracking privacy or copyright violations.

Peter Swire, Georgia Institute of Technology, pointed out that other domains, such as counterterrorism, are moving toward emphasizing prevention rather than punishment—the opposite of the retroactive approach Lampson proposes. It is a comfortable myth to believe that our lives or our computers and devices can be made fully secure, Lampson said, but it is not possible to prevent all threats from being realized. Focusing on prevention, he argued, perpetuates the myth of security, while focusing on recovery and punishment represents a more realistic view.