In 2017, researchers discovered a vulnerability in microprocessors used in computers and devices all over the world. The vulnerability, named Spectre, combines side effects from caching and speculative execution, which are techniques that have been used for many years to increase the speed at which computers operate.
The challenge that Spectre and similar vulnerabilities such as Meltdown present is distinctive. First, the problems lie in computer hardware rather than software. And second, the vulnerable hardware is deployed in millions of devices manufactured by multiple companies worldwide. The discovery upends a number of common assumptions about cybersecurity and draws attention to the complexities of the global supply chain and global customer base for the vast range of devices and cloud capabilities that all computer users rely on. These sorts of vulnerabilities raise important questions about the interplay between hardware and software, the security of Internet of Things (IoT) devices and cloud deployments, and how and when different stakeholders should be notified when such vulnerabilities are discovered.
To explore the implications of this development, the Forum on Cyber Resilience of the National Academies of Sciences, Engineering, and Medicine hosted the workshop Beyond Spectre:
Confronting New Technical and Policy Challenges on October 3, 2018, in Washington, D.C.
The workshop began with an overview of the Spectre vulnerability, followed by panel discussions focusing on its implications for hardware and software engineering; cloud services and isolation; and international and national security and vulnerability disclosure. A range of experts from industry and academia were invited to relate their experiences with Spectre and explore repercussions for hardware, software, and policy. A final discussion summarized the day’s conversation and pointed to future areas of research.
The workshop was open to the public. This proceedings was created from the presenters’ slides and a full transcript of the meeting; it is intended to serve as a public record of the workshop presentations and discussions.
Fred B. Schneider, Cornell University, a member of the National Academy of Engineering and chair of the Forum on Cyber Resilience, opened the meeting with a brief introduction to the Forum on Cyber Resilience and a framing discussion for the day.
Schneider suggested that the discovery of the family of vulnerabilities known as Spectre and Meltdown represented a “teachable moment” for researchers, technologists, and policy makers. This form of vulnerability, he said, is “present in virtually every computer in use, transcends manufacturers as well as country of origin, and has enormous policy ramifications.” This development, he noted, also marks the start of a troubling new era in cyber intrusion—an era in which attackers are changing focus to target vulnerabilities in a computer’s hardware. Mitigation, the practice of reducing the severity of a vulnerability, and patches for hardware are far more difficult to deploy than for software.
Schneider observed that although the security community has been grappling with how best to manage disclosure of vulnerabilities for a long time, Spectre highlights some of the policy and management
complexities surrounding vulnerability disclosure. As always, if a vulnerability is disclosed before it is fixed or mitigated, bad actors can take advantage of the opening while users and systems operators are awaiting protective measures. And if a vulnerability is discovered but never publicly disclosed, governments or other actors could weaponize it. But in the case of Spectre, the pervasiveness of the problem necessitated the involvement of many more entities than usual in the disclosure and mitigation process.
Schneider described how, in the typical disclosure process, the discoverer first notifies those who can mitigate or fix the vulnerability. Once a fix has been developed, which can take significant time, a patch is disclosed to those who can validate and disseminate it, and finally to those whose data are at risk. Sometimes, it is necessary to involve governments early in the process as well—for example, when an attack could affect national infrastructures. Yet, as the circle of awareness widens, it becomes more difficult to keep the vulnerability secret from potential attackers.
Once discovered, Schneider said, vulnerabilities ideally are mitigated or fixed. With Spectre, neither is completely feasible. Mitigating the attack risk in this case could reduce computing speed, might not be 100 percent effective, and might even cause the hardware to become nonfunctional. Fixing the vulnerability would require completely redesigning processors and disseminating the fix through a highly complex supply chain that includes producers of the chips, device manufacturers who put those chips in devices, and ultimately product users.
Schneider noted that Spectre is forcing a rethinking of the entire disclosure, mitigation, and repair process, and also it is challenging, in a broader sense, our expectations of computers. People expect successive generations of hardware to perform ever better, but eliminating the Spectre vulnerability might well require the next generation to sacrifice performance in favor of security. New systems, he said, could see decreased performance, while systems currently in use might not be fixable at all.
Schneider described how the discovery of Spectre also highlighted key misconceptions about the relationship between hardware and software. Although often considered to be separate
entities, in reality they are not. Higher-level layers in the computing stack, such as applications, interact with lower layers, such as operating systems, instruction set architectures, processors, and microarchitecture. Software developers and software system architects make assumptions about how hardware will behave by giving abstractions of what the hardware actually does. However, caching and speculative execution—functionality at the heart of the Spectre vulnerability—temporarily violate those abstractions. They treat data in ways that violate the usual assumptions software developers make about hardware. In the worst case, this class of problems exposes fundamental misconceptions about the contract between hardware and software.
Schneider described the general problem. Security of a given layer is typically achieved by making assumptions about what behaviors lower layers allow and forbid. If those assumptions are ever violated—especially when the violation is intermittent and quickly reversed—it may be essentially impossible to detect, but nevertheless higher layers of the system would no longer be guaranteed to meet security requirements. This is, in brief, what happened with Spectre. Moreover, as has now become increasingly clear, lower-level interfaces are not well enough specified for making security guarantees about functionality being implemented at higher levels. The result is a chink in the armor, which attackers are now exploiting.
Further complicating the picture, Schneider went on to say, is that computer hardware is typically time-multiplexed, so it can be shared by sets of processes or users. And the problematic assumptions violated by Spectre and related attacks are assumptions about isolation. An attacker thus has a way to learn secrets associated with users that just happen to be sharing a processor with that attacker.
Schneider noted that although Spectre is an important and pervasive vulnerability, there are many other vulnerabilities that are
easier to exploit in practice. Nevertheless, Spectre raises still more questions about how technology and policy can address complex vulnerabilities. How do we determine how likely it is that someone will use Spectre to launch an attack? Is it worth spending resources to fix it? Does all of the affected hardware need to be replaced, and is that even possible? Where should resources be invested—in supply chains, in changes to the microarchitecture, or in finding and fixing bugs? How should we best balance function, performance, security, and convenience in our computing systems?
In closing, Schneider highlighted the workshop’s goals—to address various areas of fallout from Spectre, including improved hardware and software, the effects on cloud services, and implications for national and international security—and expressed his hope that participants not only would ask questions but also suggest areas for future policy discussions and technological research to address these challenges.
Paul Kocher, independent researcher and member of the forum, was one of the researchers that discovered the Spectre vulnerability. He delivered a keynote address describing the process of Spectre’s discovery and disclosure, and he offered insights on its implications.
The trail that led Kocher to the discovery of Spectre began when a colleague asked him whether speculative execution—an optimization technique in which a task is performed in advance of knowing with certainty whether it will be needed—had any security implications. Kocher noted that this is a question that sits at the intersection of two important features of computing technology that are often at odds with each other: security and speed. Technological innovation has created faster and better processors, enabling computers to do amazing things, but these advances raise security risks that tend to receive far less attention. Kocher observed that Spectre has revealed these risks and the inherent flaws in our way of thinking about advances in computing technology.
Optimizing for Speed
Kocher noted that increasing processor speeds makes good business sense. One way to increase speed is to increase the clock rate of the microprocessor. From 1990 to approximately 2004, processors went from 33 million clock cycles per second to over 4 billion. Improvements in clock frequency since then have been at a much slower pace, for many technical and economic reasons, so the focus has changed to reducing the number of clock cycles needed to complete programs.
Kocher described the memory subsystem as one key area of optimization. Program execution comprises a series of instructions. In a nonoptimized setup, a processor retrieves each piece of information a program needs from external memory, adding a delay of several hundred clock cycles per access. Optimized processors increase performance by reducing the time needed for retrieval in several different ways. By far the most popular method, built into nearly every device in use today, is to keep recently calculated or frequently used code in a cache, where it can be retrieved quickly. A small percentage of data in a system are used frequently, and keeping that data cached increases performance speeds by a factor of 100 or more. Because of this huge performance gain, most existing devices have very complex memory caches.
Other optimizations to improve performance include multitasking, performing instructions in a more efficient order, and speculative execution, where the processor guesses the likely path and works on the next instructions before it has determined whether it will actually need to execute them. Except as gains in processing speed, these optimizations are not visible to programmers, and they are considered “correct” because they ultimately provide the right answer—allowing a program to function properly—regardless of the particular steps (including discarded missteps) taken to arrive at that answer. However, Kocher highlighted, speculative execution opens a serious security vulnerability.
Overlooking a Weakness
Kocher illustrated how speculative execution can be exploited to reveal secret information from a cache, recounting his own experiments in uncovering Spectre. Speculative execution involves the CPU guessing what operations the program might perform and beginning work on these in advance. When the processor guesses incorrectly, the CPU can perform operations that would never occur during normal (“in-order”) program execution. Furthermore, these speculatively executed instructions can, potentially, access arbitrary information in the computer’s memory, including sensitive data. Although the CPU tries to discard the results of erroneous calculations, these calculations can still leave behind subtle effects.
In a typical Spectre attack, the attacker manipulates the processor into speculatively executing the victim program’s instructions in a manner that loads a sensitive value from the victim (e.g., by reading from an attacker-chosen location in the victim’s memory area) and then uses this value to form the address for a second memory read. The speculative memory read modifies the state of the memory cache, allowing an attacker to determine sensitive data values even after the actual read operation is unwound. This occurs because, if the attacker subsequently performs a memory read at each potential address for the second memory read, the address actually read during speculative execution will return much more quickly, while reads from other locations will be much slower because that data are uncached.1
The situation with Spectre resulted from three fundamental gaps, Kocher said. First, software programmers do not typically take into account hidden complexities such as speculative execution in hardware. There is a widespread misunderstanding of modern computing architecture: while programmers are taught to design algorithms, software, and systems under the assumption that the hardware will perform the steps sequentially, in the order they are programmed, that assumption is not always sound. There are often important differences between what the programmer understands
1 The published version of this work is P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Haas, M. Hamburg, et al., 2019, “Spectre Attacks: Exploiting Speculative Execution,” accepted paper, 40th IEEE Symposium on Security and Privacy 2019, https://spectreattack.com/spectre.pdf.
and intends, and what the architecture is capable of. Second, security functions rely on the hardware executing code properly, but speculative execution can cause a processor to do things not expected by a software program. Moreover, defining what is proper with regard to hardware execution is not simple, and chip designers do not always agree. With respect to speculative execution, much of the time the computer predicts correctly what data will be needed. When it predicts incorrectly, the data are discarded, unused. However, crucially, what Spectre makes clear is that even these unused data can still be determined by attackers. The third problem Kocher pointed to is an expertise gap: few security experts specialize in hardware details, and few hardware designers specialize in security. This has created a situation in which vulnerabilities involving hardware or the hardware-software interface can slip through unnoticed.
A Complicated Disclosure Process
Kocher asserted that Spectre is nothing like a typical vulnerability. Typical vulnerabilities affect one company, can be fixed through patches or updates, and are considered bugs.
In contrast, multiple Spectre variants have been discovered by multiple people, and they involve a tangled web of technical issues with unprecedented implications for companies’ liability and reputation. As a result, their discovery became, in Kocher’s words, “a big, complicated mess” that landed everyone involved in very unfamiliar territory. Spectre affects virtually all processor makers, operating systems, cloud services, drivers, databases, and more. Kocher noted that most experts consider it unable to be completely patched or fixed.
Kocher also described how the embargo and disclosure process for Spectre was as atypical as the vulnerability itself. The goal of such an embargo is to reduce the risk of exploitation before a vulnerability is patched. Typically, over a period of a few months, the vendor is notified, a patch is created and released, and the disclosure is made public. But a problem in the architecture, such as Spectre, cannot be fixed in a matter of months—manufacturers would have to redesign a processor, test it, integrate it into new machines, sell the new machines, and retire the old, vulnerable ones. Older processor models
continue to be used (e.g., in embedded applications) long after they have become obsolete for new laptops or desktops. Thus, Kocher said, it could be up to 30 years before vulnerable implementations cease being manufactured entirely, much less being deployed, even if new designs were available today.
The hardware-based nature of Spectre also makes mitigation particularly challenging. Some microcode-level mitigations have been attempted, but they significantly reduce processor performance, only partially address the overall issue, and make unrealistic assumptions about the availability of software updates, and initial versions (because the rush to respond left limited time for testing) introduced bugs that caused instability. Software-only approaches have also been challenging, Kocher explained, due to issues including differences in processor architectures, lack of information about proprietary hardware designs, lack of suitable development tools, performance impact, testing/verification challenges, and the difficulty of fixing legacy code.
After discovering the vulnerability, Kocher described how he followed the standard industry practice for responsible disclosure. He notified Microsoft, Intel, and other manufacturers, worked with other researchers to define the variants, and cooperated on establishing and respecting an embargo on disseminating information to the public about the vulnerability until vendors could take steps to address it. But that process quickly became more chaotic than usual. Kocher described how miscommunication, anxiety, and competing incentives and interests among companies, researchers, and the press led to confusion and a mishandling of information. Some companies were kept informed, while others were left in the dark. Each company took a different approach to notifying customers and handling press interactions, adding to the chaos.
A New Mindset on Security Vulnerabilities
Kocher emphasized that Spectre is not actually a “bug” as the term is generally understood. Speculative execution works as designed. The problem is with the design itself. That is, today’s computing architectures—the “contract” between hardware and software—are ambiguous and do not provide sufficient guarantees for security. The nature of the problem, he said, suggests a need for some fundamental changes in how we think about security, hardware, and software.
For example, Kocher suggested that security efforts should attempt to reconcile the hardware properties assumed by software against the guarantees provided by actual hardware implementations. Some of this would be addressed by better tools for software designers to support secure design. The other issue is that Spectre is also a symptom of a mindset that overvalues performance and falsely assumes that architecture guarantees security. Although it may be possible to make a few corrections that add better security into the architecture, it would be more effective in the long run, Kocher argued, to place the burden of security directly into the architecture. It is also important to be mindful of metrics for success and how they are communicated; speed gains are far easier to measure and market to consumers than security gains.
Kocher expects many future hardware vulnerabilities to be discovered, and he listed specific areas he believes are underexplored. These areas will present exploitable vulnerabilities risks affecting (1) confidentiality due to timing channels as in Spectre, or failures due to incomplete memory sanitization on context switches; (2) data integrity compromise due to cross talk between elements on chips and clock control timing skew; (3) data disclosure attacks due to differential power usage and power glitches, and low-probability errors due to race conditions in distributed systems; (4) lack of availability as serious as complete hardware destruction due to nonvolatile memory write exhaustion, overheating, or electromigration; (5) improper operation due to malicious fuse blowing, and malicious field-programmable gate array bitstreams; and (6) issues due to inadequate manufacturing, infrastructure, and policy implementations (such as might occur from insecure factory
key programming, mandatory backdoors, and attacks against code signing keys). Issues that can destroy hardware are of particular concern, especially if exploited in a coordinated way by state-level actors, he said.
Kocher then put Spectre in context. Despite tremendous technical evolutionary leaps, we are in many ways stuck in the past with regard to computer system design. Current approaches to chip design are a legacy of the era of single-user offline PCs, rather than being optimized for today’s cloud service models. As consumers, we settle for decent reliability in noncritical functions like video games, where processing speed is paramount, but that low expectation is often insufficient for critical infrastructure like banking.
Although chip design used to be constrained by what is manufacturable, we are now pushing the limits of what is understandable, Kocher said. This is risky, he noted, because complexity decreases security. Emphasizing that “faster” and “safer” are very different goals, Kocher asserted that there is no “Goldilocks” solution to the tug-of-war between them. Different activities require different levels of security, speed, risk tolerance, and verification.
Kocher closed with a story about the reindeer population on St. Matthew Island, which famously underwent a population boom and crash. First, food was plentiful and conditions for growth were good. Then, when the population peaked, resources became scarce and many reindeer died. In the computing realm, the technological benefits to society rose first, and computing became essential to the functioning of the world economy. But now, insecurity is rising faster, and its costs are steadily increasing. While technology has grown rapidly, insecurity will grow even faster, especially as devices become more complex and less well understood. Increasing speed brings marginal benefits but opens us up to new, incalculable risks. Eventually, the costs of insecurity will outpace the benefits. Where is the inflection point, Kocher asked, and where can we make changes before it is too late?
In a brief question-and-answer session, Butler Lampson, Microsoft, asked how reverting to a prespeculative execution and precaching operational model would affect performance speeds. Noting that speculative execution has been around for almost 30
years (although it became widespread only 15-20 years ago), Kocher estimated that chips without these optimizations would likely be noticeably slower.
Kocher added that fixing or mitigating for Spectre could create a modest performance hit while still leaving systems vulnerable to attack from new variants. Even without speculative execution, processors would not be truly secure. Caches and pipeline programming have flaws, and there are many other open vulnerabilities that should be addressed.2
Companies must reconsider the balance between profit and security, Kocher argued, and perhaps make inroads toward better security by creating more isolation in system layers, using different security attributes for different applications, and ending the reliance on a one-size-fits-all model.
2 The research community is exploring new approaches to computer architecture to begin to address these and related challenges.