Americans’ safety, productivity, comfort, and convenience depend on the reliable supply of electric power. The electric power system is a complex “cyber-physical” system composed of a network of millions of components spread out across the continent. These components are owned, operated, and regulated by thousands of different entities. Power system operators work hard to assure safe and reliable service, but large outages occasionally happen. Given the nature of the system, there is simply no way that outages can be completely avoided, no matter how much time and money is devoted to such an effort. The system’s reliability and resilience can be improved but never made perfect. Thus, system owners, operators, and regulators must prioritize their investments based on potential benefits. Most interruptions result from physical damage in a local part of the distribution system caused by weather, accidents, or aging equipment that fails. Less frequently, major storms and other natural phenomena, operations errors, and pernicious human actions can cause outages on the bulk power system (i.e., generators and high-voltage power lines) as well as on distribution systems.
RESILIENCE IS BROADER THAN RELIABILITY
This report of the Committee on Enhancing the Resilience of the Nation’s Electric Power Transmission and Distribution System focuses on identifying, developing, and implementing strategies to increase the power system’s resilience in the face of events that can cause large-area, long-duration outages: blackouts that extend over multiple service areas or states and last several days or longer. Resilience is not just about lessening the likelihood that these outages will occur. It is also about limiting the scope and impact of outages when they do occur, restoring power rapidly afterwards, and learning from these experiences to better deal with events in the future.
The power system has been undergoing dramatic changes in technology and governance. In some parts of the United States, power is still supplied by regulated, vertically integrated utilities that generate electricity in large power plants, move that power out over high-voltage transmission systems, and distribute it to end-use customers—all under that single utility’s control. In other parts of the country, electric utilities have been restructured to promote competitive markets, particularly in wholesale power sales between generators and electricity distribution companies. In the more market-oriented parts of the country, high-voltage transmission lines that connect wholesale buyers and sellers are regulated or publicly owned, as are most distribution systems that provide the poles, wires, and equipment to serve retail customers. However, the flows over those wires and customers’ responses are increasingly determined by market forces. Efforts to improve resilience must accommodate institutional and policy heterogeneity across the country.
There has been significant growth in instrumentation and automation at the level of the high-voltage, or bulk power, system. This allows the system to operate more efficiently and provides system operators with much better situational awareness; this can improve grid reliability and resilience in the face of outages, but this added complexity can also introduce cybersecurity vulnerabilities. Analogous technological advancements on distribution systems (i.e., “smart grids”)—including improved sensing, communication, automation technologies, and advanced metering infrastructure—are occurring piecemeal across the country.
In some states, such as Hawaii and California, distributed energy resources, including distributed generation, demand response, energy efficiency, customer-owned storage, microgrids, and electric vehicles, are a rapidly growing fraction of the overall resource mix that must be planned and managed to maintain grid reliability, resilience, and security. However, despite these developments, for at least the next two decades, most U.S. customers will continue to depend on the functioning of the large-scale, interconnected, tightly organized, and hierarchically structured electric grid.
Strategies to enhance electric power resilience must accommodate both a diverse set of technical and institutional arrangements and a wide variety of hazards. There is no “one-size-fits-all” solution to avoiding, planning for, coping with, and recovering from major outages.
FRAMEWORK AND ORGANIZATION
Chapter 1 provides a brief introduction to the electricity system and motivation for this report. Chapter 2 summarizes the present state of the electricity system and the various ways it may evolve in the future, as well as metrics used to monitor grid reliability and resilience. Chapter 3 identifies, discusses, and compares a range of natural hazards and accidental and pernicious human actions that could cause major disruptions in service. Many of these, listed in Box S.1, have caused outages or impacted electricity system functions at varying scales over the past 30 years, either in the United States or globally. Others hold the potential to become major causes of disruption in the future.
Building a strategy to increase system resilience requires an understanding of a wide range of preparatory, preventative, and remedial actions, as well as how these impact planning, operation, and restoration over the entire life cycle of different kinds of grid failures. Strategies must be crafted with awareness and understanding of the temporal arc of a major outage, as well as how the needs differ from one type of event to another. It is also important to differentiate between actions designed to make the grid more robust and resilient to failure (e.g., wind-resistant steel or concrete poles rather than wood poles) and those that improve the effectiveness of recovery (e.g., preemptively powering down some pieces of the system to minimize damage). Some actions serve both strategies, some serve one but not the other, and some serve one while inhibiting the other. Similarly, the timing of repairs is different depending on the cause. For example, repairs can begin immediately after a tornado has passed, but flooding following a hurricane can delay the start of repair and impede repair efforts. Good planning and preparation are essential to mitigating, coping with, and recovering from major outages. Both human and technical systems must be designed before grid failure so that the responders can assess the extent of failure and damage, dispatch resources effectively, and draw on established component inventories, supply chains, crews, and communication channels.
Anticipating and Preparing for Disruption
While the possibility of large-area, long-duration blackouts cannot be totally eliminated, there is much that can be done to decrease their likelihood and reduce their magnitude, should they occur. Chapter 4 assesses a variety of techniques that can be employed before an event occurs in order to enhance system resilience. These include improving the health and reliability of the individual grid components (e.g., through asset health monitoring and preventive- and reliability-centered maintenance), improving system architectures to further reduce the criticality of individual components, better simulating high-impact events, and considering the criticality of the grid’s underlying cyber infrastructure. Further work can be done in the area of real-time operations to enhance resilience. This includes improving situational awareness in the control room, with a focus on severe events and an inclusion of the cyber infrastructure, adding more wide-area monitoring and control, and developing control systems that better tolerate both accidental faults and malicious attacks. Finally, there is a need to deal with myriad regulatory entities and incentives to fund resilience investments.
Mitigating the Impacts of Disruption
While large failures of the bulk power system are rare, some will occur, and restoration can take a long time. It is essential that society prepare for periods of prolonged outage, because many vital public infrastructures—such as heating and cooling, water and sewage pumping, traffic control, financial systems, and many aspects of emergency response and public security—depend on the electric power supply. These issues are explored in Chapter 5. The effects of power outages vary with weather, for different types and locations of users, and over different durations. A central theme of this report is the need to improve how different elements of society perform the difficult task of imagining
the diverse consequences of prolonged power outages. Also important is to ensure that equipment that has been purchased or contracted for backup power supply will be available and reliable when needed.
Recovering from and Learning after Disruption
After the bulk power system has failed, first responders, utilities, and public agencies must work together to restore service. Recovery involves coordinated activity on the physical side—for example, repairing, replacing, and reconfiguring the hardware of the grid—as well as a variety of activities to rebuild the cyber and industrial control systems. These issues are the focus of Chapter 6. Effective restoration must begin well before the disaster through numerous preparatory activities, including drills and stockpiling of key equipment. Utilities and other electric service personnel must think about how they will assess damage, plan restoration, and marshal and deploy the necessary resources. This is complicated by the fact that restoration processes are starkly different depending on the nature of the event. The keys to restoration are to envision a broad range of threats, work through failure scenarios, plan, and rehearse. Regardless of the cause of the outage, restoration always involves agility, collaboration and communications across multiple institutions, and an understanding of the state of the grid and its supporting systems. Technical readiness is the ultimate determinant of the ability to restore, but technical readiness rests firmly on organizational readiness. A process of continual learning and improvement, informed by detailed incident investigations following large outages, is essential for enhancing the resilience of the grid.
OVERARCHING INSIGHTS AND RECOMMENDATIONS
No single entity is responsible for, or has the authority to implement, a comprehensive approach to assure the resilience of the nation’s electricity system. Because most parties are preoccupied dealing with short-term issues, they neither have the time to think systematically about what could happen in the event of a large-area, long-duration blackout, nor adequately consider the consequences of large-area, long-duration blackouts in their operational and other planning or in setting research and development priorities. Hence the United States needs a process to help all parties better envision the consequences of low-probability but high-impact events precipitated by the causes outlined in Chapter 3 and the system-wide effects discussed in Chapter 5. The specific recommendations addressed to particular parties that are provided throughout the report (especially in Chapters 4 through 6) will incrementally advance the cause of resilience. However, these alone will be insufficient unless the nation is able to adopt a more integrated perspective at the same time. Hence, in addition to the report’s specific recommendations, the committee provides a series of overarching recommendations.
One of the best ways to make sure that things already in place will work when they are needed is to conduct drills with other critical infrastructure operators through large-scale, multisector exercises. Such exercises can help illuminate areas where improvements in processes and technologies can substantively enhance the resilience of the nation’s critical infrastructure.
Overarching Recommendation 1: Operators of the electricity system, including regional transmission organizations, investor-owned utilities, cooperatives, and municipally owned utilities, should work individually and collectively, in cooperation with the Electricity Subsector Coordinating Council, regional and state authorities, the Federal Energy Regulatory Commission, and the North American Electric Reliability Corporation, to conduct more regional emergency preparedness exercises that simulate accidental failures, physical and cyber attacks, and other impairments that result in large-scale loss of power and/or other critical infrastructure sectors—especially communication, water, and natural gas. Counterparts from other critical infrastructure sections should be involved, as well as state, local, and regional emergency management offices.
The challenges that remain to achieving grid resilience are so great that they cannot be achieved by research- or operations-related activities alone. While new technologies and strategies can improve the resilience of the power system, many existing technologies that show promise have yet to be fully adopted or implemented. In addition, more coordination between research and implementation activities is needed, building on the specific recommendations made throughout this report. Immediate action is needed both to implement available technological and operational changes and to continue to support the development of new technologies and strategies.
Overarching Recommendation 2: Operators of the electricity system, including regional transmission organizations, investor-owned utilities, cooperatives, and municipals, should work individually and collectively to more rapidly implement resilience-enhancing technical capabilities and operational strategies that are available today and to speed the adoption of new capabilities and strategies as they become available.
The Department of Energy (DOE) is the federal entity with a mission to focus on the longer-term issues of
developing and promulgating technologies and strategies to increase the resilience and modernization of the electric grid.1 No other entity in the United States has the mission to support such work, which is critical as the electricity system goes through the transformational changes described in this report. The committee views research, development, and demonstration activities that support reliable and resilient electricity systems to constitute a public good. If funding is not provided by the federal government, the committee is concerned that this gap would not be filled either by states or by the private sector. In part this is because the challenges and solutions to ensuring grid resilience are complex, span state and even national boundaries, and occur on time scales that do not align with business models. At present, two offices within DOE have responsibility for issues directly and indirectly related to grid modernization and resilience.
Overarching Recommendation 3: However the Department of Energy chooses to organize its programs going forward, Congress and the Department of Energy leadership should sustain and expand the substantive areas of research, development, and demonstration that are now being undertaken by the Department of Energy’s Office of Electricity Delivery and Energy Reliability and Office of Energy Efficiency and Renewable Energy, with respect to grid modernization and systems integration, with the explicit intention of improving the resilience of the U.S. power grid. Field demonstrations of physical and cyber improvements that could subsequently lead to widespread deployment are critically important. The Department of Energy should collaborate with parties in the private sector and in states and localities to jointly plan for and support such demonstrations. Department of Energy efforts should include engagement with key stakeholders in emergency response to build and disseminate best practices across the industry.
The U.S. grid remains vulnerable to natural disasters, physical and cyber attacks, and other accidental failures.
Overarching Recommendation 4: Through public and private means, the United States should substantially increase the resources committed to the physical components needed to ensure that critical electric infrastructure is robust and that society is able to cope when the grid fails. Some of this investment should focus on making the existing infrastructure more resilient and easier to repair, including the following:
- The Department of Energy should launch a program to manufacture and deploy flexible and transportable three-phase recovery transformer sets that can be pre-positioned around the country.2 These recovery transformers should be easy to install and use temporarily until conventional transformer replacements are available. This effort should produce sufficient numbers (on the order of tens compared to the three produced by the Department of Homeland Security’s RecX program) to provide some practical protection in the case of an event that results in the loss of a number of high-voltage transformers. This effort should complement, instead of replace, ongoing initiatives related to spare transformers.
- State and federal regulatory commissions and regional transmission organizations should then evaluate whether grids under their supervision need additional pre-positioned replacements for critical assets that can help accelerate orderly restoration of grid service after failure.
- Public and private parties should expand efforts to improve their ability to maintain and restore critical services—such as power for hospitals, first responders, water supply and sewage systems, and communication systems.3
- The Department of Energy, the Department of Homeland Security, the Electricity Subsector Coordinating Council, and other federal organizations, such as the U.S. Army Corps of Engineers, should oversee the development of more reliable inventories of backup power needs and capabilities (e.g., the U.S. Army Corps of Engineers’ mobile generator fleet), including fuel supplies. They should also “stress test” existing supply contracts for equipment and fuel supply that are widely used in place of actual physical assets in order to be certain these arrangements will function in times of major extended outages. Although the federal government cannot provide backup power equipment to everyone affected by a large-scale outage, these
1 The Department of Homeland Security, the Federal Energy Regulatory Commission, and other organizations also provide critical support and have primacy in certain areas.
2 As noted in Chapters 6 and 7, the Department of Energy’s Office of Electricity Delivery and Energy Reliability is supporting the development of a new generation of high-voltage transformers that will use power electronics to adjust their electrical properties and hence can be deployed in a wider range of settings. The committee’s recommendation to manufacture recovery transformers is not intended to replace that longer-term effort. However, the Department of Energy’s new advanced transformer designs will not be available for some time; in the meantime, the system remains physically vulnerable. While in Chapter 6 the committee notes several government and industry-led transformer sharing and recovery programs, it recognizes that high-voltage transformers represent one of the grid’s most vulnerable components deserving of further efforts.
3 In addition to treatment, sewage systems often need to pump uphill. A loss of power can quickly lead to sewage backups. Notably, a high percentage of the hospital backup generators in New York City failed during Superstorm Sandy.
resources could make significant contributions at select critical loads.
In addition to providing redundancy of critical assets, transmission and distribution system resilience demands the ability to provide rapid response to events that impair the ability of the power system to perform its function. These events include deliberate attacks on and accidental failures of the infrastructure itself, as well as other causes of grid failure, which are discussed in Chapter 3.
Overarching Recommendation 5: The Department of Energy, together with the Department of Homeland Security, academic research teams, the national laboratories, and companies in the private sector, should carry out a program of research, development, and demonstration activities to improve the security and resilience of cyber monitoring and controls systems, including the following:
- Continuous collection of diverse (cyber and physical) sensor data;
- Fusion of sensor data with other intelligence information to diagnose the cause of the impairment (cyber or physical);
- Visualization techniques needed to allow operators and engineers to maintain situational awareness;
- Analytics (including machine learning, data mining, game theory, and other artificial intelligence-based techniques) to generate real-time recommendations for actions that should be taken in response to the diagnosed attacks, failures, or other impairments;
- Restoration of control system and power delivery functionality and cyber and physical operational data in response to the impairment; and
- Creation of post-event tools for detection, analysis, and restoration to complement event prevention tools.
Because no single entity is in charge of planning the evolution of the grid, there is a risk that society may not adequately anticipate and address many elements of grid reliability and resilience and that the risks of this system-wide failure in preparedness will grow as the structure of the power industry becomes more atomized and complex. There are many opportunities for federal leadership in anticipating potential system vulnerabilities at a national level, but national solutions are then refined in light of local and regional circumstances. Doing this requires a multistep process, the first of which is to anticipate the myriad ways in which the system might be disrupted and the many social, economic, and other consequences of such disruptions. The second is to envision the range of technological and organizational innovations that are affecting the industry (e.g., distributed generation and storage) and how such developments may affect the system’s reliability and resilience. The third is to figure out what upgrades should be made and how to cover their costs. For simplicity, the committee will refer to this as a “visioning process.” While the Department of Homeland Security (DHS) has overarching responsibility for infrastructure protection, DOE, as the sector-specific agency for energy infrastructure, has a legal mandate and the deep technical expertise to work on such issues.
Overarching Recommendation 6: The Department of Energy and the Department of Homeland Security should jointly establish and support a “visioning” process with the objective of systematically imagining and assessing plausible large-area, long-duration grid disruptions that could have major economic, social, and other adverse consequences, focusing on those that could have impacts related to U.S. dependence on vital public infrastructures and services provided by the grid.
Because it is inherently difficult to imagine systematically things that have not happened (Fischhoff et al., 1978; Kahneman, 2011), exercises in envisioning benefit from having multiple groups perform such work independently. For example, such a visioning process might be accomplished through the creation of two small national power system resilience assessment groups (possibly at DOE national laboratories and/or other federally funded research and development centers or research universities). However such visioning is accomplished, engagement from staff representing relevant state and federal agencies is essential in helping to frame and inform the work. These efforts can build on the detailed recommendations in this report to identify technical and organizational strategies that increase electricity system resilience in numerous threat scenarios and to assess the costs and financing mechanisms to implement the proposed strategies. Attention is needed not just to the average economy-wide costs and benefits, but also to the distribution of these across different levels of income and vulnerability. It is important that these teams work to identify common elements in terms of hazards and solutions so as to move past a hazard-by-hazard approach to a more systems-oriented strategy. Producing useful insights from this process will require mechanisms to help these groups identify areas of overlap while also characterizing the areas of disagreement. A consensus view could be much less helpful than a mapping of uncertainties that can help other actors—for example, state regulatory commissions and first responders—understand the areas of deeper unknowns.
Of course national laboratories, other federally funded research and development centers, and research universities do not operate or regulate the power system. At the national level, the Federal Energy Regulatory Commission (FERC) and the North American Electric Reliability Corporation (NERC) both have relevant responsibilities and authorities.
Overarching Recommendation 7A: The Federal Energy Regulatory Commission and the North American Electric Reliability Corporation should establish small system resilience groups, informed by the work of the Department of Energy/Department of Homeland Security “visioning” process, to assess and, as needed, to mandate strategies designed to increase the resilience of the U.S. bulk electricity system. By focusing on the crosscutting impacts of hazards on interdependent critical infrastructures, one objective of these groups would be to complement and enhance existing efforts across relevant organizations.
As the discussions throughout this report make clear, many different organizations are involved in planning, operating, and regulating the grid at the local and regional levels. By design and of necessity in our constitutional democracy, making decisions about resilience is an inherently political process. Ultimately the choice of how much resilience our society should and will buy must be a collective social judgment. It is unrealistic to expect firms to make investments voluntarily whose benefits may not accrue to shareholders within the relevant commercial lifetime for evaluating projects. Moreover, much of the benefit from avoiding such events, should they occur, will not accrue to the individual firms that invest in these capabilities. Rather, the benefits are diffused more broadly across multiple industries and society as a whole, and many of the decisions must occur on a state-by-state basis.
Overarching Recommendation 7B: The National Association of Regulatory Utility Commissioners should work with the National Association of State Energy Officials to create a committee to provide guidance to state regulators on how best to respond to identified local and regional power system-related vulnerabilities. The work of this committee should be informed by the national “visioning” process, as well as by the work of other research organizations. The mission of this committee should be to develop guidance for, and provide technical and institutional support to, state commissions to help them to more systematically address broad issues of power system resilience, including decisions as to what upgrades are desirable and how to pay for them. Guidance developed through this process should be shared with appropriate representatives from the American Public Power Association and the National Rural Electric Cooperative Association.
Overarching Recommendation 7C: Each state public utility commission and state energy office, working with the National Association of Regulatory Utility Commissioners, the National Association of State Energy Officials, and state and regional grid operators and emergency preparedness organizations, should establish a standing capability to identify vulnerabilities, identify strategies to reduce local vulnerabilities, develop strategies to cover costs of needed upgrades, and help the public to become better prepared for extended outages. In addition, they should encourage local and regional governments to conduct assessments of their potential vulnerabilities in the event of large-area, long-duration blackouts and to develop strategies to improve their preparedness.
Throughout this report, the committee has laid out a wide range of actions that different parties might undertake to improve the resilience of the United States power system. If the approaches the committee has outlined can be implemented, they will represent a most valuable contribution. At the same time, the committee is aware that the benefits of such actions—avoiding large-scale harms that are rarely observed—are easily eclipsed by the more tangible daily challenges, pressures on budgets, public attention, and other scarce resources. Too often in the past, the United States has made progress on the issue of resilience by “muddling through” (Lindblom, 1959). Even if the broad systematic approach outlined in this report cannot be fully implemented immediately, it is important that relevant organizations develop analogous strategies so that when a policy window opens in the aftermath of a major disruption, well-conceived solutions are readily available for implementation (Kingdon, 1984).
The committee assessed potential threats to the grid, and the conditions on the grid, and provides findings and recommendations throughout the report. In Chapter 7, these specific recommendations are summarized and sorted in terms of the issues they address and the entities to which they are directed. The high-level descriptions of each are listed below. The specific actions that should be taken to implement each one are laid out in Chapter 7.
Recommendation 1 to DOE: Improve understanding of customer and societal value associated with increased resilience and review and operationalize metrics for resilience. (Recommendations 2.1 and 2.2)
Recommendation 2 to DOE: Support research, development, and demonstration activities to improve the resilience of power system operations and recovery by reducing barriers to adoption of innovative technologies and operational strategies. (Recommendations 4.1, 4.6, 6.5, and 6.7)
Recommendation 3 to DOE: Advance the safe and effective development of distributed energy resources and microgrids. (Recommendations 4.2, 5.6, 5.12, and 6.3)
Recommendation 4 to DOE: Work to improve the ability to use computers, software, and simulation to research, plan, and operate the power system to increase resilience. (Recommendations 4.3, 4.4, 4.8, 4.9, and 6.12)
Recommendation 5 to DOE: Work to improve the cybersecurity and cyber resilience of the grid. (Recommendations 4.10 and 6.8)
Recommendation 6 to the electric power sector and DOE: The owners and operators of electricity infrastructure should work closely with DOE in systematically reviewing previous outages and demonstrating technologies, operational arrangements, and exercises that increase the resilience of the grid. (Recommendations 4.5, 5.10, 6.2, 6.4, and 6.14)
Recommendation 7 to DHS and DOE: Work collaboratively to improve preparation for, emergency response to, and recovery from large-area, long-duration blackouts. (Recommendations 3.2, 5.3, 5.5, 6.1, 6.6, and 6.9)
Recommendation 8 to DHS and DOE: With growing awareness of the electricity system as a potential target for malicious attacks using both physical and cyber means, DHS and DOE should work closely with operating utilities and other relevant stakeholders to improve physical and cyber security and resilience. (Recommendations 3.1, 6.10, 6.11, and 6.13)
Recommendation 9 to state offices and regulators: Work with local utilities and relevant stakeholders to assess readiness of backup power systems and develop strategies to increase investments in resilience enhancing technologies. (Recommendations 5.1, 5.7, 5.9, and 5.11)
Recommendation 10 to the National Association of Regulatory Utility Commissioners and federal organizations: Work with DHS and DOE to develop guidance regarding potential social equity implications of resilience investments as well as selective restoration. (Recommendations 5.2, 5.4, and 5.8)
Recommendation 11 to FERC and the North American Energy Standards Board: FERC, which has regulatory authority over both natural gas and electricity systems, should address the growing risk of interdependent infrastructure. (Recommendation 4.7)
Recommendation 12 to NERC: Review and improve incident investigation processes to better learn from outages that happen and broadly disseminate findings and best practices. (Recommendation 6.15)
Fischhoff, B., P. Slovic, and S. Lichtenstein. 1978. Fault trees: Sensitivity of estimated failure probabilities to problem representation. Journal of Experimental Psychology: Human Perception and Performance 4: 342–355.
Kahneman, D. 2011. Thinking Fast and Slow. New York: Farrar, Straus, and Giroux.
Kingdon, J.W. 1984. Agendas, Alternatives, and Public Policies. Boston: Little, Brown, and Company.
Lindblom, C.E. 1959. The science of muddling through. Public Administration Review 19(2): 79–88.