This chapter discusses the post-event system restoration and the learning phases of the resilience model laid out in Figure 1.2. The committee first introduces a general model for electricity system restoration after a large-area, long-duration outage and then discusses restoration for several classes of disruptions based on the type of damage caused. This organization is based on the recognition that restoration activities proceed differently based on different types of outages—following some events, utility operators will have no situational awareness to guide their deployments; whereas other events may leave monitoring systems intact but overwhelm stockpiled resources. The chapter includes recommendations for improving the restoration process and for improving post-incident investigation to better learn from each experience to improve future performance.
Following a large-area, long-duration outage, electricity system operators set priorities and work across organizational boundaries to bring the system back online as quickly as possible through a series of restoration activities. While the exact steps and procedures for restoration vary depending on the nature of the outage and the damage incurred, electricity providers follow four general restoration steps:
- Assess the extent, locations, and severity of damage to the electricity system;
- Provide the physical and human resources required for repairs;
- Prioritize sites/components for repair based on factors including the criticality of the load and the availability of resources to complete the needed repairs; and
- Implement the needed repairs and reassess system state.
As shown in Figure 6.1, these general processes are carried out simultaneously by different organizations operating at different scales across all elements of the power system. Many of these organizations have their own restoration plans, spanning those from individual distribution cooperatives such as Cuivre River Electric Cooperative in Missouri (CREC, 2016), to large investor-owned utilities such as New York State Electric and Gas Corporation and Rochester Gas and Electric Corporation (NYSEG and RGEC, 2016), to independent system operators such as PJM (2016). Organizations frequently involved in electricity restoration include not only electricity system operators (i.e., distribution, transmission, and generation utilities and independent system operators), but also emergency management officials from city, county, state, and federal organizations, including the Federal Emergency Management Agency (FEMA), the Department of Energy (DOE), state emergency management agencies, the National Guard, and in some cases even the Department of Defense. Depending on the circumstances, organizations that operate far afield of the utility sector may be called on when they offer special capabilities—for example, the deployment of the U.S. Air Force to transport bucket trucks by air from California to New York in response to Superstorm Sandy. Effective restoration rests on the collaboration and cooperation of myriad organizations and individuals of different skills. Various mutual assistance agreements provide additional resources to extend the reach of the restoration across geographic and organizational boundaries. The restoration work itself is dependent on the skills and resources of the line and electrician crews deployed by the local utilities.
Coordination and communication among these groups is challenging, in part because each group has different responsibilities and boundaries within which it operates. Knowledge of local conditions and needs is greatest at the site level and diminishes with increasing scale, whereas understanding of systemic risks and critical needs may be greater at the regional scale. Thus, information must flow in both directions, and, while prior agreements can help considerably, communication channels specific to the actors and hazards involved are often established in an ad hoc manner.
These communications must be agile and flexible, evolving in response to changing conditions and the shifting composition of the restoration team. Communication is partly a technical issue and partly an organizational issue—for example, determining who should have access to information. In recent storms such as Superstorm Sandy, coordinating the dispatch and routing of crews through damaged and flooded areas was a challenge, and crews were sometimes delayed because they could not reach affected areas.
Beyond identifying a specific threat to the electricity system, key utility CEOs and federal decision makers meet through the Electricity Subsector Coordinating Council to plan for national-level incidents and maintain open communication channels (ESCC, 2016). This lays a good foundation for restoration activities, but an agile approach is necessary to deal with specific circumstances. Exercises are critical, although exercises alone will not address an actual event in all regards. Nonetheless, practice and associated learning will improve reactions during actual response.
During a major disaster, the states coordinate all first responder and restoration activities. For large incidents, when federal resources are warranted and mobilized, the National Response Framework provides the organizational structure, FEMA coordinates federal assets, and DOE is appointed the energy-sector lead agency (DHS, 2016). In preparation for or response to major outages, DOE will staff local and headquarters operations centers to coordinate federal actions that expedite electricity system restoration, working closely with the electricity organizations involved and other responders. Examples of DOE action include waiving federal transportation regulations on the time trucks can drive continuously so as to bring necessary equipment to the affected area more rapidly.
When a physical disruption of the power system occurs, it is important that utility repair crews be able to gain rapid access to damaged substations and other facilities so they can safely isolate and de-energize hazardous components, retain and gain access to emergency communication equipment and supplies, promptly assess damage, and start the process of restoration. In that context, the issue of working with law enforcement to gain access becomes critical, both for reasons of safety and because supplying power can be a key component of disaster recovery and avoiding further risks and damages.
One possible strategy could be to designate selected utility personnel as “first responders.” While there have been efforts to move in this direction, they have become stalled because doing so could raise potential issues of liability, perhaps placing crews under state control or even requiring crews to divert their efforts away from electricity-related activities. The Edison Electric Institute (EEI) and others have been working at high levels to reach informal agreements
about achieving access. One problem with such an informal approach is that, without official credentialing, other first responders on the ground may not be aware of such arrangements and serious delays in access can occur. The situation could become even more complicated in the event of a major terrorist attack on substations or other critical grid facilities that might be designated as “crime scenes.” A similar situation could arise in the wake of a cyber attack where affected systems might be considered evidence.
Finding: When major physical damage occurs in the power grid, it is important that utility repair crews be able to gain rapid access. Due to a lack of standing arrangements with law enforcement and other first responders, this is not always possible; informal high-level agreements about access do not always result in smooth operations among key personnel on the ground.
Recommendation 6.1: The Department of Homeland Security in collaboration with the Department of Energy should redouble efforts to work with utilities and national, state, and local law enforcement to develop formal arrangements (such as designating selected utility personnel as “first responders”) that credential selected utility personnel to allow prompt utility access to damaged facilities across jurisdictional boundaries. Such agreements should address issues such as indemnity, liability, and the risk of diverting the mission and assets of utility crews to other non-power system objectives.
Utility Planning for Restoration from Major Disruptions
Utilities are well practiced at recovering from localized damage to the grid and helping to restore the system outside their service areas following large events. From line crews to executives, utilities are familiar with recovery from regional natural hazards; they have developed restoration plans and allocated resources for recovery operations. Some utilities equip bucket trucks with mobile generators and communications equipment that allow line crews to maintain contact and proceed with repairs even when the bulk grid and communications infrastructures are down. When damages to the physical system exceed the hardware or human resources of a single utility, mutual assistance agreements (MAAs) are used widely throughout the industry to expedite sharing of crews and equipment among utilities. For larger events, crews and equipment are often brought in from thousands of miles away to aid restoration efforts in affected areas. Following Superstorm Sandy, the EEI developed a National Response Event framework for coordinating regional MAAs across the United States (EEI, 2016). Although the National Response Event framework has not yet been tested, it is designed to help prioritize and expedite dispatch of line crews and resources on a national scale with a comprehensive understanding of damages and restoration efforts.
Utility restoration plans emphasize advanced planning, communication, training, and continual refinement and improvement. Restoration plans are drilled by utilities and externally reviewed by the North American Electric Reliability Corporation (NERC), the Federal Energy Regulatory Commission (FERC), and regional reliability organizations. One recent voluntary review found that participating organizations maintained system restoration plans that were thorough and highly detailed; however, opportunities for improvement remain (NERC, 2016a). For example, restoration plans may make key assumptions about the availability of certain assets (e.g., that a pre-identified black-start transmission corridor is operational) that, depending on the extent of damage, may not hold true.
Depending on the hazard, it may be possible for utilities to strategically deploy assets and for state and federal agencies to be mobilized in advance of the event. For example, utilities operating along the Gulf Coast have a long history of anticipating and recovering from large storms that cause extensive damage, and their restoration plans and activities reflect this history. In the week before Hurricane Katrina, Southern Company and its operating subsidiaries in Mississippi and Alabama spent more than $7 million pre-staging personnel and supplies, including catering and amenities for restoration workers, many of whose families were directly impacted by the storm (Ball, 2006). The arrival of Superstorm Sandy was preceded by a large mobilization of assets by utilities and the federal government (Fugate, 2012; Lacey, 2014). Vermont Electric Power Company’s Weather Analytics Center provides highly accurate weather forecasts that the utility uses to pre-position restoration crews and assets (NASEM, 2016). Developing additional technologies and strategies to improve pre-positioning of restoration assets remains an important area for additional effort.
The process of electricity system restoration begins long before a specific event or threat is identified, through extensive planning, training, drilling, and pre-positioning of assets, and continues after all service has been restored, through continual refinement of a utility’s restoration plans. Fundamental to all restoration planning is an unresolvable uncertainty: the exact nature of damage cannot be known before an event occurs, and restoration plans must simultaneously be specific and actionable for utility personnel yet general enough to accommodate diverse potential scenarios. Thus there is no uniform, repeatable process for restoration that extends beyond a single event. There are many post-action reports from major outages that describe the event, how it was addressed by whom, and lessons learned. By systematically evaluating previous experiences and more openly sharing information about recovery from major outages, utilities have an opportunity to identify and share best practices. While such analysis is conducted on behalf of transmission utilities at the North American Transmission Forum, these assessments do not cover distribution utilities.
Recommendation 6.2: With support and encouragement from relevant state and federal regulatory agencies, the Department of Energy and utilities should continue to work together to analyze past large-area, long-duration outages to identify common elements and processes for system restoration and define best practices that can be shared broadly throughout the electricity industry. The committee notes that progress has been made with the ongoing efforts of the Electricity Subsector Coordinating Council, which provides a good framework for expanded coordination and sharing of best practices.
Black-Start Recovery Plans
Large generation and transmission operators maintain restoration and recovery plans for energizing the high-voltage transmission system following a large-area, long-duration outage. Most generation facilities require electricity for operation, so if generators have gone off-line, these plans begin by starting selected “black-start” generators that do not require power from the larger grid to function. There are almost always functioning areas of the grid adjacent to the area experiencing an outage, and service can be most effectively restored from the edges of the blacked-out areas. If this is not the case, then black-start generators must first supply power to nuclear plants for safe shutdown before providing power to other generating stations. While black-start plans are difficult or impossible to practice (because doing so would require shutting down the grid), restoration plans provide detailed information on black-start resources in a utility’s service area, identify the priority loads and transmission corridors that the utility will bring power to first, and provide operators with key contact information. The priority loads for restoring the electricity system are other non-black-start generation plants—particularly nuclear plants that require external power—as well as natural gas pumping stations that maintain pressure in pipelines and provide fuel for natural gas generators to come online.
As generators and transmission corridors become energized, power is provided to distribution circuits—with priority given to known critical loads such as hospitals and repairs that restore service to the most customers. As restoration progresses, more generators are connected and resynchronized until service is restored to more loads. In some cases, this restoration may involve forming “islands” of electrical service: multiple smaller regions maintain balance of generation and load independent of the remaining grid and are then subsequently synchronized to the remaining system (PJM, 2016). Depending on how quickly generators are restored, some low-priority loads may need to remain off-line as the electricity providers will ration available supply to meet prioritized demand requirements. The time required to complete this process depends significantly on the damage to the infrastructure, the amount of data and information available, and the availability of restoration resources.
The Electric Power Research Institute (EPRI) has developed generic restoration milestones as well as a comprehensive methodology for power system restoration based on these milestones. It is also developing and demonstrating a prototype decision support tool for evaluating system restoration strategies (EPRI, 2010). The Optimal Black-Start Capability tool can be used by utilities to evaluate the suitability of available black-start capable units and plan optimal locations and capacity levels for new black-start units.
The restoration process is highly dependent on the topology of the transmission and distribution networks, which determine the sequence of restoration starting from the black-start generators. If in the future the generation resources are more decentralized and placed on the distribution feeders, the topology of the grid, and hence the restoration process, becomes more complex. However, the smaller generation resources closer to the loads can make the generation-load balance easier during restoration, provided that these generators (and even responsive loads) have adequate controllability. With the higher penetrations of distributed energy resources (DERs), the restoration process will need to be rethought.
Opportunities to Include Distributed Energy Resources in Restoration and Black Start
Traditionally, black-start plans have focused entirely on large, centralized utility generation assets. As the grid evolves to include larger amounts of DERs more broadly, it becomes important to consider the role these resources might play in the context of black start. The benefits and impacts of DERs will vary by geographic region because some distribution utilities have a higher penetration of DER assets than other areas. Additionally, some distributed generation and other assets are monitored and controlled by third-party entities other than the utility or grid operator because state policies do not allow these utilities to operate behind the meter. At low levels of penetration, DERs should simply be operated in ways that do not interfere with any needed black-start operations. As noted in Chapter 5, with appropriate system upgrades and institutional arrangements, microgrids and DERs could provide islands of power during outages; they could also provide local generation for utilities to restore from the distribution system outwards by connecting such small islands, as opposed to bringing power in from the bulk power system. While it may be possible to configure such resources to speed the process of supplying power to some priority loads, that would also unburden the primary black-start restoration process. At high levels of penetration, there may be an opportunity to factor DERs into black-start restoration plans. For example, multiple islands in the system formed by microgrids could be connected to form larger islands. Doing that might give the utilities more assets and more flexibility in their black-start planning.
Finding: The presence of a significant amount of DERs could provide a limited amount of local power during outages and could also be factored into black-start and
emergency planning if appropriate system upgrades have been made and utility operators have visibility into their operating status and controllability of their performance.
Recommendation 6.3: The Department of Energy and utilities should evaluate the technical and contractual requirements for using distributed energy resources as part of restoration activities, even when these assets are not owned by the utility, to improve restoration and overall resilience. Emergency management and restoration plans should include the owners of distributed energy resource assets, including owners with generation, storage, or load-control capabilities.
Monitoring and Control
The monitoring and control of the power grid is accomplished through the supervisory control and data acquisition (SCADA) system and other supporting technologies, as described in previous chapters. At the control center, software tools aggregate diverse data to provide situational awareness and support operator decision making (e.g., energy management systems [EMS] on the transmission system and distribution management systems [DMS] on the distribution side). These systems gather measurement data from sensors deployed throughout the transmission and distribution systems and send out control signals. Additional sensor technologies exist for monitoring the health of circuits and components during and after restoration, which can confirm to repair crews that damage has been corrected; however, to the committee’s knowledge, these have not been licensed or developed as commercial products. SCADA systems utilize robust, low-latency communications and are extremely helpful in assessing the state of damage to the system and identifying the centralized and distributed resources available for restoration. The communication networks enabling this monitoring and control are often dedicated infrastructure under the direct jurisdiction of the operating entity but are sometimes leased or provisioned by third parties.
DERs could also be monitored and controlled using the same SCADA system, in which case it would be easier for the DER to assist with restoration activities. If the DER is dispatched through a different monitoring and control communications infrastructure, it may be more difficult to provide restoration services due to the complications of coordinating among different systems. After a major disturbance, the status of the DERs, as well as the rest of the grid components, can only be known if the sensors and communication networks are not damaged or shut down by the disturbance. Electric power operators must restore power control systems and supporting communications systems concurrently with, and as an integral part of, grid restoration. Restoration of control systems and their associated communications infrastructure must remain an integral part of resilience planning.
Recovery Depends on the Type of Damage
Beyond the generalized description of the recovery process, the details of restoration activities can be very different for different types of events and resulting damage. For example, a cascading blackout can cause a large area to lose power, but recovery may be relatively rapid and straightforward if no significant physical damage has been done to system components. Likewise, restoration—and specifically damage assessment—is considerably easier when the grid’s cyber monitoring and control systems are intact and operational, compared to a potential cyber attack that diminishes a utility’s situational awareness. In contrast, a strong, slow-moving hurricane can cause destruction and flooding over hundreds of square miles of coastal community, making post-event access very difficult. The following sections describe opportunities to improve recovery to outages with different types of damage, as categorized in Figure 3.2.
Perhaps the most difficult disruptions to recover from are those that simultaneously cause damage to the physical components of the electricity system, the cyber monitoring and control systems, and critical supporting infrastructure. Damages of this sort can result from major natural disasters such as hurricanes and tropical storms, floods, winter storms, and earthquakes. Table 6A.1 provides details for each of these hazards in terms of the six stages of the outage life cycle—plan, prepare, event, assess, restore, and recover. Table 6A.2 lists two additional events, tornado and geomagnetic disturbances (space weather), that can also cause widespread damage.
While all of these events involve physical damage to the power system, there can be considerable variation in the extent of damage to other supporting infrastructures and the community. For example, damage from a major hurricane is typically widespread, inflicted on transportation and other critical infrastructures, and can greatly diminish local electricity consumption. In contrast, as Table 6.A1 notes, the spatial extent of damage from flooding depends significantly on local topology: in some cases much of the community may be unaffected, whereas communities and infrastructure in flat and low-lying terrains may be entirely destroyed. Clearly these two situations result in dramatically different restoration environments. Restoring a system from nearby dry ground that has all facilities intact and working is far easier than operating in an environment where everything for miles around has been submerged. Utilities generally know what sort of circumstance they will face in the event of a disaster and plan accordingly.
In some situations, there is sufficient warning time to assess whether critical system components will be at risk and, when possible, take preventative actions. While utilities
strive to maintain electrical service at all times, sometimes taking steps that will speed recovery after an inevitable outage should take precedence over keeping power on as long as possible before an outage. For example, a utility will know which substations are exposed to high flood risk and may preemptively power down certain parts of the system to prevent more substantial damage from flooding energized facilities. There are circumstances in which de-energizing vulnerable components before an event occurs could better protect them from damage and make recovery much faster.
Recommendation 6.4: Electric service providers should identify those components and corresponding events for which pre-event de-energizing of selected assets is the lowest risk strategy and develop regulatory, communication (especially with customers), and other plans that allow such protective action to be implemented.
Assessing System Damage
As Figure 6.1 notes, the first step in restoration is to assess the state of the system. Where the monitoring and control system is still operating, it can be used to perform a rapid assessment. More monitoring and control is available at the transmission level, but SCADA at the distribution level is also being deployed, driven in part by the increase in DERs and other advanced technologies. This monitoring is also extending to the customer level with advanced metering infrastructure (AMI) and distribution technologies. Rather than depending on customer phone calls, some outage management systems (OMSs) receive direct telemetry from AMI and other sensors to develop a comprehensive view of customer outages.
Where the communications network supporting the SCADA system or other measurement telemetry is damaged, the traditional strategy is to send crews out to do on-site inspections. At the transmission level, aircraft are often used to locate downed lines, towers, and other damage. Normally aircraft would be operating directly under the jurisdiction of the electricity utility operator, as their assets are also used for routine right-of-way patrols. If necessary, electricity operators are able to acquire additional aircraft through leasing or other arrangements. During large national-level events, other government agencies can provide aerial surveillance capabilities if they are not directly involved in search and rescue operations. The Civil Air Patrol,1 a civilian auxiliary of the U.S. Air Force, has also been leveraged to provide aerial photographic sorties following disasters.
A new option coming into serious consideration is the use of unmanned aerial vehicles (UAVs), commonly known as drones (Olearczyk, 2013; Miller et al., 2014). Such vehicles can systematically survey damage to a system using both visible light and infrared imagery. Some UAVs have a fixed-wing design, but others are more maneuverable and can hover over problem areas for a long duration. The results of UAV inspections will be most useful if a utility has previously built a geocoded baseline of its entire system. This allows new imagery to be compared with baseline imagery and combined with asset management tools and workforce management systems to establish and coordinate repair priorities and progress (Miller et al., 2014).
The operation of UAVs in the United States is under the jurisdiction of the Federal Aviation Administration (FAA), which has been adopting new rules governing the commercial application of UAVs. However, these regulations have not kept pace with the rapid technological advancement of these systems, and there remains uncertainty surrounding the viability of UAVs for this application. In July 2016, Congress passed the FAA Extension, Safety, and Security Act of 2016.2 Section 2207 of that law requires FAA, no later than 90 days after enactment, to “publish guidance for application for, and procedures for the processing of, on an emergency basis, exemptions or certificates of authorization or waiver for the use of unmanned aerial systems by civil or public operators in response to a catastrophe, disaster, or other emergency to facilitate emergency response operations, such as firefighting, search and rescue and utility and infrastructure restoration efforts.” As of this writing, that guidance has not yet been issued. A system that relies on temporary FAA authorization creates barriers to adopting this technology for electricity service restoration, since the capability to use UAVs for damage assessment needs to be developed, exercised, and refined in advance of a disaster rather than cultivated during the incident.
A continuing problem with the use of UAVs, both for post-disaster assessment as well as for routine surveillance and maintenance of transmission and distribution systems, has been the FAA restriction that such vehicles can only be used within the UAV pilot’s line of sight. In the event of a large-scale disaster, such a restriction seriously limits how useful UAVs can be. Several utilities have been experimenting with the use of UAVs and have obtained FAA 333 permits.3 Some limited use of UAVs for post-disaster surveillance has also
1 The Civil Air Patrol (CAP) is a congressionally chartered, federally supported non-profit corporation that serves as the official civilian auxiliary of the U.S. Air Force. CAP is a volunteer organization that performs three congressionally assigned key missions: emergency services (e.g., search and rescue and disaster relief operations), aerospace education for youth and the general public, and cadet programs for teenage youth. In addition, CAP has recently been tasked with homeland security and courier service missions. CAP also performs non-auxiliary missions for various governmental and private agencies, such as local law enforcement and the American Red Cross.
2 Public Law No. 114-190 (2016).
3 FAA Section 333 “grants the Secretary of Transportation the authority to determine whether an airworthiness certificate is required for a unmanned aircraft system to operate safely in the National Airspace System.” As of 2015, the number of FAA 333 exemption permits granted to Duke was 16; San Diego Gas & Electric was 8; Pacific Gas & Electric was 5; Southern Company was 4; and NextEra Energy was 4.
occurred under FAA Part 107 waivers following Hurricane Matthew, which aided in damage assessment and expedited recovery. However, both Section 333 and Part 107 permits require pilots to maintain line-of-sight operations, and any operation beyond line of sight requires additional FAA authorization. At the time of this writing, very few waivers for granting operation beyond line of sight have been granted, and these have been primarily to specialized testing and research organizations. While FAA can grant exceptions on an ad hoc basis, this takes time. It would be far better to have standing arrangements for the use of drones in emergency situations.
Recommendation 6.5: With convening support by the Department of Energy, the electricity industry should proactively engage the Federal Aviation Administration to ensure that the rules regulating unmanned aerial vehicle operation support the rapid, safe, and effective applications of unmanned aerial vehicle technology in electricity restoration activities, including pre-disaster tests and drills.
Data Fusion to Enhance Restoration Activities
In addition to the OMS that tracks customer outages and correlates these data with geospatial feeder data to determine where repair crews should be sent, other available data from various sources such as weather forecasts and news reports are being used to aid restoration activities (Figure 6.2). An area for research is the use of additional, underutilized information such as social media—Internet resources and social media are widely used to distribute information to consumers during a disaster. It is also possible to make use of information from consumers; however, systems are not generally in place to accomplish this. For example, during and immediately after Superstorm Sandy, many individuals sent images of downed lines, trees, and damaged equipment to utilities. If this information were automatically geotagged and time stamped, it could have provided valuable information to aid in restoration activities. Unfortunately, at the time, utilities struggled to make use of the information as it arrived in high volumes over non-traditional channels. Additionally, there was a need to ensure that public messaging was consistent,
such as continuing to advise the public never to approach downed electrical equipment.
Access to Replacement Parts, Particularly Large Transformers
While line crews are able to repair downed power lines, towers, and poles, and repair or replace low- and medium-voltage distribution transformers, damage to large substation equipment can be much more problematic. These substations contain high-voltage transformers, circuit breakers, and other large equipment that, if damaged, can be difficult and expensive to replace. Extra-high-voltage transformers (i.e., 345 kV and above) are especially problematic. These are large devices that are expensive, have long manufacturing lead times, and are hard to move. In many cases, the electrical properties of high-voltage transformers have been customized to fit the specific locations in which they are installed. It has long been understood that these transformers are an especially vulnerable element of the grid (OTA, 1990; NRC, 2012; DOE, 2015; Parfomak, 2014). While spare transformers can become a major issue in outage events that cause broad physical damage, they are especially important in the context of terrorist events where they could become the focal target of intentional attack. Indeed, as far back as 1990, the Office of Technology Assessment concluded that, if a terrorist group wanted to attack the U.S. power system, the obvious target would be a carefully selected set of high-voltage power transformers. Terrorism and the Electric Power Delivery System explained the following:
The large power transformers in generating station switch yards and major substations are vulnerable to terrorist attack and could take months or years to replace. Options for bypassing damaged substations to bring power from remote generating stations to load centers are very limited because the grid is already stressed during peak demand. The result of a coordinated attack on key substations could be rolling blackouts over a wide area until the substations are repaired. Under such conditions, the availability of compact easily transported recovery transformers would be invaluable (NRC, 2012).
The report went on to recommend that the Department of Homeland Security (DHS) cooperate with DOE to “complete the development and demonstration of high-voltage recovery transformers and develop plans for manufacturer storage and installation of these recovery transformers” (NRC, 2012). In a demonstration program called RecX (for “recovery transformer”), the DHS Science and Technology Directorate teamed with ABB and the power industry to manufacture three single-phase 345 kV transformers in St. Louis, Missouri, and move them to Houston, Texas, in March 2013 (Figure 6.3), where they were installed and operated in a substation. The entire move and installation was completed in less than 1 week (DHS, 2014).
Regulators, policy makers, and utilities recognize the need to stockpile spare equipment, especially large equipment that can be difficult and expensive to replace. As summarized in a recent Congressional Research Service report (Parfomak, 2014), the industry has made some progress in constructing a catalogue of spare high-voltage transformers. DOE recently released a request for information to gather input on setting up a national transformer reserve, and eight private energy companies have launched Grid Assurance,™ an independent company that will stockpile transformers
and other critical equipment.4 A central issue with respect to developing a stockpile of replacement transformers is how to cover the cost. The approach taken by Grid Assurance,™ in which participating utilities have helped finance the founding of the company, and in return the company will sell stockpiled equipment to participating utility companies who need them during emergencies, was recently given a boost when FERC allowed participating utilities to recover their costs associated with purchasing sparing service and spare equipment.
Given the inherent challenge to knowing in advance where the need might arise to replace multiple transformers, some argue that building a modest stockpile is a collective national asset that should be covered, or at least partly subsidized, with federal tax dollars. Congress is contemplating the creation of a national strategic transformer reserve (DOE, 2017). However, if federal resources are invested in building such a stockpile, clear policy must be developed to limit its use to well-specified disaster scenarios. Without such policy, there is a risk that industry could become overly reliant on the stockpiled equipment and reduce investment in its own spare equipment stockpiles and programs. Such an outcome could result in negligible net improvement of spare equipment capability for the nation, rather than just shifting from industry-purchased stockpiles to government-purchased stockpiles.
In its 2015 Quadrennial Energy Review (QER), DOE noted that “the use of smaller, less-efficient, temporary replacement transformers may be appropriate for emergency circumstances. In 2006, [EPRI] suggested building compact ‘restoration transformers’ that would fit on large cargo aircraft and trucks. Since then, DHS’s Recovery Transformer Program has developed and tested a flexible transformer that is transportable by truck [see Figure 6.3] and can be installed within several days of an incident. These technologies could help address logistical concerns with moving large transformers in the event of disruptions” (DOE, 2015). The QER concluded that high-voltage transformers “represent one of [the grid’s] most vulnerable components. Despite expanded efforts by industry and federal regulators, current programs to address the vulnerability may not be adequate to address the security and reliability concerns associated with simultaneous failures of multiple high-voltage transformers” (DOE, 2015). The 2017 QER also discusses this issue, noting the following:
There are currently three key industry-led, transformer-sharing programs in the United States—NERC’s Spare Equipment Database program, Edison Electric Institute’s Spare Transformer Equipment Program, and SpareConnect. Another program, Recovery Transformer, developed a rapidly deployable prototype transformer designed to replace the most common high-voltage transformers, which DHS successfully funded in partnership with Electric Power Research Institute and completed in 2014. . . . As of December 2016, three additional programs—Grid Assurance, Wattstock, and Regional Equipment Sharing for Transmission Outage Restoration (commonly referred to as RESTORE)—are in development. . . . In December 2015, Congress directed DOE to develop a plan to establish a strategic transformer reserve in consultation with various industry stakeholders in the FAST Act. To assess plan options, DOE commissioned Oak Ridge National Laboratory to perform a technical analysis that would provide data necessary to evaluate the need for and feasibility of a strategic transformer reserve. The objective of the study was to determine if, after a severe event, extensive damage to [large power transformers] and lack of adequate replacement LPTs would render the grid dysfunctional for an extended period (several months to years) until replacement LPTs could be manufactured. DOE’s recommendations will be published in the report to Congress in early 2017 (DOE, 2017).
Over the next two decades, the grid will see increasing use of solid-state transformers and other solid-state power electronics, though penetration at present is nascent. The durability and resilience of this technology will have to be established over time and restoration plans adjusted accordingly. Solid-state power electronics will offer greater operational flexibility than traditional technology, which may be useful when the grid is being operated in non-standard ways. This technology will likely see its first widespread use in lower-power distribution systems. Recently, DOE has been supporting the development of advanced designs for LPTs. Specifically, they have been working to do the following:
Stimulate innovative designs that promote greater standardization (i.e., commoditize LPTs) to increase grid resilience (i.e., faster recovery) in the event of the loss of one or more LPTs. To this end, new designs must maintain high efficiencies, have variable impedances, accommodate various high-side and low-side voltages, and be cost-effective compared to traditional LPTs. Projects would be expected to involve modeling, analyses, and exploratory research to assess the performance and economics of proposed designs (DOE, 2016). A critical value of [this] research, beyond the development of advanced designs, is increased standardization of components improving agile allocation during disasters (DOE, 2016).
The committee recommends a dual strategy: On the one hand, the nation should push forward to improving the availability of conventional and replacement transformers for use in the event of physical disruption. At the same time, DOE should continue to explore advanced LPT designs that, in the longer term, could lower cost and improve the efficacy of emergency replacements. The vulnerability to grid operation posed by accidental or intentional damage to high-voltage transformers has been understood for decades. While limited progress has been made to reduce this vulnerability, it continues to pose a serious risk to the power system.
Recommendation 6.6: The Department of Homeland Security, the Department of Energy, the U.S. Congress, and the power industry should be more aggressive in finding a way to address the issue of manufacturing and stockpiling flexible, high-voltage replacement transformers as an important component of infrastructure investment initiatives. If federal funds are used to help in doing this, policy will be needed to limit stockpile use to major disasters. Otherwise, utilities might face incentives to reduce their stockpiles for dealing with more routine events.
Finding: Development of innovative approaches for making LPTs with greater operational flexibility (e.g., variable impedances, accommodating multiple voltages) while maintaining high efficiency and cost effectiveness relative to traditional LPTs is promising. If such devices can be developed with standardized components, they could play an important role in expediting restoration of the grid when physical damage has occurred to LPTs.
Recommendation 6.7: The Department of Energy should continue to support research and development of advanced large power transformers, concentrating on moving beyond design studies to conduct several demonstration projects.
A second restoration case is recovery from damage to the cyber monitoring and control system as a result of a cyber attack that leads to a major service disruption. Restoration from such disruptions is structured around the process shown in Figure 6.4, which contextualizes active restoration within the larger process that begins with planning for cyber restoration in much the same way as utilities plan for physical restoration. Active cyber restoration begins with detecting a breach and follows the same sequence of activities introduced above: assess, provide, prioritize, and repair. This section focuses on the steps that occur to restore power after a cyber detection that has resulted in a major service disruption.
One important difference between a cyber failure and a physical disaster is the tightly defined nature of the cyber attack. The impact of a hurricane is expressed in maps as well as in lists of damaged equipment and lines. In the case of a cyber event, the cause is usually more singular than that of a natural disaster. It would be a rare event that would involve the simultaneous breach by two disparate organizations in two different ways, although a well-prepared mal-actor
may seek to create exactly this situation. The analysis of a cyber event typically focuses on understanding a specific bit of malware and how it affects communication, and the countermeasures are similarly focused and technical unless the impact extends to the point of requiring replacement of substantial equipment.
A breach of a utility industrial control system (ICS) can be obvious after the system it is controlling malfunctions, but this malfunction may not occur for a long time after the initial breach. For example, in the well-covered breach of the Ukrainian power system, the actual disruption occurred 9 months after the initial breach. Mandiant (2016) reported that the average time from breach to detection in a typical information technology system is 140 days. This time delay is important and pernicious as it allows attackers to locate and master critical systems, find valuable or restricted information, and develop a strategy for exploitation. Adversaries lacking detailed knowledge of a system do not know a priori how to inflict damage even if they have accessed the ICS; they need this time to learn how to damage the breached system. The first step in cyber restoration is to detect the breach quickly, so that the adversary does not have time to develop sufficient understanding of the ICS to disrupt operations. Utilities need to develop reliable mechanisms to verify that their systems are running only the expected software and, if this is not the case, to allow remote resetting of systems.
Finding: Breaches of utility industrial control systems may persist for an extended period prior to causing disruptions to operations or service. A breach alone is not sufficient to gain control of a system, to compromise its operation, or to steal or corrupt valuable information. It takes time for attackers to learn about the system they have breached.
The problem of breach detection can be addressed by anomaly detection, although this approach has not been shown to work as well in more general enterprise settings. In part, this is because complex and distributed systems of large enterprise systems are difficult to monitor, as the variety of communications is immense (e.g., from e-mail to web site configuration management and integration with multiple systems) and varies over time. However, electric utility ICS systems are different. The boundaries of the system are more clearly defined and slower to change, the network architecture is more consistent, the communications are more structured (i.e., using well-defined protocols), and the values communicated fall into definable ranges and patterns. For example, residential meters typically report every day, hour, or 15 minutes, depending on configuration; they always use a message structure defined by the brand of meter (frequently based on an open standard), and the voltages they report are almost always in the American National Standards Institute band.5 Using another protocol, reporting a value substantially outside the American National Standards Institute band, issuing a different message type, or reporting too often could indicate that the meter has been compromised or is malfunctioning. Another example of the potential for anomaly detection is reclosers, which control the connection to a lateral power line and do not open or close very often. Too-frequent cycling could indicate an attempt to damage the system.
Beyond these patterns, the electricity system is governed by the physics of its electrical flows. Information from the numerous and diverse sensors must present a coherent model of the state of the conditions on the grid. Reported values which deviate from the physically possible can indicate either a broken sensor or a cyber issue. For these reasons, anomaly detection methods that are not effective in general enterprise systems can work well in utility control systems. Anomalies can be detected based on rules derived by various means, including those that are (1) specified by operators, (2) derived from network mapping, (3) derived through machine learning, and (4) based on physical modeling. The first two of these are based on established technology (e.g., The Bro Project6 and the Essence Project7). There is much potential for progress in the latter two. Machine learning could combine support vector machine estimation for classification with neural net methods for training. While good physics models are available (e.g., OpenDSS and GridLab-D for distribution systems), there are challenges in making them fast enough for use in real-time anomaly detection.
Finding: Tools for physics models and ICS network modeling are not well adapted to use in anomaly detection or cyber testing. Any discrepancy between the physics of the grid and the telemetry can indicate a system or component problem or a cyber compromise. The challenge at present is that physics models of power flow are generally too slow for real-time monitoring, and the track record for calibration is spotty.
Recommendation 6.8: The Department of Energy should develop the ability to apply physics-based modeling to anomaly detection. There is enormous value in having real-time or better physics models in deriving optimal power flow and monitoring performance for more accurate state estimation. Such systems should also provide a powerful tool for verifying the integrity of telemetry systems—that is, verifying that observed conditions are consistent with model conditions—and if not, then there is a problem with knowledge of state, presuming the model is accurate.
5 American National Standards Institute Standard C84.1 defines the acceptable range of voltage within which a utility can deliver power to customers.
7 The website for the Essence Project is https://www.controlsystemsroadmap.net/Efforts/Pages/Essence.aspx, accessed July 11, 2017.
Once a breach of the ICS has been detected, the next step is to assess the extent of damage. At this point, power may still be flowing to part or all of the grid; however, the system has failed fundamentally because the ability to determine system state accurately and control component behavior is likely compromised. Work should begin immediately to determine what part of the system (including the ICS, all connected components, and communications in either direction with external systems) has been compromised and how. At the simplest level, this involves examination of all components for indicators of compromise. Examination can include the following:
- Inspection. Scanning the memory and storage of each device looking for malware (i.e., “blacklisting”) and checking that only approved software is running (i.e., “whitelisting”).
- Challenge. Exercising devices to verify that they are communicating and operating correctly (e.g., flip a switch electronically to verify that it can be reached, acts as directed, and can confirm its action and state).
- Diagnostic model. Network and physics-based modeling of the grid to map anomalous behavior, although currently the models that would be used for this are not yet ready to support near-real-time restoration.
The first steps in assessment are to assemble the necessary tools if they are not present, make sure that the tools and their underlying databases are up-to-date, and then systematically and completely examine every software object in the broadly defined system to determine whether and how each has been corrupted. The assessment should be undertaken with a sense of the system connectedness, first emphasizing components that are linked to and dependent on systems known to be compromised, within the same security domain, or accessed in similar ways.
The provisioning phase of restoration focuses on marshalling human and other resources necessary to bring the ICS back to operation, perhaps in stages. Based on the assessment, the restoration team derives a list of skills and artifacts necessary to restore each component and the integrated system. In instances where replacement is either necessary or more efficient, these lists will include hardware (e.g., servers, smart components). For example, if a server is corrupted, it may be possible to restore it to safe operation, but it may be quicker and easier to build a new server from scratch and return the original sever to inventory at a less hectic time. Restoration may also require software and data: reference disks of software, often termed “gold disks,” are typically required, as are backups of the most current state data. Large transmission organizations are generally scrupulous about maintaining “gold disks,” but this practice has not been promulgated throughout the entire industry. Restoration can be slowed by something as simple as not having license information, not patching backups to current levels, or not having internet access when it is required for activation or download of current patches. The provisioning plan should take all of these activities into consideration. The provisioning plan, overlaid on the assessment, provides a map of what components and subsystems can be restored and with what effort.
Based on the assessment, a plan must be developed to restore the system. The challenge is to coordinate the activities of specialists with the available physical and digital resources in a sequence of steps. Restoration of a specific computer could range from something as simple as running a virus removal tool to something as complex as writing new code for a virus removal tool. It could involve re-flashing a build image, replacing a drive or even a whole computer, or rebuilding a software configuration step-by-step. There may be hundreds of steps, and it may be impossible to determine in detail all of the steps needed in a particular case. Initially, the plan may state only that a network engineer will look at an infected switch and determine what needs to be done to repair it. As the restoration proceeds, knowledge of state and the efficacy of restoration options improve, and the plan becomes more specific.
A critical issue is the affected utility’s ability to marshal appropriately skilled resources. The design and documentation of utility ICS systems is insufficiently standardized; outside experts cannot quickly become effective in another organization. They can be tasked with routine tasks like imaging a disk, but their ability to contribute more strategically requires more detailed knowledge of affected systems. Priorities to achieve cyber resilience include establishing a common design and technical lexicon, training and working across organizations, and establishing common practices and formats for supporting artifacts. These need not be accomplished across the nation in a single push; rather, they can develop in groups of related or associated organizations, such as the group of distribution cooperatives supported by the single generation and transmission cooperative North Carolina Electric Membership Corporation. This model should be broadened to include other peer groups, perhaps organized around regional transmission organizations and regional reliability coordinators.
Another major barrier is that, to date, organizations have not been transparent about cyber events, in part owing to risk of embarrassment and liability. Furthermore, mechanisms to share resources for cyber restoration and compensate for their use—that is, cyber mutual assistance agreements analogous to traditional MAAs—are nascent. Working with
EEI, the Electricity Subsector Coordinating Council is developing such a cyber MAA program (ESCC, 2016); however, the configuration of local systems can differ so substantially across utilities (i.e., when comparing a small cooperative to a major independent system operator/regional transmission organization) that it may be prohibitively difficult for loaned workers to contribute significantly to cyber restoration, even if they are experts. Through a separate program, the Electricity Information Sharing and Analysis Center (E-ISAC) disseminates risk information to utilities; its further development should be encouraged, but the emphasis to date has been on sharing information rather than labor and primarily directed at protection rather than restoration.
One final issue to consider is funding; cyber restoration, like physical restoration, can be costly. Means must be made available for utilities to hire outside assistance when useful and buy new equipment as needed to restore power quickly. A utility may look at its limited resources and plan restoration over a long period, but there may be a social advantage to using resources beyond the utility to restore over a shorter period.
Finding: To date, there have been no large-scale power outages in the United States caused by cyber attacks, but there have been many instances in which components have been compromised. Utilities have experience in fixing these minor cyber problems by rebuilding components and databases. However, cyber restoration is not a routinized process, and different organizations follow different approaches based on the nature of the event.
Recommendation 6.9: The Department of Energy and the Department of Homeland Security should work with the North American Electric Reliability Corporation, independent system operators, and regional transmission organizations to develop a model for large-scale cyber restoration. This should be done in collaboration with utilities and leading utility organizations such as the Edison Electric Institute, the National Rural Electric Cooperative Association, the Electric Power Research Institute, and the American Public Power Association.
Actual repairs are accomplished in three steps: (1) containing the breach, (2) restoring components that can be saved, and (3) replacing those that cannot.
The first step after detection is to contain the malware by isolating it and preventing its spread to other internal or external systems. Taking an infected component off-line can adversely impact grid operations; thus, expert decisions must be made about how to operate without the impacted components. Operations without compromised or degraded digital control may be possible; if not, a portion of the grid may be operated instead. For example, if the problem impacts voltage control at a particular substation, the feeder may be disconnected from central control and either operated with fixed typical control points or shut down temporarily. In this case, potentially no service will be lost. It is critical to keep safety and the long-term reliability of the grid in mind; operation should not be attempted unless it can be verified that the grid and customers are not put at risk. If digital telemetry is lacking, this may require dispatch of crews to verify switch settings manually, determine voltage and current, or confirm whether a line is energized. Fortunately, protective relays and fuses provide some protection against egregious misoperation.
Another aspect of containment is to communicate with other utilities. Sharing details of the attack—particularly information on the types of components impacted, the IP addresses of the attackers if known, and any identified malware signatures—may help others identify an ongoing attack. The E-ISAC has taken on the role of intermediary in this action; nonetheless, these systems must be strengthened, extended, accelerated, and exercised. The Cybersecurity Risk Information Sharing Program, initiated by DOE with E-ISAC support, is currently monitoring the majority of transmission systems and sharing such information with automated machine-to-machine communication. This has led to substantial improvement in the situational awareness of real-time cybersecurity risks in the electricity industry.
Restore and Replace
With the spread of malware contained to the extent possible, the work shifts to restoring components to a clean state or replacing them if repair is too difficult or time consuming. As practice in cyber restoration moves beyond improvisation, restoration will eventually proceed by following a plan that is developed in advance, updated, and refined for specific circumstances. Implementing the plan requires the following: (1) Executing the outlined steps, (2) Adding detail as necessary and possible, (3) Testing, (4) Monitoring progress and failure, and (5) Providing feedback to update the plan.
At each point in the restoration, the engineer must determine the correct strategy: restore or replace. The trade-offs include cost, time, and the relative risk of a repaired component still hiding malware or being otherwise compromised versus possible errors in the configuration of new components. The choice is specific to the circumstances at hand. For example, the time required for repairs depends critically on whether there is a tested and trusted tool available on hand to remove malware and whether complete and correct backup data are available.
Highly competent staff are key to effective execution of restoration and replacement plans. While a utility may
have excellent general support staff, it is unlikely that they will have experience in large-scale cyber restoration. Their skills, experience, and confidence must allow them to innovate and improvise beyond their current skills. Government teams experienced in cyber restoration and similarly skilled staff from other utilities, software vendors, and cybersecurity firms can provide valuable support to the utility teams, although they are still limited by their lack of experience with the particular system being restored.
Finding: There has been a tendency among utilities and other commercial entities not to share information about cyber breaches and to look inward rather than seeking help, which limits potential for collaboration across organizations. Most utilities are not likely to have adequate internal staff directly experienced in large-scale cyber restoration. Furthermore, the ability of outside entities to help a utility with cyber restoration is limited by unfamiliarity with the configuration of the impacted system and by the lack of agreed-upon standards or shared practices. The ICS architecture at one utility may have little in common with the ICS at another utility, independent of the physical differences in the electrical system. This lack of commonality in utility ICS system designs and documentation makes rapid and efficient use of staff from other organizations very challenging, as an engineer at one utility may face a steep learning curve at another utility.
Recommendation 6.10: The Department of Energy and the Department of Homeland Security should work with the Electricity Subsector Coordinating Council and utilities to enhance the sharing of cyber restoration resources (i.e., cyber mutual assistance agreements) including personnel, focusing on peer-to-peer collaboration, as well as engagement with government, industry organizations, and commercial cybersecurity companies. Practices that allow shared personnel to more quickly come up to speed on restoration plans will increase the value of cyber mutual assistance agreements. This should include dissemination of best practices for the backup of utility industrial control systems and operational data.
Finding: Though the basic systems are in place for sharing cyber threat information, practices can be improved with more emphasis on speed. There are organizational systems in place for sharing cyber information (e.g., E-ISAC), but the lack of a common ontology and design patterns make the shared information more difficult than necessary to put to use.
Recommendation 6.11: The Department of Energy, the Department of Homeland Security, the electricity sector, and representatives of other key affected industries and sectors should continue to strengthen the bidirectional communication between federal cybersecurity programs and commercial software companies.
Effective documentation strategies are also critical for effective cyber restoration. System documentation must be complete, accurate, and up-to-date so that the restoration teams have the information they need to proceed and additional staff can be brought up to speed quickly. Industry experience has shown that the only way to keep documentation up-to-date is to connect it to operational production systems. For example, the network should be mapped periodically and continuously using automated tools, and then the discovered reality can be compared to the documented theory. Documentation should include backup copies of every critical system, including the data and software and all critical keys, passwords, and licenses. Such backup information should be available through a secure system with an expert in the loop.
Finally, cyber restoration workers need the best possible tools to facilitate their collaboration. At a minimum, telephones should be supplemented with shared drives, online screen sharing, and remote disk access. Cloud options should be available to provide backup if local systems are compromised to the extent possible and vice versa. Such cloud systems must be as secure as possible and potentially open only to utility operators. Furthermore, these teams must practice with either real systems or high-fidelity models. (It is possible to construct virtual systems that would allow training and practice.) Strategies for this sort of simulator are being pursued by DOE, with the National Renewable Energy Laboratory in the lead, and by the National Rural Electric Cooperative Association, with its Simba project.
Restoration of the ICS culminates with energizing the grid, shown at the top of Figure 6.4. There needs to be rapid iteration and tight integration between the plan and test steps, but ultimately the real-world test in the grid cannot be achieved digitally and virtually. Utility ICSs have switches and other controls that set machines in motion and power flowing. Some of these actions can be dangerous to line crews and could cause damage to utility and customer equipment as well as to other infrastructures. Also, a compromised control system may incorrectly alter limits on a fault protection relay or send signals to a generator that crews on site in the plant know are incorrect, resulting in dangerous system operations.
The scale and importance of utility operations dictate validation in many aspects of cyber restoration. The physics of the grid must be considered in all cyber decisions. Expert judgment is needed to determine when physical contact and observation are needed and when the benefits outweigh the risks. The training of utility personnel ensures a culture of safety.
Analyze and Refine
After the grid is re-energized, the final step is to examine what was accomplished and gather lessons learned. The goal
TABLE 6.1 Summary of Selected Recommendations Made by the National Research Council in Its 2012 Report Terrorism and the Electric Power Delivery System, Together with the Committee’s Assessment of Where Things Now Stand
|National Research Council Recommendation||Assessment of Present Situation|
|6.1: The Electric Reliability Organization (ERO) [NERC] should require power companies to re-examine their critical substations to identify service vulnerabilities to terrorist attack. Where such vulnerabilities are discovered, physical and cyber protection should be applied. In addition, the design of these substations should be modified with the goal of making them more flexible to allow for efficient reconfiguration in the event of a malicious attack on the power system. The bus configurations in these substations could have a significant impact on maintaining reliability in the event of a malicious attack on the power system. Bus layout or configuration could be a significant factor if a transformer, circuit breaker, instrument transformer, or bus work is blown up, possibly damaging nearby equipment.||The industry has made progress on this issue.|
|6.2: The ERO and FERC should direct greater attention to vulnerability to multiple outages (e.g., n-2) planned by an intelligent adversary. In cases where major long-term outages are possible, reinforcements should be considered as long as costs are commensurate with the reduction of vulnerability and other possible benefits.||Some progress has been made on these issues, but additional effort is warranted.|
|7.6: State legislatures should change utility law to explicitly allow microgrids with distributed generation. [Institute of Electrical and Electronics Engineers] should revise its standards to include the appropriate use of islanded distributed generation and microgrid resources for local islanding in emergency recovery operations. Utilities should re-examine and, if necessary, revise their distribution automation plans and capabilities in light of the possible need to selectively serve critical loads during extended restoration efforts. Public utility commissions should consider the potential emergency restoration benefits of distribution automation when they review utility applications involving such investments.||There has been some progress on this. Some states are considering whether and, if so, how to support the development of microgrids as well as the role of the local distribution utilities and other entities in the process of developing such systems. But additional effort is warranted.|
|8.1: The Department of Homeland security and/or the Department of Energy should initiate and fund several model demonstration assessments each at the level of cities, counties, and states. These assessments should examine systematically the region’s vulnerability to extended power outages and develop cost-effective strategies that can be adopted to reduce or, over time, eliminate such vulnerabilities. These model assessments should involve all relevant public and private participants including public and private parties providing law-enforcement: water, gas, sewage, healthcare, communications, transportation, fuel supply, banking, and food supply. These assessments should include a consideration of outages of long duration (≥ several weeks) and large geographic extent (over several states) since such outages could require a response different from those needed to deal with a shorter duration events (hours to a few days).||To the best of the committee’s knowledge, no such demonstrations have been undertaken.|
|8.2: Building on the results of these model assessments, DHS should develop, test, and disseminate guidelines and tools to assist cities, counties, states, and regions to conduct their own assessments and develop plans to reduce their vulnerabilities to extended power outages. DHS should also develop guidance for individuals to help them understand steps they can take to better prepare for and reduce their vulnerability in the event of extended blackouts.||To the best of the committee’s knowledge, no such activity has been undertaken.|
|8.3: State and local regions should use the tools provided by DHS as discussed in Recommendation 8.2 to undertake assessments of regional and local vulnerability to long-term outages, develop plans to collaboratively implement key strategies to reduce vulnerability, and assist private sector parties and individuals to identify steps they can take to reduce their vulnerabilities.||While not following the strategy that the committee recommended, some limited progress has been made.|
|8.4, 8.5, and 8.6: Congress, DHS, and the states should provide resources and incentives to cover incremental costs associated with private and public sector risk prevention and mitigation efforts to reduce the societal impact of an extended grid outage. Such incentives could include incremental funding for those aspects of systems that provide a public good but little private benefit, R&D support for new and emerging technology that will enhance the resiliency and restoration of the grid, and the development and implementation of building codes or ordinances that require alternate or backup sources of electric power for key facilities. . . . Federal and state agencies should identify legal barriers to data access, communications, and collaborative planning that could impede appropriate regional and local assessment and contingency planning for handling long-term outages. Political leaders of the jurisdictions involved should analyze the data security and privacy protection laws of their agencies with an eye to easing obstacles to collective planning and to facilitating smooth communication in a national or more localized emergency. . . . DHS should perform, or assist other federal agencies to perform, additional systematic assessment of the vulnerability of national infrastructure such as telecommunications and air traffic control in the face of extended and widespread loss of electric power, and then develop and implement strategies to reduce or eliminate vulnerabilities. Part of this work should include an assessment of the available surge capacity for large mobile generation sources. Such an assessment should include an examination of the feasibility of utilizing alternative sources of temporary power generation to meet emergency generation requirements (as identified by state, territorial, and local governments, the private sector, and nongovernmental organizations) in the event of a large-scale power outage of long duration. Such assessment should also include an examination of equipment availability, sources of power generation (mobile truck-mounted generators, naval and commercial ships, power barges, locomotives, and so on), transportation logistics, and system interconnection. When areas of potential shortages have been identified, plans should be developed and implemented to take corrective action and develop needed resource inventories, stockpiles, and mobilization plans.||Limited progress has been made on selected items.|
|National Research Council Recommendation||Assessment of Present Situation|
|9.1: Complete the development and demonstration of high-voltage recovery transformers and develop plans for the manufacture storage and installation of these recovery transformers.||A demonstration has been successfully conducted. Considerable work is still needed on developing and implementing an adequate program of funding and other support for recovery transformers.|
|9.2–9.6: Continue the development and demonstration of the advanced computational system currently funded by the Department of Homeland Security and underway at the Electric Power Research Institute. This system is intended to assist in supporting more rapid estimation of the state of the system and broader system analysis. . . . Develop a visualization system for transmission control centers which will support informed operator decision making and reduce vulnerability to human errors. R&D to this end is underway at the Electric Power Research Institute, Department of Energy, Consortium for Electric Reliability Technology Solutions, and Power System Engineering Research Center, but improved integration of these efforts is required. . . . Develop dynamic systems technology in conjunction with response demonstrations now being outlined as part of an energy efficiency initiative being formed by EPRI, the Edison Electric Institute, and DOE. These systems would allow interactive control of consumer loads. . . . Develop multilayer control strategies that include capabilities to island and self-heal the power delivery system. This program should involve close cooperation with the electric power industry, building on work in the Wide Area Management System, the Wide Area Control System, and the Eastern Interconnection Phasor Project. . . . Develop improved energy storage that can be deployed as dispersed systems. The committee thinks that improved lithium-ion batteries have the greatest potential. The development of such batteries, which might become commercially viable through use in plug-in hybrid electric vehicles, should be accelerated.||Limited progress has been made on selected items.|
NOTE: NRC (2012) was undertaken for the Department of Homeland Security. Progress has been limited on a number of the recommendations that are listed on page 6 of that report.
SOURCE: NRC (2012).
is to refine the process, further moving cyber restoration from an ad hoc exercise to an engineering process.
Recommendation 6.12: The Department of Energy should develop a high-performance utility network simulator for use in cyber configuration and testing. There is, to date, no flexible, peta-scale utility industrial control system simulator that offers sufficient fidelity for testing intrusion detection, anomaly detection, software defined network controls, and other aspects of utility operations. The closest systems to date take a “hardware-in-the-loop” approach. While this offers some apparent advantages in terms of fidelity, it is too time consuming and expensive to test a wide range of scenarios in such a system. A purely virtual system is necessary.
There are few hazards that cause only physical damage to the electricity system. Of principal concern is the threat of a well-coordinated and executed physical attack. This was the subject of a 1990 Office of Technology Assessment report (OTA, 1990) and a more recent National Research Council report, Terrorism and the Electric Power Delivery System (NRC, 2012). While distribution and transmission equipment have been the target of attacks internationally, the Metcalf incident (described in Chapter 3) is one of the few cases in the United States, although the event was modest in scale and did not disrupt electricity service.
A terrorist attack on the towers and poles of the transmission infrastructure could disrupt service over a large area. However, utilities are well practiced at rebuilding lines and replacing poles, and it is unlikely that such an outage would be of long duration. The situation is very different for an attack on substations and especially high-voltage transformers. As noted in Terrorism and the Electric Power Delivery System, a terrorist attack carried out in a carefully planned way by people who knew what they were doing could “deny large regions of the country access to bulk system power for weeks or even months. An event of this magnitude and duration could lead to turmoil, widespread public fear, and an image of helplessness that would play directly into the hands of the terrorists. If such large extended outages were to occur during times of extreme weather, they could also result in hundreds or even thousands of deaths due to heat stress or extended exposure to extreme cold” (NRC, 2012).
Table 6.1 revisits the recommendations made by that report and summarizes the present state of affairs. Unfortunately, the ubiquity of grid assets and their inherent vulnerability make it too costly to achieve a comprehensive high level of security. Resources are prioritized on those assets where improved security will yield the greatest improvement. Efforts to improve security at key assets should proceed alongside efforts to stockpile replacement equipment and develop and deploy temporary recovery assets.
Finding: The power system continues to be vulnerable to physical attack by terrorists. Some progress has been made in making the system more resilient in the face of this hazard—for example, through physical security standards such as NERC CIP-014—but much remains to be done. Several
strategies (e.g., high-voltage replacement transformers) that reduce vulnerability to terrorist events also reduce the system’s vulnerability to a range of natural hazards.
Recommendation 6.13: Efforts by the Department of Energy and the Department of Homeland Security, in conjunction with the Federal Energy Regulatory Commission, the North American Electric Reliability Corporation, and the electric industry, should be redoubled to reduce the vulnerability of the power system to terrorist attacks (paying particular attention to topics in Table 6.1 that have not yet been adequately addressed).
Restoration of electric service from a system that has sustained both physical damage (e.g., a damaged transformer) and compromised monitoring and control systems (e.g., SCADA and EMS disrupted) will require greater reliance on manual inspection and operation, which can slow the pace of damage assessment and recovery. Thus, recovery from a coordinated cyber-physical attack may proceed slowly if operators suffer diminished situational awareness and have to dispatch linemen to assess damage. The principal concern across the industry is the potential for a well-informed state actor or terrorist group to execute a coordinated cyber-physical attack, the so-called structured adversary. Both cyber and physical attacks can be combined, targeted toward system components that cause the most damage or are most difficult to replace, and carried out repeatedly and perhaps with the explicit intent of hindering restoration.
EPRI has developed scenarios of coordinated cyber-physical attacks targeting generation, transmission, and distribution systems that can be used by operators and asset owners to test their readiness and improve planning and drilling (EPRI, 2012). More recently, NERC coordinated more than 100 participating organizations in the biennial distributed-play exercise GridEx III, which practiced response and recovery from a series of hypothetical cyber and physical attacks (NERC, 2016b). Such planning and drilling exercises are a valuable industry practice; however, the level of sophistication of attacks may continue to grow along with the number of vulnerable cyber and physical targets.
Recommendation 6.14: Utilities, with support from federal and state government, should continue to expand joint cyber-physical recovery exercises. These should emphasize, among other things, the maintenance of cyber protection during the chaotic period of physical restoration. The need to reconfigure electrical systems during a disaster requires changes to the industrial control system. It is frequently necessary to disable elements of the cybersecurity systems while the state of the gird is in flux. Research should be done on how to maintain a higher level of security during this period. This may involve operation in default modes or with analog controls to some extent until cybersecurity can be reestablished.
Other Technologies and Operations That Improve Restoration
Though many of the technologies discussed in Chapter 4 are intended to reduce the likelihood and extent of outages, many of these technologies also directly aid in the restoration stage. Improvements from advanced sensing, controls, and analytics have reduced outages and quickened restoration. In particular, distribution system automation and adaptive islanding are examples of where these technologies can play a role in improving restoration. Further, while these technologies help in the resilience of the electric system, these technologies also improve the reliability of the system to small, localized outages.
Improving Resilience by Learning from Past Events
The final step in restoration is to reflect on and analyze the experience to improve future restoration efforts. Often restoration from a large-area, long-duration outage is viewed as a unique effort. Nonetheless, it is certain that, even in the midst of a great disaster, another similar outage will follow. In 2005, Katrina seemed a nonpareil event, but Superstorm Sandy followed a mere 7 years later. The industry can and must plan for disaster recovery, but only real disasters stress the plans and expose their gaps and weaknesses. Disasters provide a genuinely unique opportunity to learn.
For most large-area, long-duration outages, there is an after-action report that, for the most part, reads like a historical piece rather than a technical study aimed at process improvement. These reports accurately describe what occurred and what was done (when, where, and by whom) as well as contain a number of short narratives related to particular successes or failures. While this information is useful, even essential, the idiosyncratic approaches make it difficult to identify more general process improvements across multiple events. Outside of the electricity industry, other sectors have developed sophisticated investigation procedures and even maintain full-time, well-trained staff whose only job is to investigate major incidents. The National Transportation Safety Board Investigative Process8 is solely focused on improving safety and since the Board has no regulatory or enforcement powers, its conclusions cannot be used in litigation. The committee believes that the electricity sector can improve its own investigations by learning from the National
8 The National Transportation Safety Board Investigative Process is described at https://www.ntsb.gov/investigations/process/Pages/default.aspx, accessed July 11, 2017.
Transportation Safety Board and potentially creating a similar institutional structure.
Part of the problem is the lack of a general restoration model to provide a common framework for learning. A simple, initial framework was proposed earlier in this chapter, and extension and elaboration of that framework could be very useful in structuring the learning process. Two additional problems are as follows: (1) There is no national process or organization to systemize the integration of studies, and (2) there is insufficient rigor to data collection. The following sections describe a general process for collecting information on the failures and shortcomings in disaster restoration.
Step 1: Compile High-Level Facts That Describe the Event
Step 1 is performed by the study team. A summary should be prepared detailing the essential known facts, including a description of the event, high-level summary of known impacts (e.g., where power was lost and for how long), the grid-level drivers of power loss, the organizations involved with restoration and their activities, a timeline of restoration activities, notable successes and failures, and a list of questions raised. From these facts, a series of maps, organization charts, and information flow diagrams should be prepared. This will provide a guide for the research and a common understanding of the event that can be shared among all of the participants in the research.
Step 2: Conduct Interviews
Beginning with the above summary, a series of interviews with a large number of individuals from all organizations involved in the restoration should be undertaken by the study team. The interviews should focus on what the organization did, as well as its inputs and outputs.
Step 3: Perform Synthesis
The synthesis phase is conducted by the study team and supplemented by subject-matter experts as needed. The synthesis phase extends the event summary by using information from the interviews. The results are summarized in a narrative that incorporates a number of graphics. The graphics include an “entity relationship diagram” (ERD); diagrams of material flows, equipment flows, and information flows; and any other charts the study team deems necessary. The ERD is crucial, as it lists all of the entities involved in restoration, from government, utility, and other private sector groups, and documents their interactions through arrows. For example, the governor’s office (entity) may direct (relationship) to the National Guard (entity). The actual flows of material, equipment, and information overlay the ERD. The reduction of the narrative to these artifacts ensures rigor in and understandability of the analysis.
Step 4: Conduct Special Engineering Studies
Special engineering studies are conducted by technical teams assembled for each study. Electrical disasters and remediation are, to a large extent, studies in organization, communication, and coordination. They are at root, however, serious exercises in engineering. Much of the process described here is directed at organizational and process improvement, which is important because it underpins the response to all disasters, but it is just as important to learn about the design and operation of the grid. These elements must be part of the learning process. Based on the recommendations of the interviews, special engineering studies should be initiated. An example that is particularly important is in understanding the transmission grid. Despite its immense scale, it is a precision machine that requires careful harmonization. The studies may look at things like cyber and physical black start, the repair of analog versus digital components in flooded substations, repair of underground laterals in flooded areas, structure failure mode and possibly the need for redesign, and a host of other subjects. Special subjects should be defined in the study phase when they are essential to understanding the restoration or when the restoration presents an opportunity to learn about the grid and how to improve it. Superstorm Sandy provided an unparalleled opportunity to study grid physics at a large scale, and Katrina provided may examples of restoration of flooded substations.
Step 5: Review and Distribute Widely
All parties involved in grid restoration should be involved in review and socialization. This includes individuals and organizations not impacted by the disaster or involved in its restoration. The synthesis report should be widely distributed and reviewed at meetings in a process of improvement and refinement. This will likely span several months.
Step 6: Generalize and Integrate
This step is conducted by a team developed specifically for this purpose but should involve a few members of the study team. The purpose of the final step is to take the specific analysis that comes from Step 5 and use it to improve the general restoration model, asking which lessons have value beyond simply understanding what occurred.
Special Studies—Cascading Failures on the Bulk Power System
The reliability of U.S. electric power systems has been high enough that the rare occurrences of major blackouts have been prominent national and even international news items. Often, the circumstances leading up to a major system failure include multiple individual factors, each of which alone would have little or no significant impact but when
combined conspire to impact the integrity of the system. In the past, such combinations have resulted through coincident occurrence of unrelated events. For example, during the August 14, 2003, blackout, there were four root causes identified (UCPSOTF, 2004). In the future, events could also be brought together through malevolent synergy. The job of an outage investigation team is to sift through all of the evidence to determine the root causes of the larger system failure and extract lessons for future improvement.
The first step in investigating an incident is to accurately reconstruct the sequence of events. Determining the sequence of events can be a time-consuming process. The first step is gathering all of the data to support the investigation team’s evidence-building process (Dagle, 2006). Myriad data sources can provide useful information to support this phase of the investigation. Among the most valuable sources of information are operational logs, records of sequence of events, digital fault recorder output, protective relaying event information, synchrophasor data history, and other similar records of real-time information. The accuracy and precision of these event logs can be critical during cascading events, allowing investigators to sift through the initiating actions and subsequent responses. In the past, significant difficulties have arisen in gathering the data to support the investigation team (Dagle, 2004). The good news is that with the advent of modern power system measurement technology, it is becoming much easier to collect data with microsecond-class measurement accuracy, which is often of ample temporal resolution to be able to accurately determine the sequence of events.
Once the sequence of events is organized, it is valuable to separate it into slower events leading up to the cascading failure and faster events that are occurring during the cascading failure itself. Normally the role of human operators is only relevant during the slower events, and automatic controls are involved in the faster sequences associated with the later stages of the cascading failure.
Particularly with the automated controls, it is necessary to understand the relationship among the various steps in the sequence of events. Characterizing the reason behind any automatic control action helps to develop a deeper understanding of the sequence of events and the chain of events that led up to the cascading failure sequence. This often involves a detailed assessment of protection and other control devices to determine why they operated as well as how their operation contributed to subsequent actions in the sequence of events.
Finally, after considering the sequence of events, and earlier actions that contributed to later actions, the process of root cause determination can be made. It is important in this process to understand that actions taken in advance of the event could be a key root cause finding. For example, inadequate vegetation management, rather than a ground fault to a tree, might be a root cause.
Another important consideration is the degree to which infrastructure damage will prevent rapid restoration of electricity service. As disruptive as widespread blackouts can be, much worse events are possible. Under several different types of circumstances, electric power systems could be damaged well beyond the level of normal design criteria for maintaining reliability (OTA, 1990). The threats of terrorism, severe storms, and other phenomena, such as geomagnetic disturbances, have increasingly become major concerns to the government and the commercial utility industry. The regulations and policies to mandate how the nation would respond to such an event, or even define who is in charge, are still evolving.
Finding: Analysis of large-area, long-duration outages requires an enormous amount of high-precision data. Provision for the collection of these data could be in place before an event. Fundamentally, it is the responsibility of each organization involved in operating the system to conduct event investigations, gather lessons learned, and apply those lessons to minimize the likelihood of subsequent similar events. NERC has jurisdiction and responsibility to conduct investigations of outages involving the bulk power system. Particularly for events that involve multiple organizations, NERC brings tremendous value to the process by assembling outside expertise that cuts across organizational boundaries.
Recommendation 6.15: The North American Electric Reliability Corporation, the Federal Energy Regulatory Commission, and relevant regional- and state-level organizations should improve the investigation process of large-scale losses of power with the objective of disseminating lessons across geographical and jurisdictional boundaries. Experiences from outside organizations such as the National Transportation Safety Board should inform this work. To further improve the investigation process, the committee recommends that organizations involved in electricity system operation improve restoration through the following:
- Better and more uniform calibration of recording instruments, including precise time synchronization.
- Pre-defined data requirements to support incident investigations using standard data formats.
- Pre-work logistical details (e.g., prior establishment of confidentiality agreements).
- Infrastructure to support centralized blackout investigations.
- Creation of a data warehouse with servers and databases to store and process the incoming data, support the investigation team, and manage data inventory.
- Defined data categories (to readily track and follow-up on data gaps).
- Automated disturbance reporting.
- Routine collection of transmission and generation events.
- Improved mechanics of data formats, exchange protocols, and confidentiality issues that can be worked out and tested on an ongoing basis.
- Blackout data that are collected in a matter of hours rather than a matter of days or weeks.
Ball, B. 2006. Rebuilding electrical infrastructure along the Gulf Coast: A case study. The Bridge: Linking Engineering and Society 36(1): 21–26.
CREC (Cuivre River Electric Cooperative). 2016. “Power Restoration Plan.” https://www.cuivre.com/content/power-restoration-plan. Accessed July 17, 2017.
Dagle, J.E. 2004. Data management issues associated with the August 14, 2003 blackout investigation. IEEE Power Engineering Society General Meeting 2: 1680–1684.
Dagle, J.E. 2006. Postmortem analysis of power grid blackouts: The role of measurement systems. IEEE Power & Energy Magazine 4(5): 30–35.
DHS (Department of Homeland Security). 2012. Recovery Transformer (RecX) Demonstration [Video file]. https://www.dhs.gov/science-andtechnology/recx-demo-video. Accessed July 11, 2017.
DHS. 2014. Considerations for a Power Transformer Emergency Spare Strategy for the Electric Utility Industry. https://www.dhs.gov/sites/default/files/publications/RecX%20-%20Emergency%20Spare%20Transformer%20Strategy-508.pdf.
DHS. 2016. National Response Framework. 3rd Edition. https://www.fema.gov/national-response-framework. Accessed July 13, 2017.
DOE (Department of Energy). 2015. “Modernizing the Electric Grid.” Quadrennial Energy Review First Installment: Transforming U.S. Energy Infrastructures in a Time of Rapid Change. http://energy.gov/epsa/downloads/quadrennial-energy-review-first-installment. Accessed July 13, 2017.
DOE. 2016. Promoting Innovation for the Design of More Flexible Large Power Transformers. https://energy.gov/oe/articles/promotinginnovation-design-more-flexible-large-power-transformers. Accessed July 11, 2017.
DOE. 2017. Quadrennial Energy Review: Transforming the Nation’s Electricity System: The Second Installment of the QER. https://energy.gov/epsa/downloads/quadrennial-energy-review-second-installment. Accessed July 13, 2017.
EEI (Edison Electric Institute). 2016. Understanding the Electric Power Industry’s Response and Restoration Process. http://www.eei.org/issuesandpolicy/electricreliability/mutualassistance/documents/ma_101final.pdf.
EPRI (Electric Power Research Institute). 2010. “Development of Power System Restoration Tool Based on Generic Restoration Milestones.” https://www.epri.com/#/pages/product/000000000001020055/. Accessed July 13, 2017.
EPRI. 2012. “Coordinated Cyber-physical Attacks, High-Impact Low-Frequency (HILF) Events, and Risk Management in the Electric Sector.” http://www.epri.com/abstracts/Pages/ProductAbstract.aspx?ProductId=000000000001025861. Accessed July 13, 2017.
EPRI. 2013. “Enhancing Distribution Resiliency: Opportunities for Applying Innovative Technologies.” https://www.epri.com/#/pages/product/000000000001026889/. Accessed March 17, 2017.
ESCC (Electricity Subsector Coordinating Council). 2016. “Overview.” http://www.electricitysubsector.org/. Accessed December 15, 2016.
Fugate, W.C. 2012. “Hurricane Sandy: Response and Recovery Progress and Challenges.” Hearing Before a Subcommittee of the Committee on Appropriations, United States Senate, 112th Congress, December 5.
Lacey, S. 2014. “Resiliency: How Superstorm Sandy Changed America’s Grid.” GreentechMedia, June 10. https://www.greentechmedia.com/articles/featured/resiliency-how-superstorm-sandy-changed-americasgrid. Accessed July 13, 2017.
Mandiant. 2016. “M-Trends.” https://www2.fireeye.com/PPC-m-trends2016-trends-statistics-mandiant.html. Accessed July 11, 2017.
Miller, C., M. Martin, D. Pinney, and G. Walker. 2014. Achieving a Resilient and Agile Grid. http://www.electric.coop/wp-content/uploads/2016/07/Achieving_a_Resilient_and_Agile_Grid.pdf.
NASEM (National Academies of Sciences, Engineering, and Medicine). 2016. Electricity Use in Rural and Islanded Communities: Proceedings of a Workshop. Washington, D.C.: The National Academies Press.
NERC (North American Electric Reliability Corporation). 2016a. Report on the FERC-NERC-Regional Entity Joint Review of Restoration and Recovery Plans. https://www.ferc.gov/legal/staff-reports/2016/01-2916-FERC-NERC-Report.pdf.
NERC. 2016b. Grid Security Exercise. http://www.nerc.com/pa/CI/CIPOutreach/GridEX/NERC%20GridEx%20III%20Report.pdf.
NRC (National Research Council). 2012. Terrorism and the Electric Power Delivery System. Washington, D.C.: The National Academies Press.
NYSEG (New York State Electric and Gas Corporation) and RGEC (Rochester Gas and Electric Corporation). 2016. Electricity Utility Emergency Plan. https://www.nyseg.com/MediaLibrary/2/5/Content%20Management/Shared/SuppliersPartners/PDFs%20and%20Docs/NYSEG%20and%20RGE%20Electric%20Utility%20Emergency%20Plan.pdf.
Olearczyk, M. 2013. Airborne Damage Assessment Module (ADAM). Electric Power Research Institute 2013 Distribution System Research Portfolio. http://mydocs.epri.com/docs/Portfolio/PDF/2013_P180.pdf.
OTA (Office of Technology Assessment). 1990. Physical Vulnerability of Electric System to Natural Disasters and Sabotage, OTA-E-453. Washington, D.C.: U.S. Government Printing Office.
Parfomak, P.W. 2014. Physical Security of the U.S. Power Grid: High-Voltage Transformer Substations. https://fas.org/sgp/crs/homesec/R43604.pdf.
PJM. 2016. “PJM Manual 36: System Restoration.” http://www.pjm.com/~/media/documents/manuals/m36.ashx. Accessed July 13, 2017.
UCPSOTF (U.S.–Canada Power System Outage Task Force). 2004. Final Report on the August 14, 2003 Blackout in the United States and Canada: Causes and Recommendation. https://energy.gov/sites/prod/files/oeprod/DocumentsandMedia/BlackoutFinal-Web.pdf.
TABLE 6A.1 Variation in Restoration Activities Across the Six Stages of the Life Cycle of an Outage Characterized by Damage to Physical Components, Monitoring and Control Systems, and Supporting Infrastructure, As Indicated in the Upper Right Corner of Figure 3.2
|Hurricanes and Tropical Storms||Floods|
|Area impacted: Typically very large||Area impacted: Typically very large|
|Damage to aboveground assets: Poles, towers, substations||Damage to aboveground assets: Poles, towers, substations|
|Damage to customer assets: Extensive||Damage to customer assets: Extensive|
|Limits to access and mobility: Major blockage||Limits to access and mobility: Major blockage|
|Event warning: Days||Event warning: Days|
|Risk assessment: Can be identified beforehand||Risk assessment: Can be identified beforehand|
|Rate of propagation: Slow||Rate of propagation: Slow|
|Plan||Individual utilities plan for hurricanes and tropical storms based on their experience and historical hurricane tracks, although these tracks may be trending more northerly in the Atlantic, placing the Mid-Atlantic states and New England at greater risk than in the past. Utilities are experts in identifying their specific vulnerable assets. During this phase, utilities should establish and refresh mutual aid agreements, create owned and shared inventory, train crews, conduct exercises, and communicate with customers regarding emergency preparedness.||More than any other disaster, floods are subject to statistical analysis, and utilities plan based on FEMA flood maps. Some adjustment should be made if there has been substantial reduction in forest cover or if there has been substantial development in the impacted watershed. Consideration should also be given to how, in light of climate change, future flood risk may be different from historical risk. To the extent possible, critical assets should not be located in identified flood plains, but there are numerous legacy assets exposed to flood risk. Floods in major river basins tend to be slow rising and slow receding, with lesser hydrostatic force. In contrast, canyon flooding (largely in western mountains) tends to be fast rising with short notice, forceful, and quick to recede. In either case, assets at risk can be identified and measures taken to reduce risk such as elevating them above the flood or building coffer dams. Plans should be made to replace assets in flood plains.|
|Prepare||Hurricane wind and rain forecasts with high uncertainty are available up to 1 week in advance, which is sufficient time to elevate or downgrade risk. When risk is elevated, staffing for the emergency can be refined, and mutual aid agreements can be activated. Flood forecasts are available only 3 to 4 days in advance, and peak flooding frequently follows the event.||River basin flood forecasts are available 3 to 4 days in advance. Many flash floods occur with effectively no warning; however, if major rain events are forecast for canyon areas, utilities may place crews on standby. When a flood is forecast in a river basin, it is possible to forecast which areas and assets are most likely to be affected. General restoration plans can be made more specific, and mutual aid agreements and emergency operations centers can be activated. Lists of materials, supplies, and equipment can be developed, procured, and staged.|
|Event||Relatively little can be done on distribution systems during the comparatively short duration of the event. Transmission systems must be adjusted as loads, generators, and transmission lines drop off the grid. Utilities develop an understanding of the extent of damage and customer outages and develop specific plans for remediation, building on the general planning. Government support organizations monitor conditions and establish and exercise lines of communications with utilities and with each other. Limited actions should be taken by utilities only when safety is an issue.||Major river floods are long-duration events that move down a river basin. Restoration can start upstream while the event is still evolving downstream, and some protective measures can be undertaken as water rises. Before restoration begins in an area, the plans can be improved and refined with emphasis on the temporal sequencing. Communications and coordination should be established and exercised.|
|Hurricanes and Tropical Storms||Floods|
|Endure||The endurance phase is the period from when the storm passes to the start of restoration. Unless there is flooding, restoration can begin immediately. If there is a delay, the time should be spent moving crews into position to the extent that the condition of the roads and safety considerations allow. Effort should also be made to improve the assessment of the state of conditions, to refine plans, and to refine requests for support from and coordination with other organizations, including other utilities and government organizations. This involves the high level such as governors’ offices, but also the crews on the ground, as per informing police and fire departments about the utility staff who will be working in their area. If specialized equipment is needed, arrangements should be made for acquisition and staging for deployment.||The endurance phase for a flood at one point can be very short in areas where the grade of a river is steeper or long in low-lying flat areas. Work begins in an area as soon as the water recedes, allowing restoration.|
|Restore||Restoration is the most visible phase of the event. Crews are on the streets working. While this is a difficult and costly phase, it is one that most utilities are familiar with and good at. If there are many trees and other obstacles in the street, they must be cleared to gain access to facilities. Utilities and the linemen know how to clear access, set poles, erect towers, string conductors, and clean and repair substations. The goal of management and support organizations (including governmental) is to ensure that the line crews are used effectively. They must be dispatched to the areas where their work will have the greatest impact, considering what is doable, and placed in a sequence of restoration activities. Management should work the supply chain to be sure that crews have the equipment, parts, and supplies (including fuel) they need to execute the necessary repairs. Crews must be provided with provisions, including food and housing, and amenities, such as electrical and phone service and access to health services for the injuries that are inevitable in this dangerous physical work. Experience has shown that taking care of the families left behind when crews are deployed is an important factor in enabling them to work effectively.||Flood restoration can take a very long time. In the absence of wind, poles and towers are not typically damaged; nonetheless, the ground can be softened and some distribution and transmission failures may occur. Manholes are flooded and must be pumped out. Underground lines and associated gear sometimes survive intact but often are damaged to the point of needing costly and time-consuming replacement. Flooded substations are difficult to restore. Analog equipment can sometimes be cleaned, dried, and returned to service, but digital devices typically need replacement. Underground vaults are problematic as they are difficult to drain and dry, can accumulate deep mud, and are more difficult to move equipment in and out of. All of this, however, is work utilities know and are well equipped to manage. The key, as noted in the discussion of hurricanes, is to provide broad support to the crews.|
|Recover||Hurricanes damage communities, not just utilities. Utilities must be part of the community restoration, perhaps lasting years. Rebuilding is an opportunity for improving.||Floods damage communities, not just utilities. Utilities must cooperate with other entities in the restoration as, for example, in repairing or replacing civil and safety infrastructure.|
|Area impacted: Limited to extensive
Damage to aboveground assets: Poles, towers, substations
Damage to customer assets: Limited to extensive
Limits to access and mobility: Major blockage
Event warning: Seconds to minutes
Risk assessment: Difficult
Rate of propagation: Fast
|Area impacted: Regional
Damage to aboveground assets: Lines, poles, towers
Damage to customer assets: Limited
Limits to access and mobility: Potential blockage
Event warning: Days
Risk assessment: Straightforward
Rate of propagation: Slow
|Plan||Earthquake risk is well mapped, and utilities routinely consider earthquake risk in siting and planning processes. Methods for earthquake-survivable construction are well researched. Major plants (e.g., North Anna Nuclear Power Station) have survived earthquakes with no damage, though safety considerations have taken them off-line for an extended period. Planning consists of maintaining adequate parts inventories.||Utilities operating in regions subject to winter storms often design systems components, such as transmission towers and lines, to be able to withstand greater amounts of precipitation and wind compared to other areas.|
|Prepare||There is work on developing a near-term warning capability for earthquakes, but presently most occur with no useful warning.||Winter storm forecasts provide several days’ warning that allows for arrangement of mutual aid.|
|Event||Earthquakes are of short duration. No action during the earthquake is practical.||Some final preparation is possible during the event as outages are mapped. Transmission system operators must rebalance to accommodate failing loads and distribution systems.|
|Endure||Restoration can begin immediately.||Delay in the start of restoration is possible if the roads are blocked or ice-covered.|
|Hurricanes and Tropical Storms||Floods|
|Restore||Restoration consists of familiar utility construction but can be severely hampered by damage to supporting infrastructure. Roads and bridges can be blocked or torn away, natural gas pipelines can break, and fuel storage can rupture. Electricity system restoration is executed as part of a broader restoration effort, and coordination among federal, state, and local government, as well as utility decision makers, is essential. Shortages of materials and equipment may result in competition for scarce resources, and availability will vary geographically. Even access to food and water may be a challenge in some remote areas. There is substantial risk that the homes and families of crews may be impacted or imperiled, undermining their ability to commit to utility restoration activities. Mutual aid from unaffected areas is essential.||Restoration following winter storms is standard utility work. Mutual aid is beneficial, and due to the generally smaller geographic extent of such storms, there are fewer issues in supporting the crews or marshalling supplies than are faced during restoration from hurricanes and earthquakes. Cold temperatures do reduce effectiveness of line crews.|
|Recover||Utility restoration can be completed well in advance of the general commercial and civil infrastructure. Utility capabilities are enablers of recovery.||Winter storms do not typically inflict lasting damage on infrastructure and enablers of economic recovery.|
|Area impacted: Limited to clustered||Area impacted: Very large|
|Damage to aboveground assets: Poles, towers, substations||Damage to aboveground assets: Transformers, substations|
|Damage to customer assets: Serious but contained||Damage to customer assets: Limited|
|Limits to access and mobility: Minor blockage||Limits to access and mobility: None|
|Event warning: Seconds to minutes||Event warning: Minutes to days|
|Risk assessment: Regionally known||Risk assessment: Costly|
|Rate of propagation: Fast||Rate of propagation: Very fast|
|Plan||Utilities in high-risk areas are aware of the peril and have likely dealt with tornadoes in the past. The focus in planning is on inventory of aboveground assets and mutual assistance. Unlike some other causes, transmission and generation assets are at risk of damage from tornadoes.||Risk assessment is nascent and based on highly uncertain estimates of frequency and intensity, but methods to harden the grid are available. Replacement transformers and other vulnerable components can be stockpiled but may be too expensive to be forward deployed.|
|Prepare||The incidence of weather conditions likely to spawn tornadoes can be provided 1 day to several hours in advance. There is little time to prepare, except to bring crews to a state of readiness and fully man response centers.||Solar weather warning systems can provide some notice, allowing for minimal preparation, but there is generally insufficient time to move crews.|
|Event||Events are of such short duration that there is no practical action during the event, except that transmission operators may have to adjust to limit impact.||The building up of current on long lines can trigger operational changes to protection systems, particularly shedding load to desaturate transformers.|
|Endure||Restoration can generally begin immediately after the event passes.||Restoration can begin immediately.|
|Restore||Customer property may be destroyed alongside utility assets, which means that there may be no immediate need to restore power to the affected area. Nonetheless, the tornado may damage a transmission corridor or section of the distribution grid essential to providing service to unaffected areas. The work is familiar to utilities and, in the case of tornadoes, the impact is sufficiently localized that there is less difficulty in provisioning and supporting crews. There are likely to be intact facilities within a few miles or tens of miles of the worksite.||There is no precedent for a large-scale geomagnetic disturbance event. If the impact is very large, there may be shortages of major components, particularly large transformers due to the long lead time in building and acquiring these.|
|Recover||Tornadoes do very serious damage to the impacted community so that the recovery period can be extensive after the immediate restoration is completed. Utilities must participate in planning this recovery.||Recovery is not a factor. Extensive damage beyond the grid is unlikely since long lines are needed to build damaging current level.|
|Area Impacted||Feeder Level to System Level|
|Damage to aboveground assets||Cyber assets will certainly be compromised, perhaps beyond restoration. Control actions initiated by the pernicious actor may create a wide range of physical damage up to and including generators. In addition, “smart” components may be compromised in a way that they are no longer controllable. Such damage may be irreversible or compromise trust in the device so that it may not be used safely. This damage to the electronic aspects of a device is functionally equivalent to physical damage.|
|Damage to customer assets||Limited except, possibly, to smart meters. Meters are owned by the utility but are associated with a specific customer. If the meter includes a local wireless connection for home automation, there are potential attack strategies which may do damage to customer systems, but no such Internet of Things attack has been successful.|
|Limits to access and mobility||None.|
|Event warning||Potentially months.|
|Risk assessment||Cyber N-1 and N-2 analyses should become standard practice.|
|Rate of propagation||Slow from breach to first action, very fast from first action.|
|Plan||Planning for cyber attack is a routine part of utility operations. It tends to focus, however, on prevention rather than restoration. The emphasis in restoration is on reestablishing the operational capability of sensor, computational, and communications assets; reestablishing state; and gaining confidence in the integrity of the systems and the information they manage. Planning for cyber restoration should be planned and practiced.|
|Prepare||Systems must be improved to react more effectively to new threat information. Updated threat information is provided daily, but the systems to move this information into quick action at a utility cannot make immediate use of the information. Much of it must work its way through cybersecurity software and service providers.|
|Event||A cyber event may last several months. During the period from breach to action, the utility may be able to sever access by malicious actors, preventing damage.|
|Endure||Restoration can begin immediately on detection.|
|Restore||Methods for manual operation and restoration systems should be developed in advance. Fast reaction cyber teams should be on call.|