**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

*Mathematical Sciences Research Challenges for the Next-Generation Electric Grid: Summary of a Workshop*. Washington, DC: The National Academies Press. doi: 10.17226/21808.

Uncertainty Quantification and Validation

The fourth workshop session focused on uncertainty quantification and validation. The session was chaired by Juan C. Meza (University of California, Merced), with presentations by Miriam Goldberg (DNV GL) and Alexander Eydeland (Morgan Stanley).

HOW WELL CAN WE MEASURE WHAT DIDN’T HAPPEN AND PREDICT WHAT WON’T?

Miriam Goldberg, DNV GL

Miriam Goldberg discussed measurement and verification for demand response. Demand response (DR) is the process of balancing supply and demand by reducing demand to match supply. This is contrary to the conventional approach of assuming that demand is inelastic and supply has to meet demand. DR can be used to avoid high-priced electricity or to avoid going over the edge where there is no more supply at any price. She explained that a key point of DR is compensating consumers for what they did not do. Figure 5.1 shows where DR fits in with the other resources available to the electricity industry.

Goldberg showed an example of DR for an individual direct load program where the system operator had direct control over some of its customers’ air conditioners (Figure 5.2). In this example, she explained that the actual load of the day (the solid line) is compared with the estimated reference load baseline (the dashed line). As Figure 5.2 shows, the actual load is increasing over time until it reaches a

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

*Mathematical Sciences Research Challenges for the Next-Generation Electric Grid: Summary of a Workshop*. Washington, DC: The National Academies Press. doi: 10.17226/21808.

FIGURE 5.1 Demand response is shown in the available electricity resource set. SOURCE: Miriam Goldberg, DNV GL, presentation to the workshop; from CAISO (2012), copyright 2015 DNV GL.

FIGURE 5.2 Example of a demand response for an individual resource. SOURCE: Miriam Goldberg, DNV GL, presentation to the workshop; from PG&E (2009).

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

*Mathematical Sciences Research Challenges for the Next-Generation Electric Grid: Summary of a Workshop*. Washington, DC: The National Academies Press. doi: 10.17226/21808.

curtailment order or curtailment signal at 1:00 p.m., at which time the load drops down. The load begins to climb again after this initial drop. Eventually, this order or signal is released (dashed vertical line at 7:00 p.m.), and the air conditioner operates normally. Goldberg noted that there is then a payback or rebound period where the actual load is higher than the reference load, because in this case the house has warmed up and the air conditioner needs to work more in order to bring the house back down to its preferred temperature. Over time, the actual load returns to what it would have been without the DR.

The fundamental measurement and verification challenge is determining a true reference load baseline, according to Goldberg. Understanding the uncertainty of this reference load is important because the system is built on compensating consumers for what was not used. She explained that the resource delivered is the difference between the load that would have been used, which can only be estimated and not metered, and the load that was used and was metered. The “capacity”^{1} is the reduction that could be provided from the load that would otherwise be used. Goldberg noted that methods for calculating the reduction for financial settlement are negotiated in each jurisdiction and are often contentious. Different methods may be appropriate for different purposes. Exact “true” capacity cannot be known.

There are different kinds of measurement and verification uncertainties, according to Goldberg:

*Estimation/forecasting errors*are standard problems with standard solutions. Examples include estimating the load that would have occurred without the DR event (statistical estimation of an unobservable parameter) and estimating the load that will occur with and without a future DR event (statistical forecasting).*Policy choices and conventions*determine what is useful and practical. For example, baseline methods for financial settlement need to be reasonable and transparent, not a “best” estimate. Accuracy is a consideration in those choices, but there are various ways to describe a method’s accuracy.*Extrapolation*considers what the response will be to programs and conditions outside of current experience.

As Goldberg described it, various data are needed for measurement and verification purposes throughout the process. Before a DR event occurs, consumers have to enroll in the program, and suppliers need to determine operations and dispatch^{2} rules. During enrollment, individual capacity needs to be estimated. This

____________________

^{1} “Capacity” should be thought of as the capacity of the consumer to reduce consumption.

^{2} For this section, “dispatch” refers to the system operator imposing limits on the customer’s usage of power in response to some event.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

is often done by measuring peak load or capacity of the controlled units. However, Goldberg commented, it can be difficult to rate a DR resource with a time-varying load. During operations and dispatch, the available capacity describing what the combined assets can deliver is important. This is typically done by assessing enrollment capacity using audits and evaluating historical performance. It can be challenging to predict what could be delivered if called upon.

Once a control event occurs, Goldberg explained, a financial settlement and an evaluation of how well the event worked to curtail the load are needed. This information informs additional planning of how the system can be modified or improved. Settlement requires determining individual interval reductions by comparing the observed load with the agreed baseline. However, defining a baseline that is simple, transparent, and meaningful can be difficult, according to Goldberg. During the evaluation stage, the combined reductions are examined to see what combined assets were delivered and what will be delivered in the future under potentially different conditions. This examination compares observed aggregate load versus the baseline and evaluates the modeled load with and without the event. Some key considerations include estimating the load that did not occur and assessing the measurement accuracy. In planning for future combined reductions, Goldberg said, these data help inform what the model load and enrollment may be if conditions or rules change. An important part of the planning phase is assessing the uncertainty within the system.

Baselines can be computed in a variety of ways, Goldberg explained, but typically the approach involves computing average hour-by-hour kilowatt-hour usage over a set of recent business days. This may be done in a number of ways, and Goldberg mentioned averaging the last 10 business days; dropping the highest and lowest values of those 10 days and computing the average of the remaining 8 days; and looking at the 4 highest kilowatt-hour days among the 5 most recent days. This baseline can then be adjusted up or down (known as *day-of-flow adjustments*) to match the observed load in the hours just prior to the start of the event.

Goldberg identified other baseline computing approaches, including the following:

*Moving average:*a weighted average is calculated based on 90 percent of a previous baseline and 10 percent of the usage from the most recent day.*Regression model:*data on the day type, weather, daylight, and lags are all used as inputs.*Match day:*a prior similar non-event day for the same account is used as a comparison.

Goldberg noted that taking the average of the most recent days can be misleading because they are typically milder than the day when a control event occurred,

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

as shown in Figure 5.3. She said that shifting the load (either by an additive or a multiplicative scalar adjustment) to match at a particular point can improve the estimate in some ways, as shown in Figure 5.4. The accuracy of these shifted estimates can be assessed by looking at the error, which is the difference between the estimated baseline and the actual load. The bias and variability of the error in the system need to be assessed. Goldberg explained that bias here is the systematic error over the hours and days, and the variability is the difference in the magnitude of the error over the hours and days.

Baseline accuracy can be assessed only by comparing actual load with baseline load on non-event days, or for accounts that are not dispatched, or by comparing simulated load reductions from actual load with the calculated reduction from the baseline, according to Goldberg. This assumes similar behavior for non-event days or non-dispatched accounts. She believes that assumption is reasonable if two conditions hold: (1) event days are similar to non-event days, and (2) the accuracy, calculated from enrolled accounts or non-dispatched accounts, uses a large random subset of homogeneous enrolled residential accounts. However, the behavior of participants may differ depending on their participation in the program or on their anticipation of an event day (for example, precooling and rescheduling working

FIGURE 5.3 Illustration of baselines calculated by averaging. SOURCE: Miriam Goldberg, DNV GL, presentation to the workshop; copyright 2013 DNV GL.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

FIGURE 5.4 Additive and scalar adjustments to the 2 hours prior to curtailment. SOURCE: Miriam Goldberg, DNV GL, presentation to the workshop.

shifts). In addition, Goldberg noted that some loads are also highly variable and cannot be predicted well just from historical data.

There are two key challenges in assessing baseline accuracy, according to Goldberg. First, one needs to assess the most useful way to measure accuracy across accounts and hours, which includes computing the baseline error for an account over a specified time interval. Then, the relative error (the average hourly error divided by the average hourly load) is computed, which is useful for representing accuracy across loads of varying sizes over a population of customers and various time intervals. This can be computed over each account interval separately, or computed over longer time intervals. The median relative error across accounts and time intervals also needs to be computed. Goldberg noted that it is most useful to look at error in calculated reductions, not just in load. This requires a known, assumed, or simulated reduction quantity. She noted that if load reduction is 10 percent of load, a 10 percent baseline error is a 100 percent error in the delivered reduction.

Another consideration is how to rate a resource with time-varying load. This requires determining what is meant by DR capacity for a load with time- and weather-varying reductions. The enrolled capacity will tend to be based on peak load or reduction at peak load, which makes sense if events will mostly be called at

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

peak conditions. The NY ISO, she mentioned for example, computes an individual asset’s top 20 out of the system’s top 40 hours (by season), an individual asset’s peak load (also by season), and an individual asset’s top X of Y hours (by month).

Goldberg explained that to predict what load capacity could be delivered if needed, a program dispatch operator needs to know the available reduction at each point in time and then track whether load reduction is happening by looking at an asset’s current load level relative to where it should end up. To do this, the dispatcher needs to know any two of the following three types of information:

- The load going forward if there was no dispatch (the
*upper dispatch limit)*, - The reduction that will occur if called (
*commitment*), and - The load that will occur under full dispatch (the
*lower dispatch limit)*.

Under some market rules, Goldberg explained, the operator does not have to dispatch the full commitment and could instead ask for only part of it, ranging from the lower to the upper dispatch limit (known as the *dispatchable range*). Estimating what the load will be if no dispatch is called can require a continually adjusted baseline. At each interval, she said, the adjustment is updated to recent intervals if not dispatched. If multiple events can be dispatched in 1 day, the operator needs to determine the available capacity after release of the prior event. In principle, the same adjustment method can work that was used in the first event, Goldberg said, but this is under study for ISO New England. Specifically, she noted that the study is considering the following questions:

- How far before first dispatch should the adjustment window go?
- How long a span should the adjustment window include?
- How much downtime or recovery is needed after the first event?

Determining the real-time baseline answers the capacity question in principle because load reduction is committed, and the real-time baseline indicates whether the load is available to reduce that much over a potential event. However, Goldberg noted that actual performance may vary. Calculating available capacity based on the baseline and committed reduction assumes the same reduction will occur no matter when or in what condition the event is called. She said this is often not realistic, depending what the load reduction action is (e.g., weather-sensitive responses will not always provide the same reduction). In an ideal program, the load reduces by the amount dispatched (and not more) just as a generator supplies what it is told and not more. This is harder for DR participants unless their response is shutting down a fixed load.

Goldberg stated that the fundamental challenge for DR planning and operations is forecasting response. These forecasts involve determining the load reduction

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

possible from currently enrolled customers, specifically for each participating asset and for the aggregate of dispatched assets, at each point in time from notification through rebound as felt on the system. Equally important, she said, is estimating the number of non-dispatchable customers at a given time, taking into account dynamic rates and their price responsiveness. For future planning, forecasters must know how much responsive load there will be and how that load will respond (as a function of prices, weather, time, etc.).

Forecasting what will happen in response to a price, among other factors, Goldberg said, is commonly thought of as an elasticity problem. She noted that this is really a set of intersecting elasticity problems involving long-term investments, seasonal enrollment, monthly bid or obligation, day-ahead response, hour-ahead response, and minutes-ahead response. The choices and responses at each timescale affect options and responses at the next scale, and expectations for later time points affect decisions at earlier stages.

However, Goldberg said, demand tends to be inelastic and customers are non-responsive to short-term price fluctuation, mainly because of timing. Traders are buying and selling fixed blocks in long-term contracts, with small amounts of load-following supply, and most customers are not interested in being exposed to volatile prices. She said there are a few factors that can make customers more responsive to price, including moving to more volatile real-time prices, higher costs, automating DR, a high normal load, and discretionary/deferrable load urgency. Customer expectations of price volatility, costs, and load urgency affect prior decisions that in turn affect current price exposure and equipment capability.

Overall, Goldberg said, managing DR uncertainty involves improving measurement with better retrospective and forecasting models, accommodating response uncertainty in dispatch, and making participating loads more predictable.

She observed that the predictability for enrolled DR participants can be improved by shutting out the noise of customers with highly variable loads, by screening loads out of the program based on predictability criteria, and by requiring highly variable loads to give day-ahead predictions (and set penalties for over- or under-prediction). Adding more information can also help, Goldberg explained, by requiring highly variable loads to give day-ahead notice of major changes. It is also important to limit the potential for gaming the baseline, usually by limiting participants’ ability to control or predict when they will be dispatched and investigating load and bidding patterns that seem perverse. Operators can also help participants become more predictable by facilitating technologies that automate DR and offering retrocommissioning^{3} and (re)training.

____________________

^{3} “Retrocommissioning” is a systematic process for identifying and implementing operational and maintenance improvements.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

Goldberg said this is also an issue of improving predictability for all loads, often by incorporating supplemental customer information from other sources (such as clustering customers into industry types) and using pattern recognition to identify operating modes.

Goldberg concluded by summarizing the outstanding problems related to DR uncertainty and uncertainty reduction:

- Estimating elasticity (as interrelated response curves), including enrollment in variously configured program/product offers and determining response to prices/event dispatch when enrolled as functions of customer characteristics, calendar, hour, weather, and prior responses;
- Calculating capacity dynamically for time/weather-varying loads;
- Using pattern recognition to improve forecasts and back-casts;
- Projecting response trajectories through the duration of an event and after release with error bands;
- Relating true aggregate system reduction to the nominal reductions calculated for financial settlement, on a dynamic basis, with error bands; and
- Establishing baseline bias and variance as functions of customer characteristics, event day type, and event duration.

Overall, methods and results for DR measurement and verification affect and are affected by many aspects of program planning, design, and operations, and Goldberg emphasized that it is important that uncertainty be well understood.

MATHEMATICAL MODELS IN POWER MARKETS

Alexander Eydeland, Morgan Stanley

Alexander Eydeland discussed some mathematical models of power markets. In his presentation, two underlying assumptions were that a relatively liquid market is needed and that uncertainty is due exclusively to randomness of market prices. He explained that the objective is to acquire a commodity asset (such as a power plant, freight, oil/gas storage facilities) through a competitive auction. This requires both determining the appropriate asset price and developing a strategy to extract value from the asset. Commodity derivatives are investment tools for assessing a portfolio of financial options on assets with an associated strategy of how to extract value from the assets. This strategy is based on Black-Scholes theory, which Eydeland characterized as taking an investor’s assumed distribution of underlying prices and desired option payout and computing a formula and hedging strategy to allow the investor to lock in a value.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

Eydeland gave the example of a merchant power plant that has the option of running if the market price of power is higher than the cost of fuel plus variable operating costs. Net profit from this operating strategy is given by

Eydeland stated that operating this merchant power plant is financially equivalent to owning a portfolio of daily options on spreads between electricity and fuel (known as *spark spread options*).

Modeling the power plant commodity asset consists of three basic steps, according to Eydeland: (1) finding an appropriate process for price evolution, (2) defining the payout function, and (3) finding expected value (using methods such as Monte Carlo simulation, partial differential equations, and fast Fourier transforms). He noted that computing the payout function is often complicated. In the example of the power plant, technical considerations need to be included, such as the price of the emissions, ramp-up rates, and number of start-ups allowed per year for a given turbine. Eydeland said defining the appropriate stochastic process for price evolution is usually even more complicated.

For power prices, he said, a few key attributes can be modeled, such as mean reversion, spikes, high kurtosis, regime switching, and non-stationarity. The correlation between power prices and natural gas prices also has a unique structure that can be modeled as a joint distribution. (If the model does not capture this structure, it may misprice spread options.) Eydeland explained that this correlation can be modeled through a variety of approaches based on geometric Brownian motion. However, this alone does not capture the key behavior of power prices well. Models can then incorporate mean reversion and jumps (discontinuous behavior) to attempt to capture the necessary behavior, according to Eydeland. However, these additions can increase the number of variables beyond what can be reasonably solved and managed.

Eydeland explained that the dilemma is that a highly complicated model (including models with stochastic convenience yield, stochastic volatility, regime switching, multiple jump processes, and various term structures) is needed to capture this complex behavior, but such a model becomes unmanageable and useless. He described a hybrid model, called “bid stack,” that combines the stochastic and fundamental modeling of price formation. Prices are formed by a generator supplying bids to the auctioneer (the ISO), and the ISO puts the bids together to find the optimal way to dispatch the power generation in the region. The final price is the lowest price needed to meet the day’s demand, Eydeland noted. The price versus the demand follows a particular bid stack function that, if estimated, allows the distribution of the power prices to be constructed.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

Eydeland explained that the generation stack needs first to model the market fuel prices of key power sources and generation outages (usually done using government data with a standard Poisson process). Then the bid stack function is estimated by scaling the generation stack in the particular market to match the market data while preserving higher moments of price distribution (specifically, skewness and kurtosis). He said the demand can then be modeled as a function of temperature, including the evolution of the principal modes and of the daily perturbations, and used as an input for the generation stack function. The resulting output is a detailed estimate of power prices accounting for the complexities inherent in the system.

Eydeland concluded by noting that new challenges are multistack models and renewables. He also provided two additional references for additional information: Eydeland and Wolyniec (2003) and Eydeland and Geman (1999).

In a later breakout subgroup, a participant wondered how Black-Scholes models are applicable to electricity markets, particularly with respect to incorporating details of optimal power flow in determining the locational marginal price formation. Another participant asked what impact the increased penetration of renewables might have on the price formation process. A simple approach suggested would entail a simple shifting of the bid stack by zero-marginal cost renewables. A participant suggested that the ability to forecast price would then depend heavily on one’s ability to forecast renewable power output, which is a challenging problem.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.

**Suggested Citation:**"5 Uncertainty Quantification and Validation." National Academies of Sciences, Engineering, and Medicine. 2015.