Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 21
Assessment of Interseasonal to Interannual Climate Prediction and Predictability 2 Climate Prediction This part of the report begins by reviewing the concept of predictability, starting with a summary of the historical background for climate prediction. Lorenz’s work on weather prediction in the 1960s and 1970s is a foundation for present efforts. Progress in the 1980s extended prediction timescales, exploiting improved observational awareness of ENSO variability in the tropical Pacific and its associated teleconnections. Future improvements in prediction quality depend upon the ability to identify and understand patterns of variability and specific processes that operate on ISI timescales. Various processes in the atmosphere, ocean, and land offer sources of predictability; several are introduced in the following sections. Gaps in our present understanding of predictability are summarized to lay the foundation for discussion later in the report on how the future improvements are likely to be realized. In going forward, it will be necessary to assess the incremental skill gained from new sources of predictability. The methodologies to be used to quantitatively estimate prediction skill, validate models, and verify forecasts are discussed. THE CONCEPT OF PREDICTABILITY Lorenz in 1969 defined predictability as “a limit to the accuracy with which forecasting is possible” (Lorenz, 1969a). He later refined his view, providing two definitions of predictability (Lorenz, 2006): “intrinsic predictability—the extent to which the prediction is possible if an optimum procedure is used” and “practical predictability—the extent to which we ourselves are able to predict by the best-known procedures, either currently or in the foreseeable future.” The forecasting that interested Lorenz and others during the 1960s and 1970s, which focused on weather and the state of the mid-latitude troposphere, provided much of the framework regarding forecasting and predictability that remains applicable to longer-range forecasts of the climate system and is reviewed here. . Atmospheric Predictability Lorenz noted that practical predictability was a function of: (1) the physical system under investigation, (2) the available observations, and (3) the dynamical prediction models used to simulate the system. He noted in 2006 that the ability to predict could be limited by the lack of observations of the system and by the dynamical models’ shortcomings in their forward extrapolations. While estimates of the predictability of day-to-day weather have been made by investigating the physical system, analyzing observations, and experimenting with models (Table 2.1), no single approach provides a definitive and quantitative estimate of predictability.
OCR for page 22
Assessment of Interseasonal to Interannual Climate Prediction and Predictability TABLE 2.1 Historical methods for evaluating predictability and their advantages and disadvantages. Method and References Description Analysis Physical System: Analytic closure (Leith, 1971) Assuming that the atmosphere is governed by the laws of two-dimensional turbulence, a predictability limit can be estimated from the rate of error growth implied by the energy spectrum. Estimates are rough due to numerous assumptions. Assumptions are stringent (e.g., atmospheric flow is non-divergent and moist processes are not important for error). Difficult to extend to other aspects of real atmosphere. Model: (Lorenz, 1965; Tribbia and Baumhefner, 2004; Buizza, 1997; Kalnay, 2003) Using a dynamical model, experiments are designed to answer: How long is it expected to take for two random draws from the analysis distribution for this model and observing system to become practically indistinguishable from two random draws from the model’s climatological distribution? Predictability results are highly dependent on the quality of the model being used. Predictability is a function of the uncertainty in analyses used as model initial conditions. Observations: Observed Analogs (Lorenz, 1969a; Van den Dool, 1994; Van den Dool et al., 2003) The observed divergence in time of analogs (i.e., similar observed atmospheric states) provides an estimate of forecast divergence. Difficult to identify analogs and extrapolate the results to real atmosphere. Close analogs are not expected without a much longer observational record. The studies listed in Table 2.1 demonstrate that for practical purposes (i.e., using available atmospheric observations and dynamical models), the limit for making skillful forecasts of mid-latitude weather systems is estimated to be approximately two weeks5, largely due to the sensitivity of forecasts to the atmospheric initial conditions (see Box 2.1)6. However, their focus on weather and the state of the atmosphere excludes processes that are valuable for climate prediction. For instance, many factors external to the atmosphere were ignored, such as incoming solar radiation and the state of the ocean, land, and cryosphere. Single events, such as a volcanic eruption, that might influence predictability were not considered; nor were long-term trends in the climate system, such as global warming. In addition, the models were unable to replicate many features internal to the atmosphere, including tropical cyclones, the Quasi-Biennial Oscillation (QBO), the Madden Julian Oscillation (MJO), atmospheric tides, and low frequency atmospheric patterns of variability like the Arctic and Antarctic Oscillations. These additional features are important for the impacts that they may have on the estimates of weather predictability, as well as for their influence on predictability on longer climate timescales. 5 The limit also depends on the quantitative skill metric being used. 6 Model error also contributes to errors in weather prediction (e.g., Orrell et al., 2001).
OCR for page 23
Assessment of Interseasonal to Interannual Climate Prediction and Predictability BOX 2.1 WEATHER AND CLIMATE FORECASTS AND THE IMPORTANCE OF INITIAL CONDITIONS Forecasts are computed as “initial value” problems: they require realistic models and accurate initial conditions of the system being simulated in order to generate accurate forecasts. Lorenz (1965) showed that even with a perfect model and essentially perfect initial conditions, the fact that the atmosphere is chaotic7 causes forecasts to lose all predictive information after a finite time. He estimated the “limit of predictability” for weather as about two weeks, an estimate that still stands: it is generally considered not possible to make detailed weather predictions beyond two weeks based on atmospheric initialization alone. Lorenz’s discovery was initially only of academic interest since, at that time, there was little quality in operational forecasts beyond two days, but in recent decades forecast quality has improved, especially since the introduction of ensemble forecasting. Useful forecasts now extend to the range of 5 to 10 days (see Figure 2.1). FIGURE 2.1. Evolution of ECMWF forecast skill for varying lead times (3 days in blue; 5 days in red; 7 days in green; 10 days in yellow) as measured by 500-hPa height anomaly correlation. Top line corresponds to the Northern Hemisphere; bottom line corresponds to the Southern hemisphere. Large improvements have been made, including a reduction in the gap in accuracy between the hemispheres. SOURCE: courtesy of ECMWF, adapted from Simmons and Hollingsworth (2002). 7 Here, “chaotic” refers to a system that contains instabilities that grow with time.
OCR for page 24
Assessment of Interseasonal to Interannual Climate Prediction and Predictability The initial conditions for atmospheric forecasts are obtained through data assimilation, a way of combining short-range forecasts with observations to obtain an optimal estimate of the state of the atmosphere. Figure 2.2 shows the three factors on which the quality of the initial conditions depends: (1) good observations with good coverage, (2) a good model able to accurately reproduce the evolution of the atmosphere, and (3) an analysis scheme able to optimally combine the observations and the forecasts. The impressive improvement in 500-hPa geopotential height anomaly correlation in recent decades (Figure 2.1) has been due to improvements made in each of these three components. Since atmospheric predictability is highly dependent on the stability of the evolving atmosphere, ensemble forecasts made from slightly perturbed initial conditions have given forecasters an additional tool to estimate the reliability of the forecast. In other words, a minor error in an observation or in the model can lead to an abrupt loss of forecast quality if the atmospheric conditions are unstable. For climate prediction on ISI timescales, the initial conditions involve phenomena with much longer timescales than the dominant atmospheric instabilities. For example, the SST anomalies associated with an El Niño event need to be known when establishing the initial conditions. Essentially, the initial conditions extend beyond the atmosphere to include details on the states of the ocean and land surface. From these long-lived phenomena, predictability of atmospheric anomalies can theoretically be extended beyond approximately two weeks to at least a few seasons. FIGURE 2.2 Schematic for data assimilation for an analysis cycle. The diagram shows the three factors that affect the initial conditions: observations, a model, and an analysis scheme. Predictability of the Ocean and Atmosphere As better observations led to an improved understanding of the climate system in the 1970s and 1980s, predictions of the atmosphere beyond the limits of the “classical” predictability proliferated. Statistical forecast systems had already demonstrated that predictions for time averages of some mid-latitude atmospheric quantities could be made at well past two weeks (Charney and Shukla, 1981). Observations of ENSO made it clear that some aspects of the
OCR for page 25
Assessment of Interseasonal to Interannual Climate Prediction and Predictability tropical atmosphere could be predicted at longer lead-times as well. Observational, theoretical, and modeling studies (Horel and Wallace, 1981; Sarachik and Cane, 2010) demonstrated that there were relationships between variability observed in the tropical oceans and the variability of the extratropical atmosphere. It became clear that longer-range forecasts of atmospheric quantities could be made using predictions of the coupled ocean-atmosphere system. Although operational, extended forecasts continued to focus on surface temperature and precipitation over continents, the atmospheric initial conditions were no longer considered important for making these forecasts; atmospheric ISI prediction was now considered a boundary value problem (Lorenz, 1975; Chen and Van den Dool, 1997; Shukla, 1998; Chu, 1999). Boundary forcing, initially from the ocean but later from the land and cryosphere (Brankovic et al., 1994), was used as the source of predictive information. This was appropriate because coupled models of the atmosphere, ocean, and land surface were still in their infancy and were not competitive with statistical prediction models (Anderson et al., 1999). Given this context, researchers asked: if there exists a perfect prediction of ocean or land conditions, how well could the state of the mid-latitude atmosphere be predicted (Yang et al., 2004)? This question has been addressed observationally by estimating the signal-to-noise ratio. In this case the portion of the climate variance related to the lower boundary forcing is the signal, the portion of the climate variance related to atmospheric internal dynamics is the noise, and the ratio of the two represents one possible measure of predictability (e.g., Kang and Shukla, 2005). Such studies can lead to overly optimistic estimates of predictability because they assume that the boundary conditions are predicted perfectly. There is an additional problem with this boundary-forced approach. These estimates assume that feedbacks between the atmosphere and the ocean do not contribute to the predictability. However, coupling between the atmosphere and the ocean can also be important in the evolution of SST anomalies (Wang et al., 2004; Zheng et al., 2004; Wu and Kirtman, 2005; Wang et al,. 2005; Kumar et al., 2005; Fu et al., 2003, 2006; Woolnough et al., 2007). Because the boundary-forced approach ignores this atmosphere-ocean co-variability (or any other climate system component couplings), these boundary-forced predictability estimates are of limited use. Climate System Predictability The techniques for estimating predictability shown in Table 2.1 can be applied to the coupled prediction problem (e.g., Goswami and Shukla, 1989; Kirtman and Schopf, 1998). However, each method is still subject to limitations similar to those mentioned in Table 2.1. Given the complexity of the climate system, estimates based on analytical closure are somewhat intractable (i.e., how can error growth rates from a simple system of equations relate to the real climate system?); approaches based on observations are limited by the relatively short length of the observational record, combined with the difficulty in identifying controlled analogs for a particular state of the climate. Non-stationarity in the climate system further reduces the chance that observed analogs would become useful in the foreseeable future, if ever. Model-based estimates are thus the most practical, but are still limited by the ability to measure the initial conditions for the climate and the mathematical representation of the physical processes. As discussed in Chapter 1, most efforts to estimate prediction quality (or hindcast quality) are relatively recent, and involve analysis of numerous model-generated predictions for a similar
OCR for page 26
Assessment of Interseasonal to Interannual Climate Prediction and Predictability time period (Waliser, 2005; Waliser, 2006; Woolnaugh et al. 2007; Pegion and Kirtman, 2008; Kirtman and Pirani, 2008; Gottschalck et al. 2010). For example, Kirtman and Pirani (2008) reported on the WCRP Seasonal Prediction Workshop in Barcelona where the participants discussed validating and assessing the quality of seasonal predictions based on a number of international research projects on dynamical seasonal prediction (e.g., SMIP2/HFP, DEMETER, ENSEMBLES, APCC). This collection of international projects includes a variety of different experimental designs (i.e., coupled vs. uncoupled), different forecast periods, initial condition start dates, and levels of data availability. Despite these differences, there was an attempt to arrive at consensus regarding the current status of prediction quality. Several different deterministic and probabilistic skill metrics were proposed, and it was noted that no single metric is sufficiently comprehensive. This is particularly true in cases where forecasts are used for decision support. Nevertheless, the workshop report includes an evaluation of multi-model prediction for Nino3.4 SSTA, 2m-temperature and precipitation in 21 standard land regions (Giorgi and Francisco, 2000). While it was recognized that the various skill metrics used were incomplete8 and that there were difficulties related to the different experimental designs and protocols, the consensus was clear that multi-model skill scores were on average superior to any individual model (Kirtman and Pirani, 2008). Systematic efforts along the above lines for the intraseaosnal time scale have only recently begun with the development of an MJO forecast metric and a common approach to its application amongst a number of international forecast centers (Gottschalck et al. 2010) as well as the establishment of a multi-model MJO hindcast experiment (see www.ucar.edu/yotc/iso.html). SOURCES OF PREDICTABILITY Overview of Physical Foundations Climate reflects a complex combination of behaviors of many interconnected physical and (often chaotic) dynamical processes operating at a variety of time scales in the atmosphere, ocean, and land. Its complexity is manifested in the varied forms of weather and climate variability and phenomena, and in turn, in their fundamental (if unmeasurable) limits of predictability, as defined above. Yet, embedded in the climate system are sources of predictability that can be utilized. Three categories can be used to characterize these sources of weather and climate predictability: inertia, patterns of variability, and external forcing. The actual predictability associated with an individual phenomenon typically involves interaction among these categories. The first category is the “inertia” or “memory” of a climate variable when it is considered as a quantity stored in some reservoir of nonzero capacity, with fluxes (physical climate processes) that increase or decrease the amount of the variable within the reservoir over time, e.g., soil moisture near the land-atmosphere interface. Taking the top meter of soil as a control volume and the moisture within that volume as the climate variable of interest, the soil moisture increases with water infiltrated from the surface (rainfall or snowmelt), decreases with evaporation or transpiration, and changes further via within-soil fluxes of moisture through the sides and bottom of the volume. For a given soil moisture anomaly, the lifetime of the anomaly 8 The particular metrics used to evaluate prediction quality were the multi-model Brier Skill Score for 2m-temperature and rainfall and the Mean Square Skill Score for the Nino3.4 SSTA.
OCR for page 27
Assessment of Interseasonal to Interannual Climate Prediction and Predictability (and thus our ability to predict soil moisture with time) will depend on these fluxes relative to the size of the control volume. Soil moisture anomalies at meter depth have inherent time scales of weeks to months. As panel (a) of Figure 2.3 shows, soil moisture anomalies exist considerably longer than the precipitation events that cause them. Arguably, many variables related to the thermodynamic state of the climate system have some inertial memory that can be a source of predictability. Surface air temperature in a small regional control volume, for example, is a source of predictability that is very short given the efficiency of the processes (winds, radiation, surface turbulent fluxes, etc.) that affect it. If the air temperature at a given location is known at noon, its value at 12:05 PM that day can be predicted with a very high degree of certainty, whereas its predicted value days later is much more uncertain. In stark contrast, the inertial memory of ocean heat content can extend out to seasons and even years, depending on averaging depth. Examples of other variables with long memories include snowpack and trace gases (e.g., methane) stored in the soil or the ocean. The second category involves patterns of variability—not variables describing the state of the climate and their underlying inertia, but rather interactions (e.g., feedbacks) between variables in coupled systems. These modes of variability are typically composed of amplification and decay mechanisms that result in dynamically growing and receding (and in some cases oscillating) patterns with definable and predictable characteristics and lifetimes. With modes of variability, predictability does not result from the decay of an initial anomaly associated with fluxes into and out of a reservoir, as in the first category, but rather with the prediction of the next stage(s) in the life cycle of the dynamic mode based on its current state and the equations or empirical relationships that determine its subsequent evolution. In many examples related to inertia or memory within the climate system, the atmosphere plays a “passive” and dissipative role in the evolution of the underlying anomaly. On the other hand, for the patterns of variability or feedbacks discussed here, the atmosphere plays a more active role in amplifying or maintaining an anomaly associated with processes occurring in the ocean or on land. “Teleconnections” is a term used to describe certain patterns of variability, especially when they act over relatively large geographic distances. Teleconnections illustrate how interaction among the atmosphere, ocean, and land surface can “transmit” predictability in one region to another remote region. For example, during ENSO events, features of the planetary scale circulation (e.g., the strength and location of the mid-latitude jet stream) interact with anomalous convection in the tropical Pacific. These interactions can lead to anomalous temperature and precipitation patterns across the globe (panel b of figure 2.3). Thus, predictions of tropical Pacific sea surface temperature due to ENSO can be exploited to predict air temperature anomalies in some continental regions on the time scales of months to seasons. For air temperature, this teleconnection pattern offers enhanced predictability compared to memory alone, which would only be useful for minutes to hours. It should be noted that the predictability of teleconnection responses (in the above example, air temperature in a location outside of the tropical Pacific) will be lower than that of the source (in the above example, tropical Pacific SST) because of dynamical chaos that limits the transmission of predictability. The third category involves the response of climatic variables to external forcing, and it includes some obvious examples. Naturally, many Earth system variables respond in very predictable ways to diurnal and annual cycles of solar forcing and even to the much longer cycles associated with orbital variations. Other examples of external forcing variations that can provide
OCR for page 28
Assessment of Interseasonal to Interannual Climate Prediction and Predictability predictability include human impacts—long-term changes in atmospheric aerosols, greenhouse gas concentrations, and land use change. FIGURE 2.3 (a) Example of inertial memory. A positive soil moisture anomaly at the Atmospheric Radiation Measurement/Cloud and Radiation Testbed (ARM/CART) site in Oklahoma decreases with a time scale much longer than the atmospheric events that caused it. SOURCE: Greg Walker, personal communication. Soil moisture time scales measured at other sites are even longer than this (Vinnikov and Yeserkepova, 1991). (b) Example of teleconnections. Map of El Niño impacts on global climate, for December–February. SOURCE: Adapted from CPC/NCEP/NOAA (c) Example of external forcing. Global mean temperature anomaly prior (negative x-axis values) and following (positive x-axis values) volcanic eruptions, averaged for 6 events. Substantial cooling is observed for nearly 2 years following the date of eruption. The dark line has the ENSO events removed; the light line does not. SOURCE: Robock and Mao (1995). Examples of Predictability Sources Figure 2.4 provides a quick glimpse of various predictability sources in terms of their inherent time scales. This view, based on time scale, is an alternative or complement to the three-category framework (inertia, patterns of variability, and external forcing). Provided in the present section is a broad overview of predictability sources relevant to ISI time scales. Some of the examples will be discussed more comprehensively in later chapters. It is important to realize that the timescales associated with sources of predictability often arise from a combination of inertia and feedback processes. Also, it should be noted that the
OCR for page 29
Assessment of Interseasonal to Interannual Climate Prediction and Predictability FIGURE 2.4 Processes that act as sources of ISI climate predictability extend over a wide range of timescales, and involve interactions among the atmosphere, ocean, and land. CCEW: convectively coupled equatorial waves (in the atmosphere); TIW: tropical instability wave (in the ocean); MJO/MISV: Madden-Julian Oscillation/Monsoon intraseasonal variability; NAM: Northern Hemisphere annular mode; SAM: Southern Hemisphere annular mode; AO: Arctic oscillation; NAO: North Atlantic oscillation; QBO: quasi-biennial oscillation, IOD/ZM: Indian Ocean dipole/zonal mode; AMOC: Atlantic meridional overturning circulation. For the y-axis, “A” indicates “atmosphere;” “L” indicates “land;” “I” indicates “ice;” and, “O” indicates “ocean.” timescales in Figure 2.4 indicate the timescale of the variability associated with a particular process. This is distinct from the timescale associated with a prediction. For example, ENSO exhibits variability on the scale of years; however, information about the state of ENSO can be useful for making ISI predictions on weekly, monthly, and seasonal time scales. As discussed in Chapter 1 (see Committee Approach to Predictability), it can be difficult to quantify the intrinsic predictability associated with any of the individual processes depicted in Figure 2.4 (i.e., for what lead-time is an ENSO prediction viable? And to what extent would that prediction contribute to skill for predicting temperature or precipitation in a particular region?). As mentioned earlier (see Climate System Predictability), prediction experiments form the foundation of our understanding. However, these experiments are rarely definitive in quantifying such limits of predictability. For example, for ENSO, there are three competing theories (inherently nonlinear; periodic, forced by weather noise; and the damped oscillator) that underlie various models of ENSO, each with its own estimate of predictability (see Kirtman et al., 2005 for a detailed discussion). At this time we are unable to resolve which theory is correct
OCR for page 30
Assessment of Interseasonal to Interannual Climate Prediction and Predictability since all yield results that are arguably “consistent” with observational estimates. To further complicate the understanding of the limits of predictability for ENSO, there are important interactions with other sources of predictability that may enhance or inhibit the predictability associated with ENSO (see Chapter 4). ENSO is just one example of how understanding what “sets” the predictability associated with a particular process is a critical challenge for the ISI prediction community. The challenge of improving forecast quality necessitates enhancing the individual building blocks (see Chapter 3) that make up our predictions systems, but it also requires a deeper understanding of the physical mechanisms and processes that are the sources of predictability. Inertia Upper ocean heat content On seasonal-to-interannual time scales upper ocean heat content is a known source of predictability. The ocean can store a tremendous amount of heat. The heat capacity of 1 m3 of seawater is 4.2 x 106 joules m-3 K-1 or 3,500 times that of air and 1.8 times that of granite. Sunlight penetrates the upper ocean, and much of the energy associated with sunlight can be absorbed directly by the top few meters of the ocean. Mixing processes further distribute heat through the surface mixed layer, which can be tens to hundreds of meters thick. As Gill (1982) points out, with the difference in heat capacity and density, the upper 2.5 m of the ocean can, when cooling 1ºC, heat the entire column of air above it that same 1ºC. The ocean can also transport warm water from one location to another, so that warm tropical water is carried by the Gulf Stream off New England, where in winter during a cold-air outbreak, the ocean can heat the atmosphere at up to 1200 W m-2, a heating rate not that different from the solar constant. Stewart (2005) shows that a 100 m deep ocean mixed layer heated 10ºC seasonally stores 100 times more heat than 1 m thick layer of rock heated that same 10ºC; as a result the release of the heat from the ocean mixed layer can have a large impact on the atmosphere. Thus, the atmosphere acts as a “receiver” of any anomalies that have been stored in the ocean, and predictions of the evolution of air temperature over the ocean can be improved by consideration of the ocean state. Soil moisture Soil moisture memory spans intraseasonal time scales. Memory in soil moisture is translated to the atmosphere through the impact of soil moisture on the surface energy budget, mainly through its impact on evaporation. Soil moisture initialization in forecast systems is known to affect the evolution of forecasted precipitation and air temperature in certain areas during certain times of the year on intraseasonal time scales (e.g., Koster et al., 2010). Model studies (Fischer et al., 2007) suggest that the European heat wave of summer 2003 was exacerbated by dry soil moisture anomalies in the previous spring. Snow cover Snow acts to raise surface albedo and decouple the atmosphere from warmer underlying soil. Large snowpack anomalies during winter also imply large surface runoff and soil moisture
OCR for page 31
Assessment of Interseasonal to Interannual Climate Prediction and Predictability anomalies during and following the snowmelt season, anomalies that are of direct relevance to water resources management and that in turn could feed back on the atmosphere, potentially providing some predictability at the seasonal time scale. The impact of October Eurasian snow cover on atmospheric dynamics may improve the prediction quality of northern hemisphere wintertime temperature forecasts (Cohen and Fletcher, 2007). The autumn Siberian snow cover anomalies can be used for prediction of the East Asian winter monsoon strength (Jhun and Lee, 2004; Wang et al., 2009). Vegetation Vegetation structure and health respond slowly to climate anomalies, and anomalous vegetation properties may persist for some time (months to perhaps years) after the long-term climate anomaly that spawned them subsides. Vegetation properties such as species type, fractional cover, and leaf area index help control evaporation, radiation exchange, and momentum exchange at the land surface; thus, long-term memory in vegetation anomalies could be translated into the larger Earth system (e.g. Zeng et al., 1999). Water table variations Water table properties vary on much longer timescales (years or more for deep water tables) than surface soil moisture. Some useful predictability may stem from these variations, though the investigation of the connection of these variations to the overall climate system is still in its infancy, in part due to a paucity of relevant observations in time and space. Land heat content Thermal energy stored in land is released by molecular diffusion and thus over all time scales, but with a rate of release that decreases with the square root of the time scale. In practice, there is strong diurnal storage (up to 100 W m-2) of heat energy and a still significant amount over the annual cycle (up to 5 W m-2). This is particularly strong in relatively unvegetated regions where solar radiation is absorbed mostly by the soil, since vegetation has much less thermal inertia, or in higher latitudes where soil water seasonally freezes. Polar sea ice Sea ice is an active component of the climate system and is highly coupled with the atmosphere and ocean at time scales ranging from synoptic to decadal. When large anomalies are established in sea ice, they tend to persist due to inertial memory and to positive feedback in the atmosphere-ocean-sea ice system. These characteristics suggest that some aspects of sea ice may be predictable on ISI seasonal time scales. In the Southern Hemisphere, sea ice concentration anomalies can be predicted statistically by a linear Markov model on seasonal time scales (Chen and Yuan, 2004). The best cross-validated skill is at the large climate action centers in the southeast Pacific and Weddell Sea, reaching 0.5 correlation with observed estimates even at 12-month lead time, which is comparable to or even better than that for ENSO prediction. We have less understanding of how well sea ice impacts the predictability of the overlying atmosphere.
OCR for page 43
Assessment of Interseasonal to Interannual Climate Prediction and Predictability obvious prerequisite to a good quality forecast, the section on model validation is presented first. Model validation may incorporate some of the same metrics as used in forecast verification, but it involves more because the model contains the physical processes in addition to the desired prediction fields. Model validation is typically performed prior to using a model for real-time forecasting in order to design statistical corrections for systematic model biases and to better understand the model’s variability characteristics, such as accuracy of regional precipitation anomalies under El Niño conditions or the strength and timescale of the MJO. Model Validation Comparing a model environment to the observed conditions is a difficult task. Starting with the observed climate, there are numerous scales of variability that cannot be resolved, much less measured at regular intervals. As a result, any comparison would use an incomplete picture of the atmosphere or ocean. The “incompleteness” of the available observations constitutes sampling error. On a practical level, since measurement systems are not homogeneous in space and time, scientists select those variables that are considered most important. The problems become more complicated if a set of observations and model predictions are not available or valid over a common period. No numerical model is perfect, hence model errors are generated. Therefore, the observations and model predictions only can be collected to form two incomplete and imperfect probability density functions (PDFs), which provide a basis for comparison. An example of several PDFs associated with predictions of SST in the tropical Pacific Ocean is shown in Figure 2.8. The initial prediction is shown in green. Relative to this PDF, the subsequent PDFs (red and blue) exhibit higher probabilities for warmer temperatures. This shift to warmer temperatures was in fact reflected by the verification: the observed temperature anomaly was just below 0.8ºC, in between the peaks of the two later PDFs. The standard deviation of the three PDFs is a function of lead-time, although the dependence is perhaps surprisingly small here. The relatively small dependence reflects the relatively large set of available tools and the fact the forecast uncertainty with these tools is large. A realistic model strives to capture the full variability of the climate system. In particular, such a model needs to capture the full PDF and the temporal and spatial correlations of the observations, even if the forecast information to be disseminated is only a summarized version of that PDF. Additionally, PDFs allow for identification of multimodal distributions, whereas summary statistics (e.g., means, variances, skewnesses) cannot. Several goodness-of-fit tests exist that can check for a significant agreement between the observed and simulated PDFs. When such a test is not practical, the mean and variance should be compared between the observations and model, as a minimum, and the skewness, if possible. Skewness differences can point to processes not being captured, as well as to nonlinearities. Any statistical analysis of these PDFs, particularly ones that attempt to assess skill or significance, will hinge on specific assumptions of the tests applied. A sufficiently large sample size will show the models’ depiction of the atmosphere or ocean to differ from that of the observations. For those interested in comparing models to the observed world, this leaves us at an interesting juncture, one that may not have an answer without simplification (Oreskes et al., 1994). Can an experiment be designed to answer the question, “How good is a model or prediction?” Since we do not understand all of the processes and interactions that would lead to a perfect prediction, a strict validation is not possible. Additionally, it is possible that a model prediction can verify correctly
OCR for page 44
Assessment of Interseasonal to Interannual Climate Prediction and Predictability FIGURE 2.8 Examples of probabilistic predictions for Nino3.4 SST anomaly, represented by probability density functions (PDFs). The green curve is the prediction with the longest lead time (9 months), followed by the red (6 months) and blue (3 months) curves. With shorter lead times, the PDF for the prediction shifts to progressively warmer temperatures. The observed value for Nino3.4 SST anomaly is indicated by the vertical black line (0.78ºC). SOURCE: International Research Institute for Climate and Society (IRI). for the incorrect reason. Although a perfect validation is not possible, obtaining a useful comparison is possible if one recognizes the level of uncertainty associated with the observations and model. Moreover, constructing specific hypothesis tests is a viable alternative. One might pose the question as “what aspect of the distribution of the observed atmosphere matches that of the atmosphere simulated by numerical models?” Statistical Techniques for Identifying and Predicting Modes of Variability The climate system is characterized by recurrent patterns of variability, sometimes referred to as modes of variability, which include ENSO, NAO, etc. Often, the identification of modes linking remote locations in the atmosphere or the ocean-atmosphere is useful for medium-to long-range prediction (Reynolds et al., 1996; Saravanan et al., 2000). Numerous methodologies have been applied to identify such modes, ranging from linear correlation to multivariate eigentechniques (Montroy et al., 1998) and nonlinear methods (Lu et al., 2009; Richman and Adrianto, 2010). Definitions of these techniques may be found in Wilks (2006) and a summary of their use for mode identification is contained in Appendix A. Often, as the time scale increases, the nonlinear contribution to the modes tends to be filtered. However,
OCR for page 45
Assessment of Interseasonal to Interannual Climate Prediction and Predictability Athanasiadis and Ambaum (2009) note that the maintenance and evolution of low frequency variability arise from inherently nonlinear processes, such as transient eddies, on the intraseasonal time scale. This suggests that linear techniques may not fully capture the predictability associated with the modes, and the use of nonlinear techniques needs to be explored (e.g., Kucharski et al., 2010). Merits of Nonlinear Techniques To date, nearly all mode identification has been limited to linear analyses. In fact, we have defined our concept of modes through linear correlations and empirical orthogonal functions (EOFs)/principal component analysis (PCA) as those were the techniques that were computationally feasible at the time. Recently, nonlinear mode identification has begun to emerge as efficient nonlinear classification techniques have developed and as computational power has increased. To assess the degree of nonlinearity, the skill of nonlinear techniques can be compared to that derived from traditional linear methods (Tang et al., 2000). Forecasters can investigate if extracting the linear part of the signal is sufficient for prediction. On the intraseasonal time scale, when monsoon variability has been probed by a nonlinear neural network technique (Cavazos et al., 2002), a picture emerges with nonlinear modes related to the nonlinear dynamics embedded in the observed systems (Cavazos et al., 2002) and model physics (Krasnopolsky et al., 2005). Nonlinear counterparts to PCA, such as neural network PCA, have been shown to identify the nonlinear part of the ENSO structure (Monahan and Dai, 2004). By using a nonlinear dimension reduction method that draws on the thermocline structure to predict the onset of ENSO events, Lima et al. (2009) have shown increased skill at longer lead times in when compared to traditional linear techniques, such as EOF and canonical correlation analysis (CCA). Techniques have also been applied to cloud classification (Lee et al., 2004), wind storm modeling (Mercer et al., 2008) and classification of tornado outbreaks (Mercer et al., 2009). Some nonlinear techniques, such as neural networks, are sensitive to noisy data and exhibit a propensity to overfit the data that they are trained on (Manzato, 2005), which can limit their utility in forecasting. Careful quality control of data is essential prior to the application of such methods. To assess the signal that is shared between the training and testing data, some form of cross-validation is typically required (Michaelson, 1987). Techniques include various forms of bootstrapping (Efron and Tibshirani, 1993), permutation tests (Mielke et al., 1981), jackknifing (Jarvis and Stuart, 2001) and n-fold cross validation (Cannon et al., 2002). Kernel techniques, such as support vector machines, kernel principal components (Richman and Adrianto, 2010), and maximum variance unfolding, avoid the problem of finding a local minimum and overfitting. Kernel techniques have a high potential for mode identification where linear modes provide ambiguous separability (e.g., the overlapping patterns of the Arctic Oscillation and the North Atlantic Oscillation). Forecast Verification Finley’s tornado forecasts were more skillful than random forecasts, according to the metric he was using, which tabulated the percentage of correct forecasts. It credited forecasts of ‘no tornado’ on days with no tornadoes, and tornadoes were present on less than 2% of the days.
OCR for page 46
Assessment of Interseasonal to Interannual Climate Prediction and Predictability It turns out that if he had always predicted “no tornado” his skill would have been even greater (Jolliffe and Stephenson, 2003). This example illustrates the value in considering what aspect of forecast quality a metric should measure, and the baseline against which it is assessed. The particular assessment of forecast quality often depends on what characteristics of the forecast are of greatest interest to those who would use the information. No one verification measure can capture all aspects of forecast quality. Some measures are complementary, while others may provide redundant information. This section outlines several recommended (WMO SVS-LRF, 2002) and commonly-used metrics for verifying forecasts, and which aspects of forecast quality they address (see Jolliffe and Stephenson, 2003). WMO’s Standard Verification System (SVS) for Long Range Forecasts (LRF) (2002) outlines specifications for long-range (monthly to seasonal) forecast evaluation and for exchange of verification scores. The SVS-LRF provides recommendations on scores for both deterministic and probabilistic forecasts. For deterministic forecasts, the recommended metrics are the mean square skill score and the relative operating characteristics (ROC; curve and area under the curve; Mason and Graham, 1999). For categorical forecasts the recommended metric is the Gerrity Score (Gerrity, 1992). For probabilistic forecasts, the recommended metrics are the ROC and reliability diagrams. While these metrics are the main ones advocated by the WMO, several others are in regular use by modeling and prediction centers, and still others are being promoted as potentially more interpretable, at least for forecast users. Below, a variety of metrics are discussed and evaluated. Deterministic Measures Correlation addresses the question: to what extent are the forecasts varying coherently with the observed variability? Correlation assessments are typically done with anomalies (Figure 2.9a), or deviations from the mean, and can be applied to spatial patterns of variability (pattern correlations) or to time series of variability (temporal correlations). The agreement of co-variability does not indicate if the forecast values are of the right magnitude, and so it is not strictly a measure of accuracy in the forecast. The mean squared skill score (MSSS) addresses the question: How large are the typical errors in the forecast relative to those implied by the baseline? The baseline could be climatology, for example, assuming the next season’s temperature will be that of the average value from the previous 30 years. The MSSS is related to the mean squared error (MSE) and summarizes several contributions to forecast quality, namely correlation, bias, and variance error. The root mean squared error (RMSE; equal to the square root of MSE) is much more widely used than either MSE or MSSS. For a predicted variable whose magnitude is of particular interest, such as the SST index of ENSO, the RMSE may be a preferred metric of forecast quality given its straightforward interpretation. However, RMSE alone is of limited information to the forecast community that wishes to identify the source of the error (Figure 2.9b). The relative operating characteristics (ROC) addresses the question: can the forecasts discriminate an event from a non-event? The ROC curve effectively plots the hit rate, which is the ratio of correct forecasts to the number of times an event occurred, against the false alarm
OCR for page 47
Assessment of Interseasonal to Interannual Climate Prediction and Predictability FIGURE 2.9 An example of a multi-model ensemble (MME) outperforming individual models in forecasting. (a) Anomaly correlation; the red line (MME) is above the individual models (colored lines), demonstrating that the pattern of anomalous temperature from the ensemble is a closer match to observations. (b) RMSE of NINO3.4; the red line is below the individual models, demonstrating that the magnitude of the errors associated with the ensemble is smaller. Black represents a persistence forecast. Names of the individual coupled models shown in the legend. SOURCE: Figure 7, Jin et al. (2008) rate (probability of false detection), which is the ratio of false alarms to total number of non-occurrences. Therefore, one can assess the rate at which the forecast system correctly predicts the occurrence of a specific event (e.g. “above-normal temperature,” El Niño conditions, etc.), relative to the rate at which one predicts the occurrence of an event incorrectly (Figure 2.10). If the forecast system has no skill, then the hit rate and false alarm rates are similar and the curve lies along the diagonal in the graph (area = 0.5). Positive forecast skill exists when the curve lies above the diagonal (0.5 < area <= 1.0) and the skill can be measured by the “area under the ROC curve”.
OCR for page 48
Assessment of Interseasonal to Interannual Climate Prediction and Predictability FIGURE 2.10 An example of a ROC curve that plots Hit Rates vs. False-Alarm Rates. In this case, the Hit Rate is the proportion of rainfall events (either above-normal, solid line, or below-normal, dotted-line) that were forecasted correctly; the False-Alarm Rate is the proportion of non-events (i.e., incidences of near normal rainfall) for which an event was forecasted. Since the curves are above the gray line, the forecast is considered skillful using this metric. The forecasts correspond to rainfall during September through November in a region in Africa during for the period 1950–1994 The areas beneath the curves, A, are indicated also. SOURCE: Figure 2, Mason and Graham (1999). Probabilistic Measures The key aspect of probabilistic forecasts is that they proffer quantitative uncertainty associated with the forecast. Thus, if a forecast includes uncertainty, it is important to assess the meaningfulness of that uncertainty; probabilistic forecasts need to be assessed probabilistically. Providing deterministic metrics as well, such as correlation or hit rate of the most likely outcome, may give additional information of use to decision makers, but provided alone, deterministic measures undermine the richness of the forecast information. For example, a deterministic measure such as a hit score based on collapsing the probabilistic forecasts to a deterministic forecast for the category with the largest probability, for purposes of verification, cannot then distinguish between a forecast of 100% likelihood and one of 40% likelihood of above-normal (e.g. for a 3-category system, with climatologically equal odds). However, the reaction of decision makers to such differing confidence in the predicted outcome would certainly be much different. An important aspect of a quantitative probabilistic measure is that it is equitable. The term equitable means that a forecaster is not penalized for making a forecast that has a low climatological probability (e.g., forecasting a below normal temperature when the climatological probability of a below normal temperature is less than 10 percent). The potential value of probabilistic assessment of forecasts is large for the model development, forecasting, and decision making communities. While users of the forecast information may stress the desire for “accurate” information, the climate system is inherently probabilistic. The most likely outcome, or equivalently the probabilistic median or the deterministic forecast, may give a general sense of expectations for the seasonal climate, but that
OCR for page 49
Assessment of Interseasonal to Interannual Climate Prediction and Predictability information needs to be accompanied by an estimate of the uncertainty. Commercial decisions are often made, not on the basis of events which are likely to occur, but on the basis of events which are unlikely to occur, but which if they did occur, would involve serious financial loss (Palmer, 2002). The Heidke skill score (HSS), which is actually appropriate to binary forecasts rather than probabilistic forecasts, has been applied in the context of probabilistic categorical forecasts where they have been collapsed into binary categorical forecasts by retaining the category with highest probability. The HSS can be interpreted to addresses the question: did the forecast indicate the correct shift in the probability distribution more often than would be expected by chance? This score may be seen as desirable to some because it is convenient and easily interpreted (Livezey and Timofeyeva, 2008), indicating how often the forecast is “correct” or not (Figure 2.11). However, if it is applied to a probabilistic forecast the HSS degrades the information content as described above. The Heidke Skill Score is considered biased and may not be equitable. Jolliffe and Stephenson (2003) claim it is equitable for applications involving binary predictions (e.g., yes or no; event or non-event). Wilks (2006) claims it is not equitable for higher-order designs, since the correct forecasts of less likely events do not properly receive more weight (personal communication, Wilks, 2009). Thus, the forecaster may be discouraged from forecasting rare events on the basis of their low climatological probability. The reason for the bias in the Heidke skill is that the reference hit rate in the denominator is not constrained to be unbiased. This means the imagined random reference forecasts in the denominator have a marginal distribution that is not necessarily equal to that of the sample climatology. Peirce Skill (Wilks, 2006) is unbiased and can be substituted for Heidke skill. The probabilistic ROC is a variant of the ROC described previously that considers the hit rates and false alarm rates for events forecast at varying levels of probabilistic confidence. The Brier skill score (BSS) is a summary score of forecast quality that encapsulates both reliability and resolution measures of forecast quality. Reliability, discussed more below, addresses the question: to what extent do the probabilities mean what they say? Resolution addresses the question: can the probabilistic forecasts discern changes in the frequency of observed events relative to the underlying climatological distribution? An example of good forecast resolution would be that when forecasts were issued with high probability for an El Niño event, El Niño events were much more likely to happen than would be estimated from their observed frequency over all years. A reliability diagram shows the complete joint distribution of forecasts and observations for a probabilistic forecast of an event or forecast category (such as the above-normal tercile) (Figure 2.12). They indicate to what degree the probabilities assigned to an event are representative of the likely occurrence of that event. In a reliable forecast system, the probability assigned to a particular outcome should be the frequency with which—given the same forecast—that outcome should be observed. The information supplied by reliability diagrams includes calibration, or what is observed given a specific forecast (e.g., under and overforecasting), as well as resolution and refinement which is the frequency distribution of each of the possible forecasts giving information on the degree of aggregate forecaster confidence (small inset graph in Figure 2.12). Reliability diagrams can further indicate whether there are systematic biases in the forecasts, such as not predicting enough occurrences of above-normal temperatures. Such probabilistic verification, as ROC scores or reliability diagrams, also can be useful for estimating event-specific prediction skill, for example if El Niño events were better predicted than La Niña
OCR for page 50
Assessment of Interseasonal to Interannual Climate Prediction and Predictability FIGURE 2.11 Seasonal differences in forecast skill (contours) for temperature and how frequently (shading) forecasts differ from climatological odds (i.e., equal chances of normal, above-normal, and below-normal). Blue (tan) corresponds to areas where forecasts are often similar to (often different from) the climatological odds. The skill metric is a Heidke skill score, and is calculated by including only those forecasts that differ from climatological odds. Areas with high-valued contours indicate where deviations from climatology have frequently been forecasted correctly. The forecasts are from CPC and are valid ½ month from issuance. SOURCE: Figure14, O’Lenic et al. (2008).
OCR for page 51
Assessment of Interseasonal to Interannual Climate Prediction and Predictability FIGURE 2.12 An example of a reliability diagram, which indicates the skill of probabilistic forecasts. The diagram compares the forecasted probability of an event (in this case, above-normal winter rainfall in North America) to its observed frequency. A perfect forecast is represented by the dashed line, a horizontal line represents a forecast identical to climatology, and sloped lines are potentially skillful. The blue and red lines correspond to individual CGCMs and AGCMs, respectively, and are more horizontal than the black line, which represents the mean of these models. While the mean of the models is more reliable than any of the individual models, it tends to be underconfident for rare events (the black line lies above the perfect forecast line for low-probability events). Typically, a histogram accompanies a reliability diagram (inset), indicating the number of times that forecasts of various confidence levels were issued. SOURCE: Adapted from Goddard and Hoerling (2006). events or if drought conditions were better predicted than very wet seasons. A distinction in prediction skill between the cases of high and low variability calls for further examination of the physical causes of the discrepancy, and whether it is inherent to the climate system dynamics or a shortcoming of the model(s). Impacts of Non-Stationarity on Assessment of Skill This section provides consideration of forecast verification in the context of a changing background climate. Many measures of prediction skill are sensitive to how much the prediction deviates from climatology; therefore, the assessment of seasonal predictions can be influenced by both changes in the drivers of climate predictability as well as trends or other slowly varying changes in the background state. ENSO exerts the greatest influence on seasonal-to-interannual climate variability globally (e.g. Glantz, 1996). As a result, climate predictions made during ENSO events yield much higher skill than those made during ENSO-neutral conditions (e.g. Goddard and Dilley, 2005; Livezey and Timofeyeva, 2008). Although an ENSO event typically occurs every 3–7 years, decadal modulation of the frequency and intensity of ENSO events is evident over the observational record of the 20th century (Zhang et al., 1997) and over the last millennium based on proxy coral data (Cobb et al., 2003). Therefore, there will be periods with higher prediction quality than others merely because there were more or stronger ENSO events during that period. Higher prediction skill will also appear in many metrics (e.g. correlation, Heidke skill score) when the background mean state climate is non-stationary, i.e. presence of trends; the non-stationarity could be due to anthropogenic climate change or natural variability
OCR for page 52
Assessment of Interseasonal to Interannual Climate Prediction and Predictability FIGURE 2.13 Progress in the seasonal forecast skill of the ECMWF operational system during the last decade. The solid bar shows the relative reduction in mean absolute error of forecast of SST in the Eastern Pacific (NINO3). The brown-striped bar shows the contribution from the ocean initialization, and the white-striped bar is the contribution from model improvement. SOURCE: Balmaseda et al. 2009.
OCR for page 53
Assessment of Interseasonal to Interannual Climate Prediction and Predictability on long multi-decadal timescales. Climate predictions are typically communicated as deviations from “climatology,” or the background mean-state. If the mean-state is changing over time, the magnitude of the seasonal deviation will depend on the period used to define the climatology. Equivalently, predictions of deviations in the same direction as the “trend” can be credited with relatively high quality that is derived more from the slowly evolving trend than the interannual variability. For example, under anthropogenic climate change, the temperatures over most land areas are increasing relative to the mean state, say 1971–2000 (Trenberth et al., 2007). That does not necessarily mean that each year will be warmer than the year preceding it. However, predicting temperatures to be “above-normal,” will appear skillful by many measures because temperatures in this decade are very likely to be warmer than those of 30 years prior. A relevant question is then: can the forecast system discriminate between conditions in a pair of forecasts more often than not? For example, if year X is observed to be warmer than year Y, was that predicted to be so? Discrimination tests of forecast-observation pairs of cases addressing this type of question can be applied to deterministic or probabilistic forecasts. A generalization of such discrimination tests is outlined in Mason and Weigel (2009), and in many cases the metric becomes equivalent to those described above, such as generalized ROC areas for tercile probabilistic forecasts. CHALLENGES TO IMPROVING PREDICTION SKILL This chapter has provided the historical perspective on climate prediction, pointed to where there are opportunities to improve prediction quality by improving our understanding and representation in models of sources of predictability, and reviewed the methods available to quantify skill. From the 1980s to the 1990s, seasonal prediction quality improved dramatically, but then did not improve further (Kirtman and Pirani, 2008, 2009). The challenges in going forward are not only to determine where to gain further improvements but also to assess and understand the reasons for any incremental gains in prediction quality that have occurred. In the following section we examine the building blocks of intraseasonal to interannual forecasting. Improvements may stem from better observations, better models, and improved assimilation. Recent analyses demonstrate how improvements in these components of forecast systems are the source for improvements in forecast quality (Stockdale et al., 2010; Balmaseda et al, 2009; Saha et al., 2006; Fig. 2.13), and thus predictability. At the same time improvements may result from changes in the way in which the community works. Kirtman and Pirani’s (2008, 2009) summary of the first World Climate Research Program Workshop on Seasonal Prediction indicates that that workshop recommended adoption of best practices in seasonal forecasting, including the adoption of common approaches to the production, use, and assessment of seasonal forecasts. Thus, the challenges to improving intraseasonal to interannual prediction skill lie not only in improvements of the building blocks but also in how the community works together. Experimental modeling and examination of the incremental skill to be gained from new sources of predictability are needed. The three case studies provide examples of physical processes being examined as sources of predictability. A further challenge is to develop the community framework to nurture ongoing improvements to dynamical models.