The scientific and programmatic challenges identified in our discussion of the basic building blocks of ISI prediction systems (i.e., Chapter 3) and the case studies of Chapter 4 have common themes that naturally lead to a discussion of “Best Practices.” Essentially, Best Practices aim to answer the following:
How can we improve prediction systems and the provision of forecasts?
Given that both qualitative and quantitative improvements in seasonal forecasts are possible, how do we begin to map a path forward?
“Best Practices” is an optimum process, assessing forecast quality and enabling productive interactions among the various ISI forecasting communities (e.g., users, developers, providers, researchers). The discussion of Best Practices necessarily cuts across the details of how forecasts are evaluated and shared to more programmatic issues of how operational centers collaborate with the outside community. Specifically, four important aspects of Best Practices for the production, reproduction, evaluation, and dissemination of ISI forecasts are presented: public archives of forecast information, forecast metrics, more useful forecast products, and an improved synergy between operational and academic communities.
Transparency and reproducibility are essential for assessing and improving ISI forecast quality and enhancing communication among operational centers, researchers, and users. Currently, it can be difficult to determine or access the inputs (e.g., observations) and methods (e.g., models, data assimilation schemes, subjective input) that underlie a particular forecast. Likewise, many forecast products (e.g., hindcasts, analyses, forecasts, re-analyses, re-forecasts, verifications, outlooks) may not be archived or the existing archives may not be publically accessible. As noted in several instances by this report, valuable research has been preformed when such data sets are available (e.g. CMIP, ENSEMBLES, DEMETER). For many of these projects, international collaborations were enabled by the availability of forecast data. Ensuring that forecast centers establish and maintain archives is important to bolster these collaborations among forecast centers and to continue the large-scale assessment and comparison of prediction systems.
Assessing current ISI forecast performance, making comparisons among forecast systems, devising strategies for improving forecasts, and understanding the impact of a change to a forecast system (e.g., incorporation of new observations, updates to model parameters) all
require the existence of publically accessible and comprehensive archives. Documentation and archiving of data, models, methods, and products by operational centers would serve as an integral step to assessing and ultimately improving ISI forecast quality. This is especially the case if attribution of forecast improvements to specific proposed/implemented changes to the system is desired. The observing systems will evolve and studies are needed to assess and guide that evolution from the perspective of the role of observations in ISI forecasting.
Given that subjective intervention is a component of many forecast systems, it is important that the objective inputs can be easily separated from the subjective final product for independent analysis and appraisal. This separation is necessary for assessing whether the objective elements are improving and whether improvements in observations, understanding, or models, or some combination, are having a positive impact. Similarly, for forecast systems that combine statistical and dynamical prediction techniques, it is important to be able to separate the contributions from each component.
Evaluating ISI forecast quality requires a set of well-defined model performance and forecast metrics that can be applied to current and future prediction systems. Forecast metrics need to include both deterministic and probabilistic measures. Model performance metrics, which in this case are generally associated with dynamical models, need to include measures of model success in representing the mean climate, forced variability (e.g., diurnal and annual cycles), unforced variability (e.g., ENSO, MJO, PNA) and key physical processes (e.g., convection, fluxes, tropical waves). Multiple metrics are recommended since no single variable or metric is sufficient to fully characterize model and forecast quality for multiple user communities. These aspects include, but are not limited to, measures of bias (correspondence between the mean forecast and the mean observation), accuracy (the level of agreement between the forecast and the observation), reliability (the average agreement between the forecast values and the observed values when the forecasts are stratified into different categories, e.g., conditional bias), resolution (the ability of the forecast to sort or resolve the set of events into subsets with different frequency distributions), sharpness (the tendency of a forecast to predict extreme values), and discrimination (the ability of a forecast to discriminate between observations to have a higher prediction frequency for an outcome when an outcome occurs).
Regardless of which metrics are used, the following properties are necessary for a set of metrics:
Provide the ability to track forecast quality to determine if models are improving. This implies that the uncertainty in the skill statistics needs to be quantified.
Provide some feedback on model strengths and weaknesses in providing an accurate forecast.
Allow forecasts from different systems to be compared to identify which system is superior.
Provide information on metric uncertainty. This allows for forecast consistency to be evaluated.
Include a justifiable baseline of forecast quality for comparison.
The WMO Standard Verification System (SVS) for Long Range Forecasts (LRF) and the text by Jolliffe and Stephenson (2003) are excellent starting points for developing these metrics, but additional metrics will need to be developed as forecasts and their use evolve. For example, until recently there was no well-accepted and documented forecast metric for the MJO (Gottschalck et al. 2010). Examples of model performance metrics for the MJO include those developed by U.S. CLIVAR (see section MJO case study in Chapter 4; CLIVAR MJO Working Group, 2009). A number of programmatic activities have interest in developing model performance metrics that would be applicable to models used in ISI forecasting (e.g. GEWEX Cloud System Study (GCSS), Climate Historical Forecast Project (CHFP), and Climate Process Teams (CPT)). Such consideration of metrics and their development only reinforces the need for open and easy access to forecast information as discussed above.
MORE USEFUL FORECAST PRODUCTS
The promise that climate forecast information may benefit society through improved decisions and climate risk management motivates much of the human and fiscal investments in the research and production of these forecasts. Although it is often assumed that climate forecasts would be used more if they were of better quality, other factors are often cited as equally important, including the retrospective forecast performance, the societal and scientific relevance of the forecast variables and their specificity, and the manner in which the forecast is communicated.
The forecasts have to be probabilistic, as estimates of the state of the climate system are inherently probabilistic. Decision makers are accustomed to using uncertain information; risk is by definition probabilistic. But, use of probabilistic climate forecasts requires information regarding the reliability of the probabilities. Thus, additional information is needed on past performance of the prediction inputs and the overall forecast system, or the data have to be available for users to assess the forecast system based on their own requirements. If the reliability of the forecast probabilities is unknown, users often subjectively fold in additional uncertainty to the probabilities, which reduces the usefulness of the forecast. Even if there is a no-skill forecast, users need to be made aware of it either in a raw data or graphical format. It is important to document no-skill forecasts, recognizing that conditional skill may exist for a particular variable and region (i.e., areas with no-skill forecasts during one season may have useful forecasts during a different season). In addition, for the purposes of tracking forecasts over time, users may potentially find such information helpful. However, even if a forecast is labeled as “no-skill,” information on the historical climatology can still be provided to indicate the range of possibilities seen in past years.
Beyond providing information on the forecast quality, the forecast variables have to be relevant for the decision maker. Much of the ISI forecast verification to date has involved variables such as the Nino3.4 index, and even more recently the phase of the MJO. These variables are arguably distant from the needs of end users, who might instead require information on precipitation or air temperature over populated regions. Information on seasonal rainfall totals or average seasonal temperature may in turn be less important than the frequency and duration of dry spells or heat waves, or favorable conditions for tropical cyclone formation. Moreover, given that many decisions are triggered by risk of threshold crossings (e.g., not enough rain, overly high temperatures), the events or categories for which probabilities are
provided need to be determined by these conditions. Other decisions might be strongly tied to spatial considerations; for example, in some cases a large basin-scale pattern might be relevant and in others the evolution of a given variable at a specific model grid-box location might be needed. Some studies have demonstrated significant associations between ENSO events and societally relevant weather variability, such as mid-Atlantic winter storms (Hirsch et al., 2001) and variability in peak wind gusts (Enloe et al., 2004), but these have not been translated into operational products. Operational forecast centers may address some of these needs in collaboration with certain decision or policy makers, but because forecast needs are typically sector-specific and even region-specific, they cannot anticipate every decision setting. If the forecast data and tools are made available as discussed above it will be possible for users to tailor their own forecasts.
While not all end-users of the forecasts will be willing or able to tailor forecast information themselves, it is necessary to remember that users of climate prediction information encompass more than end-users or decision makers. Sectoral scientists that develop system analyses and decision models use climate prediction information as input to their models. Also, climate scientists who conduct research in areas including process studies, multi-model ensembles, and downscaling use climate prediction data. These scientists add considerable value to the development and improvement of the ISI prediction process. By working together with the end-users or decision makers, such intermediaries can help society realize the value of ISI forecasts.
ACCELERATED SYNERGY WITH THE RESEARCH COMMUNITY
There are too many major science directions for possible improvements of operational systems to be examined and implemented by either the operational centers or the research community alone. Each community has its own strengths and purpose: operational centers excel in creating robust and reliable forecasting systems using state-of-the-art models, observations and data assimilation systems; academic researchers excel in developing new ideas and approaches. Large-scale scientific challenges can often exceed the capacity of each group acting alone with respect to extreme infrastructure demands (e.g., computational resources), ancillary but important expertise (e.g, satellite or in situ measurement development) or the need for interdisciplinary approaches and expertise. This necessitates collaboration. Efforts to improve ISI forecasting should enhance communication and interaction between these communities while drawing on their complementary strengths and differentiated roles.
In terms of accelerating this synergy, the committee noted two positive examples. First, ECMWF has an annual targeted workshop with a specific focus of improving some particular element of the operational prediction system (e.g., data assimilation system, estimates of forecast uncertainty). External visitors are invited to not only make presentations, but more importantly to chair breakout groups that provide detailed and specific recommendations to the center. This activity is effective because the outside community welcomes the opportunity to affect and improve operational prediction, and because the center is committed to be responsive to the breakout group recommendation. The second example is the ongoing NOAA/NCEP Climate Test-bed seminar series. This particular seminar series has speakers from both the operational centers and the outside research community. The location of the seminars rotates through COLA (Center for Ocean-Land-Atmosphere Studies, Calverton, MD), ESSIC (Earth System Science