require the existence of publically accessible and comprehensive archives. Documentation and archiving of data, models, methods, and products by operational centers would serve as an integral step to assessing and ultimately improving ISI forecast quality. This is especially the case if attribution of forecast improvements to specific proposed/implemented changes to the system is desired. The observing systems will evolve and studies are needed to assess and guide that evolution from the perspective of the role of observations in ISI forecasting.
Given that subjective intervention is a component of many forecast systems, it is important that the objective inputs can be easily separated from the subjective final product for independent analysis and appraisal. This separation is necessary for assessing whether the objective elements are improving and whether improvements in observations, understanding, or models, or some combination, are having a positive impact. Similarly, for forecast systems that combine statistical and dynamical prediction techniques, it is important to be able to separate the contributions from each component.
Evaluating ISI forecast quality requires a set of well-defined model performance and forecast metrics that can be applied to current and future prediction systems. Forecast metrics need to include both deterministic and probabilistic measures. Model performance metrics, which in this case are generally associated with dynamical models, need to include measures of model success in representing the mean climate, forced variability (e.g., diurnal and annual cycles), unforced variability (e.g., ENSO, MJO, PNA) and key physical processes (e.g., convection, fluxes, tropical waves). Multiple metrics are recommended since no single variable or metric is sufficient to fully characterize model and forecast quality for multiple user communities. These aspects include, but are not limited to, measures of bias (correspondence between the mean forecast and the mean observation), accuracy (the level of agreement between the forecast and the observation), reliability (the average agreement between the forecast values and the observed values when the forecasts are stratified into different categories, e.g., conditional bias), resolution (the ability of the forecast to sort or resolve the set of events into subsets with different frequency distributions), sharpness (the tendency of a forecast to predict extreme values), and discrimination (the ability of a forecast to discriminate between observations to have a higher prediction frequency for an outcome when an outcome occurs).
Regardless of which metrics are used, the following properties are necessary for a set of metrics:
Provide the ability to track forecast quality to determine if models are improving. This implies that the uncertainty in the skill statistics needs to be quantified.
Provide some feedback on model strengths and weaknesses in providing an accurate forecast.
Allow forecasts from different systems to be compared to identify which system is superior.
Provide information on metric uncertainty. This allows for forecast consistency to be evaluated.
Include a justifiable baseline of forecast quality for comparison.