6
Metrics for the Climate Change Science Program
The previous chapter shows how the committee developed metrics for specific Climate Change Science Program (CCSP) objectives. This chapter proposes a set of metrics to assess progress of any CCSP program element and guide future strategic planning.
DEVELOPMENT OF GENERAL METRICS
Comparison of all the example metrics created by the committee showed that process and input measures tend to be similar in all of the case studies, whereas output, outcome, and impact measures tend to be more specific to the case study goal. However, some of these output, outcome, and impact metrics could be rewritten more generically (see examples in Table 6.1). This observation raised the possibility that a single set of metrics with broad application to the CCSP could be devised.1 Such a set would potentially be far more useful to the CCSP than a long list of highly specific metrics.
TABLE 6.1 Examples of the Way Metrics Specific to Individual Case Studies Were Worded Generically
Case Study Wording |
Generic Wording |
Output Metrics |
|
• Development of a suite of new measurement techniques that are capable of detecting carbon allocation patterns on time scales of (1) hours, (2) days to weeks, and (3) a growing season in response to external variables and photosynthetic rates of plants in control versus experimentally manipulated systems |
• The program results in peer-reviewed and broadly accessible results, such as (1) data and information, (2) new and applicable measurement techniques, (3) scenarios and decision support tools, and (4) well-described and demonstrated relationships that improve our understanding of processes or enable forecasting and prediction |
• Production of a facility that (1) can be put into the field for years at a time and (2) can maintain atmospheric CO2 levels at a specific set point (e.g., 50 ppm [parts per million] above ambient levels), with a precision (averaged over 1 hour) of 5 ppm • Sustainable information systems that make water resource data and information readily available to research and applications users |
• Adequate community and/or infrastructure to support the program has been developed |
Outcome Metrics |
|
• Are the aerosol measurements together with other aerosol research resulting in better understanding of the uncertainties in climate projections due to direct and indirect aerosol processes? • Are the research results leading to lower uncertainties in the historical contributions to sea-level rise and thence to better projections of future sea-level rise? |
• The program has led to the identification of uncertainties, increased understanding of uncertainties, or reduced uncertainties |
• Consistent and reliable projections of vegetation change and climate-vegetation interactions and feedbacks, with well-described sources of error and limitation • A peer reviewed, published, broadly accepted conclusion about our ability to simulate the twentieth century climate and attribute these variations to specific causes |
• The program has yielded improved understanding, such as (1) quantification of important phenomena or processes, (2) more consistent and reliable predictions or forecasts, (3) increased confidence in our ability to simulate and predict climate change and variability, and (4) peer-reviewed, published, broadly accepted conclusions about key issues or relationships |
Case Study Wording |
Generic Wording |
• Ability to predict the extent to which a change in climate will significantly affect public health, as measured by an increase in infant mortality rates, declines in human life expectancy, or other factors • Consistent and reliable estimates and forecasts of water resources quantities (e.g., volume of natural water resources, fluxes) to support adaptive management • Technology developed for rapid control of trace gas concentrations at high precision |
• The measurements, analysis, and results are being used (1) to answer the high-priority climate science questions that motivated them, (2) to address objectives outside the program plan, or (3) to support beneficial applications and decision making, such as forecasting, cost-benefit analysis, or improved assessment and management of risk |
Impact Metrics |
|
• Significantly reduced morbidity and mortality rates as a result of improved management of infectious disease • “No-build” zones established between new structures (e.g., roads, railways, houses) and the shoreline protect communities from sea-level rise |
• The program has benefited society in terms of enhancing economic vitality, promoting environmental stewardship, protecting life and property, and reducing vulnerability to the impacts of climate change |
The committee tested the concept by first combining the metrics in each category into a master list (Appendix B). The metrics within each category were checked for consistency with the definitions in Box 1.3 and examined for uniqueness, similarity, or overlap. Next, generic wording was developed for process, input, output, outcome, and impact measures, which required some rearranging and grouping. The metrics were written to permit a yes-no answer or a 1-5 score, although other scoring schemes (e.g., Army’s red, yellow, green light approach)2 could also be used.
The general metrics in Box 6.1 emerged from the iterative process described above. Note that the rankings will have to be defined for each measure, as exemplified in Chapter 2 (see example 1-5 ranking for metrics in Table 2.3). The type of ranking (e.g., yes or no, 1-5 scale, or some combination) is a matter of preference of the program leader.
2 |
Department of Defense, 2003, Performance and Accountability Report: Fiscal Year 2003, Washington, D.C., p. 381, <http://www.defenselink.mil/comptroller/par/fy2003/00_Entire_Document.pdf>. |
Box 6.1 Process Metrics (measure a course of action taken to achieve a goal)
Input Metrics (measure tangible quantities put into a process to achieve a goal)
Output Metrics (measure the products and services delivered)
|
Scientific programs yield a continuum of products and activities. Therefore, distinguishing between output and outcome measures and between outcome and impact measures requires some care. In this report, output metrics are tangible products and services, including scientific results and new techniques, capabilities, or infrastructure. Outcome metrics are broader
(c) new and applicable measurement techniques, (d) scenarios and decision support tools, and (e) well-described and demonstrated relationships aimed at improving understanding of processes or enabling forecasting and prediction.
Outcome Metrics (measure results that stem from use of the outputs and influence stakeholders outside the program)
Impact Metrics (measure the long-term societal, economic, or environmental consequences of an outcome)
|
results, such as improved scientific understanding or reliable forecasts, that influence stakeholders outside the program, including scientists working in other fields, resource managers, and policy makers. The level of influence of impact metrics is even greater and includes the long-term results of actions by managers, policy makers, and science and business leaders.
ROBUSTNESS OF THE GENERAL METRICS
The applicability of the general set of metrics was tested by applying them to four CCSP program elements or related programs at different scales:
-
CCSP Question 4.1: To what extent can uncertainties in model projections due to climate system feedbacks be reduced?3
-
Assessment 2.4: Assessment of trends in emissions of ozone-depleting substances, ozone layer recovery, and implications for ultraviolet radiation exposure and climate change.4
-
Chapter 11, Goal 2-related issue: Develop resources to support adaptive management and planning for responding to climate variability and climate change, and transition these resources from research to operational application.5
-
CCSP and U.S. Global Change Research Program (USGCRP) in its entirety.6
The test cases were not intended to actually assess progress in four particular program elements. Such an assessment is beyond the committee’s charge and capability and is left to CCSP program managers and appropriate stakeholder groups. Consequently, the committee’s answers, scores, and associated explanation of the metrics of the test cases are not provided. However, some insights into the application of the general metrics to the four test cases are given below.
For each test case, the committee evaluated the use of the metrics by actually answering or scoring each question. In some cases, especially those involving complex or qualitative measures, significant explanation describing the progress and performance had to accompany the answer or ranking. These tests led to some iteration and improvement in the wording, but the committee concluded that the general set of metrics is robust and could be used to measure progress and guide strategic thinking across the entire CCSP.
Test Case 1: CCSP Question 4.1 (To what extent can uncertainties in model projections due to climate system feedbacks be reduced?). In this test case, the process metrics generally received low scores because there is little focused planning and leadership is spread across different modeling programs. Although valuable collaborations exist at the project level and through international programs, their purpose is to better understand the model results, rather than to direct modeling efforts. However, a number of the output and outcome metrics received high scores because the models are improving and are leading to improved understanding of the climate system. Many of the outcome metrics and all of the impact metrics were difficult to score because they require qualitative judgments, such as peer review or stakeholder assessment. Moreover, it may take decades to assess the impact of model improvements.
The fact that good scientific outcomes are sometimes possible without extensive institutional planning or focused leadership is not surprising. Scientists are generally capable of identifying research approaches and developing grass roots collaborations without formal direction, as long as the agencies maintain an environment that promotes discovery and innovation. However, strategic planning can speed scientific outcomes, as illustrated in the second test case below.
Test Case 2: Assessment 2.4 (Assessment of trends in emissions of ozone-depleting substances, ozone layer recovery, and implications for ultraviolet radiation exposure and climate change). The committee used the history of stratospheric ozone depletion from the mid-1970s to the mid-1980s as summarized in Chapter 2 to evaluate this test case. Once the stratospheric ozone program emerged from the discovery phase in the mid-1970s, it led to numerous scientific and policy successes. The high scores given retrospectively to metrics across the board are in agreement with this perception. This test case suggests that the combination of a strong plan, an active research community enabled to make scientific discoveries, and leadership committed to using the scientific output leads to a highly successful program.
Test Case 3: Chapter 11, Goal 2-Related Issue (Develop resources to support adaptive management and planning for responding to climate variability and climate change, and transition these resources from research to operational application). The National Oceanic and Atmospheric Administration’s (NOAA’s) Regional Integrated Science and Assessments (RISA) program7 was used to evaluate this test case. The RISA
program supports research on climate-sensitive issues of concern to decision makers and policy planners at a regional level. The research, which is largely carried out by seven university-government-private sector consortia, focuses on fisheries, water, wildfire, agriculture, public health, and coastal restoration. Examination of the RISA program revealed the presence of a plan and appropriate leadership, but few peer-reviewed results and limited funds to promote discovery and innovation. Although the RISA program enables significant output related to stakeholder needs, weaknesses in process (peer review) and input (sufficient support) limit the ultimate outcomes.
The committee found it difficult to score a number of the metrics in this test case because the information needed to make the evaluation (e.g., use of results outside the program) has not been collected. This will likely be the case for the first few evaluations of any program. Experience will show which metrics are most important and what information is needed to evaluate them regularly.
Test Case 4: CCSP and USGCRP in Its Entirety. The analysis of the CCSP-USGCRP as a program revealed a different set of issues. Many of the process metrics reveal weaknesses. Although the CCSP has central leadership, the day-to-day leadership of its many programs and activities is distributed among different agencies. Distributed leadership also affects many of the factors that are related to other process metrics, such as priority setting and establishment of peer review systems. The success of the applied parts of the program in particular may well fall short without better coordination and leadership.
Scores on the input metrics were mixed. The comprehensive nature of CCSP goals ensures that many aspects of the program will be resource limited. However, although funding is insufficient to accomplish everything in the plan, it allows both unfettered and mission-oriented research. As a result, the program has produced significant outputs and outcomes.
Assessing the impacts of such a far-reaching and complex program presents a considerable challenge. First, impacts depend on a number of factors (e.g., politics, technological advances), many of which are not connected to the CCSP. Only a fraction of the scientific outcomes may have significant impact on policy and decision making, and those outcomes will themselves depend on the success of many other program elements. Second, it may take decades to assess the impact of the CCSP and its predecessor USGCRP. The two- to four-year time frame of the CCSP milestones, products, and payoffs limits the number of impacts that the CCSP can claim. Nevertheless, it is clear that the CCSP-USGCRP has made substantial contributions to the global debate on climate change. With the perspective of time, the magnitude of this impact will become more clear and is likely to grow substantially.
USE OF GENERAL METRICS TO SET PRIORITIES
One of the more difficult problems for agency managers and Office of Management and Budget (OMB) budget examiners to address is setting priorities among different types of programs in the absence of an overarching national strategy on environmental science issues. The general metrics may provide a useful starting point for choosing between different projects. They could be applied to each project and the results (scores plus commentary) compared. The comparison is simplest when similar program elements are being considered, such as land and ocean observing programs. In such cases, the factors needed to measure process, inputs, and outputs are similar and the comparison is straightforward. However, even when the goals (and thus the process, input, and outputs) are different, the general metrics facilitate identification of the strengths and weaknesses of different programs, including the readiness of a program to advance beyond the discovery stage or the effect of resource limitations on particular parts of the program. Insights gained from such a comparison could provide the basis for a more informed discussion of priorities than currently exists.
CONCLUSIONS
Overall, the committee found that the general metrics listed in Box 6.1 provide a useful starting point for CCSP program managers to assess program performance and identify barriers to progress. Tests performed by the committee suggest that the general metrics are also likely to be applicable to any science program that has established goals. The list appears to work for programs at all levels of granularity, although not all metrics will apply to all programs. In addition to providing a yes or no answer or a numerical score, the formal evaluation should include a commentary explaining the meaning of the score. Indeed, an explanation of the meaning of the measure is required in Program Assessment Rating Tool (PART) reports.8 This commentary is as important as the specific answer or score.
8 |
Office of Management and Budget, 2005, Guidance for Completing the Program Assessment Rating Tool (PART), pp. 13–14, <http://www.whitehouse.gov/omb/part/fy2005/2005_guidance.doc>. |