Systematic evaluation of performance is crucial for any public program, including research programs. Political leaders and program managers want and need regular, accurate information on what programs are or are not accomplishing; well-conducted evaluations can provide information for refining or revising program design. For program managers, evaluation can be a source of organizational learning and improvement. Stakeholders care greatly about what a program produces. And formal mandates, such as the Government Performance and Results Act (GPRA), require the regular identification of program metrics and provision of information on program performance.
In this chapter we focus on internal evaluation: What the National Oceanic and Atmospheric Administration (NOAA) should do to assess how the Sectoral Applications Research Program (SARP) is performing. Because SARP is a new and small program and one focused on research, it needs evaluative methods and criteria that are appropriate and feasible for a program with these characteristics. We begin by presenting a brief look at the textbook approach to evaluation and then assess the extent to which such an approach is appropriate for SARP. The results of this consideration shape the approach that we recommend.
TEXTBOOK PROGRAM EVALUATION
The Formal Model
In theory, a program should be assessed against the stipulated outcomes it was meant to produce. A full program evaluation would include a process evaluation, which assesses the quality, consistency, and comprehensiveness of a program’s implementation, and an outcome assessment. The data for the assessment would include valid and reliable quantitative measures of the desired outcomes. For programs aimed at achieving a variety of results, metrics could be included for all of them. In theory, outcome data are available regularly, in time series, so that routine review of progress for both formative and summative evaluations can be undertaken.
Textbook evaluations presume a fully developed causal model that includes all the factors (including other public programs) that can contribute to the outcomes of concern. Only if all these influences are taken into account is it possible to determine the extent to which the program itself independently influences the results. The most convincing demonstrations of cause and effect depend on experimental and quasi-experimental research designs (see, e.g., Campbell and Stanley, 1966). When experimentation is not feasible, evaluations can measure a broad range of influences and statistically separate the effects of the program from the effects of other variables.
All these methods require a large number of cases, with the program applied in some and not others. They also depend on having policy objectives that are clear, unambiguous, and noncontradictory and on having all the required data. When these characteristics are not present, evaluation is much more complicated.
In fact, the textbook approach to evaluation has been possible only with some medical, public health, and social programs in which well-defined interventions are used in fairly large populations with well-defined objectives. And even in some of these programs, the evaluation has been difficult because the policy is vague or has multiple, partially incompatible goals (such as prison programs aimed simultaneously at punitive and rehabilitative outcomes). Outcome measures are also likely to be prone to multiple interpretations or to be controversial among the stakeholders.
Evaluating Research Programs
Research programs are often particularly difficult to evaluate by the textbook model (see Bozeman and Melkers, 1993; National Research
Council, 2007a). One reason is that the outcomes of research are various, and the paths to those outcomes are both varied and poorly understood. Thus, successful research activities can produce different kinds of outcomes, and any individual outcome measure is likely to be an imperfect evaluation tool. Moreover, concerns are commonly raised about the validity of the more readily available quantitative measures of research outcomes, such as citation counts, reputational studies, and so forth. Another reason for the inapplicability of textbook evaluation for research programs is that the fruits of research are rarely visible in the near term. A third reason is that applied research, such as that supported by SARP, has both scientific and societal objectives, so that specifying the outcomes and determining the appropriate metrics for them is very complex.
A number of approaches for assessing the outcomes of research programs, including research programs focused on science and technology utilization, have been used with some success (see Youtie et al., 1999). These include comparisons between those who use the research results and those who do not and identification of the reasons for use and non-use; studies of the effects of the program on networks of scientists and users; and an emerging “research value mapping” approach that examines the various ways a research program can produce value and then assesses effects using both quantitative and qualitative methods (see http//www.rvm.gatech.edu/aboutrvm.htm [accessed August 2007]). The research value mapping initiative aims to evaluate both the output produced by such a program and also the capacity—the scientific and human capital generated. Such capacity could be seen in enhanced cognitive skills, knowledge, or craft skills of those involved (Bozeman and Kingsley, 1997; Bozeman et al., 2001).
This brief review makes clear that there is no single, cookbook approach even to standard program evaluation and that such evaluation is far from a trivial undertaking. Evaluation of research programs is likely to be more complex than evaluation of large-scale operating programs. Because of these difficulties, programs are sometimes advised to spend approximately 10 percent of their annual budgets for evaluation (e.g., U.S. Government Accountability Office, 2002).
PRACTICAL CHALLENGES IN EVALUATING SARP
A number of features of SARP and its context suggest the need to carefully consider what can and should be expected in evaluating SARP. Most obviously, SARP is a research program, and as such, it is difficult to know which outcomes to expect, especially in the short term. This issue is a familiar one that scientific research programs face, including programs focused on climate change research (National Research Council,
2005c). Also, SARP is a new program. Techniques for evaluating research programs, like the value-mapping approach, often require data developed over an extended period of time—in short, they can be used only for mature research programs. An evaluation of the California Irrigation Management Information System (CIMIS), a program in the Office of Water Use Efficiency of the California Department of Water Resources (see http://www.cimis.water.ca.gov/cimis/welcome.jsp), based its conclusions largely on a comparison between conditions when users began taking advantage of the system—as far back as in 1982, when the program began—and recent conditions (Parker et al., 1996; 2000). Similarly, evaluations of the Sea Grant Program have had the benefit of being able to use an extensive time horizon going back to 1967 (e.g., National Research Council, 1994). In addition to lacking a track record, SARP is not connected to a causal model that can be used to identify expected outcomes from program inputs and outputs. Moreover, because the purpose of SARP is to generate new kinds of practical outcomes from climate research in diverse decision-making settings, with different kinds of decision makers and at different levels of analysis, it is not obvious in advance who will be affected or how their decisions may be changed. In this situation, the relevant outcome measures cannot be specified.
The relevant causal model for generating expected outcomes would be a model of human decision making. However, given the highly diverse decision contexts faced by such actors as floodplain managers, farmers, urban planners, and insurers, it is likely that different decision models may apply in different settings. It is certain that the right decision model(s) to use is unknown. Moreover, the outcomes of a use-inspired research activity are likely to be quite different from the outcome of a network-building workshop. Thus, assessing the outcomes of SARP will require different metrics for different elements of the program, as well as a fairly open-ended assessment process to allow for the possibility of very different kinds of benefits in different contexts.
Identifying the SARP “treatment” that is to be evaluated is also problematic. We recommend that SARP support three different kinds of activity—use-inspired research projects, workshops, and pilot projects—all of which have different objectives and therefore require different causal models as a basis for evaluation and also require assessment against different metrics. Developing these different models and the associated metrics presents significant assessment challenges. (See Box 5-1 for a summary of input, process, output, outcome, and impact metrics for assessing climate change programs generally.)
Some additional challenges for evaluation also deserve mention. One concerns the scale on which outcomes may appear. Climate change is by definition global, so that its costs, and the benefits of improved decision
General Metrics for Assessing Climate Change Programs
Process Metrics (measure a course of action taken to achieve a goal)
Input Metrics (measure tangible quantities put into a process to achieve a goal)
Output Metrics (measure the products and services delivered)
Outcome Metrics (measure results that stem from use of the outputs and influence stakeholders outside the program)
Impact Metrics (measure the long-term societal, economic, or environmental consequences of an outcome)
SOURCE: National Research Council (2005c:6-7).
support systems, may occur much more broadly than where a program activity is initially targeted. In principle, the evaluation of SARP should take international ramifications into account, although in practice this almost certainly is not feasible.
Another challenge is that part of SARP’s mission is to generate connections that involve networked links across actors and organizations. Thus, some of the benefits of SARP may be realized through changes in other agencies and organizations at the federal, state, and even local levels. Such benefits are likely to be hidden or undervalued in most kinds of evaluations. As the Government Accountability Office (GAO) has noted, the GPRA process does not effectively address questions about program performance under these conditions (Government Accounting Office, 1999:32):
Allocating funding to outcomes presumes that inputs, outputs, and outcomes can be clearly defined and definitionally linked. For some agencies, these linkages are unclear or unknown. For example, agencies that work with state or local governments to achieve performance may have difficulty specifying how each of multiple agencies’ funding contributes to an outcome.
To the extent that SARP’s success relies on the effective collaboration of multiple actors, especially organizations that span sectors, levels, and functional specialties, the usual processes for evaluating government programs under GPRA have serious limitations (for further analysis of the broader point, see Meier and O’Toole, 2006:63-64).
In addition, SARP operates in a “crowded” policy space in which multiple agencies are players and their collaborative action may be essential in delivering desired outcomes. Distinguishing SARP-specific outcomes from those that are a result of other agencies’ initiatives may be exceedingly difficult and costly to accomplish.
Finally, we note the important difference in objectives between an evaluation carried out for assessing results for possible reprogramming of budget monies (the usual purpose of evaluations for the U.S. Office of Management and Budget [OMB]) and an evaluation conducted for organizational learning within NOAA. Given the newness of SARP and the uncertainties about the nature of its possible benefits, such learning is an important objective for any evaluation of SARP.
A MONITORING APPROACH TO EVALUATION
Because the standard evaluation approaches are not appropriate for the Sectoral Applications Research Program, we recommend that
evaluation questions for the Sectoral Applications Research Program be addressed by a monitoring program.
Monitoring requires the identification of process measures that could be recorded on a regular (for instance, annual) basis and of useful output or outcome measures that are plausibly related to the eventual effects of interest and can be feasibly and reliably recorded on a similarly regular basis. Over time, the metrics can be refined and improved on the basis of research, although it is important to maintain some consistency over extended periods with regard to at least some of the key metrics that are developed and used.
Such a monitoring emphasis would likely satisfy congressional mandates such as those of GPRA and the needs of OMB. Although it would not provide the ideal information to facilitate organizational learning for NOAA, such a monitoring system could nevertheless help to catalyze certain forms of learning: for instance, by noting apparent progress or lack of progress in developing some of the early and intermediate results anticipated by the program’s managers and thereby leading to directed searches for better project designs or decisions to redirect funding toward project types that have showed the greatest apparent payoff in outputs.
In considering a practical approach to assessing SARP and its progress, it is important to bear in mind that the overall mission SARP was created to support requires a much broader range of research activities and a much greater level of investment than is available in the current SARP budget. Thus, it is important to assess SARP against reasonable expectations for what can be achieved within its areas of activity. In terms of the metrics identified in the National Research Council (2005b) report on this topic, the inputs to SARP are seriously limited, which puts corresponding limits on expectations for outputs, outcomes, and impacts. The following discussion is therefore organized around the three lines of activity we recommend that SARP emphasize in the next several years. It also includes our ideas on how to collectively assess the progress of these activities.
As detailed in Chapters 3 and 4, we recommend three lines of activity for SARP: a limited program of use-inspired social and behavioral science research to inform climate-related decisions in sectors defined by resources or decision arenas; workshops; and, at some point following the first year of workshops, one or more multiyear pilot projects aimed at facilitating existing networks or initiating new ones, to support and study the evolution of sectoral knowledge-action networks of decision makers and scientists. All three activities have some similar long-term objectives in terms of outcomes: to induce decision makers to consider and use climate information in their decisions and to do so appropriately. Thus, relevant outcome indicators include the extent to which decision makers
in a sector seek out and then use climate information in their work. The eventual impacts of the use of climate information may be very difficult to determine and will certainly vary by sector and type of decision. However, a properly designed monitoring effort can track certain kinds of output and outcome metrics that are related to the key impacts of interest.
The three lines of activity are different, however, in how closely tied they are to the shared practical objectives. Pilot projects can reasonably be expected to change the actual information-collecting and information-using behavior of participating decision makers, and perhaps the information-collecting behaviors of participating scientists. Workshops may lead to establishing better communication between the producers and users of climate information, but other behavioral changes may occur only after effective communication has been in place for a while. The recommended research can improve understanding of these communication and behavioral processes. This understanding is an important outcome in its own right, and it should be evaluated as a contribution of SARP to basic science. In addition, the recommended research is intended to contribute indirectly to practical outcomes of importance to SARP, possibly by helping SARP do a better job of selecting promising projects or helping those who run workshops or pilot projects do so more effectively. It may also change understandings of the process of linking science to its users in ways that eventually alter some of the criteria for program evaluation. Thus, different kinds of activities require somewhat different metrics and different interpretations of the metrics. We begin by discussing pilot projects, which are most consonant with the program’s desired practical outcomes, and then discuss workshops and use-inspired research.
Metrics for Monitoring Pilot Projects
An assessment of outcomes would seem to be especially appropriate for monitoring the performance of a pilot project devoted to the development of a knowledge-action network. Two types of outcomes are likely to result from successful efforts: (1) climate-related data will increasingly reach and influence target audiences of decision makers or potential decision makers, and (2) there will be increased capacity in decision systems to create decision-relevant climate information and make it available to users, including increased linkages between and among relevant groups and decision makers who could benefit from the use of such information. These outcomes may also be useful for assessing other components of SARP, including workshops.
Reaching and Influencing Target Audiences
Climate-related information may be valid and highly relevant to the needs of decision makers, but it will not influence choices that are made unless it reaches decision makers—and reaches them in a form that can be understood and used. What sorts of metrics might be useful for tapping the extent to which target audiences are being influenced by climate-related information? In this regard, it is helpful to keep in mind several characteristics of useful metrics, as explicated in a recent report of the National Research Council (2005c) “metrics should be easily understood and broadly accepted by stakeholders. Acceptance is obtained more easily when metrics are derivable from existing sources or mechanisms for gathering information.” In addition: “Metrics should assess process as well as progress” (p. 51), and “a focus on a single measure of progress is often misguided” (p. 52).
Among the metrics that could be recorded fairly easily and regularly and that can be captured by minor modifications or additions to existing data systems, five stand out: the number of new partners receiving climate-related information; the variety of users; the number of new decision areas in which climate-related information is involved; the number and extent to which existing models, maps, texts, documents, assessments, and decision routines are modified to integrate climate-change information; and the judgment of target audiences. “New partners” can be considered in terms of individual decision makers, organizational units, and types of decision-making units. For example, in coastal zone management, units could be the number of coastal management organizations that request or receive information from the pilot project. In an effective SARP pilot project, this number in this metric should increase steadily over time.
The metric of the variety of users assesses the extent to which climate-related information is reaching a broadening array of decision makers, not merely a count of users. Over time, one would expect SARP as a whole to facilitate the distribution of climate-related information to more kinds of users, especially users previously unfamiliar with the decision relevance of such information, users drawn from very different kinds of decision contexts, users with widely varying experience with such information, users with differing degrees of professionalism (including, for example, laypeople), and users in more widely varying geographic settings. A SARP pilot project should, over time, reach an increasing number of the types of users operating in the sector of the project.
For the metric of new decision areas, a decision area can refer to something as broad as coastal decision making, agricultural decision making, health-related decision making. In the context of a pilot project in the
coastal management sector, for example, the term could refer to classes or kinds of decision settings, such as decisions about infrastructure, strategic planning, or emergency preparedness. The expectation once again is that SARP will stimulate the penetration of climate-related information into more and more types of decision areas; pilot projects should do so in the sector they target.
For the metric on documents and decision routines, modifications may be relatively easy to track. Currently, for instance, virtually all floodplain maps ignore what is known about the likely effects of climate change on vulnerability to floods. The information is widely known to insurers and other relevant stakeholders, but the key documents on which important land-use decisions are being made in places like New Orleans and Sacramento do not include the best available climate knowledge. Slowly, success in SARP should be reflected in changes in these kinds of documents and other materials to more frequently and more regularly incorporate the best information drawn from climate-change research. The extent of incorporation can be tracked for each major type of document, decision aid, or decision routine if the documents, aids, and decisions can be identified in advance. Proposals for pilot projects should identify target tools or decision routines.
Finally, the judgments of target audiences are a useful metric. Potential users of climate science information can themselves provide valuable information regarding the extent to which their decision-making context has been altered in relevant ways, the kinds of information available and used in making decisions, and the extent to which they are aware of climate information and believe it to be relevant to their decisions. Surveys can be directed to specific types of users and customized with respect to the sorts of decisions and decision settings that are relevant. Focus groups can also be used as a supplement or alternative source of audience data. Since the range of possible users and decisions may be large, surveys would have to be aimed at selected, key target groups. For a pilot project, the target user groups should be known in advance. These effects will take time to become apparent, even in the best cases. Pilot projects should be monitored over several years. Over time, a successful SARP should see increasing knowledge and utilization of climate-related information among such critical groups.
Improving decision support systems to make use of climate-related information means not simply influencing target audiences, but also expanding the capacity of varied groups of decision makers to consider and use climate information. New capacity includes the creation of net-
works and communication links that can make this information more readily useful. Four important metrics for assessing capacity are new links among target groups, the emergence of new kinds of organizations or functions, new products, and new investments in networks.
For target groups, one can monitor whether communication links, particularly between scientists and the users of science or groups representing the users, emerge as a result of SARP’s efforts. Are the links sustained? Do they in turn trigger other patterns of networking toward still additional groups of users? Survey data or follow-up assessments, possibly at annual intervals, can help assess the extent to which these kinds of connections have been established. As SARP moves forward, and especially to the extent that the program chooses to emphasize capacity-building approaches, one should expect some linkages to dissolve while others develop, persist, and stimulate still additional connections—and thus additional capacity building.
One way the use of climate science information becomes institutionalized is by the creation of new organizations or organizational roles to fulfill intermediary functions between climate information producers and users. For example, weather forecasters and newsletters for decision-making groups (e.g., farmers, water managers) could begin to offer seasonal forecasts and recommendations for taking advantage of expected unusual seasonal conditions. Professional associations of users (e.g., city managers, floodplain managers) could create new working groups or staff roles for making climate science results accessible to members. Such changes usually take considerable time, but they might be expected from a multiyear pilot project. With still more time for climate information to work its way into a decision arena, monitoring could search for actual use of the information disseminated by the new organizational activities.
Over time, effective knowledge-action networks such as those to be catalyzed by the pilot projects are likely to change the activities of science producers so that some of them create new kinds of outputs to meet users’ needs. In the case of regional decision makers, downscaled climate forecasts are an example of such outputs. For sectoral users, new outputs on the seasonal-to-interannual scale might include snowpack forecasts and estimates of growing season length; on the time scale of climate change, new outputs might include new estimates of the “100-year” and “500-year” flood or hurricane. Social scientists may also produce new outputs taking climate information into account, such as estimates of economic impact, population dislocation, or inequality effects due to future extreme events. Such changes in the behavior of scientists are likely to occur only after years of development of a knowledge-action network and thus might begin to emerge at the end of a multiyear pilot project.
Finally, increased capacity should result in new investment in net-
works. As the users of climate information become convinced over time of its value to them, large organizational users and well-funded associations of climate-affected decision makers may begin to invest money and staff time in developing new climate information products or in intermediary activities to make better use of climate information.
Metrics for Assessing Workshops
The main rationales for the recommended SARP workshops are to identify potential knowledge-action networks, to provide initial incentives for gathering and exchanging information among the producers and potential users of climate-related information, and to assess the feasibility of more sustained networking efforts. Workshops should therefore be assessed against those objectives and, over time and in the aggregate, the workshop activities should lead to more long-lasting, expanded, and substantive networks of the type the pilot projects are designed to help create. Some data to be used as metrics can be gathered from each workshop, and it is important that additional cross-sectional assessment be undertaken so that SARP can begin to understand why some network-building efforts are more successful than others.
We propose six candidate metrics for monitoring workshops:
Participation: Number of potential network actors (decision makers) who participate in the workshop activity, number of types of such participants, range of representation of science producers and users who have the potential to develop a sustainable network.
Partnership: Commitment of assistance or partnership from other potentially relevant organizations, such as extension organizations or professional associations of scientists and decision makers.
Participants’ Judgment: Participants’ overall assessment of the value of the activity, decision makers’ judgment that climate science information can be useful to them, level of interest in continuing to participate, participants’ desire to share information with other types of participants following the workshop, and participants’ willingness to commit resources to continuing the effort.
Changes in Knowledge: If feasible and valid, preworkshop-postworkshop comparisons of participants’ level of knowledge of the relevance of climate-based information to their decisions or of the types and variety of actors with whom they should regularly interact.
Changes in Communication: Increased efforts by scientists involved in the workshop to discuss research results with users involved in the workshop and other users in the same decision arena, increased efforts
of users to seek out scientists for climate-related information, additional meetings of science producers and users outside the workshop.
Capacity Building: Actual plans for and establishment of new forms and channels of network infrastructure (websites, listservs, newsletters, software, in-service training activities, committees or working groups, an executive group or secretariat, etc.), creation of new organizational roles or positions for linking climate science and users.
Many of these indicators are similar to those suggested for assessing pilot projects. However, workshops involve a lower level of program investment over a shorter period of time, and they occur when less is known about the relationship between available climate information and users’ needs and about how to link science producers and users most effectively. Therefore, workshops should be considered as an early phase in the social innovation process, and expectations should be set accordingly.
SARP should also sponsor research across workshops to compare their outputs and outcomes, with a view to understanding the reasons for what are likely to be considerable differences in outcomes. These comparisons can be a major source of learning for SARP. In addition to measuring outputs and outcomes, this effort should include measures of inputs, initial expectations, and process variables for each workshop and a characterization of the decision context being addressed. Thus, it could be useful to gather from participants (or principal investigators) such information as:
characteristics of the projected network (e.g., number of different kinds of actors),
number of total actors,
number of different levels of decision-making responsibility (temporal and spatial scale of decisions involved),
level of initial interest,
evaluation of workshop format,
extent of participation by network participants/invitees in workshop design,
involvement by both climate and social scientists,
balance between scientists and practitioners,
degree of perceived salience of climate-related information to decisions of invitees in advance of the workshop,
extent of prior organization of participants, and
extent of partnership or cofunding for networking initiative.
Metrics for Monitoring Use-Inspired Research
The limited research program that we recommend for SARP should be assessed both for its contributions to basic knowledge and for its contributions to NOAA mission goals.
Contributions to Basic Knowledge
As with other climate research programs, SARP’s research activities should be assessed against relevant input metrics (see National Research Council, 2005c), such as the sufficiency of intellectual and technological foundations to support the research and of resources to execute and sustain the work. The research should be assessed against such relevant process metrics as program leadership and strategic planning, strength of the peer review system, and strength of processes to facilitate the use of research results within the SARP planning process and by relevant outside audiences. The scientific contributions should also be assessed against relevant output metrics for science, such as producing peer reviewed and accessible results, developing a research community and associated infrastructure to support continued development and dissemination of the use-inspired work generated by the program, and developing institutional and human capacities to address related research issues. The key scientific outcomes from the research are likely to be improved understanding of the processes by which climate-related information comes to be produced in a use-inspired way and the means by which such information comes to be used or not used by those it can benefit. The nature of this understanding is unlikely to be measurable by quantitative indicators because of the nature of the processes being studied. However, it can be assessed qualitatively by advisory groups, and it can be seen in the effects of the knowledge on thinking in the relevant scientific communities.
Contributions to Mission Goals
SARP’s use-inspired scientific activities can be expected to have outputs, outcomes, and impacts related to the NOAA climate science mission. The outputs can be assessed in terms of reports, journal articles, and similar writings that speak explicitly to the efforts in SARP and in related NOAA programs such as the Regional Integrated Sciences and Assessment Program to improve the links between climate science and its potential user communities. The outcomes of successful research efforts are likely to include changes in thinking among NOAA staff and others involved in building and maintaining knowledge-action networks about how best to catalyze the use of climate science information. The outcomes may
include increased knowledge-based confidence among those audiences in their ability to organize effective programs. Research may also lead those involved in network-building activities, including those responsible for SARP-sponsored workshops and pilot projects, to organize these activities in new and more effective ways. Research may also lead to changes in the way workshops and pilot projects are evaluated and in the specifications written to request proposals for such projects.
An important ultimate impact of research would be more effective integration of climate information by decision makers. However, such mission-related impact metrics are unlikely to show discernible progress in the short term, for the several reasons discussed above. On a longer timescale it will be even more difficult, if not impossible, to separate the impacts of research efforts from those of program implementation.
Textbook program evaluations can be very valuable. However, given the small size of SARP, the expectation that desired outcomes will take at least several years to achieve, the multiple types and levels of decisions that could be influenced by climate information, the variety of relevant decision makers, and the multiplicity of programmatic approaches to shape decision support systems, such an evaluation approach is not appropriate for SARP. Instead, we recommend a monitoring approach.
A monitoring approach aims at recording and analyzing trends in metrics appropriate for each type of SARP activity (pilot projects, workshops, and use-inspired research). We have drawn on earlier work to identify several possible metrics for each type of activity. Multiple metrics should be sought—some that record processes in SARP and some that tap outputs and outcomes—in a regular monitoring scheme. Data should be recorded at regular intervals, perhaps annually. Whenever possible, monitoring should rely on existing sources of data and data that can be reliably collected without substantial time and resources, to limit the level of effort for monitoring this small program. Representatives from target audiences themselves should be asked to contribute to decisions regarding the details of data collection and surveys that could be most useful for monitoring SARP performance. We recognize that because the program is small and its context is rapidly changing, any form of evaluation will be challenging. Nevertheless, it is important for SARP to be able to learn from experience. It is therefore worthwhile to conduct careful comparative research on the results of major SARP initiatives and to seek to understand how outputs and outcomes are affected by program inputs, characteristics of the decision arenas, and other factors.