Important Points Made by the Speakers
- It is advantageous to introduce evaluative thinking at the beginning of a project.
- Theories of change are important and can evolve over time as people’s understanding of how programs work changes.
- Instead of “did it work or not,” more relevant and useful questions for evaluation are “What aspects worked, what aspects worked less well, what can be scaled up, what could be strengthened, and what can be sustained?”
- Bringing about local change in local places requires an ecology of evidence, with knowledge translated for use and learning in real time.
- Strong evaluations require an investment of resources, time, commitment, trust, and strong relationships.
On the final afternoon of the workshop, two experienced evaluators commented on their ideas for how to design a hypothetical evaluation of a fictitious global initiative that embodied many of the characteristics of the large, complex, multidisciplinary, global interventions that were the focus of the workshop. The idea behind the session, noted moderator Elliot Stern, emeritus professor of evaluation research at Lancaster University and visiting professor at Bristol University, was to think through some of the
various designs, methods, and ideas that had been examined and discussed in this workshop.
Stern began the session by providing some details from the description of the fictitious initiative given to the evaluators. The objective of the initiative is to improve safe, reliable, and sustainable access to clean water in the Pacific Andean region. Three partner countries—Chile, Ecuador, and Peru—will select priority outcomes and develop and implement a portfolio of activities and interventions to achieve those outcomes. Funding of $3.4 billion is to be provided cooperatively by Fundación María Elena, a fictitious philanthropic foundation described as newly established by a wealthy South American banker, USAID, and the Canadian International Development Agency, with 10 percent of funding from locally sourced assets in each partner country. During a 1-year planning phase, a stakeholder coalition is to set priorities among such outcomes as improved health and well-being, environmental improvements, better water systems, improved public awareness, and reduced violence and crime due to water disputes. The coalition is also to assess needs, capacity, and current efforts; select a country portfolio of infrastructure investments and interventions; develop a sustainability plan; and develop a data collection plan.
During the 3-year implementation phase, the initiative’s components could include, for example, building infrastructure for water systems; implementing technologies for water and sanitation services at the community and household level; developing and installing technologies for monitoring water quality; education campaigns; and behavior change interventions. A subsequent 4-year extension phase could involve another planning and prioritization process and an increase in the local resource matching requirement to 25 percent. A long-term sustainability phase could follow the 8-year intervention.
The premise presented to the panelists, explained Stern, was that the funders have requested an evaluation for the first 4 years of the initiative—a planning year followed by an initial 3-year implementation phase. The evaluation budget will be approximately $3 million. The main objectives of the evaluation would be to assess the effects of the initiative on the availability of and safe access to clean water and on other priority outcomes selected by partner countries, to assess the operational performance of the initiative, and to inform the planning and implementation of the extension and long-term phases. Additional evaluation aims might include assessments of each country’s process for prioritization and planning, local match requirements for funding, multisectoral participation, planning for sustainability, and the potential to adapt the model for other regions, such as Central America or East Africa.
Water issues address very specific health problems and have a major impact on health inequities, and this hypothetical initiative has “aspirations written all over it” but without specificity on how to get there, said Sanjeev Sridharan, director of the Evaluation Centre for Complex Health Interventions at Li Ka Shing Knowledge Institute at St. Michaels Hospital. But that is not unusual. “That is the nature of 90 percent of the interventions I evaluate,” said Sridharan. Indeed, a lack of specificity provides an opportunity to introduce evaluative thinking at the beginning of a project rather than the end, but he argued against spending excessive time thinking about the best design at the beginning of a project, when the project is still being developed. Large-scale complex programs inevitably have designs never implemented before, but they have components that are familiar. The package of familiar components and how they coalesce is what makes an intervention complex. This project, he said, “is begging for some developmental evaluation, where the evaluation team itself participates in the development of an intervention.”
Charlotte Watts, head of the Social and Mathematical Epidemiology Group and founding director of the Gender, Violence, and Health Centre in the Department for Global Health and Development at the London School of Hygiene and Tropical Medicine, agreed, suggesting that national researchers from the countries where the intervention will take place should be part of the evaluation from the very beginning to embed the element of capacity building in to the evaluation. Success takes time, and engaging stakeholders on the ground is a good way to get moving in the right direction. Evaluators need to move away from the idea of “best design.” Interventions and designs need to be treated as portfolios from which new knowledge can emerge.
But great plans do not equal great implementation. Implementations need structures and support systems to produce improved health and well-being, Sridharan observed. He was taught, as an evaluator, to pretend that interventions are well formed from the beginning, but after 20 years of experience he has yet to find a well-formed intervention on day 1. “Interventions are complex, they’re dynamic, they change over time. In fact, they should be—that’s what learning implies.”
Sridharan also pointed out that evaluators need to think through timelines early on to be realistic with funders and others about what results evaluations can deliver and when. People in communities have been thinking about the problems they face for a very long time, yet administrators can want results from an evaluation in very short time periods. During the discussion period, Watts also pointed out that such time pressures can undermine research in many ways. Evaluators need time to think and pre-
test an approach if they are to deliver a rigorous evaluation, but funders can overlook this need. “The risk is that we say something doesn’t work when actually it just hasn’t had the time to kick in and have an effect.”
Watts noted that it is important to understand the desired outputs that are to be achievements of the evaluation, and that while in this scenario there are specific and multiple objectives, there are also broader public good elements that can often accompany large-scale evaluations. These can include informed, intelligent intervention delivery; an increasing capacity for strengthened networks and ownership of programs with nationally led evaluation and research; and the use of monitoring and evaluation data by program staff and practitioners.
Watts was struck by the three categorizations of purpose of evaluation stated earlier in the workshop by Chris Witty: (1) assurance to the pay masters who are funding the evaluation; (2) cost correction, which may be addressed more toward good management by improving programs by learning and thinking how we use evaluation approaches to learn by doing versus evaluation; and (3) impact evaluation. Watts queried as to whether these evaluation purposes are mutually exclusive because she stated that most evaluations should be striving to accomplish all three. In the end, stated Watts, understanding that program effectiveness cannot be reduced to answering a closed-ended question about whether “it worked”—an evaluator translating evaluation findings as a simple “this works or it does not work”—can be interpreted as a commentary on the life work of the implementer and is not a good start to a working relationship. Often people are implementing combinations of programs that have some proven elements that work, so how do you actually make sure that as well as answering the questions the donors want answered, you also think about the questions that programs really want you to answer, and explicitly include that in your evaluation design? Perhaps the evaluation questions should be more nuanced: Can you do it at scale? Can you do it with this population? Can you sustain it? To Watts, this provides a greater space for the framing of questions, the evaluation design, and for partnership between evaluators and program staff.
Both evaluators emphasized the critical importance of understanding contextual factors for the evaluation. The relationship between context and the desired outcomes is important for intervention and evaluation designs. Sridharan noted it is best to bring the knowledge of context in at the start, but reminded the audience that we have to be evolving and adapting over time. He noted that evaluations tend to be based on the premise that the world is understood, but this is not the case. Knowledge of context therefore needs to continually inform evaluations so they can evolve over time.
Furthermore, Sridharan noted that for this intervention, paying attention to interdependences will be critical, given the variations in geography,
the presence of extractive industries, and possible disputes. The countries in which the interventions are being implemented have various tensions and problems, and because water does not necessarily follow national boundaries context extends to surrounding countries such as Argentina and Bolivia that are inevitably going to be involved as well. Ongoing agreements will be a crucial factor in the planning. Sridharan said that a cooperative funding stream can produce powerful partnerships, but a mechanism is needed to develop these partnerships. The presence of a large amount of funding can actually be an impediment to developing partnerships. The planning process will also need to pay attention to needs, capacities, infrastructure, local assets, implementation, political power, and project portfolios.
In response to a question from Stern about whether evaluators should stay strictly independent throughout an evaluation, Sridharan argued for a more nuanced position. Sridharan pointed out that most evaluators do not work directly in program settings. He also pointed out that program staff are generally among the most critical observers of their programs. It does not take a faraway researcher to be objective about a program. “That’s not fair, and it’s condescending. More and more, these folks are quite self-critical.” Degrees of independence can be approached in phases. Early in a project, an evaluator may be able to provide valuable input to program staff as they design or modify an intervention. After this developmental phase, evaluators may need to achieve more independence from a program to deliver unbiased results, even if that means altering a relationship over time. Much of the time, a workshop participant pointed out, the implementer and the evaluator is the same person. As Stern added, it may be possible to have different people involved in different evaluation phases to obtain the appropriate levels of independence.
In a later discussion Stern noted that many of the words used in discussing evaluations—such as impartial, objective, bias, engagement, empowerment, subjectivity, and intersubjectivity—have strong histories and deserve much more attention and thought in evaluations than simply “are they independent or not.”
The implementation of an intervention is a journey that begins with a theory of change. This theory in turn draws on an evidence base derived from prior journeys. The idea that global health initiatives are often a thumbs up or thumbs down after just a few years is nonsensical, noted Sridharan. Learning frameworks and pathways of influence are essential.
The theory of change will evolve over time and for that matter, theories of change can be subjected to experiments and quasi-experiments to formalize the learning process. “For learning you need explicit learning structures, and I don’t think we often plan for that.” By embedding evaluations in complex interventions, the evaluation can help improve the intervention, which means in essence that the evaluation is itself evaluated. Sridharan said that he was not opposed to traditional design and measuring impacts. “At the end you have to be saying was this a good investment?” But people rarely go back and think about what they have learned from an evaluation about an intervention. Revisiting evaluation methods annually helps inform both programming and continued evaluation, a point with which both Watts and Sridharan agreed. This process makes it easier to identify what to focus on, where to drill down on certain points, and how to collect final evaluation data. It also can contribute to the establishment of systems to encourage routine collection and use of monitoring data by programs. Monitoring systems and small nested studies could be used to troubleshoot and support the good management of programming.
Besides clarifying a theory of change, Sridharan recommended paying attention to contextual mechanisms, referring back to the earlier presentation on realist evaluation. How are activities controlled? Can the desired interventions really solve the problems that exist? How long will it take for impacts to appear, and what metrics will be used to measure those impacts? Do the metrics provide incentives to stay true to the intervention? What unintended consequences could occur? How does the plan address heterogeneous contexts? How are the aspects of an intervention aligned?
Watts supported spending time at the beginning of a project to develop a theory of change. A theory of change makes it possible to revisit design plans, frame data collection and feedback, and replicate interventions in other settings. In contrast, she was unenthusiastic about the logical framework approach, which she judged to be difficult to use, especially with low literacy populations or evaluation staff. She recommended trying to keep materials relatively straightforward and usable by program staff and researchers. She challenged participants to be more creative about using evaluation frameworks as a programming and monitoring tool. She was also enthusiastic about the prospects of using smartphones or other new technologies to facilitate rapid collection of output data. Looping that feedback into programs could be very powerful.
The measurement of impacts is a long-term process, said Sridharan, not a one-shot method. Also, a knowledge base is developed for local interventions, not just for people or policy makers in faraway cities. For this reason, an evidence base is not enough. To bring about change in local places, an ecology of evidence is needed. A single store of evidence is not sufficient to bring about change. Program theory is also insufficient, in part because of
the inevitable uncertainties in a program theory. Knowledge also needs to be translated for use in real time. “Far too often, our evaluation methodologies are grand reports that are not read by the people whose lives they’re trying to improve,” said Sridharan. “Capacity building is central, because it’s not the grand outsider who’s going to bring about a change.”
Achieving Multiple Evaluation Aims
Evaluations of large-scale, complex, multi-national initiatives typically seek to achieve multiple objectives, said Watts. In pursuing these multiple goals, evaluators need to walk a tightrope, she continued. They need to provide assurances to funders that are specified in the terms of reference regarding rigorous evaluation of intervention impacts and cost-effectiveness. They may want to derive programmatic lessons about how to scale up effective interventions in other settings and the resources required to do that. They also may want to produce shared open datasets that are amenable for further analyses. In addition, they may aim to increase capacity for locally led evaluation and for use of monitoring and evaluation data by program staff. All of these tasks overlap said Watts. “These are things that most evaluations should be striving to do.”
Considerations for Evaluation Design and Methods
When thinking about the design for this complex intervention, there are some things to keep in mind, said Watts. To balance all of the different demands and multiple evaluation aims, there is a need to really try to make the proposed evaluation a prospective, mixed methods study conducted by a multidisciplinary team. Considerations for the design are to include economic evaluation in the methods and to carefully consider the sampling frame because the groups who are most vulnerable are where we want to see whether the change is occurring—samplings linked to that will need a lot of thought. All countries should be included, with intervention and control communities in each country and, ideally, some sort of randomization. This may not always be possible, but it deserves energy and creative thinking—many interventions can be designed to include some element of randomness from which important information can be learned. For example, a program can be rolled out in a staggered way with a random allocation of where interventions start. To this, Stern added that the appropriateness of randomized controlled trials depends on the questions being asked. For example, trials may not be appropriate for investigations of whether a particular program can be scaled up or customized for a particular setting.
Watts raised some questions about the best ways to use mixed methods
in order to optimize the quantitative, qualitative, and economic evaluation components. She asked, How can we ensure that mixed methods are indeed mixed and not parallel? In a way, when we’re doing large evaluations we have a bit of an engine that’s moving. However, qualitative work can be proactively and flexibly nested into quantitative studies by embedding researchers into the program to enable course corrections and provision of timely data. These efforts can be low cost, influential, and are appreciated by programs on the ground. Watts challenged workshop participants to think about the best way to incorporate economic questions and questions about resource use into mixed methods models. Evaluations need to strive to achieve key design elements, Watts continued.
In addition to course correction over time, evaluations can also assess the delivery of combinations of “proven packages” at scale, where many different factors determine success or may derail or hinder success. The challenges of program delivery need to be acknowledged, she said, including similarities and differences between settings and the expertise of program staff. Interventions are complex, they have to be adaptive, and research is needed to support that adaptation. Reasonable targets make it possible to work backwards to measures of potentially small effect sizes over limited time frames. “These sorts of elements are really important to think through at the start and to have explicit in your design,” said Watts. In this process, it is important to develop a clear sense of the planned delivery chain and potential bottlenecks. There will be variations in coverage and impact, which requires getting good measures of intervention exposure and proxies of success over time.
Sridharan concurred about the need to develop measures of success, noting that these will have to evolve over time. If they do not, an evaluation risks finding what it set out to find or reporting no impact when the impacts are different than expected.
Sridharan also observed that this example is a long-term initiative by the description, and sustainability is an explicit goal. No one would expect health in Chile or Peru to change dramatically in 1 or 2 years, he stated, yet the evaluations are going to at least begin during those years. Even $3.4 billion projects need to be scaled up and generalized if they are to have the effect that is ultimately intended. Evaluators have not done a good enough job of examining the concept of generalization in a complex and contextual world, said Sridharan. Initial evaluations therefore have to recognize that evaluations will continue into the longer term and build the capacity for those longer-term evaluations. In this way, dynamic and evolving evaluations can contribute to continual improvements. Stern added that developing capacity extends not only to skills and networks but to developing data capacity. Some parts of a program may even need to be delayed to put in place a monitoring system that will allow subsequent analysis.
Watts also noted that a process of sharing and explaining context and programmatic experience as part of the evaluation design is important for political buy-in to attempt introduction of the intervention into different settings, as well as for facilitating bidirectional learning for evaluators and researchers to generate research questions and document evidence of success or effectiveness. Based on her experiences doing a 10-country study on violence against women, Watts cited the importance of building trust and strong working relationships. For the hypothetical case, she said that she would want to have annual face-to-face meetings along with online communications and debate. These communications should bring program staff together with national and international evaluators, creating a two-way learning process that can motivate both sides. Together, this group could ask what worked well, what worked less well, how to address bottlenecks, and how to share lessons and strengthen programming. Cross-disciplinary, research-practitioner discussions could support a common agenda of making evaluations work to help interventions achieve their greatest impact and deliver programs efficiently.
Capacity building for future evaluators and researchers was raised during the discussion during this session. In thinking about how best to train students to formulate relevant, strategic, and important questions before focusing an evaluation’s design, Sridharan emphasized the importance of humility. People may be trained to use sophisticated evaluation tools, but they may limit the solution space before bringing those tools to bear on a problem. “The first lesson of solution space is to work with communities, be humble, go home and reflect on these issues.” Watts agreed with the need for humility, especially in complex interventions where evaluators need to spend a lot of time understanding the intricacies of a program, especially given that research methods can be blunt tools. Nevertheless, researchers need to be objective and do good science, even as they are invested in the programs they are evaluating. “You care, but the way that you care is by wanting to learn what actually works.” Programs need to learn, and evaluators can help them do so by asking the right questions and having strong research designs.
Rachel Nugent, University of Washington, noted that many students have become interested in what is variously termed implementation science or program science. Funding agencies such as NIH also are becoming interested and are beginning to fund this approach. Stern said that his reading of implementation science is that it remains based on randomization and controlled experiments. But many knowledge gaps exist in such areas as how programs are implemented, how stakeholders are engaged, and what kind
of preplanning needs to take place. Watts pointed out that implementation science involves embedding research to improve programs, which resonates strongly with the approach she has been advocating.
Watts observed more generally that strong evaluations pose a challenge for current public health models of evaluation training and development. “Are the models of public health evaluation that we teach our students broad and flexible enough? We teach them the essence of good evaluation design and randomized controlled trials. [But] as people go into their specializations, are we supporting them to learn how to work effectively as researchers with program? Are we supporting them to be able to bring together different disciplines?… If that skill gets developed, then this sort of intervention model will become more feasible.” Watts observed that in the incentive structures evaluators face, they are rewarded for publishable results based on rigorous designs. When asked to look at issues that are complicated and murky, they may worry about the risks to their careers. The challenge is to create incentives for more difficult evaluations so researchers do not shy away from such work. Strong evaluations require resources, commitment, investments, trust, and strong relationships, Watts concluded, but they can be tremendously beneficial for public health.