This chapter summarizes workshop discussions on methodological issues related to impact evaluation design for the President’s Emergency Plan for AIDS Relief (PEPFAR) and is divided into three sections. In the first section, a diverse set of case studies of conceptual models and methodological approaches are presented from previous large-scale evaluations—from the World Bank, the Abdul Latif Jameel Poverty Action Lab at the Massachusetts Institute of Technology (Poverty Action Lab), the UK Department for International Development (DFID), the Cooperative for Assistance and Relief Everywhere, Inc. (CARE), and The Global Fund to Fight AIDS, Tuberculosis, and Malaria (The Global Fund). In the second section, methodological challenges and opportunities of impact evaluation are described for the measurement of outcomes and impacts specific to human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS), for the measurement of more general outcomes and impacts, for attribution and accounting, and for the aggregation of impact results. The third section summarizes themes common to the approaches.
Impact evaluations require the development of a conceptual model. The model must be defined, the inputs and outcomes measured, and assumptions and conversion factors determined. For prevention of mother-to-child transmission of HIV (PMTCT), noted speaker Sara Pacqué-Margolis of
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 77
4
Designing an Impact Evaluation
with Robust Methodologies
This chapter summarizes workshop discussions on methodological is-
sues related to impact evaluation design for the President’s Emergency Plan
for AIDS Relief (PEPFAR) and is divided into three sections. In the first
section, a diverse set of case studies of conceptual models and methodologi-
cal approaches are presented from previous large-scale evaluations—from
the World Bank, the Abdul Latif Jameel Poverty Action Lab at the Massa-
chusetts Institute of Technology (Poverty Action Lab), the UK Department
for International Development (DFID), the Cooperative for Assistance and
Relief Everywhere, Inc. (CARE), and The Global Fund to Fight AIDS,
Tuberculosis, and Malaria (The Global Fund). In the second section, meth-
odological challenges and opportunities of impact evaluation are described
for the measurement of outcomes and impacts specific to human immuno-
deficiency virus/acquired immunodeficiency syndrome (HIV/AIDS), for the
measurement of more general outcomes and impacts, for attribution and
accounting, and for the aggregation of impact results. The third section
summarizes themes common to the approaches.
CONCEPTuAL MODELS AND METHODOLOGICAL
APPROACHES: CASE STuDIES
Impact evaluations require the development of a conceptual model. The
model must be defined, the inputs and outcomes measured, and assump-
tions and conversion factors determined. For prevention of mother-to-child
transmission of HIV (PMTCT), noted speaker Sara Pacqué-Margolis of
OCR for page 77
EVALUATING THE IMPACT OF PEPFAR
the Elizabeth Glaser Pediatric AIDS Foundation, there is a clear, logical
pathway between access to services, counseling and testing, test results,
prophylaxis by women and infants, and aversion of infections. Assumptions
and conversion factors to be determined for PMTCT can include questions
like the following: What regimens are taken and how effective are they?
Are they actually consumed and when? What is the rate of transmission
during labor and delivery? What is the rate of prevention of infections in
HIV-negative women who come in for counseling? What is the level of in-
fection transmitted through breast milk? Speaker Carl Latkin of the Johns
Hopkins School of Public Health cautioned that although models of change
are needed to guide interventions, sometimes they don’t explain findings.
Models are practical heuristics but should not be blinders, he noted; we
should not let models narrow the way we look at change.
Impact evaluations also require the use of methodological approaches.
These can include quantitative, qualitative, and participatory methods and
theory-based program logic. Examples of impact evaluation methods, pro-
vided by speaker Mary Lyn Field-Nguer of John Snow, Inc., include client
satisfaction interviews and surveys, exit interviews, mystery clients, targeted
intervention research, focus groups, and key informant interviews.
The following case studies describe the experiences from evaluations of
five HIV/AIDS assistance programs run by the World Bank, Poverty Action
Lab, DFID, CARE, and The Global Fund. Conceptual models and different
evaluation methodologies are described in the context of each study.
World Bank Evaluation of HIv/AIDS Assistance Programs
Workshop speaker Martha Ainsworth, lead economist and coordinator
of the Health and Education Evaluation Independent Evaluation Group at
the World Bank, described the approach and methodologies used in an in-
dependent evaluation of the World Bank’s HIV/AIDS assistance programs.
The evaluation assessed $2.5 billion of World Bank investments in HIV/
AIDS prevention, care, and mitigation programs between 1988 and 2004
in 62 developing countries. Two objectives of the evaluation were defined:
(1) to evaluate the development effectiveness—or relevance, efficiency, and
efficacy—of HIV/AIDS assistance in terms of lending, policy dialogue, and
analytic work at the country level relative to the counterfactual, or absence
of a Bank program and (2) to identify lessons to guide future activities.
Ainsworth shared the World Bank’s experience in prioritizing what to
measure in evaluation. Although the World Bank has a large portfolio of
complementary programs in education and agriculture, indicators were nar-
rowed down to only those with direct HIV/AIDS outcomes and impacts.
In addition, identifying how lessons from completed assistance were still
relevant to new approaches posed a challenge, given that three-quarters of
OCR for page 77
DESIGNING AN IMPACT EVALUATION WITH ROBUST METHODOLOGIES
the HIV/AIDS assistance programs being evaluated were still in progress. In
assessing a long-term, ever-changing implementation approach over time,
therefore, the World Bank evaluation was designed to select those issues
that were common to all projects, such as political commitment, setting
strategic priorities, multisectoral responses, ministry of health role, use of
nongovernmental organizations (NGOs) in implementation, and monitor-
ing and evaluation (M&E). The World Bank evaluated the projects com-
pleted in the past and examined those issues relevant to ongoing projects.
Through this approach, the assumptions and design of the ongoing portfo-
lio were analyzed and prospectively evaluated. The World Bank was able
to consider design issues and point out where risks had been mitigated and
where problems could be addressed through midstream adjustments.
The World Bank evaluation drew on a number of methodological ap-
proaches. As Ainsworth noted, the World Bank does not rely exclusively on
a single source of information, but rather uses different types of evaluations
already occurring in the context of the work, such as midterm reviews,
completion reports, and annual reviews. Evaluation methods used include
the following:
• Results chain documentation: Inputs, outputs, outcomes, and
impact of government, the World Bank, and other donor efforts were
gathered.
• Time lines: The documentation of timing of efforts was collected,
although in many activities this type of M&E information is lacking.
• Interviews: Some information was elicited from interviews of stake-
holders, other donors, people and staff involved on the ground, and govern-
ment implementers.
• Desk work: The following were collected and analyzed: literature
reviews; archival research; interviews on the time line of World Bank re-
sponse; an inventory of analytic work; a portfolio review of health, educa-
tion, transport, and social protection sectors; and background papers on
national AIDS strategies.
• Surveys: Surveys were conducted of staff members, audiences for
analytic work, project task team leaders, and country directors.
• Field work: Project assessments and case studies—chosen to reflect
different levels of experience and where interventions worked or did not
work—were collected and reviewed. For example, a project in Indonesia,
canceled because the World Bank intervention occurred before anyone was
visibly ill, was chosen for the evaluation, as was a project in Russia, where
only policy dialogue and analytic work were conducted.
OCR for page 77
0 EVALUATING THE IMPACT OF PEPFAR
use of Randomized Controlled Trial Methodologies to
Evaluate HIv/AIDS Programs
Rachel Glennerster, executive director of the Abdul Latif Jameel Pov-
erty Action Lab at the Massachusetts Institute of Technology, described
the application of randomized controlled trial methodology to HIV/AIDS
program evaluation. She described the advantages and disadvantages of
randomized trial methodologies and then discussed the results from two
case studies in which randomized methods were used, an evaluation of an
HIV education program in Kenya and an HIV status knowledge program
in Malawi.
Advantages and Disadvantages of Randomized Evaluations
To know the true impact of a program, one must be able to assess how
the same individual or group would have fared with and without an inter-
vention. Because it is impossible to observe the same individual in the pres-
ence or absence of an intervention simultaneously, comparison groups that
resemble the test group are commonly used. Common approaches for se-
lecting comparison groups include a “before and after” approach, in which
the same group of individuals are compared before and after exposure to an
intervention, and a “cross-sectional” approach, in which, at a single point
in time, a group of countries or communities in which an intervention has
occurred are compared to a “non-intervention” group. However, programs
are usually started in particular places at certain times for a reason, and
they are usually established with the countries, communities, schools, and
individuals most committed to action. Therefore, estimates of program im-
pact may be biased because it is difficult to find a comparison group that is
equally committed to those where the program was established. This may in
part explain why projects typically work well in a few places, but fail when
scaled up. In randomized controlled trials, like medical clinical trials, those
who receive the treatment and the control group are selected randomly. By
construction, those who receive the proposed new intervention are no more
committed, no more motivated, no richer, and no more educated than those
in the control group. Randomized trials produce results that are freer from
bias than other epidemiological studies. Randomized evaluations can be
used to test the efficacy of interventions before they are eventually scaled
up to the national level.
Randomized trials conventionally have been used to look at drug ef-
fectiveness, but are also being applied to other areas where they are not
commonly used. For example, randomized trials can be used to investigate
social patterns, such as what messages are most effective in changing the
sexual behavior of young girls.
OCR for page 77
DESIGNING AN IMPACT EVALUATION WITH ROBUST METHODOLOGIES
There is a perception that randomized evaluations are difficult both
to implement and to integrate with what is going on at the ground level,
but with innovations in randomization over the past 10 years, randomized
studies are less intrusive and less like more formalized clinical trials. Sev-
eral mechanisms exist to more naturally introduce randomization into the
way a government works or with the way an NGO works on the ground,
including the following:
• Lottery: Randomization can be introduced through a lottery if a
program is oversubscribed.
• Beta testing: Randomization can be introduced through small-scale
experimentation of methods before scaling up to the national level.
• Randomized phase-in over time and space: Capacity or financial
constraints may limit the ability to introduce interventions in all com-
munities immediately. The order in which a program is phased in can be
randomized, allowing for an assessment of effectiveness to be made during
the phase-in period.
• Encouragement design: Often, national programs that are up and
running do not have 100 percent adoption; the impact of such programs
can be evaluated by randomly encouraging some people to participate in
the program.
Several of these mechanisms simultaneously help to address some of the
ethical questions surrounding randomized design—the exclusion of people
from having access to care or programs that might save their lives. In the
randomized phase-in approach, all individuals will ultimately benefit from
the intervention; under the encouragement design, no one is denied care.
A disadvantage of randomized evaluation is that it cannot be done
after the fact; it must be implemented with the program. Institutional con-
straints are another disadvantage to randomized evaluation that sometimes
make it more difficult to engage with partners in an intensive way. One
workshop participant noted that randomized controlled trials can be dif-
ficult to translate from the individual level to the community level, where
interventions are more complex. Glennerster acknowledged that random-
ized controlled trials can be improperly designed and can thereby generate
incorrect results.
Using Randomized Trials to Evaluate HIV/AIDS Education
Programs in Kenya
Randomized trial methodology was used to evaluate a Kenyan HIV/
AIDS education project, a collaborative effort among the government of
Kenya, a local NGO, U.S. universities, and Jomo Kenyatta University in
OCR for page 77
EVALUATING THE IMPACT OF PEPFAR
Kenya. The method was used in randomly chosen schools to test a range of
education strategies for their effectiveness in getting children to understand
messages about the risks of HIV. These strategies included training teach-
ers in a new HIV/AIDS education curriculum, reducing education costs to
encourage young girls to stay in school, holding debates about whether or
not to teach about condoms in primary schools, holding essay competitions
about protection from HIV, and telling children about relative infection
rates by age, including the dangers of sexual, gift-exchanging relationships
between young girls and older men (sugar daddies), the greater likelihood
of older men to be infected than younger men, and the greater likelihood of
girls to be infected than boys. Upon implementation of each program, the
evaluation tracked observed changes in behavior, including school dropout
rates, marriage, pregnancy, and childbirth, as determined by community
interviews. Follow-up studies are also tracking HIV infection rates under
each type of intervention.
Results from the trial are shown in Figure 4-1.
FIGuRE 4-1 Impacts of alternative HIV/AIDS education strategies on girls’ behav-
ioral outcomes.
NOTE: ´Indicates that the difference with the comparison group is significant at
10 percent.
4-1
SOURCES: Duflo et al., 2006, and J-PAL, 2007.
Bitmapped--cannot remove background
OCR for page 77
DESIGNING AN IMPACT EVALUATION WITH ROBUST METHODOLOGIES
The teacher training in the national curriculum had little effect on
school dropout rates, marriage, and childbirth, although girls from schools
where the training was conducted were more likely to be married if they had
a child, and there was a slight effect on increasing tolerance of those with
HIV in schools that underwent the training. Reducing the cost of education
was found to be an effective strategy for reducing dropout, marriage, and
childbirth rates. Education programs about the dangers of sexual relations
with older men, or sugar daddies, led to a 65 percent drop in pregnancies
or childbirths with older men and no increase in childbearing with younger
men. Self-reported data indicated a shift between having relationships with
older men to having relationships with younger men. Self-reported data
from the boys in the group indicated increased condom use, potentially be-
cause boys had learned that girls were much more likely to be infected than
boys. Results of the debate and essay interventions remain to be tested with
outcome data; currently, only self-reported data exist, which can be very
biased. On the basis of the costs of the interventions, the evaluators were
able to calculate a cost-per-childbirth-averted rate for each intervention,
with the education program about older men being the most cost-effective
intervention, at $91 per childbirth averted, compared to $750 per childbirth
averted for interventions to reduce the cost of schooling.
Using Randomized Trials to Evaluate HIV Status Knowledge
Programs in Malawi
Although half of HIV/AIDS prevention spending in Africa focuses on
HIV testing, many of those tested do not come back to pick up their results.
A study conducted in Malawi used randomized evaluation to test the im-
pact of campaigns promoting knowledge of HIV status (Thornton, 2007).
Only 40 percent of those tested for HIV returned to collect their results,
but the study showed that a small incentive—only 10–20 cents, or a small
fraction of the daily wage—was enough to increase results collection by
50 percent. The study went on to test whether or not knowledge of status
changed behavior. In follow-up interviews with those who had and had
not received encouragement to pick up their test results, people were given
the opportunity to buy subsidized condoms and the money to buy them. In
comparing the treatment group (those encouraged to and therefore more
likely to know their status) with the control group (those who were not
encouraged and thus less likely to know their status), the study found that
the knowledge of HIV status had virtually no impact on whether people
purchased subsidized condoms, even when they were given the money to
buy them. Only HIV-positive individuals in long-term partnerships were
more likely to buy condoms if they knew their status, and few bought
subsidized condoms.
OCR for page 77
EVALUATING THE IMPACT OF PEPFAR
Glennerster cautioned that if randomized methodologies are not used
and if studies survey only the sample that returns for test results, it may
appear as if knowledge of status is effective in reducing HIV incidence.
A randomized methodology allows researchers to tease out proper at-
tribution for the perceived success of a program. Glennerster also noted
that the use of plausible correlation approaches—suggested by workshop
speaker Paul De Lay of the Joint United Nations Programme on HIV/AIDS
(UNAIDS) as a more practical methodology applicable to work at the
country level—without doing a full trial can also lead to the wrong policy
conclusion. With millions of dollars being invested in knowledge-of-HIV-
status programs, it is worth testing whether they are effective in reducing
incidence, she concluded.
DFID Evaluation of the National HIv/AIDS Strategy
Speaker Julia Compton, senior evaluation manager of the Evaluation
Department, DFID, described a recent evaluation of the UK national HIV/
AIDS strategy, “Taking Action,” a comprehensive and far-reaching $3 bil-
lion, 5-year effort launched in 2004, which included a substantial overseas
investment component. This national strategy cuts across the UK govern-
ment and involves six priority areas. The following four objectives were
defined for the evaluation:
• Developing recommendations for improving implementation
• Developing recommendations for how to measure success: indicators
• Developing recommendations for a future UK strategy on HIV and
AIDS
• Developing recommendations for other UK government strategies
Through an extensive consultative process, DFID identified 13 evalu-
ation questions focusing on inputs and processes specific to decisions,
for example, the usefulness of spending targets and the effectiveness of
country-led approaches.
The evaluation used several methodologies. Seven case studies of coun-
tries were conducted and three working papers were developed to gain an
understanding of spending, M&E frameworks, and challenges in reaching
women, young people, and vulnerable groups.
The evaluation was a heavily consultative process; in fact, the process
of communications and consultations during the evaluation process may
have had greater impact on changes in the strategy than the actual evalua-
tion data, remarked Compton. The process of evaluation motivated DFID
to make changes needed to achieve positive results. Compton cautioned
that concentrating too narrowly on the data—at the expense of communi-
OCR for page 77
DESIGNING AN IMPACT EVALUATION WITH ROBUST METHODOLOGIES
cation and understanding what policy makers want—may result in missed
lessons from evaluation.
A major challenge to the DFID evaluation was the declining quantity
and quality of data collected at projects in-country. Because DFID relies
heavily on country-led approaches and country systems to collect data, this
was a major constraint to the evaluation.
CARE Evaluation of Women’s Empowerment Programs
Kent Glenzer, director of the Impact Measurement and Learning Team
at CARE, described the approach and methodology of a multiyear evalua-
tion of the impact of women’s empowerment interventions. The evaluation
is a $500,000 effort assessing interventions at field sites in more than 40
countries, plus 900 other projects through secondary data. This evaluation
is being conducted to inform organizational change at CARE, a private,
international humanitarian organization with a focus on fighting global
poverty.
CARE uses a literature-based theory of social change and defines the
concept of empowerment as a process of change in women’s agencies,
social structures, and relations of power through which women negotiate
claims and rights. CARE’s approach for evaluating complex systems, such
as women’s empowerment, involves bringing together experts—internal,
external, and local—and coupling M&E with project implementation. In
CARE’s experience, local actors know and understand systemic changes
better than external experts; therefore, CARE’s role is to bring actors—
most importantly women and girls—together over the long term to discuss
systems changes, develop hypotheses, and build collective knowledge about
change.
CARE is tracking change across 23 categories of women’s empower-
ment. Indicators—including those developed by local men and women—are
developed at multiple levels for each category and include measures of
individual skills or capabilities; measures of structures such as laws, family
and kin practices, institutions, and ideologies; and measures of relational
dynamics, such as those between men and women and between the power-
ful and less powerful. Although across the sites the indicators are differ-
ent, broad patterns can be compared relating to where and how change is
happening.
The following attributes of a successful evaluation approach, from the
perspective of CARE, were outlined:
• Evaluation is a long-term learning experience that should unite
relevant actors.
• Evaluation should be flexible enough so that different dependent
OCR for page 77
EVALUATING THE IMPACT OF PEPFAR
variables can be specified in different contexts, but should be designed to
permit comparison of variables across contexts.
• Centrally planned, mixed-method evaluation designs work best.
The Global Fund Evaluation
Stefano Bertozzi, member of the Technical Evaluation Reference Group
of The Global Fund, described a 5-year evaluation plan for The Global
Fund, which will focus on 8 countries in depth, plus 12 others using sec-
ondary information. The evaluation is a “dose-response design,” meaning it
will look for correlations between intensity of project implementation and
changes in trends of the HIV/AIDS epidemic in terms of survival of infected
individuals and prevention of new infections.
The plan includes evaluation of the following three major topics:
• Organizational efficiency: Operations, business model, and gover-
nance structure in The Global Fund, which are based on technical reviews
of country-generated proposals with little country presence other than au-
diting firms, will be evaluated.
• Partnership environment effectiveness: Country and grant perfor-
mance will be evaluated, including the effectiveness of mobilization of tech-
nical assistance and effectiveness of country-coordinating mechanisms.
• Health impact: The health impact of The Global Fund on the three
diseases it covers (HIV/AIDS, TB, and malaria) will be evaluated.
MACRO International Inc., Harvard University, the World Health Or-
ganization (WHO), and Johns Hopkins University are implementing the
evaluation, and data collected by MACRO through Demographic and
Health Surveys-Plus (DHS+)1 will serve as the baseline assessment. The
limited budget of the evaluation will not permit the conduct of large-scale
surveys.
METHODOLOGICAL CHALLENGES AND
OPPORTuNITIES IN EvALuATING IMPACT
Workshop participants described methodological challenges and oppor-
tunities in evaluating the impact of PEPFAR, including those in measuring
outcomes and impacts specific to HIV/AIDS, measuring broader impacts
and outcomes, attributing results, and aggregating the results of impact
evaluation. The discussions were wide-ranging and touched on many chal-
1 Demographic and Health Surveys including HIV prevalence measurement are known as
“DHS+.”
OCR for page 77
DESIGNING AN IMPACT EVALUATION WITH ROBUST METHODOLOGIES
lenges and opportunities, but were by no means an exhaustive or prioritized
list of considerations or an in-depth analysis of any one of them.
Measuring HIv/AIDS-Specific Outcomes and Impacts
HIV/AIDS-specific outcomes and impacts include the measurement of
HIV prevalence, incidence, infections averted, mortality rates, development
of drug resistance, orphanhood prevention, behavioral change, and stigma
and discrimination. Workshop participants described methodological chal-
lenges and opportunities in each of these areas.
Measuring Change in HIV Prevalence
HIV prevalence is the proportion of individuals within a population
infected by HIV during a particular time. It is a function of both the death
rate of those already infected and the rate at which new infections occur.
Repeated surveillance of pregnant women at antenatal clinic (ANC) sentinel
sites is currently the most common method for measuring changes in HIV
prevalence. Workshop speaker Theresa Diaz of the U.S. Centers for Disease
Control and Prevention (CDC) pointed out some of the challenges and limi-
tations of using this approach. Comparison with nationally representative
household-based surveys shows that the ANC surveillance method tends to
overestimate prevalence, she said, because ANCs are predominantly urban.
In addition, the ANC methodology does not take into account other fac-
tors, such as the change in use of clinics over time, increased survival, or
immigration, which can lead to a change in HIV prevalence. The method
is also unreliable for measuring prevalence in areas where epidemics are
concentrated in high-risk groups, such as Vietnam.
Diaz noted that a number of new tools are now becoming available to
analyze prevalence trends more effectively. CDC uses a suite of methods
(chi-square, linear, trend, linear regression, and nonparametric methods) for
analyzing prevalence trends using only the most consistent ANC sites and
the most recent data. In addition, a second population-based survey of HIV
testing will soon be available in some countries to allow analysis of HIV
prevalence over time. The collection of data on antiretroviral (ARV) use—
both from ANC sentinel surveillance surveys and from the population-based
surveys—would allow better prevalence data to be collected, in addition to
data on coverage. Finally, methods such as respondent-driven sampling are
being standardized for collecting HIV sero-prevalence data among high-risk
groups. When such methods use the same sampling methodology in the
same place over time, trends can be observed.
OCR for page 77
00 EVALUATING THE IMPACT OF PEPFAR
plies, lab services, and curative care services. Quantitative provider surveys
were used to measure impact on individual providers and facilities receiving
funds and to assess training, supervision, motivation, and job satisfaction.
In-depth qualitative interviews with important stakeholders were also con-
ducted throughout the entire health system.
Novak stressed the importance of monitoring both positive and nega-
tive impacts of interventions, which can help countries address critical
issues in the health system. For example, although the SWEF evaluation re-
sults showed positive impacts on the health system—such as greater partici-
patory engagement, decentralization, the emergence of new public–private
collaborative arrangements, creation of improved incentives and work en-
vironment for those working in HIV/AIDS, and harmonization of pricing
and cost-recovery approaches—there were also some negative impacts, such
as delivery-level constraints as HIV/AIDS drew both human resources and
services away from other health areas, and poorly functioning procurement
and distribution systems in some countries.
Challenges of using this more descriptive methodological approach
include the lack of empirical estimates of impacts, small sample size, short
time interval over which change was evaluated, and lack of ability to easily
attribute impact.
Evaluating impact of HIv/AIDS interventions on non-HIv primary health
care services. Jessica Price, Rwanda country director of Family Health
International (FHI), presented results from a study conducted in Rwanda
testing the hypothesis that HIV/AIDS interventions strengthened the num-
ber of non-HIV primary health care services. Study data were derived from
the review of monthly activity reports submitted by health centers to the
government of Rwanda. The study compared the quantity of non-HIV
health services delivered before and after the introduction of basic HIV
care, defined as services including counseling and testing, PMTCT, preven-
tive therapy, and basic upgrades to health center infrastructure. The study
assessed 30 FHI partner health centers from 4 provinces and 14 districts in
Rwanda, representing 21 faith-based centers and 9 public centers. Hospitals
that do not deliver some non-HIV services and health facilities with fewer
than 6 months’ experience delivering basic HIV care were excluded from
the study.
A set of 88 indicators of non-HIV services delivery was tracked, with
22 indicators considered to represent the best range of public health ser-
vices. These included general services (such as inpatient and outpatient ser-
vices and lab tests), reproductive health services, and services for children.
In addition to monitoring impacts of HIV/AIDS interventions, the study
also tracked impacts of two other health programs—primary health care
insurance and performance-based financing—and used regression analysis
OCR for page 77
0
DESIGNING AN IMPACT EVALUATION WITH ROBUST METHODOLOGIES
to isolate the independent effects of HIV/AIDS interventions. The analysis
consisted of calculating mean quantities of non-HIV services delivered per
primary health center per month between the two time periods, testing for
significant differences, and conducting regression analysis to control for
experience with other health programs (insurance and performance-based
financing) to determine which program, if any, had an independent effect
on the observed change.
The HIV programs were shown to have had an independent effect
in a number of indicators across a range of areas. These areas included
improved coverage for antenatal visits and services, use of health care
facilities for maternity services by HIV-positive women, syphilis screening,
family planning services, child vaccination and growth-monitoring services,
outpatient consultations, and hospitalization services.
Limitations and challenges of the methodology were discussed. In fu-
ture analyses, evaluation of the impacts of HIV programs should also
include hospital settings. Indicators could also be tracked for impacts on
other diseases (such as, malaria, TB, and sexually transmitted infections),
quality of patient care, costs of HIV-specific services (such as HIV tests)
versus non-HIV-specific services (such as infrastructural upgrades like incin-
erator construction and maintenance of electricity), and client and provider
satisfaction. Future studies should also look at larger sample sizes over
longer time periods. A random selection of sites should also be considered
in future studies, noted speaker Field-Nguer. The fact that all chosen sites
were FHI partners may have given them a competitive edge, she noted. If
being FHI sites did not confer an edge, then perhaps access to services can
be replicated at any site in Rwanda. But if FHI status did confer an edge,
then perhaps unique attributes of the partnership can tell us something
about how to replicate the impact, she noted. Workshop participant Laura
Porter of CDC added that future studies will need to ensure that service
delivery improvement is a real effect and not just an artifact of data system
improvement.
Measuring Impact of Complementary Interventions
As described in Chapter 2, PEPFAR investments include numerous
interventions in programs complementary to more narrowly focused HIV
services. These so-called wraparound programs include interventions in ar-
eas such as malaria, TB, nutrition education, food security, social security,
education, child survival, family planning, reproductive health, medical
training, health systems, and potable water.
Workshop speaker Bertozzi described methodologies from two case
studies from Mexico in which such complementary interventions were
OCR for page 77
0 EVALUATING THE IMPACT OF PEPFAR
evaluated: a human-capacity development program for children and a food
assistance program.
The Oportunidades program is a Mexican government–sponsored
human-capacity development program for Mexico’s poorest children. Fi-
nancial incentives to parents are offered through the program for ensur-
ing children’s participation in health, nutrition, and educational services.
The Programa del Apoyo Alimentario (PAL) program provided food
assistance—either food or cash payments—to small rural communities in
Mexico. Impact evaluations of both Oportunidades and PAL were con-
ducted using prospective randomized evaluation, in which later program
enrollees were compared to earlier program enrollees. Both health impacts
and education impacts were monitored through the evaluations. For Opor-
tunidades, health indicators tracked include use of preventive services (such
as well visits and vaccinations), use of curative services, out-of-pocket ex-
penditures, and anemia prevalence. PAL health impacts monitored included
height-for-age, weight-for-height, and weight-for-age. Education indicators
monitored in the Oportunidades program included grade-level achievement,
attendance, early enrollment, and repetition of grades.
The evaluative approach from these studies could potentially be applied
to the evaluation of complementary interventions in the PEPFAR program,
particularly to health and educational interventions targeting orphans and
vulnerable children, noted Bertozzi. Other indicators of “basic capability”
child care interventions could include zinc status, sick days, days incapaci-
tated, prevalence of risky and healthy behaviors (such as alcohol use, sexual
activity, and exercise), and educational performance.
Bertozzi emphasized the importance of controlling for secular—long-
term, noncyclical—trends in impact evaluation. Such trends can sometimes
have a large effect independent of the intervention. For example, malnutri-
tion indicators were tracked in the poorest rural communities in Mexico in
the 5 years leading up to the start of the PAL program (ENN-1999 versus
PAL-2004, the baseline for the PAL intervention). In the absence of any
intervention, noted Bertozzi, extraordinary secular trends led to a halving of
malnutrition indicators in these communities. Any intervention conducted
during this 5-year period would have given the appearance of stimulating
a large positive effect when there might have been none at all—or perhaps
even a negative effect.
Measuring Impacts of Gender-Focused Activities
Workshop participants discussed some of the challenges and oppor-
tunities for evaluating the impacts of gender-focused activities, including
those interventions to promote gender equality and women’s empowerment.
Noting that gender equality and women’s empowerment are multidimen-
OCR for page 77
0
DESIGNING AN IMPACT EVALUATION WITH ROBUST METHODOLOGIES
sional, open, complex, nonlinear, and adaptive systems, speaker Glenzer
observed that it is seldom clear what variables are or are not involved.
It is a challenge to define what constitutes success and what it looks like
on the ground. Glenzer said some of the difficulty in tracking change of
gender systems relates to the following characteristics: the large-scale ef-
fects of small changes over time, the separation of causes and effects over
large spatial and temporal scales, the multiple levels over which change
may occur, and the heterogeneity of systems. Speaker Julie Pulerwitz of the
Population Council acknowledged the difficulty in implementing rigorously
designed evaluations and called for more consensus building about how to
operationalize the concept of gender and how to evaluate gender-related
activities. Although gender is generally recognized as important, she added,
there have been few outcome evaluations and few tools developed on how
gender-focused activities affect HIV risk. Few good indicators exist that
are useful in understanding social dynamics, and evaluation schemes often
underrepresent the perspectives of local people, who are a source of such
knowledge, noted Glenzer.
Speaker Pulerwitz described a new method now available for studying
the impacts of gender-focused activities and how those impacts can con-
tribute to PEPFAR goals. Pulerwitz directs an operations research program
called Horizons at the Population Council that has conducted studies using
this method. Pulerwitz shared the study design and tools used for an evalu-
ation of gender-focused programs—group education, community-based be-
havioral change communication campaigns, and clinical activities—focused
on young men in Brazil. A combination of data collection approaches were
used, including the following:
• Pre- and postintervention surveys and a 6-month follow-up survey
for three groups of young men—two intervention groups and a compari-
son group, which eventually also received the interventions after a time
delay—followed over a year
• In-depth interviews with a subsample of young men and their
sexual partners
• Costing analysis and monitoring forms for different activities
An evaluation tool called the Gender Equitable Men’s (GEM) scale was
used to look at gender norm attitudes and how they changed over time
(Barker, 2000; Pulerwitz and Barker, 2008). The scale includes 24 items,
including parameters such as home and child care, sexual relationships,
health and disease prevention, violence, homophobia, and relations with
other men. Certain GEM scale domains are associated with partner vio-
lence, level of education, and contraception use. The GEM tool was used
to detect significant changes in attitude toward equitable gender norms and
OCR for page 77
0 EVALUATING THE IMPACT OF PEPFAR
in support of inequitable gender norms in the two intervention groups as
compared to the control group. HIV outcomes—condom use with primary
partners—were also tested, and one of the intervention groups showed an
increase as compared to the comparison group. The study also looked at
covariance between changes in attitudes toward norms and changes in con-
dom use; men who were more gender equitable were more likely to report
condom use. The in-depth interview component of the analysis unearthed
other changes among those in the test groups, including a delay in sexual
activity in new relationships.
The evidence generated by the evaluation is supportive of interventions
that target gender dynamics and their influence on HIV risk behavior in
Brazil, concluded Pulerwitz. She noted that there are ongoing or planned
efforts to adapt the GEM tool to other country contexts—India, Ethiopia,
Namibia, Uganda, and Tanzania—and to other demographic groups, such
as married men. Preliminary findings show that results can be highly coun-
try specific. Although a similar trend toward more equitable attitudes has
been observed in the work conducted in India, baseline attitudes in that
country are much less supportive of equitable gender norms than those in
Brazil.
Measuring Coordination and Harmonization
Workshop speaker De Lay spoke of a new opportunity for measur-
ing coordination and harmonization—the alignment of interventions with
country-level plans and coordination of efforts among other implementing
partners. A new tool, known as the Country Harmonization and Alignment
Tool (CHAT), developed by UNAIDS and the World Bank, is now avail-
able and could be applied to the standardization of alignment of interven-
tions with country-level plans and coordination of efforts among partners
(UNAIDS, 2007a).
The tool has been used to assess harmonization and alignment of the
national plan, coordinating mechanism, and M&E plan in six pilot coun-
tries, and a launch of the tool is planned in two more countries. The tool
has revealed that many national plans are still not credible, not costed
appropriately, not prioritized, and not actionable. In addition, the tool
has shown that few countries have a central funding channel or single
procurement system for the HIV/AIDS response. The tool has also shown
that “basket funding,” or joint funding by multiple donors, is not normally
used. Although donors support the notion of the development of indigenous
national M&E capacity, the tool has revealed that in practice donors usu-
ally rely on their own M&E systems to collect urgent data when needed.
OCR for page 77
0
DESIGNING AN IMPACT EVALUATION WITH ROBUST METHODOLOGIES
Measuring Community-Level or Population-Level Service Delivery
Workshop speakers spoke of the challenges of scaling up successful
service-delivery interventions for specific populations, such as children,
families, communities, and high-risk groups. As workshop speaker Bertozzi
observed, sometimes it is difficult to distinguish between a community-
level or population-level effect and the effect of an intervention. Tools
are needed, noted speakers Kathy Marconi of OGAC and Stoneburner, to
measure the effectiveness of interventions in specific populations, including
communities, diverse populations, and at-risk or infected populations.
Speaker Field-Nguer announced that a new and important addition to
the evaluation toolbox is now available: community-level program informa-
tion reporting systems (CLPIR) (personal communication, R. Yokoyama,
John Snow, Inc., January 18, 2008). CLPIR indicators look strictly at
community-level service delivery and help answer questions such as when,
how, and where people want testing and treatment.
Attributing Impact
Given the diversity of programs and funders, attributing impact—or
relating a particular effect to the work of a specified agent—is a substantial
methodological challenge in evaluation, workshop participants said. The
World Bank experience shows that because loans or grants are made to
governments, speaker Ainsworth said, performance of activities depends
heavily on governments, and it is therefore difficult to disentangle the
efforts of government and any particular donor from the efforts of all
other donors. Even within the programs of a single donor, noted speaker
Gootnick, accounting can be complex. Some interventions can be double
counted; for example, voluntary counseling and testing is included under
both the prevention and care modalities. As PEPFAR moves increasingly
toward more harmonized approaches, noted speaker Compton, it will be
even more difficult to disentangle effects in an exclusive way.
Many workshop participants agreed that the demand for exclusive at-
tribution by donors may not be constructive. General evaluation of what is
and is not working, in contrast, may be desirable, noted workshop modera-
tor Ruth Levine of the Center for Global Development. Speaker Glennerster
emphasized that it is preferable to test what works in very specific areas and
then judge a program by whether it spends money on interventions whose
effectiveness is supported by evidence. All programs are doing many things
in-country; they are implementing many different policies. If we want to be
effective in focusing resources on what works, we need to identify which
interventions have the most impact and which are most cost-effective, she
said. Speaker Diaz reinforced this idea, stating that a worthwhile attribu-
OCR for page 77
0 EVALUATING THE IMPACT OF PEPFAR
tion goal should be to know the effectiveness of certain programs and their
coverage in terms of impact measures. A useful attribution exercise, she sug-
gested, might be to determine what level of ART coverage decreases general
mortality and what types of prevention activities, in which populations,
decrease HIV incidence. Ainsworth added that it is nevertheless useful to
analyze the value added of the unique approaches of particular donors.
An important dimension of attribution is the concept of the counter-
factual, or the assessment of what would have happened differently had the
donor not intervened. Some speakers noted that absence of the donor does
not necessarily imply that nothing would have happened. Discussant Jim
Sherry of George Washington University observed that one consequence of
donor interventions is that the donor occupies a particular space and pre-
vents other organizations from filling it. As speaker Bertozzi pointed out, in
the case of South Africa, even if outside institutions did not intervene, given
the massive social mobilization potential in the country, dramatic change
could have been effected without outside help.
Aggregating Evaluation Results
Several speakers noted that the synthesis or aggregation of evaluation
results is a methodological frontier. Workshop participant David Dornisch
of the U.S. Government Accountability Office proposed that meta-analysis
or synthesis could be used to bring together the results of multiple studies.
From the congressional perspective, workshop participant Naomi Seiler
from the U.S. House of Representatives Oversight Committee also stated
that while prospective evaluation is useful, any type of meta-analysis or syn-
thesis of what is already known about types of interventions, contexts, and
populations would be helpful. Discussant Jimmy Kolker of OGAC echoed
the need for data synthesis to be relevant to designing or implementing a
program.
Workshop discussant Sherry observed that such methods have yet to
be developed, however. Sherry predicted that the clustering of country-level
assessments and evaluations will likely provide much more information
through meta-analysis than one definitive, globally executed impact study.
Although there is room for both kinds of evaluations, he noted, there is
substantial room for improvement on meta-analysis to look statistically at
the results of these studies. Sherry observed that there may be inadequate
separation of macro-, micro-, and meta-level evaluation processes, leading
to an evaluation either not making sense to policy makers or not being
rigorous enough for scientists. Micro-level evaluation tends to be too tech-
nical and too situation-specific to be digestible to institutions or useful for
interventions. Macro-level evaluation tends to be too soft and too subject
to evaluation spin to be digestible or credible. Durable findings are needed
OCR for page 77
0
DESIGNING AN IMPACT EVALUATION WITH ROBUST METHODOLOGIES
about programs that allow for more sustainable dialogue and learning at
the meta-level in terms of evaluation.
Another workshop participant raised a question about the value of
performing multiple evaluations. Speaker De Lay commented that although
it is sometimes desirable to avoid duplication where it is not needed, some-
times duplication is necessary and multiple perspectives are desirable. For
example, validation of existing data by an independent group is often a
useful alternative to redoing an entire study.
THEMES COMMON TO EvALuATION
METHODOLOGIES AND APPROACHES
This section distills some of the main messages and themes common to
the discussions about evaluation methodologies and approaches.
Prioritization
Most evaluations require some type of prioritization to narrow down
what is to be measured. Speaker Ainsworth noted that for long-term evalu-
ations, for example, one might select only those issues common to all proj-
ects. For a large portfolio of activities, she added, one might select a more
narrowly defined set of indicators.
value of Consultation and Communication
Several speakers emphasized the value of consultation and communica-
tion in any evaluation approach. Speakers Compton and Glenzer observed
that consultation and communication through the evaluation process are
as important in effecting change and course corrections as the data from
the evaluation results. It also matters who is consulted, observed speaker
Field-Nguer.
value of a “Learning” Evaluation
Many of the evaluation methodologies described were formative, or
“learning” evaluations, designed to help improve institutional performance.
As Glenzer noted, evaluation is a long-term learning experience that should
unite relevant actors. Speaker Ainsworth added that bringing to bear the
findings of past support can inform ongoing programs. Using evaluation
to understand the variation in outcomes, or the distribution of outcomes
within a population, can help us learn, she said. For example, changes in
the average life expectancy or the average change in behavior is not as
OCR for page 77
0 EVALUATING THE IMPACT OF PEPFAR
interesting as knowing why behavior changed in one group of people but
not another.
Others emphasized the heuristic value of negative evaluation results.
Analysis of failures, observed speaker Field-Nguer, is sometimes more fruit-
ful than success stories. Negative evaluation results should be divulged and
shared, one workshop participant urged; if they are not shared, programs
lose credibility and waste money. Speaker Glenzer noted that all of CARE’s
research reports are published on Emory University’s website and include
some research indicating that CARE is not having long-term impacts on
women’s empowerment or underlying causes of gender inequality.
The emphasis on learning evaluations contrasts with a more typical
systemic bias in the international health community in which actors want
to see programs continue, noted workshop discussant Sherry. Therefore, in-
stead of using evaluation for learning, it is used to protect our interests and
programs. Sherry underscored the importance of sustaining the institutional
learning process. The isolation of evaluation departments in international
health systems—analogous to the isolation of smart and reflective people
in universities, organized into separate compartments so they have minimal
effect on the society around them—is one obstacle to institutional learning,
he noted. Decision-making cycles, such as 5-year cycles, reauthorizations,
or external audits, drive evaluators into prominence briefly but then fade
away. Also observing the existence of different consumers of evaluation,
speaker Nils Daulaire of the Global Health Council emphasized the impor-
tance of having a single M&E system that satisfies multiple sets of needs.
For example, if a customer for evaluation is Congress, then the evaluation
will emphasize putting on the best possible spin, but that must be balanced
with the use of evaluation on a daily basis to help improve program de-
velopment and results. One step in achieving a multiuse system is to give
evaluators a role in program management and development as opposed to
a peripheral role in projects.
Importance of Designing the Evaluation Early
Several speakers emphasized the importance of considering evalua-
tion design early in the implementation process so that the design will be
appropriate and so that impacts can be detected early. Speaker Compton
urged that evaluations be set up at the beginning of the process, and speaker
Bertozzi also spoke about some of the drawbacks of an ex-post evaluation.
Speaker Glennerster noted that opportunities to use powerful randomiza-
tion approaches exist, but they can be used only if the design is included
at the beginning of an intervention. Field-Nguer and Bertozzi stressed the
importance of baseline assessments, without which the wrong conclusions
may sometimes be drawn.
OCR for page 77
0
DESIGNING AN IMPACT EVALUATION WITH ROBUST METHODOLOGIES
understanding the Limitations of Models and Data
Workshop participants acknowledged the limitations of data and mod-
els used in evaluation. Speaker Pacqué-Margolis emphasized that empirical
data are often inadequate, lacking, or inaccurate, and speakers Ainsworth
and Compton emphasized that poor data quality at the country level is often
a serious problem. Speaker Garnett emphasized the existence of data gaps
for measuring efficacy in different epidemiological contexts. Age- and sex-
specific empirical data are also lacking, noted discussant Fowler. Ainsworth
stressed that incentives need to be created to encourage project staff and
governments to establish and maintain monitoring efforts. Not all data are
of the same quality, participants said. Speaker Glennerster noted that data
based on self-reported behavior might have issues regarding reliability.
Models are powerful tools that can help in evaluation, but they also
have limitations. Speaker Glennerster pointed out that models need to be
validated with empirical data, and variables need to be added to them to
make them more accurate predictors. Speaker Garnett also observed that
models are less reliable predictors when the spread of HIV infection be-
comes epidemic.
value of Multiple Methodologies
Several presenters noted the value of using multiple methodological
approaches in evaluation. Speakers Compton and Ainsworth cautioned
against relying exclusively on one evaluation methodology, and speaker
Field-Nguer pointed out that multiple methods may yield richer results
than one or two methodologies. Field-Nguer also noted that lack of a base-
line assessment (as was the case in PEPFAR) may increase the importance
of using several methodologies, including qualitative measures. Speaker
Glenzer reinforced the point with his comment that centrally planned,
mixed-method evaluation designs work best.
At the same time, the use of multiple methods should be strategic,
noted workshop speaker Glennerster. She noted that currently organiza-
tions often conduct a confused mix of process/output and impact evalu-
ations in too many places. Instead, she recommended conducting good
process evaluations everywhere and a moderate number of high-quality
impact evaluations focusing on a few key questions.
value of Randomization
Multiple presenters emphasized the value of randomization tools in
the conduct of evaluations. Glennerster pointed out that new methods
of randomization are now available that integrate with evaluation with
OCR for page 77
0 EVALUATING THE IMPACT OF PEPFAR
minimal disruption. In his presentation, Bertozzi also drew on evidence
from randomized controlled trials. Speaker Field-Nguer pointed out that
nonrandom selection of sites has the potential to limit or weaken a study.
Workshop participant De Lay discussed some of the potential problems
with impracticality of randomization.
Comparison Across Contexts
Several workshop participants stressed the highly contextual nature
of change when comparing across contexts. Evaluations that are centrally
coordinated to permit comparison of variables across contexts, while al-
lowing some flexibility in indicator design at the local level, are optimal,
suggested speaker Glenzer. Interventions that are successful in one country
are not necessarily transferable to another country, noted workshop speaker
Stoneburner. Examples provided by Stoneburner and speakers Latkin,
Garnett, and Pulerwitz supported this statement. In some cases, factors
independent of an explicit program intervention can have an influence on
change. In other cases, change in behavior does not always lead to a change
in the pattern of the HIV/AIDS epidemic, and changes in the pattern of the
epidemic cannot always be translated to a change in behavior. Close en-
gagement of the scientific community in evaluation, urged speaker Latkin,
can help to assess the likelihood of transferability of effective programs to
other settings.