Read "Strategic Approach to the Evaluation of Programs Implemented Under the Tom Lantos and Henry J. Hyde U.S. Global Leadership Against HIV/AIDS, Tuberculosis, and Malaria Reauthorization Act of 2008" at NAP.edu

« Previous: PART I Introduction and Background

Page 17 Cite

Suggested Citation:"PART II Proposed Evaluation Approach." Institute of Medicine and National Research Council. 2010. Strategic Approach to the Evaluation of Programs Implemented Under the Tom Lantos and Henry J. Hyde U.S. Global Leadership Against HIV/AIDS, Tuberculosis, and Malaria Reauthorization Act of 2008. Washington, DC: The National Academies Press. doi: 10.17226/12909.

Page 18 Cite

Page 19 Cite

Page 20 Cite

Page 21 Cite

Page 22 Cite

Page 23 Cite

Page 24 Cite

Page 25 Cite

Page 26 Cite

Page 27 Cite

Page 28 Cite

Page 29 Cite

Page 30 Cite

Page 31 Cite

Page 32 Cite

Page 33 Cite

Page 34 Cite

Page 35 Cite

Page 36 Cite

Page 37 Cite

Page 38 Cite

Page 39 Cite

Page 40 Cite

Page 41 Cite

Page 42 Cite

Page 43 Cite

Page 44 Cite

Page 45 Cite

Page 46 Cite

Page 47 Cite

Page 48 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

PART II Proposed Evaluation Approach

CONGRESSIONAL CHARGE AND PLANNING PHASE APPROACH Under the LantosâHyde Act of 2008, 9 Congress mandated that the IOM conduct a study that includes an assessment of the performance of U.S.-assisted global HIV/AIDS programs and an evaluation of the impact on health of prevention, treatment, and care efforts that are supported by U.S. funding, including multilateral and bilateral programs involving joint operations (see Appendix A). Based on clarifications with congressional staff and OGAC, 10 the charge is intended to focus on the performance and impact of bilaterally funded PEPFAR programs in the current partner countries (see Table 1 in Part III, Section 2 for a list of countries). This will include programs and activities that are operated jointly with both bilateral funding through PEPFAR and funding through the Global Fund. Consistent with the clarified congressional intent, U.S. contributions to the Global Fund that are not a part of activities jointly funded or implemented by PEPFAR will not be the focus of the evaluation, and the evaluation will not compare the performance of bilateral PEPFAR programs to that of Global Fund programs (Bressler, 2009; Marsh, 2009). The study will consider PEPFARâs performance and impact since funding first became available in 2004. The timing of the study, with a final report to be delivered in 2012, dictates that the evaluation will consider data that are now, or will become, available through 2011. As the first phase of this study, the IOM was charged to form an ad hoc committee to develop a strategic plan for the assessment and evaluation of HIV/AIDS programs implemented under the LantosâHyde Act of 2008 and to issue a short report to the U.S. Congress describing the planâs proposed design, taking into consideration the requirements for the congressionally mandated study. These requirements and the charge for developing the evaluation plan can be found in full in the Statement of Task in Appendix A. More information about the members of the planning committee can be found in Appendix B. This report documenting the proposed evaluation approach is the product for the first phase of the study. To produce this report, the planning committee met three times to deliberate in person, conducted two teleconferences, and engaged in additional deliberations in smaller working groups by telephonic and electronic communications as needed. In the development of its plan, the committee consulted widely, and remains open to receiving input from, the broad range of parties interested in and affected by PEPFAR. To solicit input and gather information from a wide range of stakeholders, public sessions were held in conjunction with the first and second committee meetings, and delegations from the committee and IOM project staff also held information-gathering meetings with a range of global stakeholders, including UNAIDS, WHO, the United Nations Childrenâs Fund (UNICEF), and the Global Fund. The primary purpose of these meetings was to establish working relationships with these stakeholders and to discuss potential data sources and methodologies, as well as strategies and lessons learned from large- scale programmatic or organizational evaluations. The agendas for these activities can be found in Appendix C. In addition, one staff member attended the PEPFAR Annual Implementersâ Meeting in 2009. The committee also consulted the available literature on PEPFAR, global HIV/AIDS, and the state of the art in large-scale program evaluation, including the summary of a 9 Supra., note 6 at Â§101(c), 22 U.S.C. 7611(c). 10 Personal communications from Congressional Staff at the U.S. House Committee on Foreign Affairs and U.S. Senate Committee on Foreign Relations and OGAC, 2009. 19

20 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS workshop convened by the IOM, âDesign Considerations for Evaluating the Impact of PEPFAR,â which focused on methodological, policy, and practical considerations (IOM, 2008). The committee and staff also conducted an initial scan of potential data sources for the forthcoming evaluation using a range of sources, including a preliminary review of documents from OGAC and other bilateral and multilateral agencies and of relevant published literature, as well as communications with a wide range of staff from OGAC, implementing partners, and multilateral stakeholders. The committee used this information to assess the methods that could be employed to answer evaluation questions based on the charge in the statement of task, focusing on data and methodology that will be robust, available, feasible, and appropriate to the questions. Through this information gathering and deliberation, the planning committee developed a conceptual framework for the evaluation that is based on both the committeeâs expertise and current standards in evaluation methodologies for large-scale programs. This report is intended to provide Congress and OGAC with an overview of the strategic plan for the forthcoming evaluation. As agreed upon contractually with the study sponsor, the planning process for the evaluation will culminate with a transitional period for operational planning that will take place between the delivery of this report and the implementation of the evaluation itself in the fall of 2010. During this operational planning phase, IOM staff, planning committee members, and consultants will carry out activities to further develop and refine the plan described here. These activities, which will inform the implementation of the evaluation, are described as part of the work plan later in this section. This structure for the study, with a report describing strategic elements of the plan delivered to Congress before detailed operational planning is complete, was intentionally designed to allow uninterrupted progress in preparation for the evaluation during the time necessary for review of the report and budget planning by the sponsor and for subsequent preparation of the contract for the evaluation. After congressional review of the planâs proposed design and budget, the final phase of the project will be to carry out the assessment/evaluation of the program. The IOM will convene a new ad-hoc committee to conduct the evaluation as a consensus study. The intent is for the evaluation committee and staff to have considerable overlap from the planning committee. Standard IOM procedures will be followed to ensure that the evaluation committee and project staff have the appropriate expertise to conduct the evaluation activities described in this report. EVALUATION GOALS AND CONCEPTUAL FRAMEWORK FOR EVALUATION DESIGN The legislative mandate to evaluate PEPFAR is a complex challenge. As described above, PEPFAR is a large, multifaceted program with many activities carried out by many different partners in a diverse group of countries. In addition, PEPFAR activities are being implemented in the context of programs supported by other funders that have the same ultimate aim. PEPFAR is also by necessity a dynamic program; the ability to change the program over time can be beneficial, but makes evaluation difficult as it presents a âmoving target.â Therefore, this report not only outlines an approach for evaluating the performance of PEPFAR, but also delineates the challenges in evaluating the impact of such a complex, large-scale foreign assistance program and provides information about reasonable and appropriate expectations for an evaluation of this kind. The committee has endeavored to present a plan that is thorough and well-defined in its approach yet maintains ample flexibility. This will allow the evaluation to be adapted in response both to the evolving goals of the program and to the additional information

21 PROPOSED EVALUATION PLAN the evaluation committee will gather during operational planning and as the evaluation itself proceeds. The proposed conceptual framework for the evaluation and its limitations will be described briefly here, followed by a more thorough discussion of the methods and data sources that will be used. The subsequent sections in Part III of this report address specific components of the evaluation in greater detail. Evaluation Goals and Assumptions The planning committee understood the mandate from Congress as a charge to develop a plan to assess the program with two primary goals. The first of these is an assessment of the success of the program in meeting the performance goals and targets laid out in two sources: the reauthorization legislation and the new PEPFAR Five-Year Strategy. Although the statement of task was written before the new strategic plan was available, the committee interpreted the charge to take this document into account because it articulates the current guiding principles and the future direction for the program. Therefore, the evaluation will include a careful review and comparison of these guiding documents in order to more clearly define the targets and goals of the program. The second goal of the committeeâs charge is to evaluate the health impact of PEPFAR, including impact of treatment, care, and prevention programs; effects on health systems; efforts to address gender-specific aspects of HIV/AIDS; impact of programs on child mortality; and impact of interventions on behalf of orphans and vulnerable children. The findings and conclusions of the evaluation of PEPFARâs progress toward its stated goals and the impact of the program will then be used to make recommendations for improving the USG response to global HIV/AIDS, in particular through PEPFAR programs. It is important to note that the IOM is being charged to conduct an evaluation early in the implementation of changes to the program in response to the reauthorization legislation and the new PEPFAR Five-Year Strategy. These changes reflect a progressive transition to a new era of challenges and goals for the program, which include efforts to improve sustainability of the response over time, to enhance coordination with partner governments and other global funding partners, and to support accountable ownership of HIV program delivery by countries themselves. They also reflect efforts to give greater consideration to the relationship of PEPFAR to broader health and development needs in partner countries. The timing of this evaluation, with data collection extending through 2011 for a final report due in 2012, will make it difficult to evaluate the outcomes or impact of these most recent changes so soon after implementation. For example, it could take several years or even decades for a full effect to be realized from some efforts to strengthen health systems, such as the training and retention of new health care workers or the strengthening of health information systems to support M&E efforts. However, the evaluation will assess efforts, process, and initial results in these areas to provide insight into whether PEPFAR is making reasonable progress toward these new goals and to lead to recommendations for how the program can be improved to ensure that these evolving goals for the program can be met. As part of this, the evaluation will assess whether there is sufficient M&E capacity in place to eventually evaluate whether the program has met these goals as well as the resulting outcomes and impact. The legislative mandate calls for the assessment of PEPFAR to be delivered in 2012; this would coincide with reauthorization discussions for the program, which the current legislation extends through 2013. It is of course not possible to predict the future needs and priorities of

22 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS Congress and OGAC with complete accuracy, but the planning committeeâs goal was to design an evaluation approach that, to the extent possible, looks ahead to anticipate the evolution of the program and therefore produce findings that address key issues under consideration at the time of the report release, including discussions about possible future legislative reauthorization. Conceptual Framework for Evaluation Design The planning committee developed an overall conceptual framework for the evaluation, which calls for the use of a program impact pathway to guide an assessment of the contribution of PEPFAR to changes in health impact within the context of multiple international and national funding streams. This program impact pathway, described in more detail below, illustrates the committeeâs understanding of how PEPFAR programs are currently structured and intended to ultimately translate into health impacts, laying out a plausible pathway for causal effects. It represents the theory of change that underlies the programâin other words, the rationale for how the combination of activities supported by PEPFAR are logically expected to produce intermediate outcomes, which are then expected to collectively contribute, along with programs funded by other sources, to the desired population health impact. The use of a program impact pathway, which is also referred to as a logic model or results chain, has become well established as a method for evaluating complex, large-scale development assistance programs and is becoming widely accepted as a standard in the global HIV/AIDS community (Leeuw and Vaessen, 2009; UNAIDS MERG, 2010). 11 Guided by the program impact pathway, the evaluation committee will use a mixed methods approach that will draw on a combination of analytical techniques and on a range of both quantitative and qualitative data sources. By assessing whether there is convergence and consistency among different data sources and methods, the evaluation committee will seek to triangulate findings that support reasonable, plausible linkages to outcomes and impact (Greene et al., 1989; Leeuw and Vaessen, 2009). The methods and data in the mix will complement each other, and will each have different strengths and limitations. This approach helps to account for the reality that, even given access to all potential data sources and extensive evaluation resources, there still would not be direct measures to answer many of the evaluation questions posed in the charge to the IOM. However, when taken together, the totality of evidence will allow the evaluation committee to draw conclusions and make recommendations for the program as a whole. Program Impact Pathway Figure 3 shows the program impact pathway that the planning committee developed to represent a plausible causal chain of results for PEPFAR. The pathway begins with a series of investments or inputs to the program. For PEPFAR, these inputs include not only funding and other resources but also strategic planning, programmatic and policy guidance, technical assistance, and knowledge transfer and research that represent the evolving evidence base. These inputs support activities that provide services and support to children, adolescents, and adults in need. Although these services are described by PEPFAR in categories like prevention, treatment, 11 Many of the terms used in the program impact pathway have different meanings in different fields of research. In this report, the terms correspond to definitions that reflect the current consensus in program evaluation. Definitions can be found in the glossary (Appendix D).

23 PROPOSED EVALUATION PLAN and care and support, the conceptual framework acknowledges that they are all part of an interrelated and overlapping approach, which also includes activities around gender issues and capacity building. These activities result in outputs that are measureable proximal effects. When the program is implemented well, these outputs are expected to produce outcomes as intermediate effects on the pathway to the ultimate goal of health impact. These intermediate outcomes include, for example, the delivery of high-quality, efficient services that are available and accessible to the targeted populations and that are achieving the intended and appropriate coverage. Other target outcomes include, for example, health systems strengthening; changes in individual risk behavior; and changes in knowledge, norms, and attitudes that affect sexual behavior, stigma, and gender issues. Ultimately, the program is intended to operate through this pathway to contribute to an impact on individual and population health and well-being, including HIV incidence, HIV prevalence, morbidity, and mortality. Data will not be available to directly measure all of the outcomes and impacts illustrated in the impact pathway, and when available may need to come from sources other than PEPFAR. In some cases, such as assessing effects on incidence, proxy measures or modeling data will have to be used. A critical advantage of the program impact pathway approach is that it identifies the intermediate steps between the inputs invested in the program and the ultimate impact on health. This allows the evaluation to consider not just the beginning and endpoints, but also to assess whether the program is performing in the way it is intended along the full range of its implementation. Thus, even when it is not possible to assess impact directly, the evaluation committee will be able to state plausible findings about the effects of the program and draw conclusions that provide more refined and useful information about elements of the program that are functioning well or that could be improved in order to result in a greater impact on health. For each of the programmatic areas that will be assessed in the evaluation, the committee will work from more specific program impact pathways. These are described in Part III of this report, along with illustrative evaluation questions based on the committeeâs interpretation of its charge to assess PEPFARâs performance and impact. All of the specific program impact pathways are oriented to describe outcomes that contribute to the HIV-related health impacts shown in Figure 3, which represent the stated overall goals of PEPFAR.

24 FIGURE 3 Program impact pathway for evaluation of PEPFARâs effects on HIV-related health impact for children and adults. In the case of joint PEPFAR and Global Fund programs, some inputs may be provided by the Global Fund.

25 PROPOSED EVALUATION PLAN Although it provides a critical guide for developing evaluation questions, assessing data sources, and selecting methodologies, the program impact pathway is of course a simplified view of PEPFAR programs and their impact. Of particular importance for this evaluationâs conceptual framework is the reality that in any country that receives PEPFAR support, the program operates within the context of a wide range of other factors that affect the implementation of the program as well as health outcomes (see Figure 4). Investments from a range of other sources support programs that are aimed at the same desired outcomes, and the proportion of total HIV/AIDS support that is provided by PEPFAR varies from country to country. In some cases, multiple funding sources may be co-mingled to support the same programs. Therefore, changes in population health that can be used to reflect program impact cannot be separated by specific programs or investments. Even individual measures can be difficult to attribute directly, as an individual or household may be receiving different services from different programs funded through different sources, all of which have an impact on the health outcomes of the beneficiary. Health outcomes are also influenced by a wide range of cultural, societal, geographical, and political factors and influences that the program cannot control. In addition, as PEPFAR programs increasingly operate with an emphasis on country ownership and harmonization with national plans, the extent to which central USG guidance and authority can influence all levels of priority-setting, decision making, and implementation can be quite limited. Finally, with a foreign assistance program that is implemented as broadly and on the scale of PEPFAR, there is rarely an appropriate comparison available in order to attribute outcomes to the program based on what would have happened in the absence of the investment. Therefore, although the ideal goal in a program impact assessment may be to determine to what extent a desired outcome can be attributed directly to the program or policy investment, the realities of a large-scale program such as PEPFAR can make it difficult to determine the extent to which successes or failures in achieving the intended effect can be attributed directly to the program. Thus, the aim of this proposed evaluation approach is not to attempt to determine the direct attribution of PEPFAR funds to health outcomes. Rather, the aim is to assess the plausible contribution of PEPFAR to changes in health impact, both globally and by country, within the landscape of broader funding, programs, and other factors that influence health. This contribution analysis approach is consistent with the guidance given to the committee by congressional staff about expectations for the evaluation (Bressler, 2009; Marsh, 2009). It is also accepted as an appropriate standard for large-scale development assistance programs (Leeuw and Vaessen, 2009). There may be some areas in which attribution can be more readily determined or approximated, as in the direct relationship at the first step of the impact pathway between inputs and the activities they support, or in the case of controlled experimental studies that assess the effects of intervention components that are distinct to PEPFAR, or in countries where PEPFAR is or has been the nearly exclusive funder of all national HIV/AIDS activities. If feasible, when these opportunities arise, the evaluation committee will consider whether a finding of attribution may be plausible.

26 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS FIGURE 4 Context for PEPFAR program implementation. NOTES: M&E = monitoring and evaluation; USG = U.S. government; NGOs = non-governmental organizations.

27 PROPOSED EVALUATION PLAN OVERARCHING EVALUATION CHALLENGES AND LIMITATIONS There are a number of overarching challenges to carrying out this evaluation. These are described here, while more specific challenges and limitations are described in more detail in the following sections on evaluation methodologies and data sources as well as in the subsequent sections in Part III of this report that address specific components of the evaluation. One of the primary challenges to the evaluation is that there are limited data to address health impact and other evaluation questions about the whole of the PEPFAR program. Therefore, many of the evaluation questions will require additional sources of data, analytical approaches, and methodologies. In the mixed methods approach described in this evaluation plan some limitations with readily obtained impact data may be overcome by using other proposed comparison methodologies, ancillary studies, key informant interviews, and site visits. These methods are described in more detail in the subsequent sections on evaluation methodologies and data sources. The type of methodology and analytical approach used to answer specific questions requested by Congress will differ depending on the rigor and feasibility of collecting existing data or the feasibility of gathering new informationâas the committee intends to do during country visits and during interviews with OGAC staff, implementers, and other key stakeholders that will occur outside of country visits. This poses limitations on the evaluation approach and on the interpretation of the findings. There are indicators that are reported centrally to OGAC across the entire PEPFAR program; however, these provide only limited answers to the evaluation charge. Although data from within PEPFAR that go beyond the centrally-reported indicators may be available, a preliminary scan of sources revealed that these data will have to come from disparate sources that are not currently catalogued or coordinated. These data sources, such as recommended indicators not reported to OGAC and data collected by the major USG implementing agencies and other implementation partners, are not managed through a process that allows for easy cataloguing of what is available from whom. Therefore, accessing the data to answer many of the evaluation questions will require a significant data-mapping and data-gathering effort that adds to the resource requirements of the evaluation. Requests from the IOM will also likely impose a burden of time and resources on staff at OGAC and other implementing agencies as well as on country teams and implementing partners; this introduces a dependence on the timely efforts of many different actors who, in many cases, are already overburdened. Another possible source is data that are collected through other multilateral organizations and can be made available to the IOM. However, some of these data, as well as data from OGAC and implementing partners, are not already analyzed in a way that answers the questions posed by this evaluation in response to the congressional mandate. Therefore, to make full use of these data would require new analyses. In addition to questions requiring new analyses, some questions would require developing and using new data collection tools, which may or may not be feasible during the time period of the evaluation. Each of these approaches could enhance the quality of the evaluation, but each will also require a greater investment of resources than restricting the evaluation to existing data and analyses. Another important limitation to note is that most of the additional data sources will not be PEPFAR-wide data or population data but will instead be country-specific, program-specific, or component-specific. When data are not collected systematically across all PEPFAR countries, this will limit the ability to generalize findings to the whole of the program. There is also considerable heterogeneity in the implementation of PEPFAR across different countries and

28 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS programs, which limits the ability to generalize findings. However, country-specific, program- specific, or component-specific data can nonetheless be highly informative and, if interpreted with care, both commonalities and differences across countries can inform conclusions that contribute to an understanding of the performance and impact of PEPFAR as a whole. In addition, it is important to note that, although substantial progress has been made in recent years to harmonize data collection, variability in the definitions of indicators and in the quality of available data will also be constraints to both the summary of findings and comparison of findings across countries, even when similar data are collected across multiple countries. This is especially true in some critical areas where consensus indicators have yet to be developed, such as health systems strengthening, integration of services, and country ownership. Finally, there are also evaluation questions the committee will consider that will simply not be possible to answer in the forthcoming evaluation period. As part of the final evaluation report, questions of this kind may be discussed if they are found to be important for future ongoing evaluation, along with suggestions for how to develop the means to answer them. EVALUATION METHODOLOGIES There are a number of alternative design and analysis techniques that could, in theory, be considered for inclusion in a mixed methods approach to the evaluation of the impact of the PEPFAR program. Each method has different requirements and each has its own strengths and weaknesses in terms of the types of evidence it brings to bear on a program evaluation. The range of methodological options in two main categories, comparison approaches and modeling, are described here as background information with a discussion of their relative advantages and disadvantages. The methods the committee found to have the potential to be feasible and relevant for this evaluation are then described in more detail, along with the potential sources of data for the evaluation. Comparison Approaches One basic question is of paramount importance to decision makers in evaluating the effects of a programâwhat would have happened if the program had not been provided? This is often referred to as the âcounter-factual.â Answering this question requires an experimental design or analytical method that allows a comparison to be made over time between a group that receives the program and a group that does not. Ideally, these two groups would be similar enough to each other on key parameters that any difference in outcome would be attributable to the program itself rather than to other differences between the groups. There are a number of potential methodologies that could be considered to allow for these comparisons to be made. In medical science, the randomized controlled trial is widely accepted as the âgold standardâ in determining the effects of an intervention or in comparing one intervention to another (for example, to answer the question of whether one drug is better than another for treating an illness or preventing a medical outcome). The advantage of random assignment to intervention and comparison groups is that it should randomly distribute different characteristics between the two groups, thereby reducing the concern that pre-existing differences between the two groups, rather than the effect of the intervention, might account for the difference in outcomes. Randomization typically occurs at the level of the individual. However, in what is

29 PROPOSED EVALUATION PLAN known as a cluster-randomized trial, randomization can also occur at a broader level, such as clinics, schools, or neighborhoods. Randomized trials can provide useful evidence to evaluate components of a large-scale program, including not only interventions but also implementation questions or service delivery models. Indeed, some components provided within PEPFARâs activities have previously been evaluated for effectiveness using randomized trials, including ARV drugs and other pharmaceutical treatments as well as other interventions. For example, male circumcision as an intervention to reduce the transmission of HIV among young adults has been evaluated in three separate studies in three sub-Saharan countries (Auvert et al., 2005; Bailey et al., 2007; Gray et al., 2007). Behavioral interventions for HIV prevention have also been evaluated using randomized trials conducted in a variety of venues with a diversity of populations in sub-Saharan Africa (Cornman et al., 2008; Jemmott et al., in press; Jewkes et al., 2008; Kalichman et al., 2008; Stanton et al., 1998). However, random assignment methodologies are not widely used to evaluate the effects of an entire large-scale, multi-component program because they require special conditions that cannot reasonably be met. In fact, it is difficult, and sometimes unethical, to have an appropriate control group that would comply with the parameters required in a randomized trial. First, when evaluating the delivery of proven interventions, control participants must be provided with the accepted standard of care because ethical considerations prevent withholding known effective treatments. This can lessen the difference between what the control and comparison groups receive, which can make it statistically difficult to detect an intervention effect, even with large and correspondingly expensive sample sizes. Second, it is not possible to âmaskâ the intervention for participants or to keep the control communities or participants âblindedâ as to their experimental condition. In addition, random sampling is difficult to achieve in a large, widely implemented program, and the outcomes for any comparison group are likely to be affected by secular effects or by interventions or services provided through other programs. Finally, program interventions may also take a long time to have the desired effect, and it can be costly and impractical to maintain a trial for the required duration. Therefore, a randomized trial is often not considered feasible or appropriate for evaluating large-scale programs where the goal is to provide whole communities or entire districts with multiple, new services in the hopes of improving health outcomes. There are alternative comparison methodologies that can be more feasible for evaluating large-scale programs, although challenges such as the timeframe of the trial and secular effects still apply. In a variation of a randomized evaluation approach, programs can be implemented in a phased rollout to different groups or areas over time, which can be randomly assigned. Although everyone eventually receives the intervention, phasing in the program in this way allows comparisons of outcome data in those that receive the program first against a control group made up of those not yet receiving the program (Hussey and Hughes, 2007). Conceivably, this design could be undertaken by countries that might, for example, choose to implement a new intervention or a new delivery mechanism in place of an established intervention district by district, allowing an evaluation of the impact on health outcomes. Quasi-experimental studies also offer alternatives for assessing what would have happened in the absence of an intervention. These designs do not involve random assignment to intervention and control groups, but can allow comparisons between groups using a variety of approaches to control for differences between nonequivalent groups. One approach is to compare two groups or communities that are not randomized, but one is served by the program while the

30 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS other is not. These can be planned comparison groups or can sometimes occur as ânatural experimentsâ that serve as a variation on the phased rollout design described above. These occur when programmatic or policy changes are phased in. Data collected on outcomes in the groups or areas where the changes were implemented earlier versus later can produce informative evaluation data. Ideally, these approaches would be designed to assess both the intervention and comparison groups before and after the program is delivered and then to analyze the âdifference- in-differencesâ between the two groups. An alternative to this design is a âpost-testâ only comparison group. This is a comparison of similar groups in which the group that receives the intervention is measured before and after the intervention and is then compared at the âpost-testâ period to a similar group that had no exposure to the intervention. This approach can provide some valuable information to determine whether differences in outcomes can plausibly be associated with exposure to the intervention. In these comparisons, the concern in interpreting the results is that differences in outcomes may not be due to the program itself, but rather to other observed or unobserved differences between those who did and those who did not receive the intervention. To some extent, this concern can be addressed by matching the individuals or communities being compared on important characteristics that may affect the targeted outcomes, or by statistically accounting for any differences in these characteristics in the analysis of the data. However, this approach can be compromised by the possible influence of and inability to control for unobserved characteristics that may affect the outcomes. When no appropriate comparison group is available, comparisons can also be made using before-after studies (also known as âpre-test/post-testâ). These compare outcomes or indicators before a program was established to the same measures in the same group after the program was implemented. These studies are observational and not experimental in nature, but can allow the detection of significant change after the intervention is given. The disadvantage to this approach is the difficulty in controlling for other factors that may have coincided with the implementation of the program and may have caused or contributed to the observed changes. Data analysis of trends over time offers a similar method to inform an evaluation of the outcomes of a program across a range of metrics. If the trends in program inputs are associated with program activities on the ground and with later shifts in outcomes, it can be reasonable to conclude that the intervention program was plausibly important in changing the outcomes. However, this method requires that data collection be repeated reliably over time, which is often not the case for many data sources. The challenge for evaluating a program such as PEPFAR, which is widely implemented at the national level across many countries, is that it can be very difficult to identify an appropriate comparison or control to use in the kinds of comparison approaches described previously. In addition, the ideal for comparison approaches would be to use a prospective design, in which data for both intervention and comparison groups are collected from the beginning of the evaluation. However, this is an evaluation that extends back in time to the implementation of a program well before the starting point of the evaluation. In addition, it is not feasible for the evaluation committee to mandate complex intervention and evaluation designs or new data collection in order to make prospective comparisons during the time period for this evaluation, although the committee will consider the outcomes of any such studies of PEPFAR program components if the findings are available by the end of the evaluation period. Instead, for most of the questions in this evaluation, comparisons can only be made retrospectively. This leads to the limitation that the evaluation committee will only have access to data that were

31 PROPOSED EVALUATION PLAN already collected or are already being collected and as a result, the available data may be not be sufficient to answer all of the desired questions. Modeling Methods employing mathematical and statistical models are another means of assessing health interventions and evaluating outcomes when an ideal experimental design with primary data collection is not practical or feasible. Analyses using these methods synthesize data from several sources to estimate the probable impact of different strategies. They can offer the advantages of analyzing different scenarios and projecting outcomes far into the future, both of which would be difficult to assess in a trial or in the field (Bertozzi, 2006; Garnett, 2002). However, models are dependent on the availability of accurate data sources, key input parameters, and epidemiological assumptions about disease transmission (Garnett, 2002). Many countries lack the necessary data to create a baseline from which future scenarios can be projected. Additionally, models rely heavily on published literature for impact parameters, which may overestimate the actual impact by focusing on high-quality, successful interventions (Forsythe et al., 2009). Therefore the reliability and utility of current modeling approaches need to be carefully assessed. Countries, donors, and other global stakeholders use existing mathematical models to determine resource needs for HIV/AIDS programs and to make evidence-based decisions about future programming with the aim of achieving sustainability. These models require various inputs such as demographic, epidemiological, and financing data, and make various assumptions about the progression of disease. Thus, the validity of projections is conditional on the quality and completeness of existing data and the accuracy of the assumptions inherent to the model (Garnett, 2002). Until recently, PEPFAR focused on costing and modeling for ART (Holmes, 2009). However, PEPFAR has identified a continuum of costing and modeling activities and is transitioning from simply costing and forecasting ARV needs to modeling the cost of service delivery by measuring all inputs, including personnel, investments, and overheads. PEPFAR also plans to undertake comprehensive treatment costing and modeling of national program scenarios and is expanding costing exercises to include care and prevention. There are several models currently in use to estimate resources needs for interventions that the committee can consider as data sources, including the ART Cost Project (Levine, 2010), the HIV/AIDS Program Sustainability Analysis Tool (HAPSAT) (Dutta and Fleisher, 2008), and Spectrum Policy Modeling System (Constella Group LLC, 2008; Holmes, 2009). In addition to modeling of costs and resource needs, mathematical modeling to estimate infections averted is of particular relevance because HIV incidence is difficult to measure directly. Three modeling approaches, coverage-based, behavior-based, and disease-modeling- based, have been proposed for estimating the number of HIV infections averted from intervention programs (Heaton et al., 2008). The coverage-based approach relies on an estimate of the efficacy of the intervention on incident HIV infection. By incorporating coverage information associated with the intervention (e.g. numbers of persons receiving the intervention), estimates are produced through models of the number of infections averted. Two critical inputs (the coverage and the relative risk) are important sources of uncertainty with this approach (Heaton et al., 2008). A second approach, the behavior-based approach, relies on a model that describes how HIV infection is mediated by behavior. Two critical inputs to this approach are evidence of the effects of behavior change on incident HIV infection and the change in

32 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS prevalence of the high-risk behaviors resulting from the intervention. A key limitation of this method is the lack of reliable behavioral data in many developing countries. The third approach, the disease modeling approach, is based on a comparison of observed HIV incidence trends with the expected or baseline HIV incidence trends. However, few countries have been able to collect true population-level incidence data and there have been difficulties with measuring incidence using measures such as BED immunoassays 12 (Hallett et al., 2009; Murphy and Parry, 2008). It has also been suggested that, as an alternative, HIV prevalence trends among young persons (ages 15â24 years) can be assessed as an approximation of incidence (UNAIDS and WHO, 2009). Likewise, some have found that serial cross-sectional prevalence data may be used to estimate general-population incidence by age (Hallett et al., 2008). Indirect strategies for estimating HIV incidence include models, such as the Estimation and Projection Package and the Spectrum software, developed at UNAIDS, that have been used by some researchers to predict HIV prevalence. Comparisons of the observed trends with the modeled or expected trends have been used to estimate infections averted. PEPFARâs estimates of the number of infections averted in partner countries are produced by the U.S. Census Bureau (PEPFAR, 2010a). The Census Bureau model (known as, RUPHIVAIDS) follows a disease-modeling approach in which expected or baseline HIV incidence estimates are developed with data prior to 2005 and compared to re-estimated trends in HIV incidence from new surveillance data available after 2004. The difference in the number of new infections, based on this comparison approach, is used as the number of infections averted. The model incorporates estimates of HIV prevalence from the Estimation and Projection Package to project HIV incidence and applies various assumptions in relation to sex distribution of HIV infection, sex ratios of new infections, rate of mother-to-child transmission, and disease progression as recommended by the UNAIDS Reference Group on Estimates, Modelling and Projections (U.S. Census Bureau, 2009). Modeling of infections averted can also be used to specifically measure the impact of PMTCT on vertical (mother-to-child) transmission of HIV. For example, both HAPSAT and the Spectrum Suite can be used to project the number of infections averted through ART, ARV prophylaxis, and improved breastfeeding techniques (Constella Group LLC, 2008; Dutta and Fleisher, 2008). These estimates are limited by the quality and availability of data regarding the percent of women with access to PMTCT that agree to be tested, the percent of women found to be HIV-positive that accept ARV prophylaxis and/or substitute feeding, and the percent reduction in the rate of mother-to-child transmission with prophylactic treatment and/or substitute feeding (Resch et al., 2009, UNAIDS, 2009b). It will not be feasible for the evaluation committee to conduct new mathematical modeling. However, the committee will consider the strengths and weaknesses of existing modeling efforts, such as those described, and will assess the reliability of the estimates they provide in the two areas that are relevant to PEPFAR: (1) the cost of HIV/AIDS interventions and the resources necessary to scale up and maintain programs and (2) infections averted. Where deemed appropriate, these estimates will be used as a source of data for the committeeâs assessment of PEPFARâs outcomes and impact. 12 The BED-CEIA (HIV-1 subtype B, CRF_01AE, and subtype DâCapture Enzyme Immunoassay) is a commercially available product designed specifically for the purpose of indentifying HIV-1 infections that were recently acquiredâusing the three specific peptides to cover much of the extent of antigenic diversity to overcome some of the subtype differences associated with the âdetunedâ assays (Murphy and Parry, 2008).

33 PROPOSED EVALUATION PLAN Application of Methodologies in the PEPFAR Evaluation Given these methodological design considerations, the evaluation will employ a mix of quantitative and qualitative methods, including trend analysis and retrospective comparison approaches using quantitative analysis of key indicators, document review, mapping of resources, policy analysis, benchmarking of outputs and outcomes against stated goals, site visits, and primary data collection through structured interviews and town hall meetings with key informants. Where feasible, methods will be applied and data will be gathered at the level of the whole of the program or for all of the PEPFAR partner countries. However, in order to not limit all findings to the constraints on consistent data availability across the whole of the program and all of the countries, the committee will also identify countries, programmatic areas, or intervention components implemented within PEPFAR for which a methodology cannot feasibly be applied consistently for all PEPFAR countries or data may not be available for all countries, but where sufficient data can be gathered to allow in-depth studies to assess effects, including outcomes and impact. These in-depth studies may be specific to individual PEPFAR countries or, in some instances, a multi-country analysis may be feasible. For example, some prevention and treatment activities have been subject to multiple evaluations that have been conducted on a smaller scale than the whole of PEPFAR, but when reviewed in depth will nonetheless serve to inform the committeeâs overall findings. By applying this mix of methods and layers of investigation and analysis using a range of available primary and secondary data sources (see Boxes 2aâd in the section that follows), the committee will arrive at findings that can be triangulated to draw conclusions about the performance and impact of PEPFAR even when any one data source is not sufficient or any one methodological approach is not feasible. PEPFAR Country Studies Given the limitations on data that can be gathered at the level of the whole of the program, a major area of focus for the evaluation will be country studies. A country-by-country approach offers a potentially rich source of evaluation data. Country studies will be used by the evaluation committee to assess progress of the program at the country level against targets set in national AIDS strategies and in the PEPFAR COP. Country studies are also an opportunity to conduct time-series and trend analyses that compare outcomes before and after PEPFAR programs were implemented or before and after changes in PEPFAR programs were introduced within a country. In addition, although there will be limits to aggregating or generalizing country-specific findings due to the heterogeneity across PEPFAR countries, country-by-country assessments can, if interpreted carefully, contribute to conclusions about the performance and impact of PEPFAR as a whole. Country studies will also provide the necessary data collection to make the cross-country comparisons described later. The main component of these country studies will be country data sets that will be compiled for each of the current PEPFAR partner countries using key indicators gathered from OGAC and other available data sources as well as document review (see Boxes 2aâd). To the extent possible, local experts, governments, organizations, and implementing partners at the country level will be engaged in the evaluation process. The committee will solicit their assistance in determining the availability and quality of country-specific and program- specific data sources and will seek to collect data, including data from national health

34 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS information systems, as well as data analyses developed and conducted by local experts. The planning committee strongly endorses the principle of engaging country-level partners and considers this a potentially important component of the evaluation, which would also be consistent with PEPFARâs goals to strengthen country capacity to monitor as well as manage the AIDS epidemic (OGAC, 2009g). However, the committee also recognizes that additional requests for data and analysis can place a significant burden on local partners that is not accounted for in current planning or budgeting. Therefore, the committee will seek to maximize engagement within the bounds of what will realistically be feasible. Pilot country visits during the operational planning phase will provide an opportunity to explore and assess the feasibility of engaging country experts in the evaluation process. Country data sets The committee will identify and request relevant documents and other data sources to build country data sets for each country and to perform content analysis. These will include key indicators from PEPFAR and other multilateral agencies, as well as national M&E systems, data extracted from country team and national planning and reporting documents, and available data from the published literature, grey literature, and prior evaluations. Sources of data that will inform these country data sets are listed in more detail in Boxes 2aâd. In addition to informing data sets across all countries, this document review will help the committee identify countries and programmatic areas for in-depth studies. Country timelines A country timeline will be developed for each of PEPFARâs current partner countries to provide an overview for the past 12 years of major events related to the epidemic and HIV/AIDS programs and to map the timing of the availability of key data sources (see Appendix E for example timelines). These country timelines will serve as a multi-purpose evaluation tool. First, they will inform the analysis and interpretation of trends and longitudinal data for some of the outcome and impact indicators. For example, they will illustrate which countries have surveillance data that have been repeated before and after the implementation of PEPFAR and which countries have had policy changes or changes to their national health systems since PEPFAR activities began. In addition, they will map the presence and timing of other contextual factors that may affect the interpretation of PEPFARâs contribution to health impact. Lastly they will provide a snapshot of the data available at the country level which can be used to select and design different analyses (e.g., comparison studies) and in-depth studies (e.g., country visits). The main types of information that will be gathered to build the country timelines are PEPFAR activities, the major activities and investments of other donors, and country- and global-level policy information that would be expected to have an impact on the countriesâ response to the HIV epidemic. This will be overlaid with information on recent and past data availability as well as new data anticipated to become available during the evaluation timeframe, especially population-based surveys and other HIV-related surveillance data. The information will be drawn from a number of sources, concurrent with the process for country data sets described above (see Boxes 2aâd). These sources will include published literature and PEPFAR- related documents from OGAC, U.S. agencies, and other PEPFAR implementing partners, as well as country-specific global stakeholder reports and other external evaluations. Additionally, global health or health policy media outlets will be used as a source of information on events related to the HIV/AIDS epidemic in the each country.

35 PROPOSED EVALUATION PLAN Country visits While some in-depth studies will be conducted through document review, the evaluation committee will also visit select countries for in-depth studies. These country visits will serve a number of goals. A primary purpose of the country visits will be to obtain qualitative data, including semi-structured interviews with key informants and observational data. This is particularly critical for the committee to collect information on process questions related to the implementation of PEPFAR programs, including barriers to implementation, harmonization with national plans, and indirect or unintended effects as observed by local authorities and implementing partners. A main focus will be to gather information on whether the program is implemented according to PEPFAR guidance, with an emphasis on the new PEPFAR II Five- Year Strategic Plan for transitioning to a sustainable response. This will include an assessment of progress toward goals such as country ownership, capacity building, health systems strengthening, resources, transparency and accountability, and other program characteristics deemed essential for a sustainable response. Country visits can also inform the committeeâs assessment of progress toward other program targets for which limited quantitative data may be available, such as implementation of strategies to address gender issues and to reach vulnerable populations. These questions will not be readily addressed through existing data sources and are dependent on observations of the program made in context. The information gathered from these country visits will also inform the interpretation of quantitative data by providing context for baseline country characteristics and trends in the epidemic, for the heterogeneity of the implementation of PEPFAR across countries, and for assessments of data quality. In some cases, the country visits will also allow the committee to collect additional locally-available quantitative data. Additionally, the country visits are an opportunity to obtain country-specific information on funding flows, on the costs of interventions, and on efforts and outcomes in PEPFARâs programmatic areas. Selection of countries for country visits Due to limited resources and time, it will not be feasible to conduct country visits to all of the current PEPFAR countries. Instead, the committee will use purposive sampling to select a subset of countries that represent a diverse range on a number of key characteristics. These will include but will not be limited to: (1) the types of interventions and activities implemented with PEPFAR funding, (2) the operational infrastructure (especially the distribution pattern of funding), (3) the size of country, (4) the size of financial inputs from PEPFAR and other sources, (5) the length of time PEPFAR programs have been in place, (6) epidemic trends and epidemic type (concentrated versus generalized), and (7) country income level. The data to be collected will be further defined based on the country case study framework and key evaluation questions that are best addressed during a country site visit. Country visit process and standardization The following describes the strategy that will be used for data collection and standardization for country visits. The operational planning phase will include pilot testing and refining of these methods. As an independent and neutral third party, the IOM country visit teams are expected to be reasonably able to elicit candid information from key informants. To help encourage this candor, the committee will not attribute comments, examples, or findings to specific informants without express permission. 1. Preparation for country visits: Develop the country visit case study framework and data analysis plan to ensure consistency in methodology across countries to allow for comparisons linked to key evaluation questions

36 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS Review country-specific background information to generate hypotheses and determine questions to be probed in-country Review of PEPFAR and national indicator data Document review and preparation of country timelines (described previously) Develop interview questions for different categories of interviewees (i.e., PEPFAR agency, PEPFAR implementer, PEPFAR beneficiary; country partner, international partner) Develop framework and criteria for observational data collection Develop and prepare a country visit qualitative data collection toolkit for use by IOM evaluation teams on country visits Train committee members and staff on data collection tools and data reporting methods Agree on a date for the country visit, identify interviewees for each country, and schedule interview and site visit appointments 2. Process for country visits (see Appendix F for details): Time frame: January through August 2011 Duration: 7â14 days (depending on size of country, size of program, and what evaluation questions are relevant for each country study) Number of countries: 12â15 Investigation teams: 3 IOM committee members, 2 IOM staff, and consultants and contractors as needed Stakeholders to interview: PEPFAR: U.S. Ambassador, CDC, USAID, PEPFAR implementing partners, program beneficiaries Partner country stakeholders: National AIDS Commission; Ministries of Health, Finance, and other relevant ministries; civil society representatives; other relevant local stakeholders Country-level representatives of other external programs: UNAIDS, other United Nations agencies, Global Fund Country Coordinating Mechanism members, Global Fund recipients, World Bank, other bilateral donors 3. Post country site visit follow-up activities: Country visit write up (within 4 weeks after visit) On-going analyses of quantitative and qualitative data Comparisons Among Countries Within PEPFAR In addition to country-by-country studies, the evaluation committee will consider conducting comparisons among countries within PEPFAR after weighing the feasibility of this approach based on data mapping during the operational planning period. The goal of this comparison approach would be to determine if changes in key indicators in a PEPFAR country are associated with variables such as the timing of the introduction of PEPFAR into that country; the scale of the PEPFAR presence in that country (as measured, for example, by the extent of funding and activities); or the operational infrastructure (including, for example, how PEPFAR funding is distributed among implementing partners or the extent to which PEPFAR activities are parallel to or integrated within public sector health services).

37 PROPOSED EVALUATION PLAN One approach to implementing such an analysis would begin by assembling a database on changes in key indicators over time in each PEPFAR country. The dependent variables would be the relative or percentage changes in key indicators over defined time intervals. Adjusted analyses could then be performed that correlate those changes in key indicators with explanatory variables such as the duration of time PEPFAR had been present in the country, the cumulative PEPFAR investment in the country up to that point in time, and differing PEPFAR implementation strategies. The committee recognizes that there are critical differences among PEPFAR countries with respect to demographics, social and economic factors, and the epidemiology of the epidemic that must be taken into account in these analyses. Accordingly, adjusted analyses must be performed that carefully consider and account for such confounding factors. To determine the feasibility of this approach, the committee will assemble a database of country level variables thought to be related to the propagation of the epidemic and consider their use in adjusted analyses of comparisons among PEPFAR countries. A major limitation of this approach is uncertainties in some of the key benchmark indicators including HIV prevalence and HIV-related deaths. The committee also recognizes the limitations of these analyses for inferring causation based on associations. An additional approach for comparison analyses that the committee will consider will be to compare sub-units within PEPFAR countries that receive different levels of PEPFAR investment or where different types of PEPFAR activities have been implemented. This alternative may allow for comparisons among groups that are more similar in baseline characteristics and available data, although there may still be limitations due to regional differences in demographics, social and economic factors, the epidemiology of the epidemic, and availability of the appropriate data. Comparisons Between PEPFAR and Non-PEPFAR Countries Another approach to the evaluation of PEPFAR that will be considered for the evaluation is based on comparisons of PEPFAR to non-PEPFAR countries with respect to key indicators. 13 The operational planning for the evaluation will allow time to gather the data necessary to fully review the utility of this approach in light of the limitations described below. The evaluation committee will assess the strength of the evidence about the effectiveness of PEPFAR that can be gleaned from comparisons of PEPFAR to non-PEPFAR countries and will only conduct these comparisons if deemed appropriate. The committee recognizes that PEPFAR focus countries were not chosen randomly. Therefore, there are important differences between PEPFAR focus countries and non-PEPFAR countries that must be accounted for if a comparison approach is to be valid. These differences relate to economic, political, and health factors; population sizes; the stage of the epidemic; and 13 A comparison of 12 PEPFAR focus countries with generalized epidemics in Africa to 29 control countries was recently published (Bendavid and Bhattacharya, 2009). This correlational analysis, using UNAIDS data as the source for outcomes indicators, showed a significantly more rapid decrease in the rate of deaths due to HIV/AIDS in the PEPFAR focus countries than in non-PEPFAR countries during the period of PEPFAR activities between 2004 and 2007. The authors noted the difficulties of generalizing the findings to other countries and other time periods because of the non-random sampling of the comparison groups. There were baseline differences between the intervention and control groups in variables such as population, adult HIV prevalence, gross domestic product, aid targeted to HIV/AIDS from other donor sources, and World Bank indicators of governance. The authors reported that adjusted analyses to account for these variables did not change the significance or direction of the reported findings, which were the results of unadjusted analyses.

38 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS available infrastructure and capacity prior to the introduction of PEPFAR. In addition, many countries where PEPFAR has not been implemented may have implemented similar interventions to achieve the same objective through programs with support from other external or national funding sources. When this is the case, comparisons cannot evaluate the presence or absence of the intervention activities supported by PEPFAR per se, but rather the implementation and delivery strategy used by PEPFAR. As a result, the approach of comparing PEPFAR to non- PEPFAR countries would require special care in implementation, analysis, and interpretation. For this kind of comparison approach to be useful for this evaluation, it would be critical to identify control countries that can be suitably compared to PEPFAR countries. The control counties would be selected from the same geographic regions as the PEPFAR countries (e.g., sub-Saharan Africa, Asia, the Russian Federation or Eurasia, and the Caribbean). The evaluation committeeâs work would begin by assembling a database of baseline country-level variables in both PEPFAR and non-PEPFAR countries that might relate to the course of the epidemic. The committee will also document investments in HIV activities from country governments and external donors. The validity of using this comparison approach to draw reliable inferences about the effects of PEPFAR will depend on whether the analyses can be adequately adjusted to make fair comparisons between PEPFAR and the candidate non-PEPFAR control countries. Adjusted analyses that statistically control for differences will be considered. The dependent variables in the adjusted analyses will be the relative or percentage changes in key indicators before and after introduction of PEPFAR. The before-after percentage changes in PEPFAR countries will be compared to non-PEPFAR countries, adjusting for differences in baseline variables and taking into account HIV activities supported from other sources. As with the comparisons among countries within PEPFAR, a major limitation is that there are important uncertainties in some of the key benchmark indicators used as the dependent variables, such as HIV prevalence and numbers of HIV-related deaths. In addition, there are a number of measures of interest for this evaluation for which data are not collected across PEPFAR and non-PEPFAR countries, which would limit the scope of this approach in addressing many of the evaluation questions drawn from the statement of task. DATA SOURCES AND ANALYSIS The extent to which the goals of this evaluation can be met depends on the availability of relevant and timely data. As described previously, most evaluation questions will require the evaluation committee to draw on data that go beyond the indicators that are reported centrally to OGAC. These data will have to come from a range of disparate sources. The availability of this data will partly depend on the feasibility of access within the timeframe of the evaluation. There will also be challenges of sampling and interpretation due to heterogeneous data sources with different data collection systems and criteria, as well as the potential for reporting bias in the responses to data requests from the committee. The approach for collecting and assessing data that could be used for the evaluation is described here and in the subsequent section on the workplan for the evaluation. Mapping of Data Sources The time and resources available for the planning phase did not allow for a complete mapping of all currently available and anticipated data sources in time for this report. In the

39 PROPOSED EVALUATION PLAN operational planning period the IOM staff, under the guidance of the planning committee, will continue an extensive data-mapping effort, expanding on the preliminary scan of data sources conducted during this strategic planning phase. The mapping will occur through document review, informant interviews, information obtained from domestic and international data requests, and qualitative methods used during three pilot country visits. The timing of this evaluation, with a final report to be delivered in 2012, dictates that the committee will be considering only data that are or will become available to the committee through 2011. This mapping will determine what data are available for each of the PEPFAR countries, providing the evaluation committee with a data matrix similar to the template that can be found in Appendix G. Some of these data sources will also be mapped for non-PEPFAR countries to inform the feasibility of the comparison approaches described earlierâthese approaches would rely on the availability of data from these countries, and on the willingness and capacity of stakeholders in non-PEPFAR countries to participate in country visits and other data-gathering requests. This mapping of available data will also include an assessment of the feasibility of collecting data from each source, taking into consideration the burden that additional data requests would place on each sourceâs resources and staff time. In addition, this data mapping will assess whether data from each source would require new data analysis in order to answer the evaluation questions posed by the committee. The categories and some examples of available data sources that will be mapped and, if available, used for the evaluation are listed in Boxes 2aâd. These include central USG data sources, data from multilateral organizations, country-level data from both PEPFAR and other sources, and data from additional sources, which may be from single countries or multiple countries. The applicability of specific data sources to address illustrative evaluation questions in some specific programmatic areas will be discussed in the subsequent sections of this report.

40 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS BOX 2a Central U.S. Government Data Sources Office of the U.S. Global AIDS Coordinator (OGAC) Reports and Planning/Guidance Documents: OGAC periodically releases reports of its activities as well as programmatic, policy, and reporting guidance for field programs. Most of the reports are requested by Congress or required under federal regulations. Guidance for field programs includes both formal guidance documents and other communications from headquarters to implementing partners and country teams. 5-year strategic plans Country Operational Plan guidance PEPFAR indicators reference guide (including the Next Generation Indicators Reference Guide) Programmatic guidance Partnership Frameworks and Partnership Framework Implementation Plans Guidance Public health evaluation guidance Reporting guidance for the annual program results (APRs)/semi-annual progress results (SPRs) PEPFAR annual reports and other reports to Congress PEPFAR operational plans Obligation and outlay reports PEPFAR State of the Program Area News to the Field Data Reported to OGAC through the Country Operational Plan Reporting System (COPRS II): As part of PEPFARâs monitoring and evaluation (M&E) of activities, countries are required to report program data through COPRS II. Countries submit two reports to OGAC annually (APRs and SAPRs), which include data from the essential and reported PEPFAR indicators* collected from implementing partners on all technical areas. Congressional Appropriations Bills and Conference Reports: The U.S. House of Representatives and U.S. Senate Committees on Appropriations and their appropriate subcommittees have the broad responsibility over the discretionary budget for global HIV/AIDS bilateral funding and the U.S. government funding for multilateral organizations such as the Global Fund to Fight AIDS, Tuberculosis and Malaria. Office of Management and Budget (OMB): OMB Circulars are instructions or information issued by OMB to federal agencies. These are expected to have a continuing effect of 2 years or more. PEPFAR funding for HIV/AIDS is subjected to OMB Circulars. PEPFAR Implementing Agencies Data: Program monitoring, evaluation, and research data as well as other relevant information over which these agency have oversight (e.g., principally the Office of HIV/AIDS within the Global Health Bureau at the United States Agency for International Development [USAID] and the Global AIDS Program at the U.S. Centers for Disease Control and Prevention).

41 PROPOSED EVALUATION PLAN PEPFAR External Evaluations: Reports of evaluations of PEPFAR conducted by other U.S. government agencies, including the Government Accountability Office. Congressional Research Service, OMB, and the Offices of the Inspector General for the Department of State, USAID, and the Department of Health and Human Services. These include reports on topics such as program management and implementation, coordination, funding allocations and oversight, technical assistance, harmonization, and program efficiency and effectiveness.a * PEPFAR guidance classifies indicators in three ways: by degree of importance and aggregation level (i.e., essential and reported to headquarters, essential and not reported to headquarter, or recommended indicators), by reporting level (i.e., direct program or national indicators), and by standard M&E classification (i.e., output, outcome, or impact indicators). SOURCE: Compiled from U.S. Government publicly available information and PEPFARâs website (www.pepfar.gov). a For example: CRS (2005,2007, 2008a, 2008b, 2009); DoS OIG (2008, 2009); GAO (2004, 2005, 2006, 2008, 2009).

42 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS BOX 2b Multilateral Donor and Other International Data Sources Multilateral donors and international organizations play an active role in implementing global commitments on HIV/AIDS and supporting these through funding and technical assistance. Data available from multilateral donors and international organizations are reported by national governments, which are generally required to report on the progress of externally supported HIV/AIDS programs. The following are examples of these types of sources of data: Joint United Nations Programme on HIV/AIDS (UNAIDS) Global HIV statistics and other estimates (e.g., âReport on the Global AIDS Epidemicâ and âAIDS Epidemic Updateâ reports) Frameworks and Indexes (e.g., National Composite Policy Index and Stigma Index) National AIDS spending assessments United Nations General Assembly Special Session on HIV and AIDS country progress reports Project and indicator data collected, analyzed, and reported through the Country Response Information System or the Indicator Registry United Nations Childrenâs Fund (UNICEF) HIV Statistics and other socio-economic statistics affecting child well-being (i.e., âState of the Worldâs Childrenâ annual reports) Frameworks (e.g., Five-year global campaign on children and AIDS) Publications (e.g., the âChildren and AIDS Stocktaking Reportsâ) Technical and policy documents Review of status of programs (addressing focus areas: preventing mother-to-child transmission of HIV; providing pediatric treatment; preventing infection among adolescents and young people; protecting and supporting children affected by HIV and AIDS) World Health Organization (WHO) Data and statistics (e.g., data on testing and counselling, mother-to-child transmission of HIV, antiretroviral therapy, and pediatric HIV) National Health Accounts Country-specific antiretroviral drug costs HIV drug resistance monitoring reports and literature WHO normative guidance and publications The International Health Regulations 2005 The Global Fund to Fight AIDS, Tuberculosis and Malaria (Global Fund) National Health Accounts Progress reports including technical support grants from PEPFAR Key performance indicators Global Fund five-year evaluation: Study area 3 reporta Global Fund evaluation country case studies

43 PROPOSED EVALUATION PLAN The World Bank Public expenditure reviews Project documents Analytic work/research Health and HIV/AIDS project evaluations Evaluation of HIV/AIDS support Bank-wideb Country assistance evaluations Organisation for Economic Co-operation and Development (OECD) HIV-related funding data Country surveys Evaluation studies Other Multilateral or International Data Sources: aids2031 reports and working papers Committee on the Rights of the Child, States reports Millennium Development Goals reports UNITAID reports European HIV/AIDS Funders Group Interagency Group for Mortality Estimation Funders Concerned about AIDS SOURCE: Compiled from the Global Fund, OECD, UNAIDS, UNICEF, WHO, and World Bank publicly available information and personal communications with individuals at these organizations. a TERG (2009). b For example: IEG World Bank (2009) and World Bank (2007).

44 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS BOX 2c Country Data Sources PEPFAR Country Sources: Program data and other information generated at the country level. Country operational plan (fiscal years 2004â2011) Partnership framework and implementation plan Prime and sub-prime partner reports OGAC indicators not centrally reported* HIV Programs costing data Other communications among country teams and implementing partners National Policy Documents and other National AIDS Response Information: Relevant national policy documents, strategies, and plans of action supporting PEPFAR activities and/or beneficiaries of PEPFAR-funded activities. National AIDS Coordinating Authorityâs strategy and framework Agencies or departments policy documents and plans (e.g., Ministries of Health, Finance) Country harmonization and alignment tool surveys National Health Information Systems: National health information systems play an important role in ensuring that reliable and timely health information is available for operational and strategic decision making about HIV/AIDS country programs. Census data Civil registration and vital statistics Ministries of Health and Finance data Health services records Population surveys (e.g., Multiple Indicator Cluster Survey, Demographic and Health Survey, AIDS Indicator Survey, Behavioral Surveillance Survey, and Biologic and Behavioral Surveillance Survey) Antenatal care surveillance data Facility surveys (e.g., Service Provision Assessment and Service Availability Mapping) *Additional essential/not reported to headquarters and recommended indicators beyond the 25 essential indicators reported to headquarters collected at the country and partner level. SOURCE: Compiled from publicly available information and personal communications.

45 PROPOSED EVALUATION PLAN BOX 2d Other Data Sources (Single or Multi-Country) Public Health Evaluations: Concept papers, protocols, and/or progress reports for each approved PEPFAR public health evaluation (PHE). PHEs are investigator-initiated studies intended to guide the PEPFAR program and future policy development, to provide evidence to the HIV/AIDS community on programs that work, and to identify gaps in knowledge that can be filled with timely program evaluation and research. Published Literature: Peer-reviewed journal articles, grey literature, and other reports relevant to PEPFARâs activities. These will address country-specific or program-specific studies as well as technical areas such as operations research of HIV programs, prevention of mother-to-child transmission, sexual prevention, blood safety, injection safety, intravenous and non-intravenous drug use, male circumcision, adult and pediatric care and treatment, tuberculosis and HIV co- infection, counseling and testing, health systems strengthening, gender-related HIV issues. Existing Modeling Data Sources for Costing: PEPFAR works with countries in estimating resources needs for interventions. At the country-level, PEPFAR uses several models including the ART Cost Project, HIV/AIDS Program Sustainability Analysis Tool, and Spectrum Policy Modeling System. Existing Modeling Data Sources for HIV Infections Averted: Since the numbers of HIV infections averted due to the implementation of a specific intervention(s) cannot be measured directly, modeling approaches provide a proxy to measure impact (e.g., models that estimate the efficacy of the intervention on incident HIV infection, models that describe how HIV infections are mediated by behavior, and models that compare incidence trends with the expected or baseline HIV incidence trends). SOURCE: Compiled from publicly available information and personal communications. Analysis and Interpretation of Data The evaluation committee will guide the implementation of the evaluation and data analysis, interpret the data, and deliberate to come to consensus on the findings, conclusions, and recommendations. Primary data and secondary data that require additional analysis will be analyzed, using appropriate statistical methodologies, by the members of the evaluation committee and, with the committeeâs guidance, by the study staff team, which will be augmented for the implementation of the evaluation with additional staff trained in statistical analysis and data management. In addition, the committee will use specific subcontractor services for some areas where there is specialized knowledge needed with a substantial time commitment above what the volunteer committee members can provide. For example, expert consultation will contribute to the design of the tools and methods for qualitative data collection and to oversight of the analysis of primary qualitative data collected during country visits, other structured interviews, and other qualitative methods. Expert consultation will also be used to advise and assist in designing and supervising appropriate data requests and quantitative/qualitative analysis

46 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS of secondary data. The committee will oversee all analyses performed by subcontractors to ensure validity and rigor as well as integration with the overall evaluation methodology. The committee, staff, and consultants will take steps to quantify the quality and completeness of the data used for the evaluation. For the primary data collected by the committee, the methods used to assure the quality of the data will be described in full in the final report. When existing data analyses are used, the committee will review and assess the methodology and quality of the data in the original analyses. When secondary data are requested and used for new analyses conducted by the committee, a request will also be made for a description of the data management plan, and the committee will assess the procedures in place to assure the quality of the data, including, whenever possible, parameters such as reporting rates, sampling frame, and data completeness. This information will allow the evaluation committee to assess possible reporting bias and data quality and take these factors into account to inform the evaluation committeeâs interpretation of the data based on the likely reliability and quality. In its final report, the committee will include an accounting of data requests as well as a summary and analysis of the data quality and completeness. This will include the number of data requests made and the extent to which these requests were completed as requested. For data requests within PEPFAR, this will also afford the committee an opportunity to assess the completeness and validity of data as a metric of program progress toward sufficient data collection capacity for M&E, a critical component of sustainability. Although the assessment of data requests and data quality will be reported in the aggregate, data request outcomes will not be linked in the report to the specific organizations that receive data requests to avoid inhibiting the reporting of data. WORKPLAN Operational Planning Phase As the culmination of the planning phase for this study, a transitional operational planning period will take place between the delivery of this report and the implementation of the evaluation itself. As described previously, the operational planning activities were intentionally structured and approved by Congress and OGAC as part of an ongoing planning phase, after delivery of the report, so that work on the evaluation could continue uninterrupted and so that the evaluation committee would not be starting de novo with respect to data availability and cataloguing, pilot-testing of instruments and methods, and development of relationships with relevant stakeholders. The results of these operational planning activities will be detailed in staff- authored planning documents for the evaluation committee as part of their background information and preparation to implement the evaluation. Activities in this period will be carried out by IOM staff and planning committee members and will be designed to further develop and refine the plan described here and to inform the implementation of the evaluation. The operational planning will focus on data mapping (sources and availability of relevant data); mapping of methods and data sources, including key indicators, to the mandated tasks and illustrative questions in order to refine and prioritize key evaluation questions and identify key indicators; developing procedures for data requests; initiating data requests; designing and initiating data quality review methods for data received; refining and testing country visit selection criteria; preparing country timelines and other background materials for PEPFAR countries, and developing country study frameworks

47 PROPOSED EVALUATION PLAN and methods for country visits. Some initial structured interviews with key informants will also take place during the operational planning period, and a final operational planning task will be continued relationship-building with relevant stakeholders such as contacts in PEPFAR countries and at implementing partner organizations. Initial discussions with OGAC staff about data availability and the preliminary data scan conducted during this planning phase revealed that much of the data that the committee will need are not available through the headquarters level. In addition, the committee learned that implementing partners, agencies, and countries do not necessarily have to share a lot of their available data with OGAC. In light of this, OGAC agreed to partner with the IOM to help facilitate, to the extent they are able, access to these data by making introductions to field, headquarters, and agency staff and by disseminating information about the purpose of the evaluation. An initial introduction was sent in a News-to-the-Field posting from OGAC on June 4, 2010, which explained the mandate for the study, the progress of the planning committee as of the posting date, the proposed data-mapping activities and pilot country visits during the operational planning phase, and that the IOM data requests, country visits, site visits, and interview would be entirely independent of the relationship of implementing partners and other country-level stakeholders with OGAC and other USG implementing agencies. It also assured that the evaluation is not a financial audit or evaluation of specific programs; that findings, examples, and comments would not be attributed without expressed permission; and that participation in the evaluation is voluntary. 14 Operational planning activities will also include the use of a qualitative research and evaluation consultant. This consultant will help to develop and refine data collection instruments and processes for country visits, in-depth studies, and other qualitative data collection; to determine what design issues, options, and qualitative methods beyond interviewing might be appropriate and feasible (e.g., content/thematic, statistical, or combination analyses; systematic triangulation; focus groups; direct observations for contextual information; town halls; photovoice); to plan logistics for field work; to test illustrative questions for refinement; to make determinations about the balance of breadth versus depth within the design options and data collection instruments; to develop audit trails to assure rigor of the fieldwork; and to train IOM staff in qualitative methods and the use of qualitative analytical software. In addition, pilot testing and refinement of field research methods and data collection instruments, with the qualitative consultant, will occur during visits to three PEPFAR countries, one bilaterally- and two multilaterally-funded. Implementation Phase The evaluation committee will produce one consensus report with its findings and recommendations. This report is targeted for delivery to Congress by Fall of 2012. The overall time line for the evaluation will be approximately 24 months. The first 18 months will be data collection and analysis, building on the activities of the operational planning phase. This will also include consultation with relevant domestic and international stakeholders, implementing partners, and others with relevant expertise. The remaining six months will include final data analysis and interpretation of findings, determination of conclusions and recommendations by consensus among the committee members, finalization of the committeeâs report, an 14 Personal communication from OGAC, June 4, 2010.

48 STRATEGIC APPROACH TO THE EVALUATION OF U.S. GLOBAL HIV/AIDS PROGRAMS institutionally-overseen peer-report review, report production, and briefings for the sponsor as requested. Over the course of the evaluation, the full committee will meet at least four times in person, with participation of the subcontractors and consultants. Additional virtual meetings will be conducted as needed using videoconferencing, teleconferencing, and web-based conferencing tools. In addition, working groups within the committee focused on specific content areas will hold in-person and virtual meetings, as needed, for ongoing deliberations as well as data analysis and interpretation. These committee activities will be augmented by ongoing communications, by telephone and electronic mail among the committee members, staff, and subcontractors and consultants. A summary schematic of the proposed work plan and timeline for the evaluation can be found in Appendix F. Adjustments may be needed to the timeline and work plan due to any delay in the start time of the evaluation phase or to uncontrollable external shocks such as man-made or natural disasters (e.g., Haitian earthquake), political instability that could jeopardize the safety of members in countries that are identified for committee visits, or unforeseen scheduling problems for traveling (e.g., the Icelandic volcano eruption in 2010).

Next: PART III Illustrative Evaluation Details for Assessment of PEPFAR's Performance and Impact »

Strategic Approach to the Evaluation of Programs Implemented Under the Tom Lantos and Henry J. Hyde U.S. Global Leadership Against HIV/AIDS, Tuberculosis, and Malaria Reauthorization Act of 2008 (2010)

Chapter: PART II Proposed Evaluation Approach

Welcome to OpenBook!

Get Email Updates