Important Points Made by the Speakers
- When planning an evaluation, the feasibility of collecting the necessary data is an important consideration.
- Standardized data collection and analysis methods can help assure quality.
- The time frame and budget of an evaluation are critical factors in designing data collection and analysis for a complex evaluation.
- Routinely collected program data can be a rich and efficient source of information for program evaluation.
- Financial data can help assess the efficiency of a program and the return on an investment.
- A large-scale data infrastructure that includes a wide variety of data sources could be a powerful research tool.
The sine qua non of evaluation is data, but it is also the rock upon which many hopes are dashed for evaluators and evaluands. In this session, six panelists discussed the importance of and strategies for identifying and assessing potential data sources. At the beginning of the panel, session moderator and workshop planning committee member Ann Kurth, professor of nursing, medicine, and public health at New York University, noted
the importance of data both for the design and for the execution of an evaluation. Data issues include the kinds of data needed across program components, the availability of data, metrics, the disaggregation of data, routinely collected versus new data, who owns the data, what format the data are in, and what mechanisms are used to share data. In addition, questions about data quality can arise, particularly when data are being collected by people outside of the evaluation team. A major decision point in the process of designing an evaluation is determining the feasibility of collecting and accessing all necessary data.
Kurth, who was also a member of the IOM committee for the evaluation of PEPFAR, went on to describe how, in designing the evaluation, the team mapped available data sources as well as those that might need to be collected against the evaluation’s goals and questions (IOM, 2013). The objective of the data mapping was to identify complementary data sources to address the evaluation questions using the program impact framework of inputs, activities, outputs, outcomes, and impact. In each of these levels of the framework, and in each of the technical areas evaluated, questions germane to the evaluation were developed. Data sources for answering the questions then were identified, whether monitoring, financial, surveillance, interview, document review, or other types of data. Where data were not available or ideal, the feasibility of getting the necessary data was assessed.
This mapping process took into account the priority of the questions to be answered, said Kurth. Not all questions could be answered given the project’s time frame, geographic scope, and data availability. Also, some data were available only for certain time periods or subsets of the program. The initial data mapping was driven by the need to understand what data sources were actually available and developed into an iterative process of matching data sources with evaluation questions and the data needed to answer them.
Workshop planning committee member Martin Vaessen, senior vice president at ICF International, discussed data issues involved in the Global Fund’s evaluation of the impact of collective efforts in reducing the disease burden of HIV/AIDS, tuberculosis, and malaria in 18 countries. All 18 countries, he explained, had national data records on all three diseases, but an extensive set of new data was collected via surveys in 8 countries, with data collection concentrated at the district level within a country. The evaluators did have a problem at the district level in that they wanted to classify
districts as high performing or low performing but found it difficult to get all of the district-level information that would enable that type of classification. As a result, district classification was not used for the analysis.
Vaessen commented that these data were not collected by the evaluators or even by the Global Fund but by local organizations with assistance from individuals tasked with carrying out the evaluation. The data collection effort was “a lot of work” and very difficult, said Vaessen. It was a challenge to go to each district in a country and figure out how many health facilities were operating, how many nongovernmental organizations (NGOs) were working in the district, and how many civil society organizations were providing particular services to HIV/AIDS patients. He emphasized this point because part of the data mapping exercise that should be done in planning an evaluation also has to include feasibility—is it really doable to collect the necessary data? “We need to define who it is that will actually access those sources and get that information,” said Vaessen. “It is key that we have strong local implementing agencies that we can work with, that listen to the people, that are working with them in terms of providing technical assistance, and that are open to working according to the guidelines established for the evaluation. This is not always the case.” In the case of the Global Fund evaluation, three countries dropped out because they did not want to participate in data collection. “Those are the realities we have to deal with,” Vaessen said.
Overall, the household surveys and facility surveys provided data that were of reasonable or good quality, but for most other information data quality was uneven across countries. For example, good financial information was almost impossible to obtain. One issue that arose was the need to pre-test some of the surveys in one country before expanding data collection to all the countries, an activity for which there was not always time given the timeline for the evaluation.
Based on his experience, Vaessen listed three steps that evaluators need to pay attention to when thinking about data collection. First, define all of the indicators that need to be measured. Preferably, he said, the indicators should be standardized and harmonized with other data collection efforts to avoid a proliferation of indicators that differ sometimes in a minor way but are not the same. Second, establish procedures for data collection, and decide on inclusion and exclusion of indicators. This step requires strengthening data collection systems based on an analysis of whether or not existing systems support the collection of good quality data. An evaluation or evaluation team may decide that “it may be too difficult or too cumbersome to collect certain indicators,” said Vaessen. Finally, establish procedures to carry out frequent data quality assessments to ensure that the data are accurate, complete, and timely and that they can be aggregated and analyzed.
Vaessen noted there is a separate country report for each country par-
ticipating in the Global Fund evaluation, and the richness of the information in the country reports is much greater than the synthesis report, which tried to draw general conclusions. These country reports contain more detailed information about data sources, completeness, and quality. This information is a contribution of the Global Fund that countries can use to improve data sources if they choose to do so.
One of the strengths of the AMFm evaluation was the ability to use good quality assurance procedures with regard to the primary data, said Kara Hanson, professor of health system economics at the London School of Hygiene and Tropical Medicine. The AMFm team was able to start with methods that were developed by the ACTwatch monitoring program 1 for conducting outlet surveys, which included sampling techniques, the use of training materials, and analysis plans (Tougher et al., 2012). “We were able to develop for all eight pilots standardized questionnaires and a strong set of training materials and standard operating procedures for undertaking the outlet surveys,” she explained. In addition, she added, “The team members participated in most of the training at baseline and at endline in all eight of the pilots.”
The AMFm team also developed common data cleaning guidelines and analysis plans and gave responsibility for the analysis of the outlet survey data to the contractors who collected those data. The independent evaluation team then reviewed the results, performed the analysis of the changes over time between baseline and endline, and integrated the quantitative data with the qualitative country case study data to interpret and understand what was going on in each country.
In terms of challenges, the timing of the outlet surveys proved to be important. In Nigeria, for example, the time between baseline and when the first drugs arrived in country was 15 months, while in Kenya the time between baseline and the arrival of drugs was only 2 months. Hanson noted that while the longer time period in Nigeria could be a source of bias, “We were fairly certain that not a great deal was going on in terms of antimalarial drug supply in that intermittent period, particularly in the private sector.” There were also significant differences across the eight countries between the arrival of the first co-paid drugs and the endline survey, as well as between scale-up of the information, education, and communication and behavior change communication efforts and the survey, with one country never implementing those efforts and two countries suspending them before
the endline surveys were conducted. “What this raises is the challenge of trying to plan large-scale survey operations and the unpredictability of the start of an intervention when you are reliant on these complex processes,” said Hanson.
Another challenge had to do with the availability and interpretation of household surveys. ACT use was one of the four outcomes that the AMFm evaluation was supposed to measure, but the collection of the primary household data was removed from the design even before the contract was issued because of cost, explained Hanson. The evaluation team knew that they were going to have to rely on existing surveys if the surveys fit the evaluation design time frame. Hanson noted that there was some ambivalence among AMFm stakeholders about whether use should even be evaluated given the short time frame between initiation of the program and the evaluation. In the end, the evaluation team relied on secondary data using some inclusion criteria. “We said that in order to be eligible as a baseline, a household survey had to be undertaken no more than 2 years before the beginning of the program and that the endline had to be at least 6 months after the arrival of the first co-paid drugs.”
In the end, five countries had appropriately timed endline data, but unfortunately, neither Kenya nor Ghana, the two countries that were believed to be fast moving, strong implementers, had appropriately timed household survey data. In addition to the limited availability of endline data, Hanson noted that most of the surveys only measured ACT use among children and that there was limited control over the design of the survey and the training given to the interviewers. It was also difficult to predict when survey data were going to be available for analysis.
A third challenge was tied to the 2009 WHO recommendation that antimalarial drugs be given with a parasitological diagnosis, which led many countries to focus on expanding access to ACTs. That action, said Hanson, changed the discourse on the balance between access to ACTs and ensuring that the drugs are going to people with parasites. AMFm was launched at about the same time, which led to a primary indicator being changed during the course of the evaluation period. As a result, use results were difficult to interpret.
Summarizing the key lessons from the AMFm evaluation, Hanson said the team learned about the importance of standardizing data collection and analysis methods to assure quality. They also recognized the challenges of mounting a large primary data collection exercise that is constrained on one side by epidemiology and logistics and on the other side by being dependent on countries for data that may not be forthcoming on the necessary timeline. Finally, relying on secondary analysis for something that turned out to be a key outcome was a limitation, said Hanson. “In fact, the TERG report
on the evaluation points out the absence of evidence on use. The pieces just do not all link up.”
The PMI evaluation had five objectives, Jonathan Simon, Robert A. Knox Professor and director of the Center for Global Health and Development at Boston University, reminded the workshop:
- Review management’s use of resources and management quality.
- Evaluate the program’s practices for getting the technical package of interventions into the focused countries.
- Evaluate the partner environment to determine if the PMI was in the right niche given the importance of other efforts such as those of the Global Fund.
- Evaluate the PMI’s impact.
- Make recommendations, which was not a data-driven issue.
“Within those five objectives, we had a number of different nails that we had to hammer, and we used different approaches or techniques or different types of data for that,” said Simon.
For the first objective, a qualitative management review exercise, the primary sources of data were key stakeholder interviews with those in PMI leadership positions, as well as global, regional, and in-country stakeholders who benefit from the initiative. This was a relatively straightforward activity, Simon explained, because the stakeholders had asked for the review and were readily accessible. For the second objective, which was to try to get at what the program was doing, the evaluators used a mixed methods approach because they needed to look both at quantitative data about the key interventions and at some qualitative data about strengthening health systems or capacity strengthening within national malaria control programs. Here, program-based data from the donors, particularly the Global Fund, were useful because they provided information on how many drugs were bought and how many mosquito nets were sent into a country, though they revealed nothing about distribution or consumption, said Simon. It was more difficult, he said, to get at the issue of whether the PMI was strengthening health systems or national malaria control efforts.
Impact evaluation was the core objective of the evaluation, and Simon acknowledged that the 5 countries in which the evaluation team did in-depth studies were picked in part because they had better data than did the other 10 countries, where better data was defined as at least two and in some cases three data points on change in all-cause child mortality. “We
did not have a direct measure of malaria-associated deaths averted, so we used the all-cause child mortality as a proxy for that, malaria being a big part of that pattern of death,” Simon noted.
Much like the AMFm evaluation team, the PMI evaluation team was able to use data from other sources, such as the Demographic and Health Surveys (DHS), for primary and secondary data on all-cause mortality reduction. “We know both the strengths and weaknesses of those data and how they are constructed,” explained Simon. Access to those data, he added, was not a problem. What was an issue was the quality of the data obtained from the in-country information systems and access to those data. In addition, little data analysis had occurred in many countries, yet those nations were reluctant to let the evaluation team analyze raw data. Access to raw data was also an issue with some of the larger philanthropies.
Simon concluded his remarks by situating the challenges of pursuing multiple sources of data as part of the complexity involved in conducting large-scale program evaluation on a short time frame and with a limited budget. “You are really in the realm of can you make a believable, plausible argument that associates the investments made by the global community and the national governments to minimize the impact of malaria to the activities that we were able to show did occur in terms of commodity, in terms of training, in terms of accessibility at health systems,” said Simon. “It is a leap of faith. A lot of this evaluation requires a healthy skepticism, and then at the end of the day everybody decides just how far of a leap are they willing to make.”
Drawing on the experience of the International Center for AIDS Care and Treatment (ICAP) as a large PEPFAR implementing partner supporting the scale-up of HIV services in approximately 20 countries over the past 8 years, Batya Elul, assistant professor of clinical epidemiology at Columbia University’s Mailman School of Public Health, spoke about the nuts and bolts of using routinely collected data and publically available data for program evaluation. In many of its evaluations and research studies on the delivery of HIV/AIDS services, ICAP regularly uses four data sources:
- Aggregate indicator data collected quarterly from 1.6 million patients at more than 3,000 facilities
- De-identified clinical data collected quarterly with institutional review board (IRB) approval from 960,000 patients at 311 facilities
- Annual clinical survey data from 1,017 care and treatment clinics and 730 laboratories
- Community-level data from between 50 and 75 regions that are mapped at the subnational level to the regions in which health facilities that ICAP supports are located.
With these data in mind, ICAP’s evaluation framework seeks to assess the variation in key HIV care and treatment program outcomes by site and determine the extent to which facility- and community-level factors are associated with outcome, Elul explained. This framework recognizes there is a substantial variation in the way HIV programs are scaled up both within and between countries and takes advantage of this natural variation to identify which approaches are optimal by using largely hierarchical modeling. Elul noted that, as is commonly the case when using routinely collected data, ICAP has to contend with data quality issues. It addresses these issues by conducting data quality assurance at the facility level, by checking for completeness and consistency, through automated checks into a web-based reporting and management system, and at the analytical stage.
As an example of the challenges inherent in working with this type of data, Elul cited an evaluation of how many patients on antiretroviral therapy were retained over time. This evaluation used data from three implementing partners in a single country. Data from one partner showed that they retained all of their patients, data from a second partner showed essentially no retention, and the third had great fluctuations from 100 percent to no retention over time. Elul noted that they are still determining whether this is a data quality issue or reality, but this is difficult because many of the sites are small, with cohorts as small as five patients. After discussions with the implementing partners, it was decided to remove specific data points only if the partners could provide very detailed documentation and justifiable documentation as to why these values could be considered poor data quality.
Even when implementing partners work hard to ensure data quality, monitoring and evaluation systems are often not set up to facilitate analysis. To ensure that data are accessible for analysis, ICAP uses unique site codes, geocoding of sites, and data dictionaries, and it built its monitoring and evaluation system to easily export a standardized analytic file along with a standardized data dictionary to minimize the need for data managers or programmers to create analysis files. To address potentially problematic issues involving data ownership, ICAP has established principles of collaboration with each host government for its evaluation framework. These principles include IRB approvals, the scope of the analyses, and the use of data in multicountry analyses.
In conclusion, Elul stated, “Despite all the challenges of using routinely collected program data, particularly when combined with publicly available data, they are rich, highly underutilized, and often the most generalizable
and efficient source of information for program evaluation. When merged together, these data become much more powerful and lend themselves to multilevel analyses.” That said, she added that it is essential to assess data quality at all phases of program evaluation or monitoring and that implementing partners need to do more to ensure their monitoring and evaluation systems facilitate data use for analysis.
As had been discussed previously, all data must be fit for purpose, and the purpose of financial data differs from that of programmatic data, said Victoria Fan, research fellow at the Center for Global Development. “While programmatic data is used to assess the effectiveness of activities, financial data helps us to assess the efficiency and the value for money of our investments. It provides a crucial denominator of just how much value or good was achieved for every pound, pula, or peso.” The goal of the Center for Global Development’s More Health for the Money evaluation was to examine the value of various global health funding agencies, with a focus on the Global Fund and its key partners. The evaluation looked at four main domains: resource allocation, contracts, cost and spending, and performance verification.
While there are many types of financial data, Fan focused on two types: budgets or planned expenditures, and actual expenditures and costs. She noted that data access and availability can be a serious challenge. In their review of Global Fund data, Fan and her colleagues found that out of approximately 20 countries that were the highest recipients of HIV/AIDS funding, 40 percent of the grants did not reveal any budget information in their country grant agreements. Of those that did, data availability and accessibility varied greatly. “We really had very little sense of where funding was going,” she said. In response, the Global Fund said that the data were available but not publicly accessible. “We argued that having such information publicly accessible was crucial for value for money given the large number of actors in this space,” said Fan.
Turning to the subject of actual expenditures and costs, she remarked, “The Global Fund should be commended for its groundbreaking price and quality reporting system, which reveals the prices and quantities of six main drugs and commodities.” To secure future funding, countries must report the prices that they obtained for their drugs to the Global Fund’s data system. But while the Global Fund has done a good job with price and quality reporting, Fan believes the organization needs to do more work on measuring unit costs. “Collecting such data and unit costs is not easy, and it involves facility surveys and multiple types of checks,” she said. As a counterexample, Fan observed that recently PEPFAR has done a terrific job of
collecting and using data on unit costs. In a pilot program in Mozambique, for example, PEPFAR was able to use such data to drive a reduction in the range of unit costs as well as in the average unit cost paid in the country.
For the past decade, Peter Elias, a labor economist in the Institute for Employment Research at the University of Warwick, has devoted his time to developing a large-scale data infrastructure to support research efforts in the United Kingdom. This data structure has enabled researchers to assemble the world’s largest household panel study, the world’s largest birth cohort study, and, most recently, a mechanism that will enable a link between a variety of administrative datasets and the United Kingdom’s national health survey data. In his presentation, he focused on an effort he carried out with the support of the Organisation for Economic Co-operation and Development (OECD) to take a science-driven approach to advance a global social science data agenda. This effort focused on digital data that were perhaps not designed for research but that would have research value if they could be made available, discoverable, useable, and fit for purpose. Such data include census data, administrative records, and records of transactions.
Elias listed several reasons for engaging in this effort, including enabling comparative work, providing the ability to consolidate data to study rare groups, and enlarging studies beyond national boundaries. Issues that need to be addressed include increasing the discoverability of data, using new forms of data such as Google Flu Trends or data mined from store loyalty cards, and developing new methods of collecting data that are more cost-effective than traditional survey methods. He noted that using new forms of data collected from the Internet or transaction histories raises ethical data access issues and it’s important to consider whether or not people have given their consent for the data collected about them to be used for additional research. Elias remarked on the potential challenges of consent in this context. He noted that, for example, the European Union is considering a law that would require all researchers to seek consent for all research, which for these new kinds of digital data not designed for research would be both costly and lead to strongly biased results. Nonetheless, he acknowledged that when a law is being written across 27 countries, there must be “some underlying public dissatisfaction with the way in which we, as researchers, are using data at the moment, and we have to address that issue.” Along those lines, it is important that the community address the issue of data security and data governance.
Another consideration is data compatibility across nations. “There are plenty of international classifications out there that we should use but that so often are not used,” said Elias. “We ought to have ways in which we
can collaborate to define what it is we are studying in ways that will enable us to share that information.” Equally important are having good metadata that describe how data were provided and collected and the means to preserve data and metadata on a sustainable basis. “We are beginning to wake up to the fact that a lot of our data preservation and management systems are inadequate for the modern age with the deluge of data that is now upon us,” stated Elias. He noted that many efforts are under way to address these issues. For example, the European Union funds the Council of European Social Science Data Archives that is now working to integrate large data archives in many countries.
In a report to OECD, Elias and his colleagues recommended that there needs to be more funding of research to explore the potential of new forms of data and more cooperation between official statistical providers and research communities. “In some countries, that hardly exists at all, which is quite incredible,” said Elias. “We need to have better coordination of data management plans so that we know more about data before they are created, … and we need to ensure that the international organizations are more connected.” More incentives for international data sharing are also needed, Elias remarked, as are ways in which people who take responsibility for the development of these resources are professionally rewarded for their efforts, given that few of these efforts produce publications.
In response to a question from Kristen Stelljes of the Hewlett Foundation about whether any of these efforts involve African nations, Elias said that South Africa has expressed a strong interest in joining the next stage of work. He acknowledged that it is often difficult for some of the poorer nations to join an effort that is largely being promoted by the wealthier nations of the world. It is not surprising that we end up with recommendations about data sharing that are doable for one group of countries “but remain absolutely out of reach for many other countries because of a lack of resources, a lack of knowledge, and a lack of expertise,” he said. “That is a real problem that we have to address.”
Data access and availability were prominent topics in the discussion session. Elias pointed out that politics can be a powerful influence on data access. “At one point you will find there is great access and great cooperation, and then suddenly the barrier comes down and you cannot do something or you cannot publish something or you cannot take any data away or you cannot bring in people who you want to bring in.… It just hits a brick wall.”
Simon pointed to four issues involved in data access. Having a strong relationship with investigators in a country can enable access that would
not be the case with less strong relationships. “Level one is who do you know.” Second, scientists tend to be more willing to share data than science managers, particularly governmental science managers. Third, some countries are more sensitive to sharing data than are others. Finally, some countries are more reluctant to share certain types of data, such as biological information, than other types. Vaessen added that writing provisions for data sharing into program agreements can avoid later problems.
Simon observed that universities are among the worst institutions in making data freely available. The open data movement is putting pressure on institutions to release data and research results within given time frames, and universities can support this movement, for example, through line items for data publication and archiving. “It is on the academic community to push harder on these issues.”
Elias said that data archiving and accessibility are especially important with new forms of data, such as information gathered from online activities. Ethical issues as well as issues of reproducibility will surround these data types.