Peter Elias, a social scientist affiliated with the United Kingdom Economic and Social Research Council, and an early and still-active member of the Public Health Research Data Forum (PHRDF), opened the next workshop session. He began by commenting that there is no major challenge facing the world’s population that does not have social science causes or consequences or both, and that includes public health.
He also noted that it is important to focus on the practicalities and the logistics of data sharing. These include ensuring that health research datasets are accessible and usable for researchers and other users, and maximizing the potential to link datasets—those collected for the purpose of research with those collected for other purposes.
MAXIMIZING ACCESS AND RE-USE OF RESEARCH DATA: LESSONS ABOUT OPPORTUNITIES AND CHALLENGES FROM THE SOCIAL SCIENCES
Myron Gutmann, professor of history and director of the Institute for Behavioral Sciences at the University of Colorado Boulder, and former assistant director for social, behavioral, and economic sciences at the U.S. National Science Foundation, gave a keynote talk on lessons from the social sciences.
A Bit of History
According to Gutmann, sharing of data has been going on in the social sciences for decades. The Roper Center, now located at the University of Connecticut, was founded by the Roper Organization in 1946 to enable sharing of polling data. The Inter-university Consortium for Political Research, currently the Inter-university Consortium for Political and Social Research (ICPSR), was founded in 1962. In the early 1960s, census micro data and major surveys began to be shared and these publicly available data became the basis for research in the social and behavioral sciences. Several factors, including the rise of social science disciplines and nongovernmental research, coupled with an interest in social science immediately after World War II, led to an interest in increased understanding of social processes and investment in social science. The availability of new computer technology in the 1960s enabled researchers to actively use and share data about society.
At the same time, Gutmann explained, networks of repositories of data rapidly began to form. Initially, they were national in scope and very specialized, but soon, like ICPSR, broadened to an international scale. The Council of European Social Science Data Archives, almost as old as ICPSR, is an important European network. The creation of new mechanisms for technology, including the Web, led to virtual networks. The virtual data center at the Institute for Quantitative Social Sciences at Harvard University has spurred important activities in cross-archive networking. Access to data available through the ICPSR doubled with the advent of the Web. Gutmann said many social science surveys operate with the assumption that data will be publicly available within a short timeframe, which makes the surveys important resources.
Data Sharing: U.S. Context and Literature
At the same time, there has been growing scientific opinion and discourse tied to issues related to data sharing, Gutmann said. Studies at the U.S. National Academy of Sciences1 have addressed issues such as privacy and confidentiality (see, for example, Protecting Participants and Facilitating Social and Behavioral Sciences Research ). Conducting Biosocial Surveys (2010) discussed confidentiality issues in detail, while Putting People on the Map (2007) discussed the complexity of sharing data that have specific geographic locations in them. Standards for metadata have helped make it possible to understand and combine data but doing
so requires resolution of sometimes complex confidentiality and privacy issues.
One indication of the current acceptance of data sharing in the United States is a policy issued in 2013 by the U.S. Office of Science and Technology Policy, which called for all federal agencies that support research to make data and publications publicly available in a timely manner. Developments in the production and analysis of data will continue and will raise new issues. The availability of social media data, the increasing role of administrative data, and the use of commercial data in research create a broad innovation space for creative analysis of social problems, Gutmann said. They also lead to analytic issues tied to the size and complexity of integrated data and how to use them in a meaningful way, as well as how to know what inferences can be drawn from combined data that may have uneven coverage and uneven quality.
An overriding consideration is confidentiality protection. Gutmann suggested the answer to how to protect people is to have a graduated system that provides various means of protection, with the access difficulty increased as the risk of disclosure increases. For example, very simple data with little risk of disclosure could be accessed via the Web. For complex data with very little risk of disclosure but some risk of harm, people could sign a contract not to share these data. There may be cases where there are data with a very high risk of disclosure or a very high risk of harm. There, he suggested, the answer may be an enclosed data center. There are all sorts of ways of dividing up the gradient and providing a wall between a potential intruder and the data, he said, as well as other efforts such as to limit data, alter the data, provide secure access, and simulate data.
Gutmann said an approach commonly talked about in census data samples is swapping. For example, for two people in two locations who are quite similar (e.g., age, number of children, etc.) where only the income is different, the income is “swapped” to make it harder for an intruder to really know whom they had found.
Gutmann reported that in his own work looking at the relationship between agriculture, population, environment, social change, and health in the United States, he is seeing a move toward large integrated data collections. What that means, he said, is that single repositories are impossible to imagine as being the only solution.
In his view, this requires new kinds of infrastructure and strategies for confidentiality protection. Although one of the advantages of distributed data is that the linkable information is not all in the same place, it also becomes much harder to get to and the analytic requirements are large. “We have to think then about how we are going to deal with scale and how we are going to deal with human data reporting,” he said. “Yet, if we want to bring them together at a global scale, we are going to need to find a way to do this.” On top of that, he said, data sharing will require thinking about the policies, laws, and culture that vary nationally and sometimes regionally (e.g., the European Union) or locally.
Recipes for Long-Term Success
Gutmann suggested examples for long-term success exist in the social sciences. “It requires that we engage communities. It requires that we have high-quality data collection and management. It requires that we have rapid and easy data sharing. It requires that, most of all, we . . . build capacity steadily, rapidly by training data users, training data prep managers, training everyone involved, bringing together these engaged communities to talk about the important problems that we solve,” he stated.
A participant posed the question of which stakeholders should play which roles. Gutmann responded he sees a distributed and divided responsibility and everyone has a role to play. Institutions need to define a policy sphere in which to get effective research done. Communities have the role “which is both to do our jobs and to do them well, but also to keep pushing our institutions to understand that we cannot solve the problems we need to solve without their taking an active role,” he said.
In response to another participant’s question, Gutmann commented that to the extent that a culture can be created to share data, it should happen. His position is that data should be made as public as possible. But, he pointed out, the incentives for research subjects and researchers have to be in place. Related to this, he observed most of the workshop participants have very specific interests in improving health in their communities, and that may provide a different set of incentives than those of, for instance, an assistant professor of political science in the United States or in Northern Europe.
ENABLING DATA LINKAGE TO MAXIMIZE THE VALUE OF PUBLIC HEALTH RESEARCH DATA: A PHRDF REPORT
Felix Ritchie, a professor of applied economics at the University of the West of England (UWE) in Bristol, and his co-author, Alex Montgomery,
Ritchie began by saying that the PHRDF report endorses almost everything raised in the plenary presentation. It draws primarily from high-income countries (HICs) because there is very little in the literature about low- and middle-income countries (LMICs). To try to address this gap, the report includes case studies based on interviews with people who work in LMICs. The aim of the report was to think about how data linkage could boost public health research and the barriers to useful data linkage. He said it focuses on what is practical and useful, rather than being exhaustive.
The project team represented the business school and the public health school at UWE, DataFirst, and the Center for Injury Prevention Research in Bangladesh. It was designed to offer a range of perspectives by including a mix of people from different socioeconomic backgrounds and work perspectives, including data access, clinical work, and epidemiological work.
The project included a nonsystematic literature review, formal and informal interviews (face-to-face, telephone, and via Skype), and the subset of interviews that served as case study examples. These methods were supplemented with the team’s own experience and personal understanding of data access.
Key Findings of the PHRDF Report
Ritchie summarized the main findings of the report:
- Change the tone of the debate. According to Ritchie, “data should not be used for research or linked unless it can be done safely and securely” and “data should be available for research and linking unless it cannot be done safely and securely” functionally mean the same thing. But the default of the first statement is that data sharing is closed and the default of the second is that it is open.
2DataFirst was set up by the Mellon Foundation Fund to share survey and administrative micro data and increase skills among African researchers. It is an ongoing repository and curates data to international standards. DataFirst efforts include versioning, quality control, and disclosure control, and it provides an online help desk to people who use the data in their repository. The group also runs workshops to train data analysts. For additional information, see https://www.datafirst.uct.ac.za/ [August 2015].
3 The full report can be found at http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Public-health-and-epidemiology/WTP056860.htm [August 2015].
He suggested that the debate needs to be shifted from a default-closed position to a default-open position.
- Policy decisions need to be more evidence based. Researchers using data for research purposes are generally viewed as low risk, but things can still go wrong. In the wider context of protection for data, academic research use of data is a low-risk activity, but that message is not widely understood outside the data community. The academic literature, which guides decision makers, focuses on intruders. However, usually when problems occur, it is because someone made a mistake.
- Narrow informed consent is not enough for good epidemiological research. When talking about linked data, a lot of data that might be linked has not been collected with consent for research. For example, administrative data or census data were not provided with consent for use in statistical research. Often, when data are collected for health research purposes, broad consent is acquired for that data. Ways are needed to link data where there is not consent or it is not sensible to acquire consent when used for epidemiological research.
- Maintaining good relationships is the key. The project’s case studies showed that good relationships are the key, Ritchie said. If the stakeholders, ethics committees, and users are on board at the beginning of any data-sharing project, everything goes much more smoothly.
- Incentives to manage and share data are weak. Data funding is often tied with research funding, but the interest is in getting the research out. The incentives to work on data are weaker.
- Different things happen in different places. There may be a hierarchy of problems, from data to organization to institution, and the problems experienced differ for HICs and LMICs. In some places, the issue is getting access to data; in others, it may be linking the data, getting useful data, or being able to use the data held in a different system.
High-Income Country Experience
Ritchie reported that from the perspective of the HICs, there are problems with data and with organizational and operational issues, but institutional issues are dominant, such as relationships with the people who deposit the data, with ethics committees, and with the general public, and unrealistic risk assessment and worst-case scenario planning may result. According to Ritchie, stakeholder management works well in HICs. Stakeholder management involves both early planning and education/
communication. In HICs, it includes talking with ethics committees when they do not have expertise in the topic at hand.
Low- and Middle-Income Country Experience
There is less information about LMICs and data sharing, according to Ritchie, but based on the information they had, the LMIC experience is dominated by operational and quality issues. For example, publicly funded health data are held by institutions and universities, and only available to collaborators, rather than having research facilities in place to share those data. Although international funders have data-sharing requirements, there are not very strong data-sharing requirements for national funding bodies, and data-sharing requirements of international bodies are not enforced. LMIC experience tends to be focused on “pools of expertise” rather than a critical mass of researchers, to ensure robustness and enable sharing with colleagues.
Montgomery talked about the experience of several projects in South Africa, which is in a transition from a LMIC to a HIC, as a case study. South Africa has linkable data and a number of projects have been undertaken, including a project to link data from the Agincourt Health and Demographic Surveillance System with other data sources. Their data-linkage efforts ran into various operational and statistical barriers, and ethical concerns were raised. The Department of Health is trying to operationalize due diligence by setting up a preapproved database and procedures for using it to increase researcher access to the linked data.
Changing the Conceptual Framework
The Wellcome Trust’s report Enabling Data Linkage to Maximize the Value of Public Health Research Data (2013) recommends changing the conceptual framework around data sharing. Ritchie observed commonality in the views of many participants and a lot of knowledge about what works, but the knowledge is sometimes in the wrong place and not accessible to decision makers. He suggested information needs to be made available to avoid continual reinvention.
A participant suggested the need for strong case studies where data linkage led to a finding that subsequently led to a policy change and impact. Ritchie responded with an example of a cohort study with a control group that was looking at the use of statins in the prevention of cardiopulmonary disease and stroke. It covered about a 4-year period and seemed to be showing a significant difference. The study was then extended by data linkage for a further 25 years. The discrepancy in mortality rates greatly expanded. In this case, an expensive cohort study was extended at virtually no cost to prove something quite remarkable.
Developing Practical Guidelines
The PHRDF report also recommended developing practical guidelines and measures. According to Ritchie, “everything has been solved; everything has been done somewhere.” But, the available information has to be more widely known. Ritchie highlighted what he considered to be some primary issues for LMICs:
- Establishing research data infrastructure to support health data usage and linkages, such as DataFirst’s free data position and archiving service for all African organizations.
- Building quantitative skills, expanding from having pools of experts to more critical mass.
- Data management, with data collection only part of the overall research process.
- Targeted funding for good data collection and curation.
Lynn Woolfrey, from DataFirst, stressed infrastructure is very important to support data linkage in African countries. The secure data service that enabled the project team to get data “would have never been in the research domain.” Woolfrey pointed out the secure data service is open to all researchers, not just African researchers.
Data Sharing/Linkage Case Studies
A participant observed that it may take a long time to address the concerns raised, but there are multiple examples of how data linkage can and has improved health or informed policy issues. He pointed to a Scottish study on a specific insulin medicine that put two datasets together, with immediate implications for practice. Another participant shared an example of linking population-based HIV sero conversion data obtained by household surveillance to antiretroviral treatment data from public clinics.
Ethical Review Boards
Participants had a lively exchange about the role and perspective of ethical review boards in reviewing and approving research involving shared data. Some felt that the review boards, also known as institutional review boards (IRBs), stand in the way of research involving data access or data linkage because of a lack of understanding, which can lead to an adversarial relationship. Others pointed out that ethics boards have their
own responsibilities and requirements; they are not being intentionally obstructionist. One of the report authors shared that some researchers in the study said they felt that if review boards feel the need to do a full review of an approved proposal, it can be taken as the board thinking the researcher is not competent. Another opined that ethics board reviews of secondary analyses should not have to go through a full ethics approval process if the data are available through archives developed for researchers.
A participant shared an example of research he had done with hypersensitive studies designed to re-identify people. His research team went to the IRB well before submitting an application for the research and conducted a series of workshops with the IRB members. He said it made a big difference and is the kind of thing needed when “doing anything sensitive.”
Another participant echoed this sentiment, saying “you have to think about how you can help them to do their job in a way that they feel comfortable with.” A participant representing an ethics review board reinforced this, suggesting an engagement strategy with the board and not just submitting a “complicated proposal and expect[ing] everybody to get up to speed.” To provide an example, a participant shared that when HIV prevention trials were starting, the World Health Organization (WHO) ran global education programs for ethics committees to help inform them of issues and empower them to competently review these kinds of proposals. Another participant reported H3Africa does similar work. The Global Alliance for Genomics and Health4 is also engaging in a process to think about ethics equivalency.
The discussion closed with a comment by a participant about the importance of changing mindsets of both ethics oversight bodies and funders. To engage in data linkage requires good access to data.
BUILDING PARTNERSHIPS FOR DATA SHARING, LINKAGE, AND RE-USE
This panel session picked up on earlier points about building partnerships with statistical authorities, data controllers, data owners, those responsible for the ethical control permission of research, research communities, and funders.
The ALPHA Network
Basia Zaba, professor of medical demography at the London School of Hygiene and Tropical Medicine and head of the African Longitudinal Population-based HIV data on Africa (ALPHA) network, presented on the network.5 ALPHA is a network of 10 community-based HIV study sites in Eastern and Southern Africa. All the studies existed before the network was formed and have their own independent scientific programs. Some have partners in the North; others are independent organizations. They came together because they realized that they were addressing similar questions in different ways.
Initially, most of the studies ran protocols that they call informed consent without disclosure. If subjects wanted to know their test results, they could obtain them, but the studies did not insist that people learn their HIV status as a result of participating in the testing programs. This has changed over time, and gradually some of the sites are moving toward an expectation that everybody who is tested will want to know her or his status in order to access treatment.
According to Zaba, the real strength of the ALPHA network is the follow-up with HIV-positive members. They have identified nearly 50,000 people who are HIV positive and have over 150,000 person years of follow-up with them. This enables researchers not only to do very in-depth analyses of mortality, but also to investigate risk factors for acquiring HIV and to do direct measurements of HIV incidents, which are not available from other sources.
Self-Interest as a Data-Sharing Imperative
The motivators for the ALPHA network’s members to share data with each other “were the concerns of the people, our clients, who needed our data,” Zaba said. The Joint United Nations Programme on HIV/AIDS (UNAIDS), WHO, and the Global Fund all wanted data from these sites, but they wanted to know that the data were generalizable. These sites were chosen because they are places of interest or places where it was feasible to do field work. If all the sites show similar results, then the people who need these results know that there is a far better chance of the results being replicable across a lot of settings.
Zaba commented that when they pool their data, the statistical power is much greater than if each site only has its own data to look at. But while all the sites may have looked at HIV and fertility “somebody has done it one way, somebody has done it another way,” she said. By solving the
technical problems of data sharing together, “we are gradually learning also how to share our data with the big wide world outside.” Trust is not really a problem for the network, she said.
Shared Data Resources
The network has built up an impressive shared data resource, Zaba said, which consists of the demographic episodes and events being experienced by the people followed; HIV test dates and results; and various linkages between the individuals observed (e.g., co-parenting, co-residence), sociodemographic data (e.g., education, marital status, sexual behavior), and data on verbal autopsies and cause of death. Africa has no death registration system or medical certification of cause of death. Data rely on the reports of people who have cared for the deceased during their final illness to describe the symptoms experienced, in order to get some idea of the likely cause. The network also has self-reported data from interviews in the community about HIV care and treatment and has recently embarked on linking with the clinics that provide treatment and care. The Masaka site has its own research clinic, but most ALPHA member sites use government clinics, which serve their populations and are gradually building up data linkages, he explained. The network has a current grant application into Welcome Trust to make their data tables publicly sharable.
Zaba described the ALPHA approach to capacity building, as “capacity bootstrapping.” ALPHA has a scientific advisory committee that involves principal investigators from all the member study sites. It also involves people from the London School of Hygiene with statistical expertise and has other outside members from WHO, UNAIDS, and other organizations. The scientific advisory committee chooses research topics to do a literature review, looking at what different sites have done on that particular topic.
The members agree on a series of basic analysis and then very carefully specify a harmonized dataset that every site has to supply in order to do this analysis. The harmonized dataset is the minimum dataset that is required for the analysis. But it is also the lowest common denominator, in terms of the categories that all the sites can achieve. All the sites can recode their data to produce these harmonized datasets. They then organize workshops to discuss the public health rationale, the epidemiological theory, and the statistical theory for the analyses. Very importantly, the workshops are not just for the analysts. The ALPHA approach to capac-
ity building is also aimed at data managers, as well as more traditional epidemiologists and others who analyze the data.
They agree on what kind of joint analyses can be done and which might be using the pulled dataset, then they follow up with publications. ALPHA findings have been showcased in several special issues of journals. They are branching out into policy analysis and health facility studies with funding from the Gates Foundation and the Wellcome Trust.
Technical Issues: Harmonization, Documentation, and Data Linkage
Data harmonization has made their data more valuable, Zaba said. The process has helped them to understand what the users want. When the ALPHA projects started, the data collected were very simple, focused on incidence and prevalence trends. The data have gotten more complicated over time and now include clinical linkages. Harmonization gets more complicated as the data do.
In Zaba’s view, “big data” constructs are more challenging in LMICs. For example, except in South Africa, there are very few indexing identity (ID) variables, like national IDs. There are no Social Security numbers or post office codes. While mobile phone use is common, there are no personal phones in rural Africa. There is not the same kind of personal identification of a phone number and an individual as in other locations. There is also much less certainty about dates of events and far more variability in the rendition of names, even name order.
According to Zaba, there is a lack of statistical theory for linkage failures. In addition to missing links, they also sometimes have real multiple links because somebody came back into one of the studies and was not identified as a returning migrant, causing additional statistical challenges.
The ALPHA network collaborates with other networks, including INDEPTH, Idea (a network of HIV clinical cohorts), and the HIV modeling consortium. A participant observed 8 of the 10 ALPHA sites are also in INDEPTH and asked about the overlap and motivation for having two networks rather than extending one of them. Zaba responded that INDEPTH is older than ALPHA, and some of the ALPHA sites joined INDEPTH after being in ALPHA and learning of the benefits of INDEPTH. ALPHA is very specialized in that all study sites do community-based HIV surveillance. They are hoping to use the INDEPTH data archive.
Zaba closed by emphasizing funding problems. The sites that contrib-
ute the data “are not the favorite for most funders. They are sort of boring workhorses of the research world, rather than open-ended, basic observational studies,” she said, noting they are the platform for other research projects. They may have to tax the projects that build on their platforms in order to make sure that the health and demographic surveillance system studies survive. Zaba also commented on the lack of mobility of her employees—researchers from the North can easily work for years at a time in most African countries, but researchers from African countries find it difficult to get employment permits to work in each other’s countries and almost impossible to work in the North.
The WorldWide Antimalarial Resistance Network (WWARN)
Karen Barnes, professor of clinical pharmacology at the University of Cape Town and director of the pharmacology modules for the WorldWide Antimalarial Resistance Network (WWARN),6 presented on WWARN’s efforts to bring the antimalarial research community together to make malaria treatment more effective.
WWARN was created with the mission of providing the information necessary to prevent or slow antimalarial drug resistance, to make sure individuals have the most effective treatments, and to thereby prevent malaria morbidity and mortality. The network met for the first time in 2004, got their first planning grant from the Gates Foundation in 2007, and became firmly established in 2009. She observed that much of their thinking happened in parallel or before issues around data sharing.
Their efforts focused on the malaria community in malaria-endemic countries that had seen drug resistance to chloroquine and then sulfadoxine-pyrimethamine. They had very clear data that it took between 4 and 8 years between having clear evidence of failing treatment to having a change in policy implemented. During that time, they estimated about 112,000 extra deaths happened each year as a result of staying with failing drug policies.
She pointed out WWARN was created to detect resistance early and speed up the time between when resistance is known to development of a more effective treatment policy. That sort of mandate, she said, made it easy to attract people with similar goals to work together. There are currently over 230 partner institutions across the globe, with leadership of WWARN across many sciences. Cape Town leads the pharmacology module and Bangkok leads quality assurance/quality control. Oxford is the hub, but the project is very global. Barnes reported that one achievement is having over 100,000 clinical trial patients in malaria studies. Two-thirds
of the artemisinin combination therapy data published to date are already in the WWARN data repository.
WWARN works with the malaria community to collect data on the clinical efficacy of drugs, the molecular markers associated with antimalarial drug resistance, as well as in vitro data of drug resistance. One arm looks at pharmacology to separate out poor-quality drugs from true resistance.
WWARN has an ability to link data in all those domains and to link metadata at all study sites. At first, it was considered a major deterrent if WWARN was specific about how the data needed to be submitted—the approach was to take what they could and make it their problem to make it compatible. Oxford helped ensure that the bioinformatics technology was secure. The project spent a lot of time on curating data and checking data quality, which automatically gave feedback to the people contributing their data. In return, the contributors got very detailed reports generated often in a matter of weeks, which they could then use to publish their data much more quickly with tables, graphs, and other depictions.
Data are standardized so that they can be reanalyzed. Publications are not the primary goal. Barnes pointed out that WWARN provides numerous free tools to help researchers with planning their data; provides templates, protocols, and tools for analyzing data; and generates automated reports.
WWARN also runs a proficiency-testing program for laboratories, pharmacology and in vitro drug quality laboratories. They send out samples and ask what concentration recipients think is in the sample; the results are anonymized. She reported that with each round, people are getting closer and closer to the target concentration. She also noted that the laboratories in the North did not outperform the laboratories in the South.
WWARN works on presenting data visually, such as maps, to allow policy makers, health care workers, and others see what the data mean so they can see the need to change policies, which is the ultimate goal. Data visualization makes the findings more accessible and in interactive format, allowing the data to be interrogated in specific locations. A map can illustrate, for example, that ASP resistance is a problem in East and Southern Africa, but less of a problem in West Africa. Data on drug quality can also be visualized, with an indication on a map of every place that had a report of a substandard counterfeit antimalarial.
A Success Story
WWARN’s hardest challenge is to slow resistance and develop the antimalarial drug pipeline. The next wonder drug is at least 5 years away, she said, so the task is to make the current drugs last for as long as possible. They have worked to identify factors to promote resistance, optimize regimens, and then target interventions appropriately.
For example, the dosing of dihydroartemisinin-piperaquine in young children shows the promise of their work and the value of their approach. This drug is the artemisinin combination that is currently available that has the best potential to last until new drugs are available. They pulled the available efficacy study data done with this drug, and the main result showed almost a 98 percent cure rate.
Because they had such a large dataset linked to enough other details, they were able to identify that the youngest children, ages 1 to 4, who in endemic countries are those without immunity, had a four-fold higher risk of recrudescence. They also knew that the drug concentrations were generally lower in this population at the currently recommended dose. They were able to determine that increasing the recommended dose slightly could halve the risk of treatment failure, and achieve the WHO goal of more than 95 percent cure rates.
Barnes acknowledged concerns about the potential risks of “just pushing up a dose.” To address that, they pulled all the pharmacokinetic data of drug concentrations that had been measured, from all WWARN sites that had measured them, and modeled how to shift the dose to make sure that the minimum exposure was high enough, but the maximum exposure was not too high. They provided their modeling to WHO, which has changed the treatment guidelines that will come out this year, to include a higher recommended dose for children ages 1 to 4. The project hopes that the increase will enable the field to hang on to this useful drug longer.
The availability of large amounts of data in a variety of areas helps WWARN to identify where more data are needed and enables them to use data to support effective interventions. For example, sulfadoxine-pyrimethamine is not generally recommended as a treatment but is used as preventive treatment in young children in areas with seasonal malaria transmission and in pregnant women. By mapping resistance rates, WWARN can help target studies in the right places, increasing efficiency. The project has the potential to preserve the efficacy of available antimalarials working on optimizing dosing regimens, Barnes said. They are particularly worried about dosing for malnourished children at the moment.
Barnes concluded by saying that as a data center, WWARN has developed the scientific and ethical rationale for data sharing and has the potential to provide long-term secure data storage and to help their data contributors meet the requirements of journals and of regulatory agencies. Their work can help make drugs last longer and be used to help inform the development of new drugs. The assumption in the past has been that the same milligram-per-kilogram dose is going to work for everyone, which is not the case. WWARN provides accurate, useful intelligence to inform this process.
Barnes observed that WWARN started at a time when people were not expected to share data. In WWARN’s participatory mode, people are less concerned that their data will be misused in secondary analysis, but WWARN does provide expertise and insight to make sure that secondary analyses are valid and reliable. She reflected that there is capacity building at every level and that the feedback on data, training programs, and tools lifts up everyone’s quality of work, North and South. Taking initiatives like WWARN forward will depend on giving people support to avoid further separation of established and emerging researchers, she said.
Statistics South Africa
Dan Kibuuka, a director at Statistics South Africa (Stats SA)7 responsible for health statistics and for managing the acquisition, collection, and analysis of health information from household-based surveys, talked about what Stats SA does and what they are capable of doing, and the interactions that have taken place. He reflected that the issues discussed at the workshop are different in the government context. His data-collection work is supported by law. Stats SA is a government department tasked with collecting, processing, and analyzing data. They assist other government departments that do not have the capacity to collect data through household surveys since they have an infrastructure that reaches to the remotest district of the country. They also assist nongovernmental organizations and other government departments in the collation and analysis of administrative data.
Before a survey is conducted, Stats SA holds user consultation workshops. They consult with the National Department of Health in particular to determine the current issues in the country before going to the field with questions. Stats SA has several surveys with health data:
- General House Survey (annual)
- Living Conditions Survey (every 5 years)
- Income and Expenditure Survey (every 5 years)
- Community Survey (every 5 years)
- Census (supposed to be every 5 years, but funding limitations have sometimes prevented that and required a large community survey instead)
Stats SA works with the National Department of Health, which has administrative health data, especially data from the District Health Information System (DHIS), to compare data they collected through surveys with data from the DHIS. The Department of Home Affairs provides Stats SA with death certification information that Stats SA analyzes and uses to produce an annual report. Stats SA is also coordinating with colleagues from the South African Medical Research Council and the Human Sciences Research Council to pool resources. They will together run the South Africa demographic survey.
Stats SA collects data on such topics as disability, immunization coverage (beginning in 2016), childhood diseases, causes of death, and out-of-pocket medical expenses, with sample sizes of about 93,000 people. They have been able to respond to stakeholder issues by adding questions to existing surveys. For example, data on immunization coverage was added for the Department of Health because of concerns raised by WHO. Similarly, information on childhood diseases was added for the Department of Health, with guidance from them on how to ask questions.
The combined data are a rich source of information, he said. He specified a few factors that enable sharing, linking, and re-use:
- Know which health data are being produced by whom, then find out who collects the data and explain how you want to use the data.
- Know the frequency of production. Researchers can plan publications or other targets based on the data-production timelines.
- Know the procedures for acquiring and accessing the health data.
- Know what can be offered through networking and collaboration.
Data linkage at Stats SA is in its infancy, but they are evolving to a more sophisticated linkage model that involves working to move beyond basic descriptive reports and aiming to do more analyses. They are exploring whether they can link their health survey data to causes of death, for example. He welcomed researchers to offer partnerships with Stats SA and to offer research skills in exchange for data access. Stats SA also has a well-structured data repository to support data re-use.
Kibuuka shared his view that three things can make partnerships fail: competition, lack of trust, and funding.
A participant commented that the examples presented illustrate the power of pooling data and asked why fundraising is difficult. Zaba responded that repeated demographic surveillance is not viewed as cutting-edge research, and funders seem primarily interested in intervention trials. Zaba also commented many sites are used to being independent and “don’t want to limit themselves just to being funded for a narrow common range of questions that are only answerable through the network.” Barnes responded that there is more opportunity for start-up funding, but most projects do not have the core funding that makes that possible. Researchers use the data largely because they are available for free, yet society and public health really benefit from data sharing, she said. A participant commented that the significant operational costs of the core elements should decrease over time as efficiencies are gained, but that is not what he sees with funding requests. There may be a tension between maintenance of an established operation and continuing to grow, he posited.
Training the next generation of researchers to sustain research such as linkage to use of administrative data is a challenge, observed a participant. It is more attractive for young researchers to develop their own cohort studies, collect their own data, and publish an original paper, and asked how to change that mindset. Zaba commented that ALPHA’s capacity building and training addresses what academic training cannot because it extends beyond theory and focuses on how to approach an analytical solution to a specific problem. She expressed optimism that the people trained will be able to continue the type of analysis they were taught, if funding is available.
Barnes connected the solution to the benefits possible from sharing data that are not possible through individual studies. “I think that excitement of what more you can do without having to start studies from scratch . . . will sustain the new paradigm shift,” she observed.