Key Messages Identified by Individual Speakers
• Registration of clinical trials and summary trial results has been a major step forward, but ambiguous protocols and discrepancies between protocols and results raise concerns about the integrity of clinical research data.
• Greater transparency of study protocols and amendments, statistical analysis plans, informed consent forms, clinical study reports, and adverse event reports would both improve clinical trials and facilitate sharing of trial results.
• The de-identification process can be complicated and expensive when studies are not designed with data sharing in mind.
• Collaborations need to be clear about common goals, realize the unique value each party brings to the effort, and strive for open inclusiveness.
• Companies can be fierce competitors, but still cooperate on precompetitive research to meet common needs.
• If patients provide information for a research project, they should receive information in return that can help them make meaningful health care decisions.
• Treating patients as partners in research would acknowledge their expertise in managing and understanding their conditions.
Clinical trial data are a public good, but many stakeholders in addition to the public have interests in those data, observed Jeffrey Nye, Janssen Research & Development, in his introduction to the session on models of data sharing. Participants in a trial have interests in the information a trial generates, as do the researchers conducting a trial. Pharmaceutical companies are another stakeholder, along with researchers from either the private or public sectors doing reanalyses or meta-analyses of study data. Regulators have the objective of safeguarding public health and guiding and advising companies as they develop new products, while citizen scientists may be studying the data to derive information they can apply in their own lives.
Seven speakers at the workshop described different models designed to increase the sharing of clinical research data. All of these models have strengths and limitations. Although the optimal path forward is not yet clear, all of these models offer lessons that can inform future initiatives.
Three key problems interfere with the practice of evidence-based medicine, said Deborah Zarin, director of ClinicalTrials.gov at the National Library of Medicine, National Institutes of Health (NIH). Not all trials are published. Publications do not always include all of the prespecified outcome measures. Unacknowledged changes made to trial protocols can affect the interpretation of findings.
These problems led to the establishment in 2000 of ClinicalTrials.gov, which serves as a registry of clinical trials at the trials’ inception (Zarin et al., 2011). The registry now contains key protocol details of more than 130,000 trials from around the world. In 2008 the registry added a results database, which now contains the summary results of more than 7,000 trials. ClinicalTrials.gov does not accept participant-level data, Zarin emphasized, but it has considerable experience with other kinds of data generated by clinical trials.
Clinical trials data take many forms, from uncoded, participant-level data to analyzed summary data; only the latter are posted at ClinicalTrials.gov. At each step in the process leading from the raw data to the summary data, information is lost (see Figure 4-1). Also, each vertical drop involves subjective judgments that are not transparent, but can
FIGURE 4-1 Information loss as clinical trials data progresses from raw uncoded data to summary data.
SOURCE: Zarin, 2012. Presentation at IOM Workshop on Sharing Clinical Research Data.
influence the reproducibility of results. The users of summary data generally assume that they reflect the underlying participant-level data, with little room for subjectivity. That assumption is not always correct, said Zarin.
The results database at ClinicalTrials.gov was launched in response to the Food and Drug Administration Amendments Act of 2007 and was based on statutory language and other relevant reporting standards. It requires that the sponsors or investigators of trials report the “minimum dataset,” which is the dataset specified in the trial protocol in the registry. The data are presented in a tabular format with minimal narrative. They cover participant flows, baseline patient characteristics, outcome measures, and adverse events. The European Medicines Agency is currently developing a similar results database.
Although ClinicalTrials.gov has checks for logic and internal consistencies, it has no way of ensuring the accuracy of the data reported. ClinicalTrials.gov does not dictate how data are analyzed, but does require that the reported data make sense. For example, if the participant flow had 400 people and results are presented for 700, it asks the trial organizers about the discrepancy. Similarly, time to event must be measured in a unit of time, and the mean age of patients cannot be a nonsensical number like 624. “That is the kind of review we do,” Zarin said.
ClinicalTrials.gov was established on the assumption that required data are generated routinely after a clinical trial based on the protocol for
the trial, so the burden of reporting to ClinicalTrials.gov would be due mainly to data entry. Instead, the experience at ClinicalTrials.gov has shown that protocols are often vague, are not always followed, or in some cases may not even exist. In addition, summary data are not always readily available even for trials that have already been published. For many trials, no one can explain the structure of the trial or the analysis of the data, said Zarin. “What we learned is there is not an objective, easy-to-describe route from the initial participant-level data to the summary data. Many people and many judgments are involved.”
Structural changes to trials are also common. A trial can start as a two-arm study and then become a four-arm study. Participants come and go, so that the number of participants changes over time. Participant flow and baseline characteristic tables describe different populations than the outcomes table. Data providers often cannot explain the “denominators” for their results, the groups from which outcomes or adverse events are collected. Zarin described a study in which a year of close work was required with statisticians to figure out who the people in the study were and where they went as a result of structural changes to the study. “These are brilliant statisticians. They were in charge of the data. [But] this trial was basically too complicated for them to figure out. They were giving outcome measures without actually knowing what the denominators were. That is one kind of problem we have seen.”
In other cases, outcome measures were changed: a quality-of-life scale was replaced with a depression scale; 1-month data were replaced with 3-month data; the number of people with an event was replaced with time to an event; and all-cause mortality was replaced with time to relapse. Sometimes discrepancies are obvious. In one study, the mean for hours of sleep per day was listed as 823.32 hours. Another study of 14 people included data on 36 eyeballs. “As a consumer of the medical literature, these are not reassuring things,” Zarin observed.
In a study of 100 matched pairs of ClinicalTrials.gov results and publication results, 82 percent had at least one important discrepancy. The inevitable conclusion is that summary data may not always be an accurate reflection of participant-level data. Although the deposition of clinical trial protocols and summary data into registries is a huge step forward in the direction of transparency, the validity and reproducibility of summary data are called into question by such inconsistencies. “This is a big problem,” Zarin asserted.
Providing more transparency about the process of converting one type of data into another type would help inspire trust, she said. Docu-
ments that may help explain this journey include the protocol and amendments, the statistical analysis plan, informed consent forms, clinical study reports, and adverse event reports. Greater transparency would also help everyone involved with clinical trials to engage in internal quality improvements.
In contrast to the declining mortality rates for heart disease (see Box 2-2), mortality rates for cancer have dropped only slightly in recent decades, noted Charles Hugh-Jones, vice president and head of Medical Affairs North America for Sanofi Oncology. Changes in risk behaviors, an increase in screening, and new therapeutics have all contributed to this decline in cancer, “but we are not being as effective as we would like to be.” At the same time, the price of cancer treatment has skyrocketed, which is not sustainable in an era of fiscal austerity. We need to find better ways of reducing cancer mortality rates, said Hugh-Jones, and “one of the solutions of many that we need to address is data sharing.”
Data sharing in the field of oncology could lead to faster and more effective research through improved trial designs and statistical methodology, the development of secondary hypotheses and enhanced understanding of epidemiology, collaborative model development, and smaller trial sizing, said Hugh-Jones. For example, as oncology researchers divide cancers into smaller subgroups with particular molecular drivers, data increasingly need to be pooled to have the statistical power to determine the most effective treatments for each subgroup.
Hugh-Jones described an ideal data-sharing system as simple, systematic, publicly accessible, and respectful of privacy issues. DataSphere, which is an initiative of the CEO Roundtable on Cancer, is designed to achieve these objectives. The CEO Roundtable on Cancer consists of the chief executive officers (CEOs) of companies involved in cancer research and treatment who are seeking to accomplish what no single company can do alone. DataSphere will rely on the convening power of CEOs, together with support from patients and advocacy groups, to secure and provide data. Initially, it will seek to provide comparator arms, genomic data, protocols, case report forms, and data descriptors from industry and academia. DataSphere will include data from both positive and negative studies because a negative study is often as revealing from an epidemiological point of view as a positive study. De-identification
will be standardized, and DataSphere will then work with third-party data aggregators to pool the data in meaningful ways—a significant challenge when hundreds of cancer drugs are being developed at any given time and thousands of studies are registered in ClinicalTrials.gov.
At the outset, said Hugh-Jones, the originators of DataSphere asked three questions. Why would people want to share their data? If I wanted to share my data, how would I do it? Finally, where would I put it once it was ready to post? DataSphere has established incentives for data contributors that call attention to the increased productivity, cost savings, citations, and collaboration that can accompany sharing. It also is looking at micro-attribution software that could extend credit for sharing to the contributors of data. Similarly, incentives for patients emphasize the benefits of making data available and the security precautions that have been taken. It has even been looking into the possibility of competitions among researchers to enhance the sharing of data.
Tools to enable sharing, continued Hugh-Jones, include a standard de-identification system being developed in collaboration with Vanderbilt University that is consistent with Health Insurance Portability and Accountability Act (HIPAA) regulations, a single online data use agreement form, how-to guides for de-identification, and tools for advocacy. Finally, it has been working closely with the database company SAS to produce a simple but secure, powerful, and scalable website where everything needed to share data is automated.
Sanofi is contributing de-identified data from two recent Phase III clinical studies to start the ball rolling. The goal, said Hugh-Jones, is to have at least 30 high-quality datasets in the database by the end of 2013 and then expand beyond that. “With the sort of environment we have demonstrated here, this is something that can be successful.”
One paradigm for facilitating dissemination of industry data and ensuring high-quality independent review of the evidence for efficacy is exemplified by the Yale-Medtronic experience, as described by Richard Kuntz, senior vice president and chief scientific, clinical, and regulatory officer of Medtronic, Inc., where proprietary data were released to an external coordinating organization that contracted other organizations to perform systematic reviews of the study results.
In 2002, according to Kuntz, the Food and Drug Administration (FDA) approved a product from Medtronic called INFUSE, which was designed to accelerate bone growth in cases of anterolateral lumbar interbody fusion. Approval was based on one pilot randomized controlled study and two pivotal randomized controlled studies. A series of subsequent peer-reviewed publications supported by Medtronic provided additional data on the use of the product.
In June 2011, Kuntz continued, a major challenge was raised regarding the validity of all the published literature on INFUSE. The principal focus was on the results presented in the peer-reviewed literature and on general study designs and endpoints. The challenge was published in a dedicated issue of a medical journal and consisted of more than 10 articles. The company quickly reviewed its data to ensure that the dossiers it had were accurate. “We are convinced that the data were good, and talked to the FDA immediately to make sure that they felt the same.” However, the issue was being discussed extensively in the media. “We had to make some quick decisions,” said Kuntz.
Within less than a month, Kuntz said, the company announced its decision to contract with Yale University as an independent review coordinator. In August, Yale announced its plan to establish an independent steering committee and contract with two systematic review organizations to carry out reviews of the research. Medtronic agreed to supply Yale with all de-identified patient-level data, including non-label studies, along with all FDA correspondence and adverse event reports. It also agreed to allow Yale to establish a public transparency policy and process for the entire INFUSE patient-level dataset. The publication of the systematic reviews was scheduled for the fall and winter of 2012, with summary manuscripts prepared and submitted for publication in the Annals of Internal Medicine at the time of the workshop.
The project has been undertaken by the Yale University Open Data Access (YODA) project, which, according to Kuntz, serves as a model for the dissemination and independent analysis of clinical trial program data. This project is based on the rationale that a substantial number of clinical trials are conducted but never published, and even among published clinical trials, only a limited portion of the collected data is available. As a result, patients and physicians often make treatment decisions with access to only a fraction of the relevant clinical research data. Clinical trials are conducted with both public and private funding, but several issues are particularly important among industry trials. Industry funds the majority of clinical trial research on drugs, devices, and other products,
both premarket and postmarket. Also, industrial research is proprietary, with no requirement for publication or dissemination, and the public perception is that industry has a financial interest in promoting “supportive” research and not publishing the rest of the data.
The YODA project has been designed to promote wider access to clinical trial program data, increase transparency, protect against industry influence, and accelerate the generation of new knowledge. The public has a compelling interest in having the entirety of the data available for independent analysis, but industry has legitimate concerns about the release of data, Kuntz said. Steps therefore are needed to align the interests of industry and the public, particularly when concerns about safety or effectiveness arise.
Yale and Medtronic spent a year working through issues involved in assembling the data and giving those data in the most unbiased way possible to reviewers so they could do a full systematic review. To maintain transparency and independence, formal documentation of communications between Yale and Medtronic was necessary along with clarity about what kinds of discussions could and could not be held. For example, Kuntz said, Medtronic did not want to send Yale previous reviews or interpretations of the data done by outside groups because the company did not want to taint the information. The query process among the reviewers, Yale, and Medtronic also had to be carefully managed.
The de-identification process was complicated and expensive. De-identifying the necessary HIPAA fields and information took several months and the efforts of about 25 people, which contributed substantially to the overall $2.5 million cost of the project. The HIPAA Privacy Rule was not designed for this kind of activity, Kuntz observed. As a result, the YODA project’s approach to de-identification was a “Rube Goldberg contraption” and clearly not scalable. Given that paper case report forms and studies going back to 1997 had to be reviewed, the project was “an outlier example of how complicated it would be to de-identify [data].”
Industry has several reasons for participating in this kind of process, according to Kuntz. It allows fair and objective assessment of product research data, as opposed to speculative analysis based on incomplete data. It supports competition on the basis of science rather than marketing. It promotes transparency and advances patient care. Although committed to transparency, Medtronic was concerned about potential misuses of the data. For example, is everyone seeking access to the data interested in the truth? Litigant firms may be interested in making money, “but
litigant firms also can find the truth,” said Kuntz. In the end, Medtronic sought to provide the data and initiate conversations about its use.
However, Kuntz raised a large number of questions that the Yale-Medtronic project has not fully answered:
• Would it be possible for an independent group to determine whether a question requiring the use of data serves the public interest or a special interest?
• Should queries be limited to single questions, and should the methods used to answer the questions be prespecified?
• Should there be an initial time period during which data remain proprietary?
• What portion and level of the dataset are necessary?
• Should there be a time limit or license for data access?
• Who controls the data distribution?
• Are there a priori questions and hypotheses to be tested, or is there an interest in data exploration?
• Is the requester competent to do the proposed analysis?
• Should a trusted third-party analysis center be contracted? May the requester share the data with others?
• Should there be controls on the dissemination of results, such as a requirement for peer review before dissemination?
• What methodological review is required?
• Should industry be involved in the peer review of results derived from its data?
All of these questions need better answers than exist today, said Kuntz. Nevertheless, the bottom line is that industry has a responsibility to do studies with regulatory agencies to produce results in a faithful and trusted way and to disseminate them under the law. It needs to competently and ethically contract or execute the required clinical studies and perform timely filing of the data and results dossier. Industry makes products that “we sell to people,” said Kuntz. “We are responsible for the health of those individuals.”
The movement from keeping data concealed to sharing data will require foundational changes, Kuntz concluded. One important step will be involving patients as partners rather than “subjects,” which will help lower at least some of the barriers to the use of data.
The Biomarkers Consortium of the Foundation for the National Institutes of Health (FNIH) is a precompetitive collaboration designed to increase the efficiency of biomarkers-related research. Its goals are to facilitate the development and validation of new biomarkers; help qualify these biomarkers for specific applications in diagnosing disease, predicting therapeutic response, or improving clinical practice; generate information useful to inform regulatory decision making; and make Consortium project results broadly available to the entire scientific community.
John Wagner, vice president for clinical pharmacology at Merck & Co., Inc., described the validation of adiponectin as a biomarker as an example of the work of the Consortium. Adiponectin is a protein biomarker discovered in the 1990s that is associated with obesity and insulin sensitivity. Certain drugs can drive up adiponectin levels very quickly in healthy volunteers and in patients, and attention was focused on the use of adiponectin as a predictive biomarker to identify patients who would or would not respond to particular therapies.
Though considerable data about adiponectin existed in the files of companies and academic laboratories, relatively few data about the use of adiponectin as a biomarker were publicly available. The Biomarkers Consortium took on the task of compiling these data as a proof-of-concept project for the collaboration. A number of companies agreed to combine their data into a blind dataset derived from many trials involving more than 2,000 patients. Using these data, the consortium concluded that adiponectin is a robust predictor of glycemic response to peroxisome proliferator–activated receptor agonist drugs used in the treatment of diabetes. The results confirmed previous findings and investigators concluded that “the potential utility of adiponectin across the spectrum of glucose tolerance was well demonstrated” (Wagner et al., 2009).
Wagner drew several important lessons from this experience. The project demonstrated that cross-company collaboration was a robust and feasible method for doing this kind of research. However, the project took a relatively long time to complete, which is a real problem, according to Wagner. The Consortium has since learned how to collaborate more efficiently, but time remains a concern. The pace was set based on the amount of time team members had to dedicate to this project. The Consortium was not the first priority of everyone involved in the project. “It was the evening job for many people, myself included.” Good project
management skills have helped to address this problem, as has the development of new collaboration tools.
The Consortium struggled with data-sharing principles and standards, Wagner admitted. Negotiating a data-sharing plan with even a small number of companies was challenging and having a single legal liaison for each of the companies was found to be critical. Standard definitions were not all obvious. In some cases, people would fail to pass on crucial information before leaving for another position. However, in the end the project created a template for the Biomarkers Consortium for data-sharing plans, which should speed the work in subsequent projects. Also, FDA currently has an initiative to require uniform data submissions using standardized data fields, which would result in data that are much more amenable for sharing, Wagner observed. Furthermore, health care reform is also expected to harmonize data practices, in part to reduce costs and improve care.
The existing data had many limitations, Wagner indicated. The original studies were not designed to answer the research question investigated by the Consortium. The adiponectin data also had limitations because different companies used different assays to measure the protein, which required more work to ensure that the data could be combined reliably.
Broader issues also arose. The clarity of the research question is very important for defining the type of collaboration. The existence of a neutral convener—in this case the FNIH—was critical in gaining the trust of all the stakeholders involved in the project. Still, motivations were an issue. Depending on the question being asked, the openness of the contribution and of the output can change. In the case of the Biomarkers Consortium, the output is completely open, which is a good model for generating new knowledge. The nature of the collaboration also depends on whether it is developing standards and tools, aggregating data, creating new knowledge, or developing a product, Wagner said. Collaborations depend on trust and openness. Being clear about common goals, realizing the unique value each party brings to the effort, and striving for open inclusiveness can greatly improve collaborations.
NEWMEDS, which is a project sponsored by the European Union, stands for Novel Methods for Development of Drugs in Depression and Schizophrenia. As discussed by Jonathan Rabinowitz, academic lead of
NEWMEDS at Bar Ilan University, the NEWMEDS consortium was established to facilitate sharing of clinical trials data—in particular, coded participant-level data—from industry and academia to examine research questions in the precompetitive domain. According to Rabinowitz, the schizophrenia database, which includes data from AstraZeneca, Eli Lilly, Janssen, Lundbeck, and Pfizer, encompasses 64 industry-sponsored studies representing more than 25,000 patients, along with studies sponsored by the National Institute of Mental Health and the European Union. The depression database, with data from several of the same companies, includes 26 placebo-controlled, industry-sponsored studies covering more than 8,000 patients.
Rabinowitz went on to describe some of the major findings and lessons learned from the schizophrenia database. When looking at patient response, analysis of the database revealed that results at 4 weeks were nearly the same as at 6 weeks, implying that studies could be shorter. Females show more pronounced differentiation between placebo and active treatment than males. Thus, the inclusion of more females in studies, previously underrepresented, could show heightened differences from placebo. Patients with a later onset of disease showed more pronounced improvements, irrespective of their allocation to active treatment or placebo groups, but differentiation from placebo was not affected by age of onset. For unknown reasons, the active-placebo differentiation varies by geographical region, with considerably more differentiation in Eastern Europe than in North America. All of this information, which is useful in its own right, can be used to design more effective and efficient clinical trials with smaller treatment groups and shorter study durations, Rabinowitz stated, which together could significantly reduce costs of drug discovery trials.
Rabinowitz described some of the lessons learned from his personal experiences with the Consortium. Just locating the data was a challenge. It might sound mundane, but it can be very complex, he said. For example, companies are bought and sold, and products are exchanged among companies. “To locate who houses data [required] almost the work of a detective.” Also, competing internal resources and priorities mean that data sharing is not necessarily the top priority. Compared with the YODA project’s experience, de-identification was much less expensive and time consuming, said Rabinowitz, requiring about 2 weeks of programming time. In the context of the amounts spent on clinical trials and the potential markets for new products, though, even rather expensive de-identification projects can be justified. The formulation of research ques-
tions and interpretation of data also need to be the result of active collaboration so that understandings are shared as well as data.
Rabinowitz talked about the increasing difficulties of drug discovery as incentive for companies to collaborate through precompetitive challenges. These companies can be fierce competitors elsewhere, but they have common needs. Companies also need to send a clear message of support for collaboration to overcome various kinds of resistance, with ongoing support from the top levels of management. Previous relationships can be very helpful because they help foster the trust that companies need to provide data to a collaborative effort. Peer pressure among companies aided data sharing, in that “if one company [provided] all their data, the others wanted to follow suit. They did not want to feel inferior in terms of their performance.”
A paradigm shift is occurring that redefines data sharing as an “ethical imperative,” Rabinowitz concluded. Studies should be given extra credit if they are willing to share data. This could be taken into account by institutional review boards (IRBs), for instance, in judging the ethical validity of a study. “Allow yourselves to imagine what you might do in some therapeutic area that is near and dear to you if you had access to almost all of the data out there in your given area,” he said. “Just think about that for a second.”
PatientsLikeMe is a health information–sharing website for patients where they can form peer-to-peer relationships, establish profiles, provide and share health data, and make de-identified data available for research. Sally Okun, health data integrity manager at PatientsLikeMe, described some of the lessons learned from the website during its 7 years of operation.
A prominent mandate of the site is “give something, get something.” If patients provide information for a research project, they should receive information in return that can help them make meaningful decisions, said Okun.
Another motto is “patients first.” In a data-sharing environment, the interests of the patients need to come first, Okun said. “They have a lot more skin in this game than any of us in this room do…. They have the expertise in managing [their conditions] that as clinicians and as researchers we could never have.”
That observation leads to a third mandate: Listen well. Patients want to share their information. When patients were asked in a recent survey whether their health data should be used to help improve the care of future patients who have the same condition, 89 percent agreed (Alston et al., 2012). Yet, when they were asked whether they thought their data were being shared, the majority said they either did not know or did not think so. “We have a huge gap between what patients are telling us they want and what they perceive us to be doing.”
The data patients provide involve intimate parts of their daily lives. These patients are not simply human subjects, said Okun; they are actually members of the research team. “I would change our paradigm completely and start thinking of patients as patient researchers or citizen researchers.” Okun quoted a recent blog post to the effect that patient engagement is the blockbuster drug of the century. If this is true, she added, and if this “drug” is not currently being used, the research community is essentially engaged in malpractice.
“The system is never going to be perfect,” she said. But the biomedical research system has evolved to the point that all stakeholders can be involved in decisions. “Without patients, we would have no research. Let’s start thinking about how we can best honor them, respect them, and allow them to develop the trust that they need to participate with us.”
An alternative to widespread data sharing was described by Richard Platt, professor and chair in the Department of Population Medicine, Harvard Medical School, and executive director of Harvard Pilgrim Health Care Institute. Platt proposed that sharing information derived from the data while minimizing the sharing of data themselves nullifies some of the barriers discussed previously (Chapter 3). He went on to describe the Query Health Initiative, a system for sharing clinical information that has been promulgated by the Office of the National Coordinator for Health Information Technology. It uses the approach of sending the question to the data rather than bringing the data to the question. The question, in this case, is an executable program sent from the originator to the holder of data. The program then operates on a remote dataset and returns the answer to the sender.
An alternative approach based on the same idea, Platt indicated, is to let a user log onto a remote system and do the analyses. The user needs to be able to access the system through a firewall, which many organizations are hesitant to permit. Other protections can be built into the system as well, such as a mechanism for determining whether the research has oversight by an IRB. A steering committee or IRB could be involved in reviewing and approving queries. Network management could provide for auditing, authentication, authorization, scheduling, permissions, and other functions. Local controls at the source of the data could monitor what kind of question is being asked, who is asking the question, and whether the question is worth answering.
A logical extension of such a system would be a multisite system in which research data from several different organizations are behind several different firewalls (see Figure 4-2). According to Platt, a single question could be distributed to multiple sites and the responses compiled to produce an answer. Source data, such as information from electronic health records, could flow into research systems through firewalls. The result would be a system in which remote investigators can gain the information they need to answer a question while data are protected.
Platt described a system developed by his group that implements this concept. The system, called Mini-Sentinel, is being used by FDA to do postmarket medical product safety surveillance. It has a distributed database with data on more than 125 million people, 3 billion instances of drug
FIGURE 4-2 Distributed networks can facilitate working remotely with research datasets derived from routinely collected electronic health information, often eliminating the need to transfer sensitive data.
SOURCE: Platt, 2012. Presentation at IOM Workshop on Sharing Clinical Research Data.
dispensing, and 2.4 billion unique patient encounters, including 40 million acute inpatient stays. Each of the 17 data partners involved in the project uses a common data format so that remote programs can operate on the data. Data checks ensure that the data are correct. Data partners have the option of stopping and reviewing the queries that arrive before the code is executed. They also can stop and inspect every result before it is returned to the coordinating center. The amount of patient-level data that is transferred is minimized, with most of the analysis of patient-level data done behind the firewall of the organization that has the data. “Our goal is not to never share data. Our goal is to share as little data as possible.” The analysis dataset is usually a small fraction of all the data that exist, and the data can usually be de-identified.
As an example of the kinds of projects that can be done using this system, Platt described a study looking at comparative risks of angioedema related to treatment with drugs targeting the renin-angiotensin-aldosterone system. The results of the study had not yet been released at the time of the workshop, but Platt concluded from the experience that data from millions of people could be accessed to do the study without sharing any patient-level data. Yet, from the perspective of the investigators, “essentially everything that was interesting in those datasets that could answer this question was accessible and was used to address the questions of interest.”
Using such a system, it would be possible to address a large fraction of the questions thought to require data sharing by instead sharing programs among organizations that are prepared to collaborate on distributed analyses, Platt insisted. Organizations also could participate in multiple networks, further expanding the uses of the data they hold. At the same time, every network could control its own access and governance.
Today, only FDA can submit questions to Mini-Sentinel, but FDA believes it should be a national resource and is working on ways to make it accessible to others. Toward that end, the week before the workshop, the NIH announced the creation of the Health Care Systems Research Collaborative, which will develop a distributed research network with the capability of communicating with the Mini-Sentinel distributed dataset. Such systems, by sharing information rather than data, could make progress faster than waiting for all the issues surrounding data sharing to be resolved, said Platt.