distributed data environment also supports data accuracy, timeliness, flexibility, and sustainability.

Despite their many advantages, distributed queries also face a number of data quality challenges. Complications in integrating results from several data sources due to a lack of standards were cited as an example. But, Elmore said, pathbreaking work is under way to address this problem. Difficulty in striking a balance between clinical intuitiveness and computability when expressing a query is another challenge. Moreover, once a query is formulated, the lack of semantic equivalency and standards to express clinical concepts among data systems must be addressed. Additionally, there is no cultivated standard value set, clinicians in the same practice often code differently, and each organization has its own established value sets. Furthermore, within those value sets, data are often missing, so completeness also presents a challenge to distributed queries.

Despite the obstacles inherent to such queries, several examples, across many domains, are ongoing and have achieved great success. Platt described Mini-Sentinel, an FDA-sponsored pilot initiative that has created a distributed dataset that includes data on 126 million people at 17 data partners to support active safety surveillance of medical products. The FDA now routinely uses the system.

Platt cited an example of a query dealing with drugs for smoking cessation, addressing concern that a certain drug increased risk of negative cardiac outcomes. Within 3 days of receiving FDA’s intent to query the network, Mini-Sentinel returned its first report on the results, including information on 300 million person years of experience. While the speed and scope of the query result were impressive, Platt noted that it had several associated limitations. These included that it was intended to be a quick look, not a final answer; that the result did not exclude excess risk; and that recorded exposures may have been missing or included a misclassified indication. Moreover, the cohort may have been unrepresentative, outcomes may have been misclassified, and there was a potential for residual confounding due to disparate smoking intensities or comorbidities. Nonetheless, with the right clarification on the query itself, specifications on the cohort of interest, and selection of diagnosis codes, the network was able to rapidly query hundreds of millions of people’s worth of data without transferring any institution’s PHI.

Another query focused on a comparison of individuals who had experienced a stroke or transient ischemic attack (TIA) and previously received one of two different types of platelet antagonists. Treatment with one of the platelet antagonists was counter-indicated for individuals who had previously had a stroke or TIA; Mini-Sentinel determined that half as many individuals received the counter-indicated drug following stroke or TIA compared to those individuals receiving the comparison drug. The limitations

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement