. "4 Issues and Opportunities in the Emergence of Large Health-Related Datasets." Digital Data Improvement Priorities for Continuous Learning in Health and Health Care: Workshop Summary. Washington, DC: The National Academies Press, 2012.
The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
Digital Data Improvement Priorities for Continuous Learning in Health and Health Care
The other strategy to improve the quality and utility of retrospective outcomes research is to establish operating characteristics of observational studies. Madigan argued that currently there is no good understanding of the quality and reliability of this type of research. For example, if a study arrives at a certain relative risk, how close is that relevant risk to the truth if it were to be reproduced with different data? Similarly, when studies report 95 percent confidence intervals, how close are those to the truth? Madigan noted that OMOP researchers have found that across databases, reported 95 percent confidence intervals often have only roughly 50 percent coverage. There is a need, he stressed, to study this science empirically in order to get a handle on how well it actually works and how likely the results are to be the truth.
Carol McCall posed that the principal challenge in health care today is the ability to create a deep and dynamic understanding of what works best for whom. She noted that while there are currently many areas of redesign and improvement in health care—aligning business models, transforming care models, building infrastructure—all of these changes implicitly assume that there is access to evidence and an understanding of what works for whom. The sustainability of all of these efforts demands something new: a nuanced understanding of cause and effect in health care.
According to McCall, three developments have made it possible to analyze vast amounts of data to generate actionable medical evidence. The first is Moore’s law, the doubling of computing capacity approximately every 2 years, which gave rise to big data and big-data technologies. The second is that health care data is becoming much more liquid. The third, which she noted as the lynchpin, is a revolution in mathematics, led by Judea Pearl, which has mathematized causality, opening a paradigm shift in analytics. Previously, the problem with big data was that the bigger it got, the more correlations were found. McCall stressed that correlation is a truism. Data is correlated, always higher or lower, but it always exists, and it is not the same as causation. This new calculus of causality, however, allows researchers to discover cause-effect relationships and generate evidence from big data (Pearl, 2009).
The fundamental difference of this type of approach is that it focuses on insights rather than analytics. Through these types of mathematical methods, she notes, researchers are left with causal structure rather than a static answer. This structure can be interrogated to answer a variety of important questions such as what data is needed to resolve existing uncertainty, an insight that can guide next data investments and be used to tailor research strategies. Furthermore, this type of structure allows researchers to