quantities of complex data being passed among organizations, and standardization can help overcome some of challenges posed by the “3 Vs of big data”—variety, volume, and the velocity at which data are needed. Outside the United States, de Crescenzo has seen great progress in implementing requirements for the use of standards in national-level research projects and electronic health record (EHR) systems. These efforts over the past decade have yielded many lessons to be learned.
Cautions on Standardization
While acknowledging the value of data standards, Vicki Seyfert-Margolis, senior advisor for science innovation and policy at the Food and Drug Administration’s (FDA’s) Office of the Chief Scientist, brought up some points researchers should remember when thinking about standardizing data. “Standardization does not ensure quality,” she said. If not done well, conversion to a standard format has the potential to adversely affect data quality and analysis. For example, standardized formats for indicating patient race can still lead to inaccurate information if the categories used in questionnaires do not adequately capture the complexity of a person’s racial identity. It can also result in loss of traceability from the source. Standardization does not imply that data are fit for purpose either, she warned. Standardized data may or may not answer the questions of interest and may or may not be useful for future analysis. It may not be possible to predefine all standards, and not all data must be standardized. FDA is working to identify minimum sets of data points that must be standardized for analysis. The effort devoted to standards needs to be weighed against these other considerations, she said, to determine how much time and money to invest in standardization, especially given that the data gathered will never be perfect.
Standards solve some problems, Seyfert-Margolis said, but they do not solve problems with data quality, disease definition, basic understanding, or data analysis. She emphasized the importance of defining diseases and having a clear understanding of clinical phenotypes as part of the standardization process. Especially as genomics begins to play a larger role in medicine, a taxonomy of disease will be needed to define patient subpopulations, “because we know not every type 2 diabetes patient is the same, yet we call them all that.” In that respect, case report forms should not treat all patients identically because patient characteristics need to be probed carefully to clarify patient populations.