FIGURE 4-1 Information loss as clinical trials data progresses from raw uncoded data to summary data.
SOURCE: Zarin, 2012. Presentation at IOM Workshop on Sharing Clinical Research Data.

influence the reproducibility of results. The users of summary data generally assume that they reflect the underlying participant-level data, with little room for subjectivity. That assumption is not always correct, said Zarin.

The results database at was launched in response to the Food and Drug Administration Amendments Act of 2007 and was based on statutory language and other relevant reporting standards. It requires that the sponsors or investigators of trials report the “minimum dataset,” which is the dataset specified in the trial protocol in the registry. The data are presented in a tabular format with minimal narrative. They cover participant flows, baseline patient characteristics, outcome measures, and adverse events. The European Medicines Agency is currently developing a similar results database.

Although has checks for logic and internal consistencies, it has no way of ensuring the accuracy of the data reported. does not dictate how data are analyzed, but does require that the reported data make sense. For example, if the participant flow had 400 people and results are presented for 700, it asks the trial organizers about the discrepancy. Similarly, time to event must be measured in a unit of time, and the mean age of patients cannot be a nonsensical number like 624. “That is the kind of review we do,” Zarin said. was established on the assumption that required data are generated routinely after a clinical trial based on the protocol for

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement