better, he said, is to arrange upfront for the full individual-level data to be available. Issues such as coding, cleaning, and logical queries take some time to resolve, but their difficulty is probably overrated. Post-hoc efforts to standardize data can also prove challenging and costly and may limit the usefulness of such data (see Chapter 5). “If we wait to see what we can do after the fact, it is very difficult.” He was involved in one study in which the investigators sought to repeat the analyses of microarray expression studies published in Nature Genetics using the datasets deposited with the papers (Ioannidis et al., 2009). Four independent teams of microarray analysts could reproduce only 2 of the 18 tables and figures from the papers. Much of the time, key information was not available, despite a precondition to publication in the journal that data be made available to independent investigators.
Some fields have adopted strong principles of data sharing. One of the best examples is the field of human genomics, which has principles on how to share information among all investigators working in the same area and, in some cases, with other investigators and the public. Without those standards in places, said Ioannidis, the “fantastic growth” in the field of human genomics would have occurred much more slowly. “Clinical trials could learn from such paradigms and try to adopt them.”