Terms such as “participant-level data,” “individual patient data,” and “raw data” are not well defined, noted Elizabeth Loder of the BMJ. A mutual understanding of the way these data are generated and shared can help alleviate ambiguities in nomenclature. In a typical multicenter clinical trial, data originate with case report forms, which can be handwritten or electronic. Study monitors audit the data, either at individual sites or electronically, to ensure accuracy. When a form contains an entry that is difficult to interpret or obviously mistaken, the monitors send a query back to the investigator or study staff to resolve the problem. Each query has to be explained and resolved before the data are entered into the coordinating center database (Kirwan et al., 2008). At several points in this process, a portion of the data is coded or categorized, and additional checks are performed to make sure the data entry is correct. Sometimes in the process of data entry, additional queries about the data are generated that must be addressed by the original investigator and the study staff.
The term “participant-level data” generally refers to the de-identified records of individual patients generated through this process. De-identification is the process by which personal information that can be used to identify an individual is removed. However, even participant-level data may not capture all relevant information recorded in the raw dataset. For example, Loder described several challenges involved in coding adverse events. Misclassification of adverse events in clinical trials can have serious consequences—as when adverse events like suicidal behavior are coded only as emotional liability—so systems have evolved to minimize this possibility. Adverse events usually are categorized using a predefined hierarchy or organizational system. But the symptoms reported by patients do not necessarily fall into this hierarchy or system. As a result, such symptoms can be interpreted in different ways. Because of this ambiguity, some have argued for access to raw data as reported by patients or researchers on the case report forms before any coding has taken place (Gøtzsche, 2011).
stored in computerized databases to the summary data made available through journals and registries like ClinicalTrials.gov. Data sharing can also occur at many levels. Several of the presenters at the workshop de-