C. Cleaned and Locked Analyzable Dataset. The final cleaned and locked analyzable dataset consists of different components (participant characteristics and primary outcome, prespecified secondary and tertiary outcomes, adverse events data and exploratory data). A statistical analysis may involve a composite outcome using any of the various components. In addition, when data are missing, values may be imputed using this dataset. Results are derived from data in the cleaned and locked analyzable dataset, which have undergone statistical analysis. Analyses that were prespecified in the Statistical Analysis Plan form the basis for the Clinical Study Report (CSR) (a detailed analysis of the study efficacy data and the complete adverse event data). The CSR and the supporting cleaned dataset are available to regulators (e.g., the Food and Drug Administration, the European Medicines Agency) and to other data users as appropriate (e.g., ministries of health). Journal articles generally represent slices of the data that make a coherent intellectual whole. For example, the “lead article” usually describes the data on the primary efficacy outcomes, key secondary outcomes, and the relevant adverse event data. Subsequent articles often focus on different aspects of the secondary, tertiary, or exploratory outcomes. Investigators can also use parts of the analyzable dataset to prepare analyses for presentations, for data exploration, and for hypothesis generation. A biostatistics best practice is to freeze a copy of whatever data were used in an analysis so the results can later be repeated if necessary. It would also be desirable to store the code used in the analysis (i.e., the computer program), especially for any derived data.

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement