C


Figure 1: Data Flow from Participant to Analyzed Data and Reporting

A. Raw Data. All clinical trial data originate from patients and healthy volunteers who participate in studies that are carried out according to a detailed research protocol. This protocol is approved by an institutional review board and explained to participants through the informed consent process. Depending on the study under consideration, demographics, clinical outcomes data, and other appropriate raw source information are entered into case report forms. Some data (e.g., imaging studies) are interpreted by study investigators, and these interpretations are entered into the database—a process referred to as abstraction. The data are then coded to meet the study guidelines (e.g., men may be coded as “1” and women as “0”); the coded data are then entered into the case report forms. In addition, narrative data from the case reports are also transcribed into the database. The data are then reviewed (i.e., cleaned) to be sure that entries make sense and are internally consistent. The data are then abstracted, coded, transcribed, and cleaned as appropriate

B. Cleaned Analyzable Dataset. Once the database is cleaned and all queries are resolved, the database, which consists of both individual participant data and computed/summary-level data, is then analyzable. It is called analyzable because a very large percentage is never used. The next step is to lock the database so that no further changes may be made and the data may be unblinded. However, the cleaned analyzable dataset in its unlocked condition has the potential for subsequent use, because it could be re-analyzed at later time points with the addition of data (e.g., when 1-year, 3-year, and 10-year outcomes measures are added).



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 57
C Figure 1: Data Flow from Participant to Analyzed Data and Reporting A. Raw Data. All clinical trial data originate from patients and healthy volunteers who participate in studies that are carried out according to a detailed research protocol. This protocol is approved by an institutional review board and explained to participants through the informed consent process. Depending on the study under consideration, demographics, clinical outcomes data, and other appropriate raw source information are entered into case report forms. Some data (e.g., imaging studies) are in- terpreted by study investigators, and these interpretations are entered into the database—a process referred to as abstraction. The data are then cod- ed to meet the study guidelines (e.g., men may be coded as “1” and women as “0”); the coded data are then entered into the case report forms. In addition, narrative data from the case reports are also tran- scribed into the database. The data are then reviewed (i.e., cleaned) to be sure that entries make sense and are internally consistent. The data are then abstracted, coded, transcribed, and cleaned as appropriate B. Cleaned Analyzable Dataset. Once the database is cleaned and all queries are resolved, the database, which consists of both individual par- ticipant data and computed/summary-level data, is then analyzable. It is called analyzable because a very large percentage is never used. The next step is to lock the database so that no further changes may be made and the data may be unblinded. However, the cleaned analyzable dataset in its unlocked condition has the potential for subsequent use, because it could be re-analyzed at later time points with the addition of data (e.g., when 1-year, 3-year, and 10-year outcomes measures are added). 57

OCR for page 57
58 DISCUSSION FRAMEWORK FOR CLINICAL TRIAL DATA SHARING C. Cleaned and Locked Analyzable Dataset. The final cleaned and locked analyzable dataset consists of different components (participant characteristics and primary outcome, prespecified secondary and tertiary outcomes, adverse events data and exploratory data). A statistical analy- sis may involve a composite outcome using any of the various compo- nents. In addition, when data are missing, values may be imputed using this dataset. Results are derived from data in the cleaned and locked ana- lyzable dataset, which have undergone statistical analysis. Analyses that were prespecified in the Statistical Analysis Plan form the basis for the Clinical Study Report (CSR) (a detailed analysis of the study efficacy data and the complete adverse event data). The CSR and the supporting cleaned dataset are available to regulators (e.g., the Food and Drug Ad- ministration, the European Medicines Agency) and to other data users as appropriate (e.g., ministries of health). Journal articles generally repre- sent slices of the data that make a coherent intellectual whole. For exam- ple, the “lead article” usually describes the data on the primary efficacy outcomes, key secondary outcomes, and the relevant adverse event data. Subsequent articles often focus on different aspects of the secondary, tertiary, or exploratory outcomes. Investigators can also use parts of the analyzable dataset to prepare analyses for presentations, for data explora- tion, and for hypothesis generation. A biostatistics best practice is to freeze a copy of whatever data were used in an analysis so the results can later be repeated if necessary. It would also be desirable to store the code used in the analysis (i.e., the computer program), especially for any de- rived data.