Randomized clinical trials are the primary tool for evaluating new medical interventions. More than $7 billion is spent every year in evaluating drugs, devices, and biologics. Randomization provides for a fair comparison between treatment and control groups, balancing out, on average, distributions of known and unknown factors among the participants. Unfortunately, a substantial percentage of the measurements of the outcome or outcomes of interest is often missing. This “missingness” reduces the benefit provided by the randomization and introduces potential biases in the comparison of the treatment groups.
In light of this problem, the Panel on the Handling of Missing Data in Clinical Trials was created at the request of the U.S. Food and Drug Administration (FDA) to prepare “a report with recommendations that would be useful for FDA’s development of a guidance for clinical trials on appropriate study designs and follow-up methods to reduce missing data and appropriate statistical methods to address missing data for analysis of results.”
The panel’s work focused primarily on Phase III confirmatory clinical trials that are the basis for the approval of drugs and devices. For these trials, the bar of scientific rigor is set high; however, many of our recommendations are applicable to all randomized trials.
Missing data can arise for a variety of reasons, including the inability or unwillingness of participants to meet appointments for evaluation. And in some studies, some or all of data collection ceases when participants
This Summary contains the highest priority recommendations. For a complete list of recommendations in the report, see Chapter 6.
discontinue study treatment. Existing guidelines for the design and conduct of clinical trials, and the analysis of the resulting data, provide only limited advice on how to handle missing data. Thus, approaches to the analysis of data with an appreciable amount of missing values tend to be ad hoc and variable.
The panel concludes that a more principled approach to design and analysis in the presence of missing data is both needed and possible. Such an approach needs to focus on two critical elements: (1) careful design and conduct to limit the amount and impact of missing data, and (2) analysis that makes full use of information on all randomized participants and is based on careful attention to the assumptions about the nature of the missing data underlying estimates of treatment effects. In addition to the highest priority recommendations here, in the body of the report the panel offers additional recommendations on the conduct of clinical trials and techniques for analysis of trial data.
Modern statistical analysis tools—such as maximum likelihood, multiple imputation, Bayesian methods, and methods based on generalized estimating equations—can reduce the potential bias arising from missing data by making principled use of auxiliary information available for nonrespondents. The panel encourages increased use of these methods. However, all of these methods ultimately rely on untestable assumptions concerning the factors leading to the missing values and how they relate to the study outcomes. Therefore, the assumptions underlying these methods need to be clearly communicated to medical experts so that they can assess their validity. Sensitivity analyses are also important to assess the degree to which the treatment effects rely on the assumptions used.
There is no “foolproof” way to analyze data subject to substantial amounts of missing data; that is, no method recovers the robustness and unbiasedness of estimates derived from randomized allocation of treatments. Hence, the panel’s first set of recommendations emphasizes the role of design and trial conduct to limit the amount and impact of missing data.
A requisite for consideration of trial design is to clearly define the target population, and the outcomes that will form the basis for decisions about efficacy and safety. The treatment of missing data depends on how these outcomes are defined, and lack of clarity in their definition translates into a lack of clarity in how to deal with missing data issues. In addition, given the difficulties of adequately addressing missing data at the analysis stage, the design process needs to pay more attention to the potential hazards arising from substantial numbers of missing values.
Recommendation 2: Investigators, sponsors, and regulators should design clinical trials consistent with the goal of maximizing the number of participants who are maintained on the protocol-specified intervention until the outcome data are collected.
A major source of missing data in clinical trials occurs when participants discontinue their assigned treatment. The two most common reasons for participants’ dropping out are reactions to the treatment—it is ineffective, has unacceptable side effects, or is perceived as having worked—or moving to a different location where the treatment is not available. We call these “treatment dropouts,” and distinguish them from analysis dropouts, which arise when the study outcomes are not measured, and are therefore unable to be included in the data analysis.
In some trials, protocols are written so that treatment dropout leads to analysis dropout because the sponsor may see no need to record study outcomes after a participant deviates from the study protocol. This approach can seriously undermine any inferences that can be drawn about effects. The panel concludes that it is important not only to obtain information about dropouts to the extent possible, but also to anticipate and plan for missing data in trial protocols.
Recommendation 3: Trial sponsors should continue to collect information on key outcomes on participants who discontinue their protocol-specified intervention in the course of the study, except in those cases for which a compelling cost-benefit analysis argues otherwise, and this information should be recorded and used in the analysis.
The techniques we suggest to limit the amount of missing data include
choices of study sites, investigators, participants, study outcomes, time in study and times of measurement, and the nature and frequency of follow-up to limit the amount of missing data;
the use of rescue therapies or alternative treatment regimens, to allow meaningful analysis of individuals who discontinue the assigned treatment;
limiting participant burden in other ways, such as making followup visits easy in terms of travel and child care;
providing frequent reminders of study visits;
training of investigators on the importance of avoiding missing data;
providing incentives to investigators and participants to limit dropouts; and
monitoring of adherence and in other ways dealing with participants who cannot tolerate or do not adequately respond to treatment.
Recommendation 6: Study sponsors should explicitly anticipate potential problems of missing data. In particular, the trial protocol should contain a section that addresses missing data issues, including the anticipated amount of missing data, and steps taken in trial design and trial conduct to monitor and limit the impact of missing data.
Despite efforts to minimize missing data in the design and conduct of clinical trials, the statistical analysis often has to deal with a non-trivial amount of missing data. There is no single correct method for handling missing data. All methods require untestable assumptions because the missingness mechanism involves assumptions about the relationships among variables with missing values and results often vary depending on the assumptions made about these relationships. Crucially, the validity of these assumptions cannot generally be determined from the collected data. Consequently, the critical need is to understand the assumptions associated with any particular analysis, and those assumptions need to be expressed in as transparent a manner as possible so that researchers and practicing clinicians are able to assess their validity in any given setting.
Recommendation 9: Statistical methods for handling missing data should be specified by clinical trial sponsors in study protocols, and their associated assumptions stated in a way that can be understood by clinicians.
The panel believes that in nearly all cases, there are better alternatives to last observation carried forward and baseline observation carried forward imputation, which are based on more reasonable assumptions and hence result in more reliable inferences about treatment effects.
Recommendation 10: Single imputation methods like last observation carried forward and baseline observation carried forward should not be used as the primary approach to the treatment of missing data unless the assumptions that underlie them are scientifically justified.
Especially when the degree of missingness is appreciable and information about the characteristics of participants with missing data is limited,
the sensitivity of the inference to reasonable departures from the assumptions of the missing data method needs to be assessed. This additional uncertainty in the regulatory environment should motivate manufacturers of drugs, devices, and biologics to pay much greater attention to the use of techniques for reducing the frequency of missing data (see Chapters 2 and 3).
Recommendation 15: Sensitivity analyses should be part of the primary reporting of findings from clinical trials. Examining sensitivity to the assumptions about the missing data mechanism should be a mandatory component of reporting.
NEW RESEARCH AND USE OF EXISTING RESEARCH
The FDA has a very large database of clinical trials that has not been tapped for its potential to inform the best practices for clinical trials. At the same time, there are a wide range of techniques that have been very thoroughly explored both theoretically and in practice over the past 20 years that are not being used in clinical trials. There seems to be a reticence on the part of analysts in both industry and the FDA to adopt those techniques. This reticence may be due in part to a lack of training and education.
Recommendation 16: The U.S. Food and Drug Administration and the National Institutes of Health should make use of their extensive clinical trial database to carry out a program of research, both internal and external, to identify common rates and causes of missing data in different domains and how different models perform in different settings. The results of such research can be used to inform future study designs and protocols.
Recommendation 17: The U.S. Food and Drug Administration (FDA) and drug, device, and biologic companies that sponsor clinical trials should carry out continued training of their analysts to keep abreast of up-to-date techniques for missing data analysis. FDA should also encourage continued training of their clinical reviewers to make them broadly familiar with missing data terminology and missing data methods.
Recommendation 18: The treatment of missing data in clinical trials, being a crucial issue, should have a higher priority for sponsors of statistical research, such as the National Institutes of Health and the National Science Foundation. There remain several important areas in which progress is particularly needed, namely: (1) methods for sensitiv-
ity analysis and principled decision making based on the results from sensitivity analyses, (2) analysis of data where the missingness pattern is nonmonotone, (3) sample size calculations in the presence of missing data, (4) design of clinical trials, in particular plans for follow-up after treatment discontinuation (degree of sampling, how many attempts are made, etc.), and (5) doable robust methods, to more clearly understand their strengths and vulnerabilities in practical settings. The development of software that supports coherent missing data analyses is also a high priority.