be some administrative or conduct process, wherein the discontinuation of treatment, or the failure to gather data, has nothing to do with a subject’s clinical course. Under this scenario, complete case analysis is unbiased, as complete cases constitute a representative sample of the study population. However, complete case analysis is inefficient in that it does not make use of the interim information from subjects without final outcome data. Interestingly, even in this situation where completers represent a completely random representative sample, LOCF is generally biased, because of its assumption that disease severity remains unchanged from its last recorded value (Molenberghs, 2004).

The second kind of missing data (MAR) occurs when data are missing at random if, conditional upon the independent variables in the analytic model, the missingness depends on the observed values of the outcome being analyzed (Yobs) but does not depend on the unobserved values of the outcome being analyzed (Ymiss). It is thus similar to MCAR, except that a subject’s observed disease severity affects the likelihood of subsequent dropout. It assumes that the average future behavior of all individuals with the same characteristics and clinical course up to a given time will be the same, regardless of whether their outcome data is missing after that time. The best approach to this kind of missing data involves forms of data imputation or modeling that take into account all the observed data up to the point where it is missing. These techniques include mixed model repeated measurement (MMRM) and multiple imputation, random regression or hierarchal regression models (Molenberghs et al., 2004; Schafer and Graham, 2002). Both complete case and LOCF perform suboptimally in this situation, the former because it doesn’t use the information from patients with incomplete data at all, and LOCF because it does not utilize that information properly.

Finally, data that are missing “not at random” (MNAR) is data whose value is not predictable from the observed data of other patients that completed the trial and from the data on the patient in question up until the point of dropout. An example of this is a patient who drops out due to an unrecorded relapse after apparently doing well, or a patient who drops out because of side effects, whose tolerance might be reduced when their PTSD is worse. Because missingness of the data is related to the value of the unobserved data, this kind of data is called “informatively” or “nonignorably” missing. This condition by definition cannot be ascertained from the observed data, yet most missing data methods take as their assumption that it does not exist. The higher proportion of outcome data that are missing, the more the validity of any analysis rests on this unverifiable assumption, and the less reliable the results from any method. It can be dealt with only via sensitivity analysis, or better, by learning something about the reasons for the dropouts using information external to the data in hand. If the data allows, studying the characteristics and intermediate outcomes of patients

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement