Trial Designs to Reduce the Frequency of Missing Data
Good design may not eliminate the problem of missing data, but . . . it can reduce it, so that the modern analytic machinery can be used to extract statistical meaning from study data. Conversely, we note that when insufficient attention is paid to missing data at the design stage, it may lead to inferential problems that are impossible to resolve in the statistical analysis phase. (Lavori et al., 2008, p. 786)
The primary benefit from randomizing clinical trial participants into treatment and control groups comes from balancing the distributions of known and unknown characteristics among these groups prior to study treatments. But baseline comparability cannot be assured by randomization when data are missing. Although there are techniques that can be applied to ameliorate the impact of missing data (see Chapters 4 and 5), avoiding or minimizing missing data is always preferred.
Any approach to statistical analysis involving missing data will involve unprovable assumptions, particularly because there is always some uncertainty about the reasons why data are missing. Consequently, the appropriate assumptions and analytic treatment—and, therefore, the appropriate inference—may be unclear. For example, in a study of weight gain, Ware (2003, pp. 2,136-2,137) writes that: “It is unfortunate, however, that so much effort must be devoted to evaluating the implications of missing observations when a seemingly simple effort to obtain study weights according to the follow-up protocol would probably have been successful with most participants. Complete evaluation of enrolled patients, irrespective of their adherence to study therapy, deserves wider recognition as an
important part of good clinical-trials practice.” Therefore, an important objective in the design and implementation of a clinical trial is to minimize missing outcome data.
As briefly discussed in Chapter 1, there are a variety of reasons for discontinuation of treatment, and for discontinuation of data collection, which we refer to as “analysis dropout,” in clinical trials. The frequency of missing data depends on the health condition under study, the nature of the interventions under consideration, the length of the trial, and the burden of the health evaluations and how much they are facilitated. Common reasons for dropout include (1) inability to tolerate the intervention, (2) lack of efficacy for the intervention, and (3) difficulty or inability to attend clinical appointments and complete medical evaluations. As noted in Chapter 1, in some trials, treatment dropout leads to analysis dropout because data collection is discontinued. In many studies, this is the major reason for missing data. Other reasons include subjects who withdraw their consent, move out of the area, or who otherwise experience changes in their lives that preclude or complicate further participation.
This chapter primarily concerns the sources of missing outcome information and how the frequency of missing outcome values can be reduced. However, missing values of covariates and other auxiliary variables that are predictive of the outcome of interest should also be reduced,1 and the techniques discussed here can be helpful for that purpose as well. There is clearly a need for more research on the specific reasons underlying missing data, a topic addressed in Chapter 3.
TRIAL OUTCOMES AND ESTIMANDS
A clinical trial typically measures outcomes that quantify the impact of the interventions under study for a defined period of time. Inference focuses on summaries of these measures (such as the mean) for the target population of interest. These summary quantities are often called parameters, or estimands. For example, consider a trial in which the primary outcome measure is change in blood pressure between baseline and 6 weeks after the initiation of treatment. The estimand of interest might be the difference in the mean change in blood pressure over 6 weeks for the target and control populations. An estimate of this parameter is the difference in sample means for participants in the treatment group and participants in the control group. This estimate is unbiased if the assignment to treatment is random, and there are no missing data (Little and Rubin, 2002). The goal
is to attribute the difference between the treatment and the control to the causal effect of the intervention.
Estimation of the primary (causal) estimand, with an appropriate estimate of uncertainty, is the main goal of a clinical trial. For example, estimates of estimands based on the measurement of symptom relief are required for the regulatory evaluation of treatments for many disorders, including mental illnesses, inflammatory bowel disease, and chronic pain. Estimands in trials of interventions for cancer and heart failure are often based on survival or disease recurrence. Estimands for trials of interventions for HIV and hypertension may use surrogate outcomes, such as CD4 counts or blood pressure measures, respectively.
The choice of estimand involves both the outcome measure and the population of interest. For instance, the population of interest might be unrestricted, or it might be restricted to people who can tolerate an intervention for a given period. The outcome measure also requires specification of a period of action of the intervention after assignment. It may be measured at one time or over a period of time and can reflect either short-term or longer-term effects. In addition, the outcome measure could be an absolute measure of change from the baseline or a percentage change.
In order to avoid confusion and clearly assess the potential for bias from missing data in a randomized clinical trial, it is important to be clear about the choice of estimand, particularly about the outcome measure and target population of interest. The estimand should be decided before the protocol for a clinical trial is final, since alternative choices of estimand may have important implications for trial design and implementation, the inferences that are made, and the amount of missing data that might be expected. For instance, an important consideration is whether to collect outcome data after a subject discontinues assigned treatment or there are other deviations from the protocol. The answer to this question depends on the choice of trial estimand.
To make this discussion more concrete, in this section we discuss some estimands. We assume for simplicity that there are just two groups, a treatment group and a control group. We discuss different trial designs for each estimand, which provide varying degrees of confidence in the trial conclusions, given the likely frequency of missing outcome values. In this discussion, “outcome” refers to the primary outcome in the trial, which might be a measure of symptoms, a surrogate outcome, or a time to an event. The “duration of protocol adherence” refers to the time after randomization for which a subject received the study intervention according to the protocol. This period may be shorter than the full study duration for a variety of reasons, including lack of tolerance, inefficacy, and study burden.
Five possible estimands are described here in the context of a symptom relief trial. A similar range of potential estimands can be identified for trials
in which the outcome is a time to event or a surrogate measure of progress. Estimands (1), (2), (4), and (5) are frequently used. Estimand (3) is rarely used, but is included to help explain the impact of the various choices of estimand on the likelihood of missing data.
(Difference in) Outcome Improvement for All Randomized Participants This estimand compares the mean outcomes for the individuals randomized to the treatment and control arms, regardless of what treatment participants actually received. Often called the “intention-to-treat” estimand, it assesses the benefits of a treatment policy or strategy relative to a control.2 Since the estimand relates to a treatment policy, the observed differences reflect the effect of the initially assigned treatment as well as subsequent treatments adopted as a result of intolerance or lack of efficacy.
A trial design that supports the use of this estimand is a parallel-group randomized trial in which outcome data are collected on all subjects, regardless of whether the study treatment is received. A trial design that does not support the use of this estimand is a parallel-group randomized trial in which outcome data are not collected on participants after they drop out or switch from the assigned treatment.
(Difference in) Outcome Improvement in Tolerators This estimand quantifies the degree of outcome improvement in subjects who tolerated and adhered to a particular treatment. This estimand concerns the subset of the population who initially began treatment and tolerated the treatment. One complication with this estimand is that it is difficult to identify the members of this subpopulation in advance in the field, and the assessed performance in a trial may therefore be an overestimate of the performance in practice.
A trial design that supports the use of this estimand is a targeted parallel-group randomized trial. An example is a design with an active treatment run-in period followed by placebo washout prior to randomization, limited to individuals who tolerated the active treatment during the run-in period. Outcome data are then collected on all randomized subjects. A trial design that does not support the use of this estimand is an untargeted parallel-group randomized trial in which outcome data are not collected on participants after they terminate or switch from the assigned treatment.
(Difference in) Outcome Improvement If All Subjects Tolerated or Adhered This estimand quantifies the degree of outcome improvement in all subjects in the trial if they had all received treatment according to the protocol for the study duration. This estimand requires an imputation of what would have been the outcome if individuals who did not comply with the protocol had complied. Given that in many settings some amount of intolerability or nonadherence is unavoidable, this estimand reflects the effects of an infeasible treatment policy.
This estimand provides insight into the magnitude of improvement in efficacy that might be achieved if one could develop therapeutic strategies that produced very high levels of adherence in real-world settings; ultimately, such therapeutic strategies would then need to be evaluated in randomized clinical trials conducted to assess their effect on the estimand “outcome improvement for all randomized participants.”
A trial design that supports the use of this estimand is a parallel-group randomized design in which all subjects are provided adjunctive or supportive therapies, assuming that such therapies are available and ensure tolerance and adherence. Outcome data are collected on all subjects. A trial design that does not support the use of this estimand is a parallel-group randomized design with no available and effective mechanism for ensuring adherence.
(Difference in) Areas Under the Outcome Curve During Adherence to Treatment This estimand compares the arm-specific means of the area under the outcome curve over the duration of protocol adherence. This estimand simultaneously quantifies the effect of treatment on both the outcome measure and the duration of tolerability or adherence in all subjects.
A trial design that supports the use of this estimand is a targeted parallel or parallel-group randomized trial. In such a trial with this estimand, there would be no need to collect outcome data after assigned treatment is discontinued or switched, other than to address secondary analysis issues, such as delayed side effects.
(Difference in) Outcome Improvement During Adherence to Treatment This estimand is the difference in mean outcomes from the beginning of the trial to the end of the trial or the end of adherence to the protocol, whichever occurs earlier. This estimand reflects both the duration of tolerability or adherence and outcome improvement in all subjects. A trial design that supports the use of this estimand is a parallel-group randomized trial. Again, in such a situation, estimating the primary estimand does not require collection of outcome data after the assigned treatment is discontinued.
Because estimands (1) outcome improvement, (4) area under the outcome curve during tolerated treatment, and (5) outcome improvement dur-
ing tolerated treatment may be influenced by both pharmacological efficacy and tolerance and adherence, they have the potential to be misinterpreted depending on whether the focus is on assessing intervention effectiveness or efficacy.
The advantage of (1) over (5) is that (1) has the alternative interpretation of the difference of two treatment policies. Estimand (5) is also problematic because it does not distinguish an immediately highly effective but extremely toxic treatment (i.e., one with no tolerability after a short period and high outcome difference over the short period of tolerability) from a nontoxic treatment with gradual outcome improvement (i.e., one with full tolerability and a difference outcome over the entire trial period of the same magnitude as that of the first treatment).
The choice of causal estimand and trial design needs to take into consideration the fact that clinical trials are often part of a larger strategy of exploring various features of an intervention prior to approval. For instance, for estimand (2) outcome improvement in tolerators we have described the benefits in limiting treatment dropout of a design with an active run-in period, followed by randomization of those who tolerated the treatment. However, in evaluating the results, it must be understood that the run-in period is part of the therapeutic strategy under study: consequently, for example, if the treatment has adverse effects during initial dosing, those risks would need to be assessed in other trials prior to approval.
In summary, the choice of outcome measure and estimand are crucial for clinical trial design and regulatory decision making. As the discussion above makes clear, there are a wide range of estimands that can be considered in a given situation, and each will involve tradeoffs between the representativeness of the population of study, the ease of study design and execution, and the sensitivity to missing data.
Recommendation 1: The trial protocol should explicitly define (a) the objective(s) of the trial; (b) the associated primary outcome or outcomes; (c) how, when, and on whom the outcome or outcomes will be measured; and (d) the measures of intervention effects, that is, the causal estimands of primary interest. These measures should be meaningful for all study participants, and estimable with minimal assumptions. Concerning the latter, the protocol should address the potential impact and treatment of missing data.
Given the resulting bias in the estimation of treatment effects caused by missing data, it is a serious concern that the actions recommended are not routine practice. However, it is our strong impression that these actions are not common, that rates of missing data remain high for a large fraction of trials, that protocols very often fail to devote attention to plans to com-
bat missing data, and that protocols are also often vague about the causal estimand. The failure to include a formal discussion of missing data in trial protocols should be viewed as a serious deficiency.
MINIMIZING DROPOUTS IN TRIAL DESIGN
In this section, we describe a number of design elements for clinical trials that can help to reduce the number of participants who drop out due to lack of tolerability, lack of efficacy, or inability to provide the required measurement. The choice requires careful consideration because in some cases it will affect the generalizability of the study, that is, the population to which study conclusions are applicable. This section covers the following design elements: use of a run-in period or enrichment; flexible doses; target population selection; “add-on” studies; reduction of follow-up periods; allowing rescue medications; defining outcomes that can be ascertained in a high proportion of participants; and determining long-term efficacy in trials with randomized withdrawal.
Use of Run-In Periods or Enrichment Before Randomization to Identify Participants Who Can Tolerate or Respond to the Study Treatment For studies in which the tolerability of treatments or adherence to study protocols is a concern, a run-in period can be used to establish short-term tolerability and adherence to the study treatment, followed by randomization of only those individuals who tolerated and adhered to therapy.
Such a design may result in a more efficient study with less missing data, but it likely will not adequately estimate the rate of adverse events in the broader population. Some clinical trials have also used a run-in period to identify participants who are likely to respond positively to the study treatment. This may also come at the cost of some external validity, reducing the ability to make estimates of the effectiveness of the treatment for the broader target population that might be given the treatment.
A related idea to run-in designs is the use of enrichment designs, which exclude participants based on initial indications that the response to the study treatment for them may be weaker or may be more difficult to tolerate. Enrichment designs have the advantage of clearly identifying the target population in advance of enrollment.3
Flexible Dose (Titration) Studies Protocols that allow flexible dosing to accommodate individual differences in tolerability allow more participants to continue on the assigned treatment by reducing the frequency of dropout
because of adverse events or inadequate efficacy. Flexible-dose protocols are sometimes viewed as conflicting with the desire to assess the effects of specific doses, but giving investigators the flexibility to increase or decrease titration (when clinicians are allowed to individualize a patient’s dosage) on the basis of a participant’s ability to tolerate a drug may in fact be more reflective of real-life applications.
Selection of Target Populations for Whom Treatment Is Indicated Participants who are doing well on their current treatments may not be good candidates for trial enrollment, in part because they are more likely to drop out because of lack of efficacy. Therefore, a good design approach is not to include participants who are receiving treatments that are proving effective in order to meet an enrollment target.
Adding the Study Treatment to an Effective Treatment Considered to Be the Standard In many cases in which drug interactions are not a concern, dropout due to lack of efficacy can be reduced through the use of “add-on” study designs. In such a design, a new treatment or placebo may be added to an optimized background regimen for study participants. These designs may decrease the likelihood of missing data due to lack of efficacy.
Reducing the Follow-Up Period Shorter follow-up periods may yield a reduction in dropouts, since fewer participants move out of the area, fewer develop intolerable adverse events, and the number and burden of clinical visits may be reduced. Modifying a trial design in this way may be preferable for assessing efficacy: essentially, it is trading off participants who respond more slowly to study treatment with participants who drop out early. Past experience of similar trials can provide guidance for evaluating this tradeoff in specific situations.
An alternative would be to define the primary outcome assessment for a shorter period of follow-up but retain a longer period of follow-up for safety and secondary outcome assessments. Use of shorter follow-up periods may be particularly useful with placebo control groups for which there are established effective active treatments. Longer follow-up periods clearly have advantages when short-term effects do not provide reliable assessments of the performance of an intervention, but the negative consequences in the form of missing data need to be recognized and taken into account.
Allow Rescue Medication in the Event of Poor Response Dropout can be reduced by allowing alternative treatments for participants who are not responding to the study treatment. If this design option is adopted, the estimand and associated outcome measurements need to be carefully defined in the protocol. For example, time to treatment failure could include the
use of alternative therapy as an indicator of treatment failure. (In doing so, however, one needs to be careful to delineate the circumstances for switching to an alternative therapy in the trial protocol to support objective conclusions.)
Define Outcomes That Can Be Ascertained in a High Proportion of Participants To avoid missing data caused by the use of outcomes that are undefined for some participants, it is useful to use primary outcomes that are ascertainable for all randomized participants.4 This may require use of composite outcomes (e.g., outcomes that incorporate death as part of the outcome or incorporate use of rescue medication or surgery for initial poor response). At the analysis stage, for ordinal or for continuous outcomes, such events might be given a worst outcome rank. However, it is not always useful to use composite outcomes to avoid the occurrence of missing data, since composite outcomes can be difficult to interpret if individual components of the composite provide contrasting evidence about the intervention or if a weaker component dominates. In addition, primary outcome measures that require substantial invasive procedures (e.g., liver biopsies) are likely to result in significant missing data, and such outcome measures should be avoided whenever possible.
Use of Randomized Withdrawal to Determine Long-Term Efficacy As noted above, trials with long-term follow-up may be more prone to missing data. In selected situations, this problem can be minimized by using randomized withdrawal designs. In such a design, all participants are initially treated with the intervention under study for a sufficiently long period to address the question of long-term efficacy. Only those who remain on and appear to have responded to therapy are then randomized for withdrawal or continuation. In cases in which loss of efficacy after withdrawal can be taken as evidence of drug efficacy, such a trial can generate long-term efficacy data.
Recommendation 2: Investigators, sponsors, and regulators should design clinical trials consistent with the goal of maximizing the number of participants who are maintained on the protocol-specified intervention until the outcome data are collected.
CONTINUING DATA COLLECTION FOR DROPOUTS
Even with careful attention to limiting missing data in the trial design, it is quite likely that some participants will not follow the protocol until the outcome data are collected. An important question is then what data to collect for participants who stop the assigned treatment. Sponsors and investigators may believe that the participants are no longer relevant to the study and so be reluctant to incur the costs of continued data collection. Yet continued data collection may inform statistical methods based on assumptions concerning the outcomes that participants might have had if they continued treatment. Continued data collection also allows exploration of whether the assigned therapy affects the efficacy of subsequent therapies (e.g., by improving the degree of tolerance to the treatment through exposure to a similar treatment, i.e., cross-resistance).
The correct decision on continued data collection depends on the selected estimand and study design. For example, if the primary estimand does not require the collection of the outcome after participants discontinue assigned treatment, such as with the estimand (4) above (area under the outcome curve during tolerated treatment), then the benefits of collecting additional outcome data after the primary outcome is reached needs to be weighed against the costs and potential drawbacks of the collection.
An additional advantage of data collection after subjects have switched to other treatments (or otherwise violated the protocol) is the ability to monitor side effects that occur after discontinuation of treatment. Although the cause of such side effects may be unclear (e.g., if a subject switches to another treatment), these data, when combined with long-term followup of other subjects in high-quality epidemiological studies, may help to determine treatment-associated risks that are not immediately apparent. We are convinced that in the large majority of settings, as has been argued by Lavori (1992) and Rubin (1992), the benefits of collecting outcomes after subjects have discontinued treatment outweigh the costs.
Recommendation 3: Trial sponsors should continue to collect information on key outcomes on participants who discontinue their protocol-specified intervention in the course of the study, except in those cases for which a compelling cost-benefit analysis argues otherwise, and this information should be recorded and used in the analysis.
Recommendation 4: The trial design team should consider whether participants who discontinue the protocol intervention should have access to and be encouraged to use specific alternative treatments. Such treatments should be specified in the study protocol.
Recommendation 5: Data collection and information about all relevant treatments and key covariates should be recorded for all initial study participants, whether or not participants received the intervention specified in the protocol.
REFLECTING LOSS OF POWER FROM MISSING DATA
An important and relatively neglected issue in the design of clinical trials is how to account for the loss of power from missing data. (An additional impact of missing data is that the true significance level of the test of the treatment effect size could be larger than the specified level.) Currently, if any accommodation is done, it is simply to inflate the sample size that was initially planned to achieve a stated power by the inverse of one minus the anticipated dropout rate, as determined from other recent trials for similar interventions. If the dropouts provide no information about the treatment effect (which would not be the case for situations in which there was an interim outcome measure collected prior to participants’ dropping out) and the data from dropouts are missing completely at random, then this approach is reasonable. However, in practice, dropouts may provide partial information about the treatment effect: that is, effects (or lack of effects) of the intervention often play a role in the decision to drop out. The missing completely at random assumption is generally too optimistic; therefore, power calculations should be based on more realistic missing at random or missing not at random assumptions. Under such assumptions, the effects of missing data on power cannot be easily assessed analytically: relatively involved simulation studies would be needed. This is rarely done and is an area for research.5
It should be added that the most worrisome effect of missing values on the inference for clinical trials is not the reduction of power, though that can be problematic, but biased estimation of the treatment effect. The bias from an unaccounted association between the indicator of missing values and the outcome of interest is not addressed by simply inflating the sample size. In particular, if the potential bias from missing data is similar in size to the anticipated size of the treatment effect, then detection of this effect is unlikely, regardless of the sample size chosen for the study. If some preliminary estimate of the potential nonresponse bias can be obtained,
perhaps from a sensitivity analysis of the kind described in Chapter 5 on a related prior study, a simple strategy is to reduce the anticipated effect size by the anticipated size of the nonresponse bias and then power the study for this reduced effect size. If the adjusted effect size is too small to detect, it would be a strong incentive to design the study to reduce the degree of missingness.
DESIGN ISSUES IN THE CASE STUDIES
We now return to the three case studies introduced in Chapter 1. These examples (chronic pain management, HIV, and mechanical devices for hearts) were used to illustrate how missing data arise in clinical trials. They are used in this section to illustrate how the design recommendations in this chapter can be carried out in these situations.
Trials for Chronic Pain
Clinical trials for assessing interventions to relieve chronic pain are often subject to high rates of missing data because of inadequate efficacy and participants’ inability to tolerate treatment. Participants who discontinue study treatment usually switch to a proven (approved) effective therapy, and it is typical for investigators to stop collecting pain response data on these individuals. The last observation carried forward approach is often used to impute missing outcome values.
Selection of (Causal) Estimand
As specified in Recommendation 1, a critical first step is to determine an appropriate estimand. Potential choices include
(difference in) pain relief in all participants (e.g., degree of pain relief at 12 weeks [or more] in all patients in whom the treatment intervention is initiated [regardless of what is received throughout the course of the trial] [estimand 1, above]);
(difference in) pain relief in tolerators (e.g., degree of pain relief in patients who tolerate and choose to receive 12 weeks of therapy [estimand 2, above]); and
(difference in) treatment success rate (e.g., proportion of patients who can tolerate therapy, remain in study, and achieve adequate pain relief over 12 weeks [estimand 5, above]).
Option (a) addresses the anticipated outcomes in all patients who are randomized. Patients are managed according to a policy outlined in the
protocol and that reflects current practice. If the treatment policy reflects common practice in the clinical setting, this estimand may predict actual clinical outcomes. However, if many subjects receive effective alternative therapies, this estimand may shed only limited light on whether the treatment therapy is effective.
Option (b) addresses a key regulatory question, long-term efficacy in patients who will take the drug, but it fails to address other key questions, especially how well and how often the drug is tolerated and its efficacy in the total population receiving it, including those who do not take it for 12 weeks.
Option (c) addresses an important regulatory question and avoids missing data by defining a composite primary outcome. However, classifying all patients as either a treatment success or not may ignore important information, such as the extent of success or cause of failure. Also, counting patients who cannot tolerate therapy as failures may strongly weigh against drugs that are excellent in patients who tolerate them, even if there are significant subsets of patients who cannot tolerate them.
For example, in recent trials of trimethoprim sulfa (TS) against pentamidine for treatment of pneumocystis pneumonia in HIV-infected subjects, those who could not tolerate TS were typically switched to pentamidine. In evaluating TS, these treatment failures were “charged” to TS according to a traditional intent-to-treat analysis, ignoring the fact that almost all the TS failures were those who failed to tolerate the drug and not failures in efficacy, which is an important finding obscured by the composite outcome measure used. (For details, see Schneider et al., 1992.)
Suggested Study Designs Paired with Estimands
For estimand (b) (pain relief in tolerators), two study designs that limit missing data are a randomized withdrawal design, in which patients are treated with the test treatment open-label for 12 weeks, and subjects who tolerate and have adequate response to the treatment are randomized to continue or to withdraw (e.g., are switched to placebo) and followed for some time, and a design that uses an active control run-in period followed by placebo washout, and then randomization of those patients who tolerated the active control and had relief. In this case, the outcome is pain control at 12 weeks.
These designs may limit missing data problems, but as previously noted they raise other issues, such as the inability to address safety and efficacy in all comers. However, it may well be best to address those questions in separate clinical trials rather than try to address all questions in one trial.
For estimand (c) (treatment success rate), one can include in the composite outcome patients who cannot tolerate therapy along with those who
have inadequate pain relief. In this approach, there will be minimal missing data for the primary outcome.
Finally, for the more traditional estimand a. (pain relief in all comers), there are some design alternatives that may help reduce the number of dropouts. For example, allowing dose modification will likely reduce dropouts in the treatment group because of inability to tolerate, since participants may tolerate a lower dose, or because of an inadequate response, since participants may respond adequately to a higher dose. Although such dose adjustments reflect common clinical practice, it is sometimes avoided in the regulatory setting due to requirements that are best addressed with fixed dosing regimens. (For example, if it is required to determine the minimal dose that provides significant response, dose adjustments may be inappropriate.) In such a case, it may be best nonetheless to allow dose adjustment to minimize missing data in trials that are estimating long-term pain relief in all comers and to address fixed dosing issues in a separate trial or trials. Finally, for this estimand, continuing to collect data through 12 weeks in all patients, including those who choose to switch therapies, and using these data in the analysis, is particularly important.
Trials for Treatment of HIV
In many HIV trials, a noninferiority design is used to test whether a new drug is at least as safe and efficacious as the current standard of treatment. Since combination treatment is the norm for HIV, the typical design in this setting is new drug A plus background treatment compared with current drug B plus the same background treatment. A common primary outcome is often called time to loss of virologic response, but is in fact a composite measure that includes the following components: (1) death (and sometimes progression to an AIDS event), (2) discontinuation of study drug before 48 weeks, (3) loss to follow-up, and (4) HIV RNA level of greater than or equal to 50 copies/mL at or prior to 48 weeks on study drug.6
Suggested Estimands and Study Designs
Again, we emphasize the need to start by determining the causal estimand. Possible choices include the following:
virologic response in all participants (e.g., the percentage with an HIV RNA level of less than 50 copies/mL after 48 weeks in all participants
This composite outcome may be an example of the hazard mentioned in Chapter 1: if outcome measures are selected partially to reduce the frequency of missing data, they may also compromise the clinical value of the resulting inference.
randomized)—a “true” virologic outcome used for a comparison of treatment policies that ignores whether study treatment is discontinued;
virologic response in tolerators (e.g., the percentage with an HIV RNA level of less than 50 copies/mL, among participants who were able to tolerate the treatment for 48 weeks (note that switches in antiretroviral therapy [ART] due to lack of efficacy need to be differentiated from switches due to side effects or lack of tolerability; and
treatment success rate (e.g., the proportion of all randomized participants who stay on assigned treatment, remain in the study, and achieve an HIV RNA level of less than 50 copies/mL at 48 weeks), which is often the estimand in current practice.
Estimand (a) (virologic response in all participants) addresses anticipated outcomes in all participants who started on the therapy in question and were managed according to standard practice. This approach addresses a question about a specific efficacy outcome, and it compares two treatment policies (e.g., starting with a regimen using drug A with background treatment and starting with drug B with the same background treatment). This outcome is not often used in a regulatory setting because of concerns that estimation of the differences between drug A and drug B could be affected if more participants on one treatment group than another were switched to a virologically more potent regimen before 48 weeks.
Estimand (b) (virologic response in tolerators) addresses one key regulatory question, which is the efficacy in participants who will take the drug. However, it fails to address other key questions, for example, efficacy of the drug in the total population receiving it, including those who do not take it for 48 weeks. A run-in period is usually not practical because of concerns about HIV drug resistance. An analysis that excludes those who do not tolerate the study treatments may lead to biased estimates of treatment efficacy.
Estimand (c) (treatment success rate at 48 weeks) addresses an important regulatory question and avoids missing data through the use of a composite outcome. However, use of the composite outcome may mask important treatment differences, and in some circumstances may result in misleading results. For example, outcomes labeled as virologic failures may in fact reflect toxicity or losses to follow-up. Furthermore, counting participants who cannot tolerate therapy as failures may overly weigh against drugs that have excellent virologic efficacy in patients who do tolerate them, even if there are significant subsets of patients who cannot tolerate them.
Study Designs That Minimize the Extent and Impact of Missing Data
The use of virologic response in all comers and the composite treatment success outcomes each have advantages and disadvantages. In the
regulatory setting, the latter is currently recommended. It avoids missing data by considering participants with missing data to be treatment failures. However, this choice of outcome gives equal weight to missing data, deaths, intolerance, and lack of virologic efficacy, creating difficulties in interpretation. If such an outcome is used, there may be advantages to continue to collect data after treatment discontinuation to the end of follow-up. These data may permit assessment of the consequences of treatment failure before 48 weeks due to intolerability or lack of virologic efficacy (e.g., the development of HIV drug resistance associated with virologic failure). Continued follow-up allows a separate assessment of each component of the composite outcome at or before 48 weeks (e.g., summaries of numbers assigned each treatment who failed virologically). Treatment policies, as in estimand 1, above, can also be compared.
Trials for Mechanical Circulatory Devices for Severe Symptomatic Heart Failure
Device trials in patients with severe symptomatic heart failure have high rates of missing data for measures of functional status and health-related quality of life. The missing data arises because of deaths, some of which may be associated with the implantation procedure, failure to attend study follow-up examinations, and inability or unwillingness to perform functional tests or complete self-administered questionnaires. Unlike ascertainment of hospitalization events, measures of health status that include symptoms, functional status tests, and quality-of-life assessment require that patients be seen or complete a questionnaire.
In a trial of a left ventricular assist device (LVAD) as destination therapy for patients with severe symptomatic heart failure, it is critical to assess whether the device improves health status as well as survival. The ultimate approval and use of the device will depend on both outcomes. We consider possible estimands and study designs for assessing health status in a comparison of an LVAD with optimal medical management over a 2-year followup period. We assume that in such studies the four outcomes are of interest: death; disabling stroke; criteria met for implanting LVAD (e.g., based on previous trials in patients ineligible for transplant); and a self-administered quality-of-life assessment using a standard instrument (alternatively, or in addition to, a function measure test, such as a 6-minute walk time could be used). Some trials may also incorporate device removal or replacement in a primary composite outcome. Considerations in measuring and assessing health status in such trials have been summarized by a working group of the Heart Failure Society of America (see, e.g., Normand, 2005).
Suggested Estimands and Study Designs
Three potential choices for an estimand for evaluating health status include
difference in quality of life between treatment groups for all randomized patients,
difference in quality of life among survivors, and
area under the quality-of-life curve while alive.
For the estimand (a) (difference in quality of life for all randomized patients), the quality-of-life comparison could be performed earlier than 2 years to maximize the number of patients in each treatment group under follow-up (e.g., at 6 months). Alternatively, patients who die or who are unable to complete the questionnaire for health reasons could be given a “worst rank” score. This latter strategy would likely affect the power of the resulting test statistic as some deaths would be expected to be unrelated to the treatments.
Estimand (b) (difference in quality of life among survivors) addresses the effect of the LVAD on an outcome that complements a composite outcome of death, disabling stroke, or progression to a specified criteria that indicate an LVAD should be implanted. However, a complete-case analysis or mixed-model approach to the analysis of the quality-of-life data may not be appropriate as it is unlikely the data are missing at random. Thus, the pattern of missing data should be considered, and other methods for modeling the missing data (e.g., pattern mixture models) should be used.
Estimand (c) (area under the quality of life curve while alive) has the advantage of simultaneously evaluating the LVAD for quality of life and duration of survival (or an expanded event-free outcome).
Irrespective of which estimand is used, in such trials it will be important to assess health status as objectively as possible. Since such trials cannot be blinded to patients or investigators caring for the patient, use of independent, trained evaluators (i.e., those not involved in the care of the patient), for whom it would be possible to be blinded to the treatment, should be considered. In addition, assessment by both patients’ self-reports and clinicians’ assessments should be considered. Prior to the initiation of a study, the importance of complete ascertainment of the health status measurements should be emphasized to patients in the informed consent process and to clinicians during protocol training. Also, plans should be developed for visiting patients to obtain the health status assessments if they cannot attend clinic examinations.