9
Interim Monitoring and Analysis of Results

Randomized clinical trials often take several years to complete subject enrollment, or accrual, and follow-up. That means that information about the risks and benefits of the intervention becomes available during the trial, sometimes from the trial itself and sometimes from external sources such as other trials.

This information provides a scientific basis for monitoring the interim results of the trial—and indeed the ethical necessity to do so—to assess whether the trial should be modified in some way, or possibly terminated, given those results. During interim reviews of the trial, as well as after it has been completed, investigators must analyze the results in valid ways that reflect the trial’s design and protocol. This chapter explores the challenges entailed in performing interim monitoring and analyzing the results of HIV prevention trials.1

ENSURING EFFECTIVE INTERIM MONITORING

The evolving interim results of phase 3 and some phase 2 randomized trials are typically monitored by a data monitoring committee (DMC) (also known as a data and safety monitoring board, or a data monitoring board). Such a committee is composed of independent experts appointed by the study investigators or sponsor to ensure that the best interests of study participants are met during the trial (Ellenberg et al., 2002).

1

For more on reporting trial results, see the CONSORT guidelines at http://www.consort-statement.org/index.aspx?o=1030.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 186
9 Interim Monitoring and Analysis of Results R andomized clinical trials often take several years to complete subject enrollment, or accrual, and follow-up. That means that information about the risks and benefits of the intervention becomes available during the trial, sometimes from the trial itself and sometimes from external sources such as other trials. This information provides a scientific basis for monitoring the interim results of the trial—and indeed the ethical necessity to do so—to assess whether the trial should be modified in some way, or possibly terminated, given those results. During interim reviews of the trial, as well as after it has been completed, investigators must analyze the results in valid ways that reflect the trial’s design and protocol. This chapter explores the challenges entailed in performing interim monitoring and analyzing the results of HIV prevention trials.1 ENSURING EFFECTIVE INTERIM MONITORING The evolving interim results of phase 3 and some phase 2 randomized trials are typically monitored by a data monitoring committee (DMC) (also known as a data and safety monitoring board, or a data monitoring board). Such a committee is composed of independent experts appointed by the study investigators or sponsor to ensure that the best interests of study participants are met during the trial (Ellenberg et al., 2002). 1 For more on reporting trial results, see the CONSORT guidelines at http://www.consort- statement.org/index.aspx?o=1030. 

OCR for page 186
 INTERIM MONITORING AND ANALYSIS OF RESULTS For example, the DMC monitors whether randomly assigning partici- pants to intervention and control groups is still ethical given interim results, and whether subjects who are already enrolled should continue to receive their assigned interventions. The DMC also tracks a study’s evolving results to determine whether the trial still has the potential to achieve its scientific goals. The DMC may also recommend modifications to the trial design based on interim results, including changing the target period for enrolling or following subjects, or modifying the criteria for enrolling subjects. Though not the focus of this chapter, another function of the DMC is to evaluate the quality of the study conduct. In particular, the DMC usu- ally reviews investigators’ compliance with data management and operating procedures. For example, the DMC may monitor the accuracy and com- pleteness of the data collected, the trial’s compliance with restrictions on the eligibility of some potential participants, the adequacy of their accrual rates, and the trial’s adherence to drug distribution policies. If it detects problems, the DMC may suggest changes to procedures (Ellenberg et al., 2002). After reviewing a trial’s interim results, a DMC could recommend ter- minating the trial for a number of reasons, including the following: • The intervention and control arms are convincingly different (that is, the intervention is efficacious), or, in the case of a noninferiority trial, the study arms are convincingly similar. • One or more of the study arms produces unacceptable side effects or toxicity. • Accrual of participants is so slow that completion of the trial in a reasonable time period is no longer feasible. • Information from other studies with related goals and similar inter- vention arms makes continuation of the trial unnecessary or unethical. This section reviews key aspects of interim monitoring of randomized HIV prevention trials, including the composition of DMCs and the typical format of their meetings, the importance of access to complete information, challenges in monitoring trial assumptions, safety, efficacy, and futility, and the use of information from sources external to the trial. DMC Composition and Meetings The DMC for an HIV prevention trial typically includes statisticians and clinicians and often other scientists such as a virologist or someone with expertise in a key diagnostic test, an ethicist, and a lay participant—all appointed by the study’s investigators or sponsors (Ellenberg et al., 2002; Fleming et al., 2002). Because of the central role of behavior in biomedical

OCR for page 186
 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS HIV prevention studies, their DMCs should usually also include an indi- vidual with expertise in behavioral or social sciences. HIV prevention trials are often designed and sponsored by investiga- tors and organizations based outside the countries where the trials occur, such as pharmaceutical companies, governments, or nonprofit founda- tions. If that is the case, including representatives of local communities on the DMC is critical. For example, in the late-1990s, two mother-to-child HIV prevention trials were undertaken in Thailand, supported by the U.S. government and designed primarily by non–Thai scientists (Shaffer et al., 1999; Lallemant et al., 2000). Although both trials demonstrated sig- nificant declines in mother-to-child transmission, the trial that compared a shortened AZT (antiretroviral) regimen to no treatment had a DMC with minimal representation from the host country. That fact helped spark con- siderable ethical debate about the use of a placebo group when the efficacy of AZT had been established elsewhere (Angell, 1997). Recommendation 9-1: The data monitoring committees of trials with sponsors and scientific leaders from outside the host country should include multiple representatives from the host country. These members—who should compose at least one-third of the committee— should include scientists, ethicists, and lay people familiar with the community and local norms. DMC Meetings A key consideration for DMCs is how often they should meet. A trial’s protocol for interim monitoring should include guidelines for determining the frequency of meetings—typically expressed as a measure of informa- tion, such as the number of observed HIV infections. For example, a trial design might call for an interim efficacy analysis when 25, 50, and 75 per- cent of the anticipated number of HIV infections in the control group have occurred. The protocol should also specify how the trial should “spend” the overall type I error (say, 5 percent) among its interim and final analyses. (See more on this below, and, for example, Ellenberg et al., 2002.) As noted, DMCs also meet to monitor participant accrual, HIV inci- dence rates, attrition, and adherence, and to assess safety—sometimes while also assessing efficacy. It is common, and advisable, to require that a DMC meet at least once a year to perform such monitoring. The DMC typically holds open sessions at which it discusses the progress of a trial with key investigators, including the sponsors. This information—usually presented in an “open” report—may include the rates at which the trial is enrolling subjects, their baseline characteristics, the completeness of the data that the trial is collecting, and its ability to retain

OCR for page 186
 INTERIM MONITORING AND ANALYSIS OF RESULTS subjects, all aggregated across study arms. The committee also reviews a “closed” report—typically prepared by the study’s statistician—in a closed session, usually attended only by DMC members and the study statistician. This report usually includes summaries of safety, efficacy, adherence, and attrition by blinded study arm. Most DMCs have the authority to unblind the study arms—that is, to find out which arm the data are from—if they feel that doing so is impor- tant to determining whether to modify or continue a trial. For example, a nonsignificant trend in trial results favoring the control arm could con- vince the DMC to recommend ending a trial, but a similar trend favoring the experimental arm would typically convince the committee to continue the trial. If the former situation potentially exists, DMC members should unblind themselves, to determine whether the trial should end on the grounds that the experimental arm is not helping subjects as much as the control arm. In some instances, DMCs have operated under criteria that members will remain unblinded unless interim analyses comparing efficacy among a trial’s arms demonstrate a significant result. For example, Van Damme and colleagues (2002) report that the DMC for the N-9 microbicide trial had planned to remain unblinded unless results from the study arms became significantly different at the P = 0.001 level. The committee fails to see the rationale behind such criteria in trials comparing a new intervention to a control group because the threshold for stopping a trial due to a higher risk of HIV infection in the intervention group should be lower than the threshold for stopping the trial because of a lower risk of infection in the intervention group. For example, a nonsignifi- cant trend suggesting increased HIV risk in the intervention group would usually mean that there is a real concern that the participants are being harmed and also that the trial would be unlikely to demonstrate a signifi- cantly lower HIV infection risk in the intervention group if the trial were completed as planned. This was the motivation for the recent termination of the Merck STEP trial (http://www.avac.org/pdf/STEP_data_release.7Nov. pdf). The committee believes that DMCs should always have the option of unblinding study arms, if they believe that doing so is in the best interests of the participants. Recommendation 9-2: The data monitoring committees for HIV pre- vention trials should always have the option of unblinding interim results if they believe that doing so might lead them to recommend that the trial be modified or terminated, or lead to other actions that are in the best interests of the trial participants. In particular, when the efficacy data show nonsignificant trends favoring one of the blinded

OCR for page 186
0 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS arms, a DMC should unblind itself as this might reflect an intervention that may be harming patients. Deciding Whether a Trial Remains Feasible In most randomized clinical trials, the DMC monitors the assumptions used to determine a trial’s sample size and planned duration to ensure that it remains feasible. Ideally, a charter prepared prior to the start of a trial details the tasks the DMC will perform and the criteria it will use. In HIV prevention trials, these assumptions include the following: • Assumed versus actual rates of subject accrual, and the demograph- ics of enrolled subjects • Assumed versus actual HIV infection rates • Assumed versus actual adherence of subjects to study interventions • Assumed versus actual retention of subjects, including rates of loss to follow-up, and missing data • Assumed versus actual rates of pregnancy and other reasons for discontinuing the product Enough information is usually available during a trial to estimate participant accrual, adherence, behavior, and retention rates precisely. (An important consideration is whether adherence and behavior can be mea- sured in an unbiased fashion—see Chapter 5.) However, HIV incidence rates are typically so low that the DMC may have trouble obtaining suf- ficiently precise estimates to determine whether the incidence rate used to determine the sample size and study duration is accurate. And as Chapter 2 noted, an overly optimistic estimate of HIV incidence in the control group could mean that a trial is underpowered, and thus that it is unable to achieve its goals. For example, if a study has assumed that the annual HIV incidence rate in the control arm will be 4 percent, the width of the 95 percent confidence interval (CI) for the rate estimated from trial results is about 0.8/sqrt(n × f), where n denotes the number of subjects on which the estimate is based, and f denotes their average follow-up time. Thus, if investigators conduct an interim analysis after enrolling 500 subjects (250 per arm), with an average follow-up time of 1 year per sub- ject, the width of the 95 percent CI for the true incidence rate is about 5 percent. That is, an observed HIV incidence rate of 3 percent would still be consistent with the assumed rate of 4 percent used to power the study, yet it would also be consistent with a rate that would indicate insufficient power. This underscores the need to adequately justify a study’s assumed

OCR for page 186
 INTERIM MONITORING AND ANALYSIS OF RESULTS HIV incidence rate, and to be conservative in using it to determine the sample size and duration of follow-up of enrolled subjects. Many randomized trials either do not provide guidelines and criteria that the DMC will use to recommend modifying the sample size or duration of follow-up, or do so only in vague terms. Such modifications would not change the statistical validity of a trial if they were not based on compara- tive analyses of the interim data. For example, a DMC could recommend increasing a trial’s sample size based on the HIV incidence rate in a placebo group or across study arms. Such recommendations should be based on spe- cific criteria set forth in the protocol, such as the pooled HIV incidence rate versus the incidence rate in the control group. However, a recommendation to continue accruing subjects because of “interesting trends” in HIV infec- tions across study arms could be problematic, as this will tend to inflate the false positive rate (type I error) in standard analyses of the results. Recommendation 9-3: Investigators should clearly describe in the study protocol the basis and criteria for any recommendation by the data monitoring committee to modify a trial’s size or duration. If such changes are implemented, the protocol should also specify how inves- tigators should evaluate the trial results. Monitoring for Safety, Efficacy, and Futility To determine whether to stop or modify a trial based on its interim results, the DMC monitors emerging data on the safety and efficacy of a study’s interventions. In HIV trials, this information includes • safety data, • differences in HIV infection rates between study arms, and • differences in other measures of efficacy between study arms. Trials usually include more structured rules for modifying or stopping them in two instances: when they demonstrate the efficacy or noninferi- ority of a new intervention, and when they demonstrate its futility. The criteria for terminating or modifying a trial may also include unexpected side effects. Safety The side effects of interventions could be minor (such as rash or sore- ness) or more serious (greater susceptibility to other infections). Side effects of products used in HIV prevention studies could also

OCR for page 186
 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS include behavioral changes. For example, participants could be more likely to engage in risky sexual behavior—this is known as disinhibition, or risk compensation (Cassell et al., 2006)—if they believe the product being tested provides partial or complete protection against HIV infection. Demonstrating Benefit Conducting multiple statistical tests comparing a new and control inter- vention increases the rate of a false positive result (type I error) (Pocock, 1974). Because a DMC usually reviews a study’s interim results on sev- eral occasions, statistical analyses need to account for this inflated risk (Turnbull, 2006). That is, the criteria for achieving statistical significance at each interim analysis must be chosen to cap the overall chance of a false positive at some predefined level—typically 5 percent (or 0.05). Multiple ways of “spending” this type I error among interim analyses are available. Pocock suggested using the same criteria for each analysis, selected to give the desired overall type I error (see, for example, O’Brien and Fleming, 1979). However, most trials employ a more conservative rule, such as the O’Brien-Fleming spending function (O’Brien and Fleming, 1979), which requires early analyses to reach higher thresholds (that is, smaller P values) for statistical significance, and allows the final analysis to reach a lower threshold. For example, for a trial with three interim analyses and one final analy- sis, investigators could achieve an overall type I error rate of 5 percent by using a Pocock spending function requiring a P value of 0.016 or less at each analysis. Or investigators could achieve that error rate by using an O’Brien-Fleming approach requiring P values of 0.000005, 0.013, and 0.0228 at the first, second, and third interim analyses, and 0.0417 at the final analysis. However, although both approaches would yield an overall type I error rate of 5 percent, the O’Brien-Fleming boundaries are less likely to end a trial early than the Pocock boundaries. If an interim analysis does not prompt termination, the O’Brien-Fleming boundaries are also more likely to have a lower threshold for demonstrating that the treatment has made a significant difference at the final analysis. (For an example of early stopping of an HIV treatment trial for efficacy, see Hammer et al., 1997.) In HIV prevention studies, where subjects’ adherence and behavior are important determinants of an intervention’s effect, investigators must also consider whether the intervention sustains that effect. For example, a microbicide that reduces the risk of HIV infection for 6 months—but not thereafter, because users do not adhere to the regimen—is unlikely to have an important impact in controlling the HIV epidemic. Thus terminating an effectiveness trial of such a microbicide based on a short-term effect at an interim analysis (say, after 6 months) may be unwise. However, an efficacy

OCR for page 186
 INTERIM MONITORING AND ANALYSIS OF RESULTS trial designed specifically to assess whether the intervention has some pro- tective ability might well use a 6-month effect on HIV infection as a primary endpoint (see Chapter 2). One way to attempt to incorporate this consideration into the design of a monitoring plan is to use a conservative spending function, as noted. However, a better approach is to define the endpoint used in an interim analysis to reflect a sustained effect, such as the difference between inter- vention and control arms in cumulative HIV incidence at 2 years. One recently completed trial of the efficacy of male circumcision in preventing HIV infection used such a criterion (Bailey et al., 2007). Recommendation 9-4: For effectiveness trials, guidelines for stopping HIV prevention trials based on positive interim results should require evidence of a sustained impact on cumulative HIV incidence. Demonstrating Futility Interim analyses may suggest stopping a trial because of futility—that is, because the trial is highly unlikely to show that a new intervention is superior, given current evidence and the added information that would become available if the trial continued. In an HIV prevention trial, “futility” need not refer only to evidence of a complete lack of benefit in preventing HIV infection, but also to evidence that the protective efficacy is less than some minimal amount (such as 40 percent), or that the intervention does not produce a sustained drop in HIV infection rates. Or, an effectiveness trial might include a stopping rule for futility if the interim data rule out a short-term effect on HIV infection (say, 6 months after randomization). A trial of an intervention that reaches such a futility criterion would typically prompt the study’s investigators to pursue no further testing. For example, Hall et al. evaluated the value of intravenous and intrathecal cyta- rabine for prolonging the survival of HIV-infected people with progressive multifocal leukoencephalopathy (1998). At the time of the second interim analysis, when 57 of the scheduled 90 subjects had enrolled, 14 deaths had occurred in each of the cytarabine arms, as well as in the placebo arm, and cytarabine was associated with significant side effects. The chances that the study would show a significant survival benefit with cytarabine if the trial were completed were exceedingly small, given those results and the fact that only 33 more subjects would be enrolled. Thus the DMC recommended ending the trial for futility. More recently, in October 2007, the DMC for an HIV vaccine trial recommended terminating the trial based on an interim analysis, concluding that the vaccine could neither prevent HIV infection nor reduce the amount of virus in people who became infected (National Institute of Allergy and

OCR for page 186
 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS Infectious Diseases, 2007). Soon afterward, the DMC for a companion vaccine trial recommended terminating that trial also. The sponsors of the STEP trial have since announced that participants would be notified whether they received placebo or vaccine (see http://www.hvtn.org/media/ pr/STEPStudyOC.pdf). Terminating a trial for futility can prevent the inefficient use of resources. However, there is also an ethical basis for stopping a trial when the likeli- hood is low that it will achieve a definitive result. In the cytarabine exam- ple, early termination avoided exposing study participants to a therapy that appeared unlikely to help them but that does have serious side effects. In the HIV vaccine examples, the possibility that a vaccine might increase the risk of HIV infection could not be excluded based on the interim data, providing an ethical basis for terminating the trial. However, even if a new interven- tion does not have side effects and does not seem to increase risk, an ethical case could be made that participants would incur an opportunity cost by remaining in a trial, if by doing so they could not seek other options. This underscores the need for a detailed informed-consent process that alerts people to both the risks and benefits of participating in a trial. Rules that encourage investigators to terminate a study based on a low likelihood that the intervention will show adequate efficacy can play an important role in HIV prevention trials. Designing a phase 3 effectiveness trial with carefully constructed futility criteria could mimic a strategy of following a phase 2B trial with a phase 3 trial only if the 2B results were encouraging. Such a strategy would avoid the ethical problems of pursuing a phase 3 trial after finding promising results in a phase 2B trial. In some instances, there may be advantages to continuing a trial even when the interim data suggest that an intervention is unlikely to be supe- rior to the control regime. For example, if a trial compares a group that receives a common but unproven intervention with an untreated control group, interim evidence that the intervention is unlikely to produce a bet- ter response may not be sufficient grounds for terminating the trial owing to futility, because of the value of showing that the intervention is not very effective. Using Information from Similar Trials or Other Sources Information that affects the equipoise between risks and benefits of a trial’s study arms sometimes becomes available from sources outside the trial. For example, after public disclosure of interim results from the Thai PHPT trial on preventing mother-to-child transmission of HIV, the DMC for a Botswana trial recommended terminating one of four study arms that was similar to the terminated arm of the Thai trial (Talawat et al., 2002; Shapiro et al., 2006).

OCR for page 186
 INTERIM MONITORING AND ANALYSIS OF RESULTS Other examples have occurred with other diseases. For example, a recent meta-analysis (Nissen and Wolski, 2007) that raised questions about the cardiovascular side effects of rosiglitazone in diabetics led the investiga- tors of a randomized trial of the drug’s safety to convene an unscheduled interim analysis (Home et al., 2007). Because of inconsistencies between their results and those of the meta-analysis, the investigators chose to con- tinue their trial. These examples illustrate that DMCs need to be aware of emerging results from similar trials and other sources. They also suggest that inves- tigators consider including guidelines on whether and how they might use information that becomes available from related trials in interim monitoring. Dixon and Lagakos (2000) have cautioned against having the DMCs for similar contemporaneous trials share efficacy results, as that would raise serious questions about the appropriate publication of the findings, and detract from the long-standing desire for trials to yield reproducible results. Terminating a trial based on the unplanned pooling of efficacy data from another trial undermines the prespecified study criteria. As such, this approach represents a post hoc analysis, as the methods used to undertake such pooling and interpret the results are not part of the study design. This is a very different matter from terminating an arm of a trial, or ending a trial altogether, based on external data, as occurred when investigators ter- minated the African Phambili trial (HVTN 503) of a Merck HIV vaccine based on the findings of the international STEP trial (HVTN 502). Different studies may also use different criteria for assessing or defining endpoints, and for including or excluding subjects, and set different sched- ules for subject visits, further complicating the interpretation of interim information based on post hoc pooling. Finally, if two trials were stopped based on post hoc pooling of efficacy data, many researchers would insist that the results be published as a single paper, because any conclusion that the intervention had a positive effect would stem from the combined data. This would introduce other complications. However, because DMCs use less formal criteria to assess the safety of a new product than they use to document efficacy, sharing safety informa- tion from concurrent trials is acceptable and can be informative, especially for less frequent adverse events. One recent proposal—perhaps motivated by recent safety problems with microbicides (N-9 and cellulose sulfate), but applicable to any HIV biomedical prevention—is to create a “super DMC” that would monitor several microbicide trials with one or more intervention arms in common (Nunn, 2007). The basic idea is that DMCs would agree to share a core set of safety data, and that participating investigators would be notified of any emerging safety problems. Such an idea is intriguing. However, implementing it would require careful planning to avoid arbi-

OCR for page 186
 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS trary decisions on when to notify individual DMCs of overall results, and on which results each participating DMC should see. For example, if two three-arm trials had one common experimental arm and a common control arm, would the DMCs share safety information about all three arms? Also important are details about the procedures for capturing safety data, as trials may differ in the method, frequency, and completeness with which they collect safety information. There may also be ethical or regula- tory considerations, including whether the informed-consent process must be changed in such circumstances, as well as scientific issues, such as whether and when participating trials should release a single publication on the main trial results, as opposed to separate publications for each trial. To the committee’s knowledge, very little has been written about how best to share safety information among DMCs, yet the committee sees value in doing so in an appropriate manner. Nor has anyone discussed whether DMCs should share safety information routinely, or only if a possible con- cern is raised. Recommendation 9-5: Investigators, donors, and regulatory agencies should encourage research on how to combine safety information from concurrent trials of similar products, including the scientific advantages and disadvantages of sharing information, the timing and logistics of doing so, ethical concerns (such as how such information might affect the informed-consent process), and how to report the results from such trials. ANALYZING TRIAL RESULTS Analyzing the results of HIV prevention trials is particularly challeng- ing, for several reasons: • HIV infection is a “silent” event—that is, it is not directly observable—and the tests used to diagnose infection are imperfect. • When pregnancies occur during a trial, women are often taken off the study product. • Participants in such trials may not adhere to the study interventions. • Investigators need to account for the impact of HIV exposure on trial outcomes—which is determined by both HIV prevalence and the behavior of participants—while also addressing the challenges of obtaining accurate information on behavior. • Investigators need to assess the relationships among interventions, adherence, exposure, and the risk of HIV infection.

OCR for page 186
 INTERIM MONITORING AND ANALYSIS OF RESULTS The Silence of HIV Infection and the Imperfections of Diagnostic Tests In contrast to “time-to-event” endpoints such as mortality and progres- sion of HIV, as measured by a biomarker such as viral load, determining HIV infection requires a diagnostic test such as EIA, RNA-PCR, or a more recently developed rapid test. Repeated testing of subjects in an HIV pre- vention trial leads to “interval-censored” observations of the time to HIV infection, rather than an exact date. That is, periodic testing brackets an individual’s time of infection between the last negative and first positive diagnostic test. The situation is further complicated by the fact that the diagnostic tests used to detect HIV infection are not perfect. For example, RNA-PCR can have a low sensitivity when used within 2 weeks of HIV infection (Balasu- bramanian and Lagakos, 2003), leading to false negatives. Similarly, EIA does not usually detect HIV infection in persons who have not yet devel- oped HIV antibodies (that is, those who have not yet seroconverted). These features imply that some participants who enroll in HIV preven- tion trials may already be infected, and that some participants who become infected during a trial may not be diagnosed. Investigators need to take those possibilities into account when analyzing trial results. Excluding Subjects Who Were HIV Infected When Enrolled from Analysis HIV prevention trials have used different approaches in analyzing results from participants who are later suspected of having been HIV infected at the time of enrollment. One approach has been to use posten- rollment diagnostic tests to avoid counting subjects believed to have been infected at the time of randomization. If investigators could identify and exclude all subjects already infected when they are randomized, and no others, estimates of the relative efficacy of an intervention, and tests of the null hypothesis, would improve in several ways (Balasubramanian and Lagakos, 2004): • Estimates of product efficacy or effectiveness would be less biased. • The type I error of tests comparing study arms with respect to HIV infection rates would remain valid. • The power of the trial to detect a real difference in efficacy between arms would increase. However, despite these potential advantages of excluding subjects who are already infected at the time of randomization, the impact of doing so is often minimal because the number of such exclusions is small. On the

OCR for page 186
 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS other hand, postrandomization exclusions can introduce biases and distort comparisons of the intervention and control arms, if they do not exclude all subjects infected at the time of randomization, or if they incorrectly exclude some subjects who were uninfected at enrollment, differentially among the intervention arms. This could occur, for example, if the post hoc testing of baseline samples is triggered by positive HIV test occurring shortly after randomization (say, at 3 months), since this could be influenced by a dif- ferential intervention effect. Biases can also occur if the criteria for determining which participants to exclude are not identical in the intervention and control arms, or if the patterns of participants’ clinic visits are not identical in each arm. Thus, investigators must carefully weigh the potential gains from excluding indi- viduals who may have been infected at enrollment against the possibility that doing so will introduce bias into the comparison of study arms. Even if investigators could theoretically justify the post hoc exclusion of subjects, critics might question the face validity of results from trials that exclude more subjects from the intervention arm than from the control arm. Recommendation 9-6: Investigators should base their primary analysis of the efficacy of an intervention on all randomized subjects. Second- ary sensitivity analyses that exclude subjects believed to have been HIV infected when they were randomized can be useful. However, investiga- tors should not substitute such analyses for the primary analysis, unless such exclusions (and nonexclusions) can confidently be made without error. Recommendation 9-7: Investigators of trials evaluating an intervention that is believed to have a delayed impact may find it efficient to exclude people found to be HIV infected after randomization but before a given follow-up time. If so, the trial protocol should specify and justify such an approach, and investigators should use it only if follow-up of subjects and assessment and confirmation of HIV infection during this period is identical in all study arms. Investigators should undertake secondary analyses based on all randomized subjects. Confirming HIV Infections Subjects who test HIV positive typically undergo confirmatory tests. Some of the initial results turn out to be true positives, while some are false positives. In theory, some of the true positives might not be confirmed because of the imperfect sensitivity of the confirmatory test—that is, these subjects would be considered negative when they are actually positive—thus increasing the number of false positives.

OCR for page 186
 INTERIM MONITORING AND ANALYSIS OF RESULTS Given that tests to detect HIV infection are imperfect, a trial protocol should set clear criteria for confirming that subjects are indeed infected. Although such confirmation could increase the number of false negatives, it would decrease the number of false positives and lead to more confi- dence that the observed endpoints are “real.” It is critical that investiga- tors develop criteria for assessing endpoints that are applied equally in the intervention and control arms. Analyzing Time to Infection Standard methods for analyzing the amount of time that elapses between randomization and HIV infection assume that investigators know the exact time of infection. These methods include the log-rank test, Kaplan-Meier estimator, and Cox’s model. To account for the interval-censored nature of information on HIV infection in prevention trials, and the imperfection of the tests, investigators could use modified versions of these methods (Richardson and Hughes, 2000; Balasubramanian and Lagakos, 2004; Gupte et al., 2007; Zhang and Lagakos, in press). These standard methods provide valid tests of the efficacy of an inter- vention if subjects are evaluated based on the same schedule of clinic visits in each study arm, and if the sensitivity and specificity of the diagnostic test does not depend on the study arm. In that case, similar periodic results would be expected to occur in the study arms under the null hypothesis of no intervention effect. Under these circumstances the following occurs: • Standard Kaplan-Meier estimates of cumulative HIV infection rates are valid at the scheduled visit times. However, these cumulative rates cannot be estimated for times between visits, so the curves should not be displayed in the usual way as step functions. • In practice, participants are often not evaluated according to the exact visit schedule. In such cases, the Kaplan-Meier estimator, log-rank test, and Cox model should use the scheduled visit time rather than the actual time. That is because the tests depend on the magnitudes of the observed times only through their relative ranks (that is, they are “rank invariant”), and thus small differences in the time of visits can have a big impact on the results. • Investigators should include information from unscheduled visits in the analyses only if they can safely assume that such visits do not depend on subjects’ infection status. Otherwise, investigators should base their analyses only on results from scheduled visits. In some studies, such as in newborns and infants, a nonnegligible pro- portion of subjects may die before being detected as HIV infected. In such

OCR for page 186
00 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS settings, investigators should use methods for analyzing competing risks, or use HIV-free survival rather than HIV infection as the endpoint (see, for example, Richardson and Hughes, 2000). Effects of Product Discontinuation and Loss to Follow-Up Participants may stop using an intervention during a trial for sev- eral reasons, most commonly adverse treatment effects, an inability to continue the treatment or lack of interest in doing so, or, in some HIV prevention studies, pregnancy. In some trials, investigators stop tracking participants who discontinue their randomized intervention prematurely. It is well known that this can lead to distorted statistical inferences about intervention effects (Lagakos et al., 1990) because subjects who discontinue their intervention, including placebo, can have different risks of becoming infected than those that do not. For example, Hughes et al. (1994) noted that HIV patients with more rapidly declining CD4+ cell counts were more likely to discontinue treatment than other patients. Another example is the Coronary Drug Project (Canner et al., 1986). In that trial, death rates in patients randomized to receive Clofibrate were 18 percent for compliers compared with 25 percent for noncompliers, suggesting that the drug might be beneficial. However, the corresponding death rates for the placebo group were 15 percent and 28 percent, indicating that something about being non- compliant was associated with a poorer outcome (Snapinn et al., 2004). For both examples, if the rates of discontinuation differed among the interven- tion groups and analyses were based only on outcome events that occurred prior to discontinuation (sometimes referred to as “as treated” analyses), the comparisons of outcome events among the interventions would be biased. Thus, the accepted practice is to continue to follow participants for the study endpoint regardless of whether or not they prematurely discon- tinue their randomized intervention, and to use all outcome events in the analysis of the data, and not just those that precede discontinuation of the intervention; such analyses are called intention-to-treat analyses. As noted in Chapter 2, the power of intention-to-treat analyses will, in general, be reduced by product discontinuation. Chapter 5 discusses several ways in which the effect of persistence and more generally adherence and behavior on HIV incidence can be meaningfully analyzed. Handling Pregnancies During Follow-Up If a pregnancy that occurs during a trial does not trigger a modifica- tion of the intervention, then analyses of time to HIV infection will not change. However, if a product is temporarily or permanently discontinued when a woman is found to be pregnant, the more general discussion of

OCR for page 186
0 INTERIM MONITORING AND ANALYSIS OF RESULTS product discontinuation described in the previous paragraph applies. Thus, it is important that investigators continue to follow pregnant women for HIV infection after they discontinue a product owing to pregnancy and, when analyzing results, use intention-to-treat analyses that utilize outcome events for the duration of follow-up, and not just those occurring prior to the pregnancy. An alternative method of analysis is to “censor” a woman’s time of infection when she is found to be pregnant and discontinues the product. That is, investigators would regard this woman’s time of infection as being “at least x,” where x is the time from randomization until she is found to be pregnant and taken off the product. This convention is sometimes referred to as an “as-treated” analysis. As with other forms of discontinuation, such analyses could lead to biased estimates of the cumulative risk of HIV infec- tion if pregnancy represented a type of “informative censoring”—that is, if the risk of (subsequent) HIV infection in a pregnant woman is different from that of a nonpregnant women with equal follow-up. Although the evi- dence for a differential risk of HIV infection during infection is limited and thus somewhat controversial, there have been reports of increased HIV risk in pregnant women (Taha et al., 1998; Gray et al., 2005; Morrison et al., 2007). The impact of pregnancy on “as-treated” statistical tests comparing intervention groups is somewhat different. Here, if the rates of pregnancy do not differ among the intervention arms, and if the risk of HIV infec- tion for a pregnant woman does not depend on the product she had been taking, at-treated tests that censor a woman at the time of pregnancy will lead to valid comparisons. However, when planning a trial, investigators usually cannot be assured of either of these assumptions; thus, it is prudent to continue to follow women who become pregnant for the study’s outcome events and to analyze the resulting data using intention-to-treat methods. Recommendation 9-8: In all trials, investigators should continue to follow women who become pregnant for HIV infection, regardless of whether they discontinue their study intervention. In addition, intention- to-treat analyses should be the primary basis for comparing interven- tion groups with respect to HIV infection and other efficacy endpoints. Investigators can include as-treated analyses as secondary analyses, but should interpret them cautiously, because of the possibility that such discontinuations represent a type of informative censoring. REFERENCES Angell, M. 1997. The ethics of clinical research in the third world. New England Journal of Medicine 337(12):847-849.

OCR for page 186
0 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS Bailey, R. C., S. Moses, C. B. Parker, K. Agot, I. Maclean, J. N. Krieger, C. F. Williams, R. T. Campbell, and J. O. Ndinya-Achola. 2007. Male circumcision for HIV pre- vention in young men in Kisumu, Kenya: A randomised controlled trial. Lancet 369(9562):643-656. Balasubramanian, R., and S. W. Lagakos. 2003. Estimation of a failure time distribution based on imperfect diagnostic tests. Biometrika 90(1):171-182. Balasubramanian, R., and S. W. Lagakos. 2004. Analyzing time-to-event data in a clinical trial when an unknown proportion of subjects has experienced the event at entry. Biometrics 60(2):335-343. Canner, P. L., K. G. Berge, N. K. Wenger, J. Stamler, L. Friedman, R. J. Prineas, and W. Friedewald. 1986. Fifteen-year mortality in coronary drug project patients: Long-term benefit with niacin. Journal of the American College of Cardiology 8(6):1245-1255. Cassell, M. M., D. T. Halperin, J. D. Shelton, and D. Stanton. 2006. Risk compensation: The Achilles heel of innovations in HIV prevention? BMJ 332(7541):605-607. Dixon, D. O., and S. W. Lagakos. 2000. Should data and safety monitoring boards share confidential interim data? Controlled Clinical Trials 21(1):1-6; discussion 54-55. Ellenberg, S., P. L. Fleming, and D. L. DeMets. 2002. Data monitoring committees in clinical trials: A practical perspective. Edited by S. Senn and V. Barnett. West Sussex, UK: John Wiley & Sons Ltd. Fleming, T. R., S. Ellenberg, and D. L. DeMets. 2002. Monitoring clinical trials: Issues and controversies regarding confidentiality. Statistics in Medicine 21(19):2843-2851. Gray, R. H., X. Li, G. Kigozi, D. Serwadda, H. Brahmbhatt, F. Wabwire-Mangen, F. Nalugoda, M. Kiddugavu, N. Sewankambo, T. C. Quinn, S. J. Reynolds, and M. J. Wawer. 2005. Increased risk of incident HIV during pregnancy in Rakai, Uganda: A prospective study. Lancet 366(9492):1182-1188. Gupte, N., R. Brookmeyer, R. Bollinger, and G. Gray. 2007. Modeling maternal-infant HIV transmission in the presence of breast-feeding with an imperfect test. Biometrics 63(4):1189-1197. Hall, C. D., U. Dafni, D. Simpson, D. Clifford, P. E. Wetherill, B. Cohen, J. McArthur, H. Hollander, C. Yainnoutsos, E. Major, L. Millar, and J. Timpone. 1998. Failure of cytarabine in progressive multifocal leukoencephalopathy associated with human im- munodeficiency virus infection. AIDS Clinical Trials Group 243 Team. New England Journal of Medicine 338(19):1345-1351. Hammer, S. M., K. E. Squires, M. D. Hughes, J. M. Grimes, L. M. Demeter, J. S. Currier, J. J. Eron, Jr., J. E. Feinberg, H. H. Balfour, Jr., L. R. Deyton, J. A. Chodakewitz, and M. A. Fischl. 1997. A controlled trial of two nucleoside analogues plus indinavir in per- sons with human immunodeficiency virus infection and cd4 cell counts of 200 per cubic millimeter or less. AIDS Clinical Trials Group 320 Study Team. New England Journal of Medicine 337(11):725-733. Home, P. D., S. J. Pocock, H. Beck-Nielsen, R. Gomis, M. Hanefeld, N. P. Jones, M. Komajda, and J. J. McMurray. 2007. Rosiglitazone evaluated for cardiovascular outcomes—an interim analysis. New England Journal of Medicine 357(1):28-38. Hughes, M. D., D. S. Stein, H. M. Gundacker, F. T. Valentine, J. P. Phair, and P. A. Volberding. 1994. Within-subject variation in cd4 lymphocyte count in asymptomatic human immu- nodeficiency virus infection: Implications for patient monitoring. Journal of Infectious Diseases 169(1):28-36. Lagakos, S. W., L. L. Lim, and J. M. Robins. 1990. Adjusting for early treatment termination in comparative clinical trials. Statistics in Medicine 9(12):1417-1424. Lallemant, M., G. Jourdain, S. Le Coeur, S. Kim, S. Koetsawang, A. M. Comeau, W. Phoolcharoen, M. Essex, K. McIntosh, and V. Vithayasai. 2000. A trial of shortened zidovudine regimens to prevent mother-to-child transmission of human immunodefi-

OCR for page 186
0 INTERIM MONITORING AND ANALYSIS OF RESULTS ciency virus type 1. Perinatal HIV prevention trial (Thailand) investigators. New England Journal of Medicine 343(14):982-991. Morrison, C. S., J. Wang, B. Van Der Pol, N. Padian, R. A. Salata, and B. A. Richardson. 2007. Pregnancy and the risk of HIV-1 acquisition among women in Uganda and Zimbabwe. AIDS 21(8):1027-1034. NIAID (National Institute of Allergy and Infectious Diseases). 2007. Statement: Immuniza- tions are discontinued in two HIV vaccine trials. Bethesda, MD: NIAID. http://www3. niaid.nih.gov/news/newsreleases/2007/step_statement.htm (accessed November 2007). Nissen, S. E., and K. Wolski. 2007. Effect of rosiglitazone on the risk of myocardial in- farction and death from cardiovascular causes. New England Journal of Medicine 356(24):2457-2471. Nunn, A. 2007. Issues in microbicide trial design, monitoring, and analysis. Paper read at the second public meeting for the Committee on Methodological Challenges in HIV Preven- tion Trials, April 19, London, UK. O’Brien, P. C., and T. R. Fleming. 1979. A multiple testing procedure for clinical trials. Bio- metrics 35(3):549-556. Pocock, S. J. 1983. Clinical trials: A practical approach. Chichester, UK: John Wiley & Sons, Inc. Richardson, B. A., and J. P. Hughes. 2000. Product limit estimation for infectious disease data when the diagnostic test for the outcome is measured with uncertainty. Biostatistics 1(3):341-354. Shaffer, N., R. Chuachoowong, P. A. Mock, C. Bhadrakom, W. Siriwasin, N. L. Young, T. Chotpitayasunondh, S. Chearskul, A. Roongpisuthipong, P. Chinayon, J. Karon, T. D. Mastro, and R. J. Simonds. 1999. Short-course zidovudine for perinatal HIV-1 transmission in Bangkok, Thailand: A randomised controlled trial. Bangkok Collabora- tive Perinatal HIV Transmission Study Group. Lancet 353(9155):773-780. Shapiro, R. L., I. Thior, S. Gilbert, C. Lockman, C. Wester, L. Smeaton, S. J. Stevens, K. Heymann, K. McIntosh, S. Ndung’u, V. Gaseitsiwe, T. Novitsky, S. Peter, E. Kim, C. Widenfelt, P. Moffat, P. Ndase, P. Arimi, P. Kebaabetswe, P. Mazonde, R. Lee, J. Marlink, J. Makhema, S. Lagakos, and M. Essex. 2006. A randomized comparison of strategies for adding single-dose nevirapine to zidovudine to prevent mother-to-child HIV transmission in Botswana. AIDS 20:1281-1288. Snapinn, S. M., Q. Jiang, and B. Iglewicz. 2004. Informative noncompliance in endpoint trials. Current Controlled Trials in Cardiovascular Medicine 5(1):5. Taha, T. E., G. A. Dallabetta, D. R. Hoover, J. D. Chiphangwi, L. A. Mtimavalye, G. N. Liomba, N. I. Kumwenda, and P. G. Miotti. 1998. Trends of HIV-1 and sexually transmitted dis- eases among pregnant and postpartum women in urban Malawi. AIDS 12(2):197-203. Talawat, S., G. J. Dore, S. Le Coeur, and M. Lallemant. 2002. Infant feeding practices and attitudes among women with HIV infection in northern Thailand. AIDS Care 14(5):625-631. Turnbull, B. 2006. Group sequential tests. In Encyclopedia of Statistical Science. New York: John Wiley & Sons, Inc. Van Damme, L., G. Ramjee, M. Alary, B. Vuylsteke, V. Chandeying, H. Rees, P. Sirivongrangson, L. Mukenge-Tshibaka, V. Ettiegne-Traore, C. Uaheowitchai, S. S. Karim, B. Masse, J. Perriens, and M. Laga. 2002. Effectiveness of COL-1492, a nonoxynol-9 vaginal gel, on HIV-1 transmission in female sex workers: A randomised controlled trial. Lancet 360(9338):971-977. Zhang, P., and S. W. Lagakos. In press. Analysis of time to a silent event whose occurrence is monitored with error, with application to mother-to-child HIV transmission. Statistics in Medicine.