2
Basic Design Features: Size, Duration, and Type of Trials, and Choice of Control Group

When planning a late-stage randomized clinical trial, investigators need to consider a number of design features, including (1) the number of subjects and duration of follow-up; (2) whether the trial will evaluate efficacy or effectiveness; (3) whether to begin with a smaller (phase 2) trial, with the understanding that a larger (phase 3) trial will follow if the results are promising (see Chapter 1, Box 1-1 for a description of clinical trial phases); and (4) how to choose a control group or groups.

A number of factors influence these choices, including the anticipated HIV incidence rate for the control group(s), the rates of product nonadherence and discontinuation owing to pregnancy and other reasons, and the rates of loss to follow-up, as well as the uncertainty surrounding these assumed rates and the resulting effect on the power of the trial. Investigators must also consider how large and long-lasting the effect of the intervention must be to be of scientific interest or public health significance. This chapter discusses these issues in the context of an HIV prevention trial involving a biomedical intervention.

TRIAL SAMPLE SIZE AND DURATION

The power of a clinical trial refers to the probability that it will detect a beneficial effect of a specific magnitude. Investigators commonly measure differences between the study arms as the relative risk, or RR, that someone will become HIV infected. For example, a recently published trial of circumcision (Bailey et al., 2007) compared the probability that participants



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 69
2 Basic Design Features: Size, Duration, and Type of Trials, and Choice of Control Group W hen planning a late-stage randomized clinical trial, investigators need to consider a number of design features, including (1) the number of subjects and duration of follow-up; (2) whether the trial will evaluate efficacy or effectiveness; (3) whether to begin with a smaller (phase 2) trial, with the understanding that a larger (phase 3) trial will follow if the results are promising (see Chapter 1, Box 1-1 for a description of clinical trial phases); and (4) how to choose a control group or groups. A number of factors influence these choices, including the anticipated HIV incidence rate for the control group(s), the rates of product nonad- herence and discontinuation owing to pregnancy and other reasons, and the rates of loss to follow-up, as well as the uncertainty surrounding these assumed rates and the resulting effect on the power of the trial. Investiga- tors must also consider how large and long-lasting the effect of the interven- tion must be to be of scientific interest or public health significance. This chapter discusses these issues in the context of an HIV prevention trial involving a biomedical intervention. TRIAL SAMPLE SIZE AND DURATION The power of a clinical trial refers to the probability that it will detect a beneficial effect of a specific magnitude. Investigators commonly measure differences between the study arms as the relative risk, or RR, that someone will become HIV infected. For example, a recently published trial of cir- cumcision (Bailey et al., 2007) compared the probability that participants 

OCR for page 69
0 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS TABLE 2-1 Trial Duration as a Function of Accrual Rate and Years of Accrual Accrual Total Trial Expected rate Years of participants duration number of Design (per year) accrual accrued (years) events 1 500 2 1,000 5.43 95 2 300 2 600 8.62 95 3 500 3 1,500 4.42 96 4 300 3 900 6.45 95 5 500 4 2,000 4.18 96 6 300 4 1,200 5.68 96 NOTE: The table assumes a 5 percent type I error, 3 percent annual HIV incidence in the control group, and 90 percent power to detect an RR between intervention and control groups of 0.5. in the circumcision and control groups would become HIV infected within 2 years of randomization. The results—rates of 2.1 versus 4.2 infections per 100 person years of follow-up for the two groups—yielded an estimated RR of 0.47 for circumcision. Alternatively, investigators could compare the HIV infection rates, or hazards, in the study arms. In that case, researchers would commonly express RR as the ratio of hazards for the two groups. The power of a trial with a “time-to-event” endpoint, such as HIV infection, is driven by the number of participants who become HIV infected during the trial, not the sample size per se, and by how much investigators expect the number of infections to differ between the intervention and con- trol groups. To illustrate, suppose that a placebo-controlled trial can enroll 500 participants per year, and wishes to have 90 percent power to detect a halving of annual HIV incidence from 3 percent to 1.5 percent, based on a two-sided type I (false positive) error of 5 percent. Table 2-1 illustrates six designs with different combinations of periods for accruing participants and trial durations that will give the desired power.1 Phase 3 effectiveness trials typically last for 2 to 4 years. The table assumes that time until infection follows an exponential distribution, that an equal number of participants are randomized to each group, that all participants are followed for the duration of the trial, and that there are no dropouts. An important feature is that despite their differences in sample size and duration, each of the six trial designs has the same type I and II error rates, and each would expect a total of 95–96 participants to become HIV infected during the study, given the assumed difference between treat- 1 All calculations in this and the following tables and figures were made using EAST soft- ware, version 5.1 (Cambridge, MA: Cytel Inc., 2007).

OCR for page 69
 BASIC DESIGN FEATURES ment groups. However, the designs differ in other important ways. For example, the first design would require only 1,000 participants (500 per arm), but it would require 5.43 years of follow-up from the time of the first randomized subject. In contrast, the fifth design would require twice the number of participants (2,000), but it would be completed more than a year sooner than the first design. Choosing among such options when designing a trial requires consider- ing several factors. These include the availability of trial participants, the relative costs of enrolling and following subjects, the anticipated number of subjects who become lost to follow-up (which would increase with the duration of follow-up), and the anticipated rates of adherence to the inter- vention, which could be affected by pregnancy rates, and would vary over time. Designs with a smaller number of subjects would last longer, and thus provide estimates of cumulative HIV incidence over longer periods of time. Such trials would provide more long-term information on the durability of any intervention effect. However, because smaller studies take more time to complete, they could delay the introduction of an effective intervention into the community. The above designs suggest that subjects would be enrolled and develop HIV infection at given rates, and therefore that the trials would be com- pleted in the indicated durations of time. However, actual enrollment rates and HIV incidence rates may vary from these assumptions. Thus another approach for implementing these designs is to follow participants until the actual number of events—that is, new HIV infections—equals the expected number. This is sometimes called an “event-driven trial.” For example, in design 1, participants would be accrued for 2 years, and the trial would continue until 96 events occurred. If the actual accrual and event rates occurred as assumed during trial planning, the trial would take about 5.43 years to complete. However, if in fact the accrual rate were only 300 per year (as shown in Table 2-1), and participants were accrued for 2 years, then the trial must last 8.62 years to achieve 96 expected endpoints, and thus the desired power. Although an events-driven approach can compensate for inaccurate assumptions about accrual or incidence rates, investigators and sponsors must also consider the cost of such a trial, the sponsor’s willingness to pro- vide longer-term support, and the relevance of the trial result if the time to completion is substantially longer than originally anticipated. Impact of the Assumed Intervention Effect on Trial Size and Power One guideline for selecting the magnitude of an intervention effect for the purpose of planning a trial is to use the smallest reduction in HIV

OCR for page 69
 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS incidence that is important from a clinical and public health perspective. For example, investigators might decide that a biomedical intervention would need to reduce the HIV infection rate in a community by at least 50 percent to be practically useful. If so, the trial should ideally be powered to detect an RR between treatment and control of 0.50. Another guideline is to power a study based on the difference investigators expect to see. For example, even though an RR of 0.7 (a 30 percent reduction in risk) might be important from a public health perspective, investigators might expect a new intervention to have a stronger effect, for example, an RR of 0.5, and then power their study to detect this. All other things being equal, a larger intervention effect requires a smaller or shorter trial. However, if investigators are overly optimistic about the magnitude of this effect, they will end up with a smaller or shorter trial than they need for adequate power. The power of a trial can drop quickly as the efficacy of an intervention diminishes because of nonadherence, time off of the product, or other fac- tors, so it is important to power a trial against a realistic RR. For example, Figure 2-1 shows how the power of design 5 from Table 2-1 changes when Power vs Effect Size (RR) 1 0.9 0.8 0.7 0.6 Power 0.5 0.4 0.3 0.2 0.1 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 Relative Risk FIGURE 2-1 Power of design 5 as a function of actual product efficacy, as measured by the RR between intervention and control groups. 2-1 NOTE: The table assumes a control group incidence rate of 3 percent and a type I error of 5 percent.

OCR for page 69
 BASIC DESIGN FEATURES the true RR varies from 0.4 to 0.8 with the number of participants held constant. Even with an RR of 0.6, the power drops from the planned 90 percent to 66 percent. Plots such as Figure 2-1 should be part of the plan- ning process, so investigators appreciate how strongly the power of a trial will vary with the assumed RR between intervention and control groups, over ranges of public health significance. Of course, having low power for small intervention effects with no public health importance is not a con- cern. (See Appendix C for a discussion of the effects of product efficacy on sample size.) Impact of HIV Incidence on Trial Size and Power A trial’s sample size also depends on the assumed HIV incidence rate in the control group. If investigators overestimate incidence rates when designing a trial, the power to detect an intervention effect could be low. Table 2-2 shows how the power of a study changes as a function of the HIV incidence rate in the control group, assuming a 50 percent reduction in risk in the intervention arm. For example, when the incidence rate in the control group drops from 3 percent to 2 percent in design 5 (2,000 partici- pants, 4.18-year study), the power drops from 0.90 to 0.752. Clearly, when the incidence rate drops, a larger (and sometimes much larger) sample size would be needed to achieve adequate statistical power. If investigators assume an HIV incidence rate that is too low for an events-driven trial, it may need to last much longer than expected (see Table 2-3). Because of uncertainty in the expected HIV incidence rate in the con- trol group, use of an events-driven trial can be advantageous to a trial that analyzes results at a prespecified time. However, when actual HIV incidence is substantially lower than investigators originally assume, the extra time TABLE 2-2 Actual Power of Design 5 When Annual HIV Incidence in the Control Group Differs from the Assumed 3 Percent Control group incidence rate Power 5% .987 4% .963 3% .90 2.5% .84 2% .752 1% .464 0.5% .262 NOTE: The RR between intervention and control is 0.5.

OCR for page 69
 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS TABLE 2-3 Expected Duration Needed for Design 5 to Achieve 90 Percent Power When Annual HIV Incidence in the Control Group Differs from the Assumed 3 Percent Control group incidence rate Expected duration (in years) 4% 3.60 3.5% 3.86 3% 4.18 2.5% 4.62 2% 5.29 1.5% 6.41 1% 8.65 NOTE: The RR between intervention and control is 0.5. needed to complete a trial with satisfactory power may not be feasible, or the results might be less relevant because of advances in the field. Thus, investigators need to monitor results during a trial to assess whether actual HIV incidence rates are close enough to the assumed rates that the trial remains feasible. (See Chapter 8 for further discussion.) The Impact of Attrition on Trial Size and Power In many trials, investigators are not able to follow some participants for HIV infection because they become lost to follow-up (LFU)—that is, leave the study area, refuse further contact participation in the study, or otherwise cannot be reached. If the prognosis for subjects who become LFU at a particular time is the same as that for subjects who remain in the study—called noninformative LFU, a condition that cannot be verified from the data—standard analyses that regard such losses as right-censored observations do not lead to distorted type I (false positive) errors, but the power of the study is diminished because of the resulting loss in person- years of observation for the study endpoint. To avoid the potential loss of power from this type of LFU, investiga- tors usually adjust the initial sample size to account for the anticipated amount of dropout. For example, if they anticipate that 10 percent of par- ticipants will become LFU, they can increase the sample size by 10 percent to yield the same total person-years of observation. Impact of Product Discontinuation on Trial Size and Power As described in Chapter 9, subjects that discontinue product use prema- turely should continue to be followed for the trial’s outcome events, such as HIV infection, and intention-to-treat analyses should be used to compare

OCR for page 69
 BASIC DESIGN FEATURES the intervention groups with respect to these outcomes. Although such an approach avoids biases that can arise from not analyzing outcome events that occur after product discontinuation, the power of the trial to detect an intervention effect on the outcome events can be diminished if the product’s effect is lost after it is discontinued, making the observed HIV infection risks for the intervention and placebo groups more alike. As Freedman et al. have shown, the impact on study power can be substantial, depending on the proportion of subjects who discontinue treatment as well as the timing of their discontinuation (Freedman, 1990). (See also Brittain et al., 1989, and Jo, 2002.) Because of its attenuating effect on the true intervention effect, non- compliance affects sample size more than attrition does (Zelen, 1988; Freedman, 1990). For example, if pregnancies lead to a 10 percent reduc- tion in woman-years of observation, the effect on the power of the trial, if analyzed by intention to treat, is greater than the effect of a 10 percent reduction in sample size. Although this loss of power is typically not addressed by increasing the planned size of the trial, it underscores the need to maximize adherence of study participants. Recommendation 2-1: Investigators should take steps to develop accu- rate a priori estimates of rates of participant accrual, HIV incidence, product discontinuation, and participant retention, and incorporate those into the sample size calculations. As a guard against inaccu- rate estimates, investigators should consider using an “events-driven” approach. That is, investigators would analyze study results when the prespecified number of enrolled subjects have become HIV infected, rather than at prespecified calendar times. Although an events-driven approach can compensate for inaccurate assumptions about participant accrual or HIV incidence rates, investigators and sponsors must consider the cost of such a trial, the sponsor’s willing- ness to provide longer-term support, and the relevance of the trial result if the time to completion is substantially longer than originally anticipated. EFFICACY VS. EFFECTIVENESS TRIALS An initial consideration when designing a phase 2 or 3 clinical trial to evaluate a new HIV intervention is whether the objective is to assess efficacy or effectiveness. In this context, efficacy refers to the effect of the interven- tion in a tightly controlled setting, wherein investigators try to minimize factors such as imperfect adherence to the product regimen, changes in risk behavior, and changes in the risk of exposure to HIV. Effectiveness, on the

OCR for page 69
 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS other hand, refers to how well the intervention would perform in the real world, where these factors and others cannot be rigorously controlled. The quantitative connection between an intervention’s efficacy and effectiveness is discussed in greater detail later in this chapter and in Appen- dix C. Efficacy trials usually overestimate the real-world effectiveness of an intervention for the outcome in question, and are often undertaken as a “proof of concept” to determine if the intervention, if taken as designed, can lower the risk of becoming infected from an exposure to HIV. Thus, since efficacy does not necessarily imply effectiveness in a real-world setting, a successful efficacy trial would commonly be followed by an effectiveness trial (see, for example, Fleming and Richardson, 2004). If an efficacy trial fails to suggest that an intervention has a positive impact, investigators may abandon further testing. Efficacy trials can have less practical relevance in situations where adherence to a product is a challenge or includes a strong behavioral com- ponent. For example, in recent trials comparing low-carbohydrate and low- fat diets among obese participants (Foster et al., 2003; Samaha et al., 2003), the most important public health question was whether such diets could cause substantial weight loss during the trial period (effectiveness), and not whether the diets would cause weight loss if fully adhered to (efficacy). Lack of Reliable Surrogate Endpoints Effectiveness trials have historically measured disease outcomes, such as clinical improvement or survival. In the case of HIV prevention, they have measured time to HIV infection. Efficacy trials can use intermedi- ate, or surrogate, endpoints, if those endpoints are sufficiently predictive of HIV infection or another clinical endpoint, and if the full effect of the interventions on the clinical response is fully explained by the effect on the surrogate (see, for example, Prentice, 1989). Examples of surrogate markers include viral load (of HIV, HCV), tumor response (oncology), bone mineral density (fracture prevention), and serum cholesterol levels (cardiology). In HIV treatment trials, suppression of the HIV virus is such an intermedi- ate outcome. HIV treatment trials using surrogate markers do not require a follow-up trial using a clinical endpoint. In general, studies relying on intermediate endpoints require smaller sample sizes or can be completed in less time. In some HIV prevention trials that focus on behavioral interventions, investigators have used acute sexually transmitted bacterial diseases, such as gonorrhea, chlamydia, and syphilis, as markers of HIV risk. In biomedi- cal prevention trials, however, these have proven to be less predictive of HIV infection and are not considered reliable proxies. In studying HIV prevention among injecting drug users, some investiga-

OCR for page 69
 BASIC DESIGN FEATURES tors have used hepatitis B and C as markers of HIV infection (Vlahov and Junge, 1998; Dolan et al., 2003). Some HIV vaccine trials have also used a specific immunogenic effect as a surrogate endpoint. Although investigators know that a vaccine that demonstrates such an effect does not necessar- ily protect against HIV infection, a lack of immunogenicity may suggest a low protective ability. Investigators also use immunogenicity to prioritize candidate vaccines for further phase 3 testing. However, surrogate endpoints in HIV prevention trials have in general not reliably predicted clinical efficacy. For example, an HIV adenovirus vaccine produced by Merck had been shown to be immunogenic—that is, capable of inducing a significant HIV-specific cell-mediated immune response. However, the company recently terminated a phase 3 trial because of lack of evidence that the vaccine prevents HIV infection, or that it affects the viral set point of subjects who become infected (NIAID, 2007). As attractive as the notion of using surrogate markers for trial end- points is, their use is fraught with many potential difficulties. For instance, a biomarker may be a good surrogate for one type of intervention and useless for another. Surrogates need to be specific for specific interventions and endpoints. Nonetheless, many current trials offer the opportunity to test surrogates against HIV incidence endpoints, and the committee believes that this is a worthwhile secondary goal for appropriate studies. The choice of candidate surrogates must be securely anchored in the knowledge of the pathophysiology of infection and how the surrogate marker relates biologi- cally to the clinical endpoint that it is replacing. Because no one has yet identified a biological or clinical marker that can reliably serve as a surrogate endpoint for HIV infection in efficacy trials of biomedical interventions, they must rely on HIV infection as the outcome just as effectiveness trials do. This means that efficacy and effec- tiveness trials will have the same basic design, differing only in the type of study population, duration, and sample size, and perhaps in the steps study staff take to promote product adherence and counsel participants to avoid risky behavior. The next section discusses the attributes of such efficacy and effectiveness trials and their respective sizes. Attributes of Efficacy and Effectiveness Trials An efficacy trial would ideally enroll as homogeneous a population as possible, and typically include a blinded control arm and be of short duration—6 months, for example, to attempt to minimize nonadherence and dropout. The shorter duration also serves to get an answer sooner than a trial with longer-term follow-up. An effectiveness trial, in contrast, would typically enroll a more heterogeneous population and last longer, commonly 2 to 4 years. Given the differences in the length of follow-up

OCR for page 69
 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS TABLE 2-4 Anticipated Attributes of Efficacy and Effectiveness Trials Efficacy trial Effectiveness trial Duration Shorter (e.g., 6–9 mos) Longer (e.g., 2–4 years) Population More homogeneous Less homogeneous Adherence/behavior education More Less HIV incidence rate in control group Lower or higher Lower or higher Relative efficacy of intervention Greater Smaller between these two types of trials, investigators need to consider the effects of attrition and nonadherence. Because of potential nonadherence to the product and to condom use in a longer-term study, investigators might expect the magnitude of benefit of an intervention to be lower in an effectiveness trial than in an efficacy trial. Investigators would therefore normally expect the RR of a particular intervention to be as strong or stronger in an efficacy trial than in an effec- tiveness trial. However, it is not clear which design would have a larger HIV incidence rate in the control group. That is because—by selecting a population felt to be highly adherent to product—an efficacy study might also select individuals more likely to adhere to condom use and other risk- reduction measures. Table 2-4 describes some attributes of efficacy and effectiveness trials using HIV infection as the endpoint. To see how these factors affect sample size and trial duration, consider an efficacy trial (trial 1) and an effectiveness trial (trial 2), both of which use HIV infection as the endpoint. Suppose that each assigns an equal number of subjects to the intervention and the control group, and the designs differ only in the duration (D) of time each subject is followed for HIV infection, the HIV incidence rate in the control group (I), and the relative risk (RR) of intervention versus control. Let these values for the efficacy and effective- ness trials be denoted (D1, I1, RR1) and (D2, I2, RR2), respectively, and let N1 and N2 denote the corresponding sample sizes for the trials, assuming the same type I and type II errors. Then the relative sample size of the effi- cacy trial compared with the effectiveness trial can be approximated by Ratio = N1/N2 = (D2/D1) × (I2/I1) × [(1 – RR2)/(1 – RR1)]2. All other things being equal, the relative size of the efficacy trial will increase with the duration and HIV incidence rate in the effectiveness trial, and decrease with smaller RR in the effectiveness trial. Appendix C shows how nonadherence and efficacy can combine to determine effectiveness.

OCR for page 69
 BASIC DESIGN FEATURES Shorter-Term vs. Longer-Term Follow-Up Some investigators and sponsors have suggested that phase 3 trials of HIV prevention agents should be short term, with follow-up lasting only 6 to 9 months (Nunn, 2007). The rationale is that users’ adherence to the product regimen may fall with time, perhaps through fatigue. HIV incidence among trial participants could also fall, because of changes in incidence rates unrelated to the trial, or differential dropout of higher-risk individuals. While short-term trials may demonstrate proof of concept, they may have limited clinical or public health value, especially if the estimated intervention effect size is borderline and adherence to the product regimen is likely to wane over time. Because areas with high HIV incidence have limited resources, those regions may find interventions attractive only if they are clearly effective over a longer period. In that case, investigators would need to pursue a longer-term placebo-controlled trial after a positive efficacy trial. However, such a follow-up trial may pose ethical concerns about maintaining equipoise, because the intervention already would have been shown to be beneficial over the short term. Information on the longer-term effectiveness of an intervention offers several advantages. For ethical reasons, HIV prevention trials must offer risk-reduction counseling, including on condom use, to participants in both intervention and control arms. If adherence to condoms falls during a lon- ger-term trial, and the HIV infection rate rises, such a trial could be more capable of demonstrating an intervention effect. However, subjects who stop using condoms may also be less adherent to the product. And adher- ence to some products might actually increase over time as participants and their partners become more familiar with the product. Moreover, setting up and initiating a trial entails considerable costs and work, and large efficacy trials with short follow-up may cost more than smaller effectiveness trials with a longer follow-up, assuming an equal number of person-years of observation. Thus the value of short-term effi- cacy trials in the case of HIV prevention is sometimes unclear, given the substantial resource commitment their large sample size would require, and the ethical concerns about undertaking a placebo-controlled effectiveness trial if an intervention turns out to be promising in an efficacy trial. Two types of modified trial designs can provide information on both efficacy and effectiveness. The first is a way of obtaining some information on longer-term effectiveness in an efficacy trial whose main goal is to assess short-term efficacy, while the second is a way of possibly terminating a lon- ger-term effectiveness trial during an interim analysis if there is insufficient evidence of short-term efficacy. The rationale behind these designs is that a product that has efficacy might not be effective in a real-world setting,

OCR for page 69
0 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS and that a product that does not have efficacy would not be expected to be effective in a real-world setting: • Efficacy study with extended follow-up: In this design, an efficacy trial would follow all subjects for HIV infection for a specific time (e.g., 6 months) after the last subject has been enrolled. Thus, for example, if the trial takes 18 months to enroll all subjects, follow-up will range from 6 months for the last enrolled subject to 2 years for the first enrolled subject. Such a trial will have the same power as an efficacy trial comparing the 6-month cumulative HIV infection rates of the intervention and placebo groups and following all subjects for exactly 6 months. However, it will also provide some evidence of cumulative HIV incidence rates through 24 months, and therefore give some measure of the effectiveness of the inter- vention. This example is based on a passive approach for obtaining more information about longer-term effects in the sense that the maximal follow- up time will be determined by the duration of the accrual period and thus the accrual rate. An alternative is to intentionally control the accrual rate to achieve a desired maximal follow-up time. For example, if follow-up times up to 3 years is desired, then the accrual rate in the example could be chosen to require a total of 2.5 years of accrual, in which case subjects would be followed for up to 3 years. In considering the implementation of such a design, cost and practicality issues would need to be considered. • Phase  trial with stopping rules for futility: Similarly, a longer- term effectiveness trial can include an interim analysis that compares the intervention and control groups with respect to cumulative HIV incidence at some early time point, such as 6 months. This trial can terminate owing to futility if the interim data do not show a 6-month benefit of a par- ticular magnitude. We note that power of a study to assess futility will be determined by the number of outcome events (HIV infections) that have occurred by the time point (e.g., 6 months) of interest. The issue of stopping rules is complex, however. In chapter 9, the committee discusses in depth several examples of trials that were stopped for efficacy and for futility, and situations when an unplanned interim analysis might arise. Phase 2B vs. Phase 3 Trials Rather than focusing on whether to precede an effectiveness trial with an efficacy trial, investigators may want to pursue a phase 2B trial before undertaking a phase 3 effectiveness trial. Phase 2B trials tend to involve follow-up periods similar to those of phase 3 trials, but to enroll only one-quarter to one-third the number of subjects. Phase 2B trials might also allow for a larger type II (false negative) error, and thus have lower power. Phase 2B trials provide a smaller, and less resource-intensive evaluation of

OCR for page 69
 BASIC DESIGN FEATURES new interventions which—if successful—would typically lead to a larger phase 3 trial. Fleming and Richardson (2004) provide a detailed discussion of Phase 2B trials and propose that investigators consider using these to evaluate HIV microbicides, based on a four-step rule: (1) if a trial shows that a product has low efficacy, investigators would not study it further; (2) inter- mediate results would prompt investigators to consider a phase 3 trial; (3) stronger results would spur a confirmatory phase 3 trial; and (4) extremely positive results would enable investigators to submit the product for regula- tory approval without further trials. However, the use of a phase 2B design raises two concerns. First, if a phase 2B trial suggests that a microbicide provides a benefit, but the evidence falls short of that needed for regulatory approval, pursuing a placebo-controlled phase 3 trial would raise ethical concerns, as noted, especially if the trial were conducted in the same region as the phase 2B trial. Second, because of their smaller size, phase 2B trials may lack the power to assess a product’s safety and participants’ ability to tolerate it over the long term. As an alternative, a phase 3 effectiveness trial that requires interim analysis could reduce these concerns yet offer some of the efficiencies of a phase 2B trial. Such a design could include guidelines for continuing the trial if early results show that the product has adequate promise of efficacy. The trial design can also include futility criteria, which would prompt investigators to terminate it given adequate evidence that the product is not efficacious. Identifying reliable surrogate endpoint(s) for assessing the efficacy of biomedical HIV prevention products is a challenging, yet critical scien- tific goal that requires further research. At present, using an HIV infec- tion endpoint, efficacy and effectiveness trials differ primarily in duration, anticipated HIV incidence rates in the control group, and the relative risks of intervention versus control. Unless an efficacy trial is designed to shed light on longer-term effectiveness, investigators would likely need to follow it with a longer trial, which could raise ethical concerns about equipoise between an intervention and control arm. Similar concerns apply to phase 2B trials. Recommendation 2-2: Until validated surrogate endpoint(s) for HIV infection or product activity is (are) identified, investigators should use modified trial designs that can provide information on both the short- and long-term benefits of an intervention.

OCR for page 69
 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS CHOICE OF CONTROL GROUP Late-stage biomedical HIV prevention trials have relied almost exclu- sively on a superiority design, which entails comparing a new biomedical intervention to a control arm, and providing counseling and education on condom use and other risk-reduction activities to participants in both arms. Such trials aim to advance the field by assessing whether the new interven- tion is superior to the control intervention. A natural question arises as to how to select the control intervention for such trials. In many such trials, a placebo-controlled design is highly desirable to help to ensure an unbiased evaluation of the relative effects of the intervention. Although this also would apply to many HIV prevention trials of biomedical interventions, there are circumstances where the use of a blinded control group can be disadvantageous in shedding light on the effectiveness of the intervention if used in the community. The limitations arise from the possibility that people’s risk-taking behavior will depend on their knowledge of the inter- vention they are (or are not) receiving. In contrast, an unblinded trial comparing the microbicide with no biomedical intervention might provide a more realistic assessment of the impact of a microbicide when used in the community, because women’s knowledge of whether they do or do not have access to a microbicide could affect the frequency of their risk-taking behavior, and whether they use a condom. Consider a randomized trial designed to test a microbicide gel that provides all participants with the same level of counseling on condom use. One issue is whether the control group should receive only counseling on condom use (C), or both counseling and a placebo microbicide (P). That is, if M denotes the microbicide arm, should investigators randomize partici- pants to M versus P, M versus C, or M versus P versus C? The key distinc- tion between the placebo gel arm and the “condom-only” arm is that in the latter, participants know that they are not receiving the microbicide. To address this question, investigators must consider the goals of the trial as well as the possible impact of a blinded versus unblinded control group on the validity and precision of the study’s results and their interpre- tation and generalizability. Potential Advantages of Blinding Blinding treatment arms in a randomized trial is a common practice, and is aimed at preventing biases that could be caused by either the care- giver’s or the participant’s awareness of which arm he or she is in. Such awareness can affect the accuracy and uniformity with which the trial’s outcome measures are evaluated. For example, knowledge of which

OCR for page 69
 BASIC DESIGN FEATURES treatment a subject is receiving could affect the caregiver’s assessment of that subject’s health status. This is especially a concern in studies with subjective endpoints, such as trials that evaluate pain or cognition levels, or trials evaluating self-reported risky behavior, such as unprotected sexual intercourse. Such concerns are substantially lower in studies with objective end- points, such as HIV infection, which is based entirely on laboratory test- ing of blood samples. This is especially true if laboratory assessments are blinded to treatment arm (even though participants and caregivers are unblinded). However, self-reported information on other study outcomes, such as sexual risk behavior or adherence to the treatment protocol, could be affected by a subject’s knowledge of her or his treatment arm. And inves- tigators cannot easily distinguish differential reporting of such outcomes from differential behavior among study arms. Blinding can also affect study results in other ways. Specifically, when intervention arms are not blinded, caregivers might offer different levels of support, treatment, and counseling to study participants with otherwise similar risk behavior profiles, and thus affect their primary study outcomes. Failure to account for this might cause investigators to misinterpret study results, if these are based on an assumption that concomitant care is identi- cal in the study arms. The choice to blind a study or not can also affect study visits and losses to follow-up. For example, participants who know they are not receiving a new intervention might miss more study visits, or drop out of a study at a higher rate, than participants who know they are receiving the interven- tion. Such differential losses or missing data can reduce the study’s power, at a minimum, and more importantly can bias assessments of the relative efficacy of a new intervention. Thus, for HIV prevention trials, an important concern is the effect of blinding (or not) on the completeness of study visits and retention and comparable counseling. However, numerous unblinded HIV prevention trials have produced excellent and nondifferential retention rates (Guay et al., 1999; Thior et al., 2006). Potential Disadvantages of Blinding Blinding through the use of a placebo can also have disadvantages. One would occur if the placebo had a direct effect on the study outcome. For example, a placebo microbicide gel might biologically inhibit the risk of HIV infection, or it might have lubricant properties that reduce vaginal abrasion and thus the risk of becoming infected (Nuttall et al., 2007). Such effects, while beneficial for study participants randomized to a placebo arm,

OCR for page 69
 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS can lead to biased estimates of the benefits of the microbicide gel. (See, for example, Kilmarx and Paxton, 2003.) Another potential disadvantage of a blinded placebo is that a subject’s knowledge of her or his treatment arm could affect behavior, such as the frequency or types of risky behavior, in a way that more accurately reflects the behavior of individuals in the community. If so, the resulting estimates of effectiveness for a placebo control arm in a trial may not reflect the effec- tiveness if the intervention were implemented (Jones et al., 2003; Padian, 2004). For example, a recently published unblinded diaphragm trial found that women who were randomized to receive diaphragms did not experience a lower HIV infection rate than those who did not receive diaphragms—yet the former group reported much lower use of condoms (Padian et al., 2007). One view of this trial’s results is that it failed because the lack of a placebo diaphragm arm, which presumably would have led to similar condom use as the active diaphragm arm, prevented an assessment of the protective effects of the diaphragm. However, another view is that the trial results reflect what might happen if the diaphragm were introduced into the community: a possible protective effect per sexual act might not translate into a reduced overall risk of HIV infection because of lower condom use. The next section considers the potential advantages and limitations of three design strategies in the case of a microbicide gel trial. Design : Microbicide (M) vs. Placebo (P) A potential advantage of this design is that it is more likely to yield study retention rates that are similar between arms compared to design 2. One disadvantage is that investigators will not be able to determine whether the placebo has any direct effect on the risk of becoming HIV infected. Another disadvantage is that the resulting estimates of the benefit of the microbicide are less likely to reflect the actual effectiveness of the microbi- cide were it introduced into the community, than in design 2, which uses no placebo. If a trial based on design 1 had high adherence and yielded a positive result, it would establish the biological efficacy of the microbicide gel, though not its effectiveness. If the trial provided convincing evidence of little or no benefit, then investigators might assume that the microbicide would not be effective in a community setting. However, such a result is also consistent with the hypothesis that the placebo was not inert: that is, that both the placebo gel and the microbicide have a biologically inhibitory effect on the risk of HIV infection.

OCR for page 69
 BASIC DESIGN FEATURES Design : Microbicide (M) vs. Condom (C) A potential advantage of this design is that the trial results would be expected to more closely reflect the true effectiveness of the microbicide if it were introduced into the community than design 1. Another advantage is that the trial would provide information on potentially differential effects of the microbicide and condom arms on reported risky behavior. In Padian et al. (2007), the unblinded trial of diaphragm with lubricant gel and con- dom provision versus condom provision only, self-reported condom use was significantly higher on the condom-only arm. However, a potential disad- vantage of this design is that a subject’s knowledge of his or her treatment arm could lead to differential rates of study retention or missed visits. Suppose that such a trial led to comparable retention rates in the M and C arms. If the trial results were positive, the study would provide more direct evidence than design 1 that the microbicide would be effective if introduced into the community. If trial results were negative, introducing the microbicide into the general study population could not be justified, although this might not rule out evaluation of the microbicide in a differ- ent population. For example, if the trial demonstrated disinhibition, also known as risk compensation (Cassell et al., 2006), (that is, that participants will engage in more risky behavior because they believe they are protected by the test intervention) in the M arm, this may have caused the lack of an apparent benefit of the microbicide, and investigators might obtain different results in a different subject population where disinhibition is less likely. Design : Microbicide (M) vs. Placebo (P) vs. Condom (C) This three-arm design enjoys the advantages of both designs 1 and 2 and also avoids the disadvantages of each, except for the possible dif- ferential rates of retention between the M/P and C arms (Fleming and Richardson, 2004). Comparison of the P and C arms would shed light on the effects on behavior among participants who know they are not receiving a microbi- cide versus those who know they could be receiving a microbicide and pos- sibly on any direct effects of the placebo. Comparisons of M and P would be expected to reflect the direct effects of the microbicide gel (relative to placebo gel) on susceptibility to HIV infection. If the P and C groups had similar HIV incidence rates and behav- iors during follow-up, future studies in the same population may not require both control groups. A comparison of the relative risks of M:P and M:C would potentially reflect the efficacy versus effectiveness of the intervention. The ongoing HPTN 035 microbicide trial employs this design (but with

OCR for page 69
 METHODOLOGICAL CHALLENGES IN HIV PREVENTION TRIALS two microbicide arms), and will hopefully shed some initial light on these issues. However, the committee cautions that the results of HPTN 035 regarding both the direct efficacy of the product and its impact on behavior would not necessarily generalize to other populations. That is because the impact on behavior of knowing one is not receiving a microbicide versus knowing one might be receiving a microbicide may differ across popula- tions. Thus this trial would not necessarily eliminate the need for other trials with multiple control groups. Although the M versus P versus C design is more expensive because of the increased sample size and complexity, the benefits of successfully completing one or more such trials could have important potential advan- tages. As noted, because such a trial could assess the impact of the placebo, the potential benefit of the new intervention would be clearer. Given that so little is understood about the interplay between intervention and risk behavior, and given the strong impact of adherence and risk behavior on the ultimate effectiveness of an intervention, the committee believes that the potential economic disadvantage does not outweigh the substantial potential benefits of a dual-control design. Recommendation 2-3: Sponsors, investigators, and regulatory agencies should consider using both blinded and unblinded control groups in future trials to more fully understand the effects of the intervention on HIV infection risk and behavior. REFERENCES Bailey, R. C., S. Moses, C. B. Parker, K. Agot, I. Maclean, J. N. Krieger, C. F. Williams, R. T. Campbell, and J. O. Ndinya-Achola. 2007. Male circumcision for HIV pre- vention in young men in Kisumu, Kenya: A randomised controlled trial. Lancet 369(9562):643-656. Brittain, G. P., C. K. Rostron, D. B. Morton, and J. E. Rees. 1989. The use of a biological adhesive to achieve sutureless epikeratophakia. Eye 3(Pt 1):56-63. Cassell, M. M., D. T. Halperin, J. D. Shelton, and D. Stanton. 2006. Risk compensa- tion: The Achilles’ heel of innovations in HIV prevention? British Medical Journal 332(7541):605-607. Dolan, K., S. Rutter, and A. D. Wodak. 2003. Prison-based syringe exchange programmes: A review of international research and development. Addiction 98(2):153-158. Fleming, T. R., and B. A. Richardson. 2004. Some design issues in trials of microbicides for the prevention of HIV infection. Journal of Infectious Diseases 190(4):666-674. Foster, G. D., H. R. Wyatt, J. O. Hill, B. G. McGuckin, C. Brill, B. S. Mohammed, P. O. Szapary, D. J. Rader, J. S. Edman, and S. Klein. 2003. A randomized trial of a low-carbohydrate diet for obesity. New England Journal of Medicine 348(21):2082-2090. Freedman, L. S. 1990. The effect of partial noncompliance on the power of a clinical trial. Controlled Clinical Trials 11(3):157-168.

OCR for page 69
 BASIC DESIGN FEATURES Guay, L. A., P. Musoke, T. Fleming, D. Bagenda, M. Allen, C. Nakabiito, J. Sherman, P. Bakaki, C. Ducar, M. Deseyve, L. Emel, M. Mirochnick, M. G. Fowler, L. Mofenson, P. Miotti, K. Dransfield, D. Bray, F. Mmiro, and J. B. Jackson. 1999. Intrapartum and neonatal single-dose nevirapine compared with zidovudine for prevention of mother-to- child transmission of HIV-1 in Kampala, Uganda: HIVNET 012 randomised trial. Lancet 354(9181):795-802. Jo, B. 2002. Statistical power in randomized intervention studies with noncompliance. Psy- chological Methods 7(2):178-193. Jones, H., J. van de Wijgert, and E. Kelvin. 2003. The need for a “condoms-only” control group in microbicide trials. Epidemiology 14(4):505; author reply 505-506. Kilmarx, P. H., and L. Paxton. 2003. Need for a true placebo for vaginal microbicide efficacy trials. Lancet 361(9359):785-786; author reply 786. NIAID (National Institute of Allergy and Infectious Diseases). 2007. Statement: Immuni- zations are discontinued in two HIV vaccine trials. http://www3.niaid.nih.gov/news/ newsreleases/2007/step_statement.htm (accessed December 15, 2007). Nunn, A. 2007. Issues in microbicide trial design, monitoring, and analysis. Paper read at the second public meeting for the Committee on Methodological Challenges in HIV Preven- tion Trials, April 19, London, UK. Nuttall, J., J. Romano, K. Douville, C. Galbreath, A. Nel, W. Heyward, M. Mitchnick, S. Walker, and Z. Rosenberg. 2007. The future of HIV prevention: Prospects for an effec- tive anti-HIV microbicide. Infectious Disease Clinics of North America 21(1):219-239. Padian, N., A. van der Straten, G. Ramjee, T. Chipato, G. de Bruyn, K. Blanchard, S. Shiboski, E. Montgomery, H. Fancher, H. Cheng, M. Rosenblum, M. van der Loan, N. Jewell, J. McIntyre, and the MIRA Team. 2007. Diaphragm and lubricant gel for prevention of HIV acquisition in southern African women: A randomised controlled trial. Lancet 370:251-261. Padian, N. S. 2004. Evidence-based prevention: Increasing the efficiency of HIV intervention trials. Journal of Infectious Diseases 190(4):663-665. Prentice, R. L. 1989. Surrogate endpoints in clinical trials: Definition and operational criteria. Statistics in Medicine 8(4):431-440. Samaha, F. F., N. Iqbal, P. Seshadri, K. L. Chicano, D. A. Daily, J. McGrory, T. Williams, M. Williams, E. J. Gracely, and L. Stern. 2003. A low-carbohydrate as compared with a low-fat diet in severe obesity. New England Journal of Medicine 348(21):2074-2081. Thior, I., S. Lockman, L. M. Smeaton, R. L. Shapiro, C. Wester, S. J. Heymann, P. B. Gilbert, L. Stevens, T. Peter, S. Kim, E. van Widenfelt, C. Moffat, P. Ndase, P. Arimi, P. Kebaabetswe, P. Mazonde, J. Makhema, K. McIntosh, V. Novitsky, T. H. Lee, R. Marlink, S. Lagakos, and M. Essex. 2006. Breastfeeding plus infant zidovudine prophylaxis for 6 months vs. formula feeding plus infant zidovudine for 1 month to reduce mother-to-child HIV transmission in Botswana: A randomized trial. The Mashi Study. Journal of the American Medical Association 296(7):794-805. Vlahov, D., and B. Junge. 1998. The role of needle exchange programs in HIV prevention. Public Health Reports 113(Suppl 1):75-80. Zelen, M. 1988. Statistical issues in the planning of prevention studies. Cancer Investigation 6(5):615-620.