Read "Small Clinical Trials: Issues and Challenges" at NAP.edu

Page 20 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 20

2 Design of Small Clinical Trials

The design and conduct of any type of clinical trial require three considerations: first, the study should examine valuable and important biomedical research questions; second, it must be based on a rigorous methodology that can answer a specific research question being asked; and third, it must be based on a set of ethical considerations, adherence to which minimizes the risks to the study participants (Sutherland, Meslin, and Till, 1994). The choice of an appropriate study design depends on a number of considerations, including:

the ability of the study design to answer the primary research question;
whether the trial is studying a potential new treatment for a condition for which an established, effective treatment already exists;
whether the disease for which a new treatment is sought is severe or life-threatening;
the probability and magnitude of risk to the participants;
the probability and magnitude of likely benefit to the participants;
the population to be studied—its size, availability, and accessibility; and
how the data will be used (e.g., to initiate treatment or as preliminary data for a larger trial).

Page 21 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 21

Because the choice of a study design for any particular trial will depend on these and other factors, no general prescription can be offered for the design of clinical trials. However, certain key issues are raised when randomized clinical trials (RCTs) with adequate statistical power are not feasible and when studies with smaller populations must be considered. The utility of such studies may be diminished, but not completely lost, and in other ways may be enhanced.

To understand what is lost or gained in the design and conduct of studies with very small numbers of participants, it is important to first consider the basic tenets of clinical trial design ( Box 2-1).

KEY CONCEPTS IN CLINICAL TRIAL DESIGN

Judgments about the effectiveness of a given intervention ultimately rest on an interpretation of the strength of the evidence arising from the data collected. In general, the more controlled the trial, the stronger is the evidence.

The study designs for clinical trials can take several forms, most of which are based on an assumption of accessible sample populations. Clinical trials of efficacy ask whether the experimental treatment works under ideal condi-

BOX 2-1

Important Concepts in Clinical Trial Design

Does the trial measure efficacy or effectiveness?

A method of reducing bias (randomization and masking [blinding])

Inclusion of control groups

Placebo concurrent controls

Active treatment concurrent controls (superiority versus equivalence trial)

No-treatment concurrent controls

Dose-comparison concurrent controls

External controls (historical or retrospective controls)

Use of masking (blinding) or an open-label trial

Double-blind trial

Single-blind trial

Randomization

Use of randomized versus nonrandomized controls

Outcomes (endpoints) to be measured: credible, validated, and responsive to change

Sample size and statistical power

Significance tests to be used

Page 22 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 22

tions. In contrast, clinical trials of effectiveness ask whether the experimental treatment works under ordinary circumstances. Often, trials of efficacy are not as sensitive to issues of access to care, the generalizability of the results from a study with highly selective sample of patients and physicians, and the level of adherence to treatment regimens. Thus, when a trial of efficacy is done with a small sample of patients, it is not clear whether the experimental intervention will be effective when a broader range of providers and patients use the intervention. On the other hand, trials of effectiveness can be problematic if they produce a negative result, in which case it will be unclear whether the experimental intervention would fail under any circumstances. Thus, the issue of what is preferred in a small clinical study—a trial of efficacy or effectiveness—is an important consideration.

In the United States, the Food and Drug Administration (FDA) oversees the regulation and approval of drugs, biologics, and medical devices. Its review and approval processes affect the design and conduct of most new clinical trials. Preclinical testing of an experimental intervention is performed before investigators initiate a clinical trial. These studies are carried out in the laboratory and in studies with animals to provide preliminary evidence that the experimental intervention will be safe and effective for humans. FDA requires preclinical testing before clinical trials can be started. Safety information from preclinical testing is used to support a request to FDA to begin testing the experimental intervention in studies with humans.

Clinical trials are usually classified into four phases. Phase I trials are the earliest-stage clinical trials used to study an experimental drug in humans, are typically small (less than 100 participants), and are often used to determine the toxicity and maximum safe dose of a new drug. They provide an initial evaluation of a drug's safety and pharmacokinetics. Such studies also usually test various doses of the drug to obtain an indication of the appropriate dose to be used in later studies. Phase I trials are commonly conducted with nondiseased individuals (healthy volunteers). Some phase I trials, for example, those of studies of treatments for cancer, are performed with individuals with advanced disease who have failed all other standard treatments (Heyd and Carlin, 1999).

Phase II trials are often aimed at gathering preliminary data on whether a drug has clinical efficacy and usually involve 100 to 300 participants. Frequently, phase II trials are used to determine the efficacy and safety of an intervention in participants with the disease for which a new intervention is being developed.

Phase III trials are advanced-stage clinical trials designed to show con-

Page 23 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 23

clusively how well a drug works. Phase III trials are usually larger, frequently multi-institutional studies, and typically involve from a hundred to thousands of participants. They are comparative in nature, with participants usually assigned by chance to at least two arms, one of which serves as a control or a reference arm and one or more of which involve new interventions. Phase III trials generally measure whether a new intervention extends survival, or improves the health of participants receiving the intervention and has fewer side effects.

Some phase II and phase III trials are designed as pivotal trials (sometimes also called confirmatory trials), which are adequately controlled trials in which the hypotheses are stated in advance and evaluated. The goal of a pivotal trial is to attempt to eliminate systematic biases and increase the statistical power of a trial. Pivotal trials are intended to provide firm evidence of safety and efficacy.

Occasionally, FDA requires phase IV trials, usually performed after a new drug or biologic has been approved for use. These trials are post-marketing surveillance studies aimed at obtaining additional information about the risks, benefits, and optimal use of an intervention. For example, a phase IV trial may be required by FDA to study the effects of an intervention in a new patient population or for a stage of disease different from that for which it was originally tested. Phase IV trials are also used to assess the long-term effects of an intervention and to reveal rare but serious side effects.

One criticism of the classification of clinical trials presented above is that it focuses on the requirements for the regulation of pharmaceuticals, leaving out the many other medical products that FDA regulates. For example, new heart valves are evaluated by FDA on the basis of their ability to meet predetermined operating performance characteristics. Another device is the intraocular lens whose performance must be satisfied in a prespecified grid. Medical device studies, however, rely on a great deal of information about the behavior of the control group that often cannot be obtained or that is very difficult to obtain in small clinical trials because of the small number or lack of control participants.

A much more inclusive and general approach that subsumes the four phases of clinical trials is put forth by Piantadosi (1997), who defines the four phases as (1) early-development studies (testing the treatment mechanism), (2) middle-development studies (treatment tolerability), (3) comparative (pivotal, confirmatory) studies, and (4) late-development studies (extended safety or postmarketing studies). This approach is more inclusive

Page 24 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 24

than trials of pharmaceuticals; it includes trials of vaccines, biological and gene therapies, screening devices, medical devices, and surgical interventions.

The ethical conduct of a clinical study of the benefits of an intervention requires that it begin in a state of equipoise. Equipoise is defined as the point at which a rational, informed person—whether patient, provider, or researcher—has no preference between two (or more) available treatments (Freedman, 1987; Lilford and Jackson, 1995). When used in the context of research, equipoise describes a state of genuine uncertainty about whether the experimental intervention offers greater benefit or harm than the control intervention. Equipoise is advocated as a means of achieving high scientific and ethical standards in randomized trials (Alderson, 1996). True equipoise might be more of a challenge in small clinical trials, because the degree of uncertainty might be diminished by the nature of the disorder, the lack of real choices for treatment, or insufficient data to make a judgment about the risks of one treatment arm over another.

A primary purpose of many clinical trials is evaluation of the efficacy of an experimental intervention. In a well-designed trial, the data that are collected and the observations that are made will eventually be used to overturn the equipoise. At the end of a trial, when it is determined whether an experimental intervention has efficacy, the state of clinical equipoise has been eliminated. Central principles in proving efficacy, and thereby eliminating equipoise, are avoiding bias and establishing statistical significance. This is ideally done through the use of controls, randomization, blinding of the study, credible and validated outcomes responsive to small changes, and a sufficient sample size. In some trials, including small clinical studies, the elimination of equipoise in such a straightforward manner might be difficult. Instead, estimation of a treatment effect as precisely as necessary may be sufficient to distinguish the effect from zero. It is a more nuanced approach, but one that should be considered in the study design.

Adherence to an ethical process, whereby risks are minimized and voluntary informed consent is obtained, is essential to any research involving humans and may be particularly acute in small clinical trials, in which the sample population might be easily identified and potentially more vulnerable. Study designs that incorporate an ethical process may help in reducing concerns about some of problems in design and interpretation that naturally accompany small clinical trials.

Page 25 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 25

Reducing Bias

Bias in clinical trials is the potential of any aspects of the design, conduct, analysis, or interpretation of the results of a trial to lead to conclusions about the effects of an intervention that are systematically different from the truth (Pocock, 1984). It is both a scientific and an ethical issue. It is relatively easy to identify potential sources of bias in clinical trials, but investigators have a limited ability to effectively remove the effects of bias. It is often difficult to even determine the net direction and effect of bias on the study results. Randomization and masking (blinding) are the two techniques generally used to minimize bias and to maximize the probability that the test intervention and control groups are similar at the start of the study and are treated similarly throughout its course (Pocock, 1984). Clinical trials with randomized controls and with blinding, when practical and appropriate, represent the standard for the evaluation of therapeutic interventions.

Improper randomization or imperfect masking may result in bias. However, bias may work in any direction (Hauck and Anderson, 1999). In addition, the data for participants who withdraw or are lost from the trial can bias the results.

Alternative Types of Control Groups

A control group in a clinical trial is a group of individuals used as a comparison for a group of participants who receive the experimental treatment. The main purpose of a control group is to permit investigators to determine whether an observed effect is truly caused by the experimental intervention being tested or by other factors, such as the natural progression of the disease, observer or participant expectations, or other treatments (Pocock, 1996). The experience of the control group lets the investigator know what would have happened to study participants if they had not received the test intervention or what would have happened with a different treatment known to be effective. Thus, the control group serves as a baseline.

There are numerous types of control groups, some of which can be used in small clinical trials. FDA classifies clinical trial control groups into five types: placebo concurrent controls, active-treatment concurrent controls, no-treatment concurrent controls, dose-comparison concurrent controls, and external controls (Food and Drug Administration, 1999). Each type of control group has its strengths and weaknesses, depending on the scientific question being asked, the intervention being tested, and the group of participants involved.

Page 26 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 26

In a trial with placebo concurrent controls, the experimental intervention is compared with intervention with a placebo. Participants are randomized to receive either the new intervention or a placebo. Most placebo-controlled trials are also double blind, so that neither the participants nor the physician, investigator, or evaluator knows who is assigned to the placebo group and who will receive the experimental intervention. Placebo-controlled trials also allow a distinction between adverse events due to the intervention and those due to the underlying disease or other potential interference, if they occur sufficiently frequently to be detected with the available sample size. It is generally accepted that a placebo-controlled trial would not be ethical if an established, effective treatment that is known to prevent serious harm, such as death or irreversible injury, is available for the condition being studied (World Medical Association, 1964). There may be some exceptions, however, such as cases in which the established, effective treatment does not work in certain populations or it has such adverse effects that patients refuse therapy. The most recent version of the Declaration of Helsinki (October 2000 [World Medical Association, 2000]) argues that use of a placebo is unethical regardless of the lack of severity of the condition and regardless of whether the best possible treatment is available in the setting or location in which the trial is being conducted. The benefits, risks, burdens, and effectiveness of a new method should be tested against those of the best current prophylactic, diagnostic, and therapeutic methods. At present, many U.S. scientists (including those at FDA) disagree with that point of view. The arguments are complex and need additional discussion and time before a consensus can be achieved if this new direction or another one similar to it is to replace the previous recommendation.

Although placebos are still the most common control used in pharmaceutical trials, it is increasingly common to compare an experimental intervention with an existing established, effective treatment. Active-treatment concurrent control trials are extremely useful in cases in which it would not be ethical to give participants a placebo because doing so would pose undue risk to their health or well being. In an active-control study, participants are randomly assigned to the experimental intervention or to an alternative therapy, the active-control treatment. Such trials are usually double blind, but this is not always possible. For example, many oncology studies are considered impossible to blind because of different regimens, different routes of administration, and different toxicities (Heyd and Carlin, 1999). Despite the best intentions, some treatments have unintended effects that are so specific that their occurrence will inevitably identify the treatment received to both the patient and the medical staff. It is particularly important to do

Page 27 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 27

everything possible to have blinded interpretation of outcome variables or critical endpoints when the type of treatment is obvious. In a study in which an active control is used, it may be difficult to determine whether any of the treatments has an effect unless the effects of the treatments are obvious or a placebo control is included, or a placebo-controlled trial has previously demonstrated the efficacy of the active control.

Active treatment-controlled trials can take two forms: a superiority trial, in which the new drug is evaluated to determine if it is superior to the active control, and an equivalence trial (a noninferiority trial), in which the new drug is tested to determine if it is equivalent to but not inferior to the active control (Hauck and Anderson, 1999). Equivalence trials are designed to show that the new intervention is as effective or nearly as effective as the established effective treatment. For diseases for which an established, effective treatment is available and in use, a common design randomizes participants to receive either an experimental intervention or the established, effective treatment. It is not scientifically possible to prove that two different interventions are exactly equivalent, only that they are nearly equivalent.

In a trial with no-treatment concurrent controls, a group receiving the experimental intervention is compared with a group not receiving the treatment or placebo. The randomized no-treatment control trial is similar to the placebo-controlled trial. However, since it often cannot be fully blinded, several aspects of the trial may be affected, including retention of participants, patient management, and all aspects of observation (Food and Drug Administration, 1999). A no-treatment concurrent control trial is usually used when blinding is not feasible, such as when a sham surgery would have to be used or when the side effects of the experimental intervention are obvious. No-treatment concurrent control trials can also be used when the effects of the treatment are obvious and there is a small placebo effect. To reduce bias when a no-treatment control is used, it is desirable that those responsible for clinical assessment remain blinded.

In a dose-comparison concurrent control trial, participants are assigned to one of several dose groups so that the effects of different doses of the test drug (dose-response) can be compared. Most dose-response-controlled trials are randomized and double blind. They may include a placebo group or an active control group or both. For example, it is not uncommon to show no difference between doses in a dose-response study. Unless the action of the drug is obvious, inclusion of a placebo group is extremely useful to determine if the drug being tested has no effect at all or a constant positive effect above the minimum dose.

There are several advantages to using a dose-response control instead of

Page 28 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 28

a placebo control. When an experimental intervention has pharmacological effects that could break the blinding, it may be easier to preserve blinding in a dose-response study than in a placebo-controlled trial (Food and Drug Administration, 1999). Also, if the optimally safe and effective dose of an experimental intervention is not known, it may be more useful to study a range of doses than to choose a single dose that may be suboptimal or toxic (Pocock, 1996). Sometimes the optimal dose of a drug has unacceptable toxicity and a lower dose—even though it is not optimal for the treatment of the disease—is safer. In this case, a dose-response-controlled trial can be used to optimize the effective dose while minimizing the concomitant toxicity. However, the same ethical issues related to withholding an established, effective treatment from participants in placebo-controlled trials are relevant in a dose-response study (Clark and Leaverton, 1994).

In an external control trial, participants receiving the intervention being tested are compared with a group of individuals who are separate from the population tested in the trial. The most common type of external control is a historical control (sometimes called a retrospective control) (Gehan, 1982). Individuals receiving the experimental intervention are compared with a group of individuals tested at an earlier time. For example, the results of a prior clinical trial published in the medical literature may serve as a historical control. The major problem with historical controls is that one cannot ensure that the comparison is fair because of the variability in patient selection and the experimental environment. If historical controls are obtained from a previous trial conducted in the same environment or by the same investigators, there is a greater chance of reducing the potential bias (Pocock, 1984). Studies have shown that externally controlled trials tend to overestimate the efficacies of experimental treatments (Sacks, Chalmers, and Smith, 1982), although one example has found the treatment effect to be underestimated (Farewell and D'Angio, 1981). Therefore, when selecting an external control, it is extremely important to try to control for these biases by selecting the control group before testing of the experimental intervention and ensuring that the control group is similar to the experimental group in as many ways as possible.

Trials with external controls sometimes compare the group receiving the experimental intervention with a group tested during the same time period but in another setting. A variation of an externally controlled trial is a baseline-controlled trial (e.g., a before-or-after trial). In a baseline-controlled trial, the health condition of the individuals before they received the experimental intervention is compared with their condition after they have received the intervention.

Page 29 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 29

It is increasingly common for studies to have more than one type of control group, for example, both an active control and a placebo control. In those trials the placebo control serves as an internal control to provide evidence that the active control had an effect. Some trials compare several doses of a test drug with several doses of an active control drug, all of which may then be compared with a placebo.

In some instances, the only practical way to design a clinical trial is as an uncontrolled trial. Uncontrolled trials are usually used to test new experimental interventions for diseases for which no established, effective treatments are available and the prognosis is universally poor without therapy. In uncontrolled trials, there is no control group for comparison, and it is not possible to use blinding and randomization to minimize bias. Uncontrolled trials are similar to externally controlled trials, in the sense that the outcomes for research participants receiving the experimental intervention are compared with the outcomes before the availability of the intervention. Therefore, the scientific grounds for the experimental intervention must be strong enough and its effects must be obvious enough for the positive results of an uncontrolled trial to be accepted. History is replete with examples of failed uncontrolled trials, such as those for the drug laetrile and the anticancer agent interferon (Pocock, 1984).

Matching and Stratification

In many cases investigators may be faced with a situation in which they have a potentially large historical control sample that they want to compare with a small experimental sample in terms of one or more endpoints. This is typically a problem in observational studies in which the individuals have not been randomized to the control and experimental groups. The question is, how does one control for the bias inherent in the observational nature of these data? Perhaps the experimental participants have in some way been self-selected for their illness or the intervention that they have received. This is not a new issue. In fact, it is closely related to statistical thinking and research on analysis of observational data and causal inference. For example, as early as 1968, William G. Cochran considered the use of stratification and subclassification as a tool for removing bias in observational studies. In a now classic example, Cochran examined the relationship between mortality and smoking using data from a large medical database (Cochran, 1968). The first row of Table 2-1 shows that cigarette smoking is unrelated to mortality, but pipe smoking appears to be quite lethal. The result of this early datamining exercise could have easily misled researchers for some time at the

Page 30 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 30

TABLE 2-1 Smoking and Mortality

	Mortality (%) per 1,000 Person-Years
Stratification or subclassification	Nonsmokers	Cigarette Smokers	Pipe and Cigar Smokers
One (all ages in database)	13.5	13.5	17.4
Two	13.5	16.4	14.9
Three	13.5	17.7	14.2
Ten	13.5	21.2	13.7

early stages of scientific discovery. It turns out that, at least at the time that these data were collected, pipe smokers were on average much older than cigarette smokers, hence the false association with an increased rate of mortality in the non-stratified group. Cochran (1968) illustrated the effect that stratification (i.e., by age) has on the direction and ultimate interpretation of the results, revealing the association between cigarette smoking and mortality ( Table 2-1).

SOURCE: Cochran (1968).

It might be argued that a good data analyst would never have made this mistake because such an analyst would have tested for relevant interactions with important variables such as age. However, the simple statistical solution to this problem can also be misleading in an analysis of observational data. For example, nothing in the statistical output alerts the analyst to a potential nonoverlap in the marginal distributions. An investigator may be comparing 70-year-old smokers with 40-year-old nonsmokers, whereas traditional statistical approaches assume that the groups have the same covariate distributions and the statistical analyses are often limited to linear adjustments and extrapolation. Cochran illustrated that some statistical approaches (e.g., stratification or subclassification) produced more robust solutions when they were applied to naturalistic data than when they were applied to other types of data. Rosenbaum and Rubin (1983) extended the notion of subclassification to the multivariate case (i.e., more than one stratification variable) by introducing the propensity score. Propensity score matching allows the matching of cases and controls in terms of their propensities or probabilities of receiving the intervention on the basis of a number of potentially confounding variables. The result is a matched set of cases and controls that are, in terms of probability, equally likely to have received the treatment. The limitation is that the results from such a comparison will be

Page 31 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 31

less generalizable than the results of a randomized study, in which each individual in the total sample has the same likelihood of being a case or a control.

In randomized experiments, ignoring important covariates increases the standard errors of the estimates. By contrast, in observational studies bias can result and the standard errors can be underestimated, leading to an opportunity for a chance association and potentially misleading results. Such problems become more complex as the number of potential outcome variables increase beyond one.

Masking (Blinding)

Investigators in clinical trials use the method of masking (or blinding), in which neither the participant nor the physician, investigator, or evaluator knows who is assigned to the placebo or control group and who will receive the experimental intervention. The purpose of masking is to minimize the occurrences of conscious and unconscious biases in the conduct of a clinical trial and in the interpretation of its results (Pocock, 1984). The knowledge of whether a participant is receiving the intervention under study or is in the control group may have an effect on several aspects of a study, including the recruitment and allocation of participants, their subsequent care, the attitudes of the study participants toward the interventions, the assessment of outcomes, the handling of withdrawals, and the exclusion of data from analysis. The essential aim of masking is to prevent identification of the interventions that individuals are receiving until all opportunities for biases have passed (Pocock, 1984). Many randomized trials that have not used appropriate levels of masking show larger treatment effects than blinded studies (Day and Altman, 2000).

In a double-blind trial, neither the participants nor the research or medical staff responsible for the management or clinical evaluation of the individuals knows who is receiving the experimental intervention and who is in the control group. To achieve this, the interventions being compared during the trial must be disguised so that they cannot be distinguished in any way (e.g., by formulation, appearance, or taste) by the research participants or the investigators. Double-blind trials are thought to produce more objective results, because the expectations of the investigators and participants about the experimental intervention do not affect the outcome of the trial.

Although a double-blind study is ideal for the minimization of bias in clinical trials, use of such a study design may not always be feasible. The interventions may be so different that it is not possible to disguise one from

Page 32 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 32

the other, for example, surgery versus drug therapy. If sham surgery would be necessary to maintain blinding, ethical problems associated with the use of sham surgery may proscribe the use of a double-blind design. Two drugs may have different forms (e.g., an intravenously administered form versus a tablet form) that cannot be changed without changing the properties of the drugs. One way to design a double-blind trial in this instance is to use a double-dummy technique (e.g., the use of two placebos to disguise which drug the participants are receiving).

An alternative design when a double-blind trial is not feasible is the single-blind trial. In a single blind trial the investigators and their colleagues are aware of the intervention but the research participant is not. When blinding is not feasible, an open-label trial, in which the identity of the intervention is known to both the investigator and the participants, is used. One way to reduce bias in single blind and open-label trials is for those who conduct all clinical assessments to remain blinded to the assignment of interventions. In single-blind or open-label trials, it is important to place extra emphasis on the minimization of the various known sources of bias as much as possible.

Randomization

Randomization is the process of assigning participants to intervention regimens by using a mechanism of allocation by chance. Random allocation for the comparison of different interventions has been a mainstay of experimental designs since the pioneering work of Ronald A. Fisher. Fisher conducted randomized experiments in agriculture in which the experimental units were plots of land to which various crops and fertilizers were assigned in a random arrangement (Fisher, 1935). Randomization guards against the use of judgment or systematic arrangements that would lead to biased results. Randomization introduces a deliberate element of chance into the assignment of interventions to participants and therefore is intended to provide a sound statistical basis for the evaluation of the effects of the intervention (Pocock, 1984). In clinical research, randomization protects against selection bias in treatment assignment and minimizes the differences among groups by optimizing the likelihood of equally distributing people with particular characteristics to the intervention and control arms of a trial. In randomized experiments, ignoring important covariates, which can lead to differences between the groups, simply increases the standard errors; however, in observational studies, bias can result and the standard errors are underestimated.

Page 33 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 33

There are several different randomization methods (Friedman, Furberg, and DeMets, 1996). Some of these procedures are designed to ensure balance among intervention groups with respect to important prognostic factors, and thus, the probability of assignment to a particular intervention may change over the course of the trial. Thus, randomization does not always imply that an individual participant has a 50 percent chance of being assigned to a particular intervention.

Clinical trials can use either randomized controls or nonrandomized controls. In a trial with nonrandomized controls, the choice of intervention group and control group is decided deliberately. For example, patients with a specific disease characteristic are assigned to the experimental intervention, whereas those with another disease characteristic are assigned to the control arm. On scientific grounds it is easy to conclude that the use of a randomized control group is always preferred. The consensus view among clinical investigators is that, in general, the use of nonrandomized controls can result in biased and unreliable results (Pocock, 1984). Randomization in combination with masking helps to avoid possible bias in the selection of participants, their assignment to an intervention or control, and the analysis of their response to the intervention.

Outcomes

The health outcomes assessed are pivotal for both the scientific and substantive credibilities of all trials—and are even more pivotal for small trials. The selection of outcomes should meet the guidelines for validity (Tugwell and Bombardier, 1982). In psychology, the concepts of validity and reliability have been developed with the view that measurement is mainly done to discriminate between states and to prognosticate from a single measurement. For example, an intelligence test can be administered to children at the end of their primary school years to suggest the needed level of secondary education. In clinical trials, however, measurement of change (e.g., to monitor the effect of treatment) is the objective. Thus, the concept of responsiveness or sensitivity to change becomes important, but its nomenclature and methodology have not been well developed. In the selection of outcome measures, validity is not the only issue—feasibility also determines which of the valid outcome measures can actually be applied. The most important criteria for selecting an endpoint include truth, discrimination and feasibility (Boers, Brooks, Strand, et al., 1998, 1999).

Page 34 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 34

Truth. Truth captures issues of fact, content, construct, and criterion validity. For example, is the measure truthful, does it measure what is intended? Is the result unbiased and relevant?
Discrimination. Discrimination captures issues of reliability and responsiveness or sensitivity to change. For example, does the measure discriminate between situations of interest? The situations can be states at one time (for classification or prognosis) or states at different times (to measure change).
Feasibility. Feasibility captures an essential element in the selection of measures, one that may be decisive in determining a measure's success. For example, can the measure be applied easily, given constraints of time, money, and interpretability?

Subject Populations

Any clinical trial design requires precision in the process by which participants are determined to be eligible for inclusion. The objective is to ensure that participants in a clinical trial are representative of some future class of patients or individuals to whom the trial's findings might be applied (Pocock, 1984). In the early phases of clinical trial development, research participants are often selected from a small subgroup of the population in which the intervention might eventually be used. This is done to maximize the chance of observing the specific clinical effects of interest. In these early stages it is sometimes necessary to compromise and study a somewhat less representative group (Pocock, 1984). Similarly, preliminary data collected from one population (e.g., from studies of bone mineral density loss in ground-based study participants) may not be generalizable to a particular target population of interest (astronauts).

Sample Size and Statistical Power

Numerous methods and statistical models, often called “power calculations,” have been developed to calculate sample sizes (Kraemer and Thiemann, 1987) (see also Chapter 3). A standard approach asks five questions:

1. What is the main purpose of the trial?

2. What is the principal method of assessing patient outcomes?

3. How will the data be analyzed to detect treatment differences?

4. What results does one anticipate with standard treatment?

Page 35 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 35

5. How small a treatment difference is it important to detect, and with what degree of certainty should that treatment difference be demonstrated?

Statistical methods can then be developed around qualitative or quantitative outcomes. A critical aspect of trial design is to first make use of statistical methods to determine the population size needed to determine the feasibility of the clinical trial. The number of participants in a clinical trial should always be large enough to provide a sufficiently precise answer to the question posed, but it should also be the minimum necessary to achieve this aim.

A trial with only a small number of participants carries a considerable risk of failing to demonstrate a treatment difference when one is really present (Type II error) (see the Glossary for explanations of Type I and Type II errors). In general, small studies are more prone to variability and thus are likely to be able to detect only large intervention effects with adequate statistical power.

Components of Variance

Variance is a measure of the dispersion or variation of data within a population distribution. In the example of the effects of microgravity on bone mineral density loss during space travel (see Box 1-2), there is a tendency to assume that the astronaut is the unit of analysis and hence to focus on components of variance across astronauts. However, given that astronauts comprise a small group of individuals and do not represent a larger population, there is a great likelihood that the data distribution will be less of a Gaussian (or a “normal”) distribution. In this case, it becomes important to consider the other components of variance in addition to the among-person variance.

In a study of bone mineral density loss among astronauts, the components of variance may include:

1. variation in bone mineral density across time for a single astronaut on Earth or in microgravity;

2. differences in bone mineral density for that astronaut on Earth and after a fixed period of time in microgravity; and

3. differences in bone mineral density among astronauts both on Earth and in microgravity.

The goal would be to characterize changes for an individual astronaut

Page 36 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 36

or a small group of astronauts, even though they do not perfectly represent a large population. It is reasonable to focus on true trends for a particular astronaut over time, which requires careful repeated measurements over time and which makes relevant the component of variance within a person rather than the component of variance among persons.

Significance Tests

Significance tests (e.g., chi-square and t tests) are used to determine the chances of finding a treatment difference as large as the effect observed by chance alone; that is, how strong is the evidence for a genuine superiority of one intervention over another (see also Chapter 3). However, statistical significance is not the same as clinical or societal significance. Clinical or societal significance (relevance) must be assessed in terms of whether the magnitude of the observed effect is meaningful in the context of established clinical practice or public health. An increase of risk from 1 in 10 to 2 in 10 has a clinical implication different from that of an increase of 1 in 10,000 to 2 in 10,000, even though the risk has doubled in each case.

In hypothesis testing, the null hypothesis and one's confidence in either its validation or refute are the issue:

The basic overall principle is that the researcher's theory is considered false until demonstrated beyond reasonable doubt to be true... This is expressed as an assumption that the null hypothesis, the contradiction of the researcher's theory, is true... What is considered a “reasonable” doubt is called the significance level. By convention in scientific research, a “reasonable” level of remaining doubt is one below either 5% or 1%. A statistical test defines a rule that, when applied to the data, determines whether the null hypothesis can be rejected... Both the significance level and the power of the test are derived by calculating with what probability a positive verdict would be obtained (the null hypothesis rejected) if the same trial were run over and over again (Kraemer and Thiemann, 1987, pp. 22–23).

A clinical trial is often formulated as a hypothesis as to whether an experimental therapy is effective. However, confidence intervals may provide a better indication of the level of uncertainty. In the clinical trial setting, the hypothesis test is natural, because the goal is to determine whether an experimental therapy should be used. In clinical trials, confidence intervals are used in the same manner as hypothesis tests. Thus, if the interval includes the null hypothesis, one concludes that the experimental therapy has not proved to be more effective than the control.

To obtain the significance level, hypothetical repeats of trials are done

Page 37 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 37

when the null hypothesis is taken to be true. To obtain power, repeat tests are done when the alternative hypothesis is correct. To compute power, the researcher must have developed from preliminary data a critical-effect size, that is, a measure of how strong the theory must minimally be to be important to the individual being offered the therapy or important to society (Kraemer and Thiemann, 1987, p. 24). Changing designs or measures used or choosing one valid test over another changes the definition of effect size. Moreover, the critical-effect size is individual- or population-specific as well as measurement-specific (Kraemer and Thiemann, 1987).

TRADITIONAL CLINICAL TRIAL DESIGNS

Modern clinical trials go back more than 40 years, and a wide variety of clinical trial designs have been developed and adapted over the past 25 years. To the extent possible, each of these designs uses the concepts of control and randomization to make comparisons among groups ( Box 2-2). Some of these designs, which are generally used in larger studies, can also be adapted for use in some small studies. For example, crossover designs can be used in small clinical studies and can be used in within-subject trials. Each is described below.

Parallel-Group Design

The most common clinical trial design is the parallel-group design, in which participants are randomized to one of two or more arms (Pocock, 1984). These arms include the new intervention under investigation and one or more control arms, such as a placebo control or an active control. The randomized parallel-group design is typically used to evaluate differences in

BOX 2-2

Traditional Designs for Clinical Trials

Parallel-group design

Crossover design

Factorial design

Add-on design

Randomized withdrawal design

Early-escape design

Page 38 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 38

the effects of different interventions across time. Trials that use the parallel-group design are often double blinded. Because of the improved ability to control for bias through randomization and blinding, the analysis of such trials and the interpretation of their results are generally straightforward.

Crossover Design

The crossover design compares two or more interventions by randomly assigning each participant to receive the interventions being tested in a different sequence. Once one intervention is completed, participants are switched to another intervention. For example, in a two-by-two crossover design, each participant randomly receives one drug for one period of time and then another drug for a second period of time, with the administration of each drug separated by a washout period (i.e., a period of time during which the first drug is cleared from the body before the second drug is administered). With this type of study, each participant serves as his or her own control. There are several advantages to this trial design, including a reduction in the number of participants required to achieve a statistically significant result and the ability to control for patient specific effects. This design can also be useful for studying a patient's response to short periods of therapy, particularly for chronic conditions in which the initial evaluation of treatment efficacy is concerned with the measurement of short-term relief of symptoms (Pocock, 1984).

A criticism of this design is that the effects of one intervention may carry over into the period when the next intervention is given. Crossover studies cannot be done if the effects of the interventions are irreversible (e.g., gene therapy or surgery) or the disease progression is not stable over time (e.g., advanced cancer). Additional problems with crossover studies occur if participants withdraw from the study before they receive both interventions or the outcomes are affected by the order in which the interventions are administered (Senn, 1993).

Crossover designs are occasionally used in psychological studies because of the opportunity to use each patient at least twice and because of the probability that the component of the variance within individual patients is smaller than between patients (Matthews, 1995).

Factorial Design

In a factorial design, two or more treatments are evaluated simultaneously with the same participant population through the use of various

Page 39 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 39

combinations of the treatments. For example, in a two-by-two factorial design, participants are randomly allocated to one of the four possible combinations of two treatments, treatments A and B: treatment A alone, treatment B alone, both treatments A and B, or neither treatment A nor treatment B. The usual intention of using this design is to make efficient use of clinical trial participants by evaluating the efficacies of the two treatments with the same number of participants that would be required to evaluate the efficacy of either one alone. The success of this approach depends on the absence of any relevant interaction between treatments A and B so that the effect of treatment A is virtually identical whether or not treatment B is administered. This design can also be used to test the interaction of treatments A and B, but then, the advantages of efficiency no longer apply because much larger trials are necessary to detect a clinically relevant interaction.

The factorial design can also be used to establish the dose-response characteristics of a combination product, for example, one that combines treatments C and D. Different doses of treatment C are selected, usually including a dose of zero (placebo), and similar doses of treatment D are also chosen. Participants in each arm of the trial receive a different combination of doses of treatments C and D. The resulting estimate of the response may then be used to help to identify an appropriate combination of doses of treatments C and D for clinical use.

Add-on Design

In an add-on design, a placebo-controlled trial of an experimental intervention is tested with people already receiving an established, effective treatment. Thus, all participants receive the established, effective treatment. The add-on design is especially useful for the testing of experimental interventions that have a mechanism of action different from that of the established, effective treatment. Experimental interventions for patients with acute myocardial infarctions and, increasingly, patients with rheumatoid arthritis, for example, are often tested in studies with this design. The add-on design is the only one that can be used in long-term studies of treatments for heart failure since standard therapy is lifesaving and cannot be denied (Temple, 1996). However, the add-on design is most useful for the testing of experimental interventions that have mechanisms of action different from that of the established, effective treatment.

Page 40 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 40

Randomized Withdrawal Design

In a randomized withdrawal design, individuals who respond positively to an experimental intervention are randomized to continue receiving that intervention or to receive a placebo. This trial design minimizes the amount of time that individuals receive a placebo (Temple, 1996). During the trial, the return of symptoms or the ability to continue participation in the trial are study endpoints (Temple, 1996). The advantages of this study design are that individuals receiving the experimental intervention continue to do so only if they respond, whereas individuals receiving the placebo do so only until their symptoms return. Disadvantages include carryover effects, difficulties assessing whether the underlying disease process is still active, and long lag times to adverse events if the disease is in remission. This design is more appropriate in phase I and II trials involving healthy volunteers because it is less likely that effective treatments are being withdrawn from those who need it. In some studies, however, measurement of the placebo effect is essential (e.g., studies of drugs for the treatment of depression), and such studies might require the use of a randomized withdrawal design. In those cases, voluntary, informed consent is essential, as is the provision of care during the withdrawal period.

Early-Escape Design

The early-escape design is another way to minimize an individual's duration of exposure to a placebo. In the early-escape design, participants are removed from the study if symptoms reach a defined level or they fail to respond to a defined extent. The failure rate can then be used as the measure of efficacy. Thus, in a study with an early-escape design, participants are only briefly exposed to ineffective interventions (Temple, 1996).

Multicenter Trials

Multicenter trials, although not a traditional design, provide an efficient way of establishing the efficacy of a new intervention; however, certain caveats must be noted. Sometimes multicenter trials provide the only means of accruing a sample of sufficient size within a reasonable time frame. Another advantage of multicenter trials is that they provide a better basis for the subsequent generalization of findings because the participants are recruited from a wider population and the treatment is administered in a broader range of clinical settings. In this sense, the environment in which a

Page 41 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 41

multicenter trial is conducted might more truly represent the environment for future uses of the test intervention. On the other hand, multicenter trials may require the use of multiple standards and quality control.

SPECIAL DESIGN ISSUES FOR SMALL TRIALS

A number of trial designs especially lend themselves to studies with small numbers of participants, including single subject (n-of-1) designs, sequential designs, decision analysis-based designs, ranking and selection designs, adaptive designs, and risk-based allocation designs ( Box 2-3).

Conducting Randomized Trials with Individual Patients

Clinicians are often faced with treatment decisions when they cannot rely on the results of an RCT because the results do not apply to that patient or a relevant trial might not yet have been done. In this case, the clinician might opt for a “trial of therapy”; that is, the clinician might administer more than one treatment to a patient to assess the effects (Guyatt, Sackett, Adachi, et al., 1988). Trials with this type of design (referred to as a trial with an n-of-1 design) have a long tradition in the behavioral sciences and have more recently been used in clinical medicine (Johannessen, 1991). Trials with such designs can improve the certainty of a treatment decision for a single patient; a series of trials with such designs may permit more general inferences to be drawn about a specific treatment approach (Johannessen, 1991). They also become useful when a population is believed to be heterogeneous. The central premise of trials with such designs is that the patient (e.g., an astronaut) serves as his or her own control.

The factors that can mislead physicians conducting conventional therapeutic trials—the placebo effect, the natural history of the illness, and ex-

BOX 2-3

Special Design Issues for Small Clinical Trials

n-of-1 design

Sequential design

Decision analysis-based design

Ranking and selection design

Adaptive design

Risk-based allocation design

Page 42 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 42

pectations about the treatment effect—can be avoided in trials of therapy with n-of 1-designs by safeguards that permit the natural, untreated course of the disorder to be observed and by keeping the patient and the clinician blind to the timing of active treatment.

Guyatt and colleagues (1988) describe one method of conducting an RCT with an n-of-1 design:

A clinician and a patient agree to test a therapy (the “experimental therapy”) for its ability to reduce or control the symptoms, signs, or other manifestations (the “treatment targets”) of the patient's ailment.
The patient then undergoes treatment for a pair of periods; during one period of each pair the experimental therapy is applied, and during the other period either an alternative treatment or a placebo is applied. The order of the two periods within each pair is randomized by a method that ensures that each period has an equal chance of applying the experimental or the alternative therapy.
Whenever possible both the clinician and the patient are blind to the treatment being given during either period.
The treatment targets are monitored (often through the use of a patient diary) to document the effect of the treatment being applied.
Pairs of treatment periods are replicated until the clinician and the patient are convinced that the experimental therapy is effective, is harmful, or has no effect on the treatment targets.

RCTs with n-of 1 designs may be indicated if an RCT has shown that some patients are unresponsive to treatment, if there is doubt about whether a treatment is really providing a benefit to a particular patient; when the patient insists on taking a treatment that the clinician thinks is useless or potentially harmful, when a patient is experiencing symptoms suspected to be medication side effects but neither the patient nor the clinician is certain, and when neither the clinician nor the patient is confident of the optimal dose of a medication or replacement therapy (Edgington, 1996). In addition, RCTs with n-of-1 designs are most useful for the study of treatments for chronic conditions for which maintenance therapy is likely to be continued for long periods of time and if the treatment effect occurs soon after the initiation of treatment and ceases soon after the withdrawal of treatment. Trials with n-of 1 designs are also attractive for the study of vaguely defined or heterogeneous conditions ( Table 2-2). For patients with these conditions, studies with n-of-1 designs may generate new hypotheses for the design of

Page 43 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 43

TABLE 2-2 Considerations in Performing a Trial with an n-of-1 Design

Is the condition chronic?

Is the condition stable over time?

Is there a carryover effect?

Is there a period effect?

Do the effects of the treatments have a rapid onset or a rapid cessation?

Are good measures available for the evaluation of the response?

Is a blinded trial feasible?

Is treatment effectiveness uncertain for the individual?

Is long-term therapy being considered?

Is the optimal dose known?

Is treatment timing feasible?

Is the patient interested in participating in a trial with an n-of-1 design?

Is the trial feasible in the clinician's practice?

Is the trial ethical?

SOURCE: Zucker (2000).

subsequent conventional group trials and can bridge the gap between research and clinical practice (Johannessen, 1991).

One concern about trials with n-of-1 designs is whether clinically relevant targets of treatment can be measured. Outcome measures often extend beyond a set of physical signs (e.g., the rigidity and tremor of parkinsonism), laboratory tests (e.g., measurement of blood glucose levels), or a measure of patient performance (e.g., score on a 6-minute walking test). Thus, in most situations it is preferable to directly measure a patient's symptoms, well being, or quality of life. The measurement of a patient's symptoms may also include the side effects of treatment (Guyatt, Sackett, Adachi, et al., 1988).

One of the advantages to not specifying the number of pairs of treatment periods in advance is that the trial can be stopped at any time. If, on the other hand, one wishes to conduct a standard statistical analysis of data (e.g., a frequentist or a Bayesian analysis), the analysis will be strengthened considerably if the number of pairs is specified in advance. Regardless of whether the number of treatment periods is specified in advance, it is advisable to have at least two pairs of treatment periods before breaking the trial (Guyatt, 1986). Conclusions drawn after a single pair of treatments are likely to be either false positive (that the treatment is effective when it is not) or false negative (that the treatment is not effective when it is). Moreover, a

Page 44 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 44

positive effect of treatment in one patient is not a reliable predictor of the responses in future patients.

A preliminary treatment period with active therapy, during which both the clinician and the patient know that active therapy is being received, could save time. If there is no evidence of a response during such an open trial or if intolerable side effects occur, an RCT with an n-of-1 design may be meaningless or impossible. An open preliminary treatment period may also be used to determine the optimal dose of the medication to be used in the trial.

If requirements similar to those required for conventional group trials—strict entry criteria, uniform treatment procedures, consensus scales for outcome measures, and acceptable statistical tests—are applied to a series of trials with n-of-1 designs, conclusions may be generalizable to the target population (Johannessen, 1991; Zucker, Schmid, McIntosh, et al., 1997). This has the advantage that the patients are exposed to placebo only for as long as is needed to get an answer both for the patients and for the main population database.

A repeated-measures design is likely to be very useful in small studies. The extreme of a small repeated-measures design is the study with an n-of-1 design. At the design phase of a study with a repeated-measures design, the correlation structure of the measures is an important parameter. One would need to explore the feasibility (i.e., the statistical power) of the study under several different assumptions about the correlation structure.

Sequential Designs

In a study with a sequential design, participants are sequentially enrolled in the study and are assigned a treatment (assignment is usually at random). The investigator then changes the probabilities that participants will be assigned to any particular treatment on the basis of as they become available. The object is to improve the efficiency, safety, or efficacy of the experiment as it is in progress by changing the rules by which one determines how participants are allocated to the various treatments.

Strategies for sequential dose-response designs include up-and-down methods, stochastic approximation methods, maximum-likelihood methods, and Bayesian methods. Recently, attention has been focused on the continual reassessment methods which is a Bayesian sequential design (Durham, Flournoy, and Rosenberger, 1997). Random-walk rules are particularly attractive for use in the design of dose-response studies for several reasons: exact finite and asymptotic distribution theory is completely worked out, which allows the experimenter to choose design parameters for the most

Page 45 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 45

ethical allocation scheme; specific designs can be chosen that allow the chosen design points to be distributed unimodally around a quantile of interest; the designs are very simple to implement; and the designs operate on a finite lattice of dosages (Durham, Flournoy, and Rosenberger, 1997). Random-walk rules identify a class of rules for which the sample paths form random walks. Thus, if there is a fixed probability of transitioning from state A to state B and another fixed probability of transitioning from state B to state A in a two-state process (a Markov chain), then sequences of states such as A, B, B, A, B,... are random walks. A rule such as “stop the first time that the total number of A's or B's reaches a prespecified number” would be called a random-walk rule.

One example of sequential design is called the “up-and-down design” (Dixon and Mood, 1948), in which the choices of experimental treatment either go up one level (dose), down one level (dose), or stay unchanged. The design allocates treatments to pairs of participants in a way that causes the treatment distribution to cluster around the treatment with a maximum probability of success (Dixon and Mood, 1948; Kpamegan and Flournoy, 2001). An up-and-down design has some advantages in clinical trials, in that it allows more conservative movement across a range of treatments. To optimize an up-and-down design, one treats individuals in pairs, with one receiving the lower-dose treatment and the other receiving the higher-dose treatment. If the lower-dose treatment results in a treatment failure and the higher-dose treatment results in a treatment success, the doses of the treatment are increased for the next pair. Conversely, if the patient with the lower-dose treatment has a treatment success and the patient with the higher-dose treatment has a treatment failure, then the doses of the treatment are decreased for the next pair. In this simple model, if there are two treatment successes or two treatment failures, the study is stopped. This design allows early estimations of the effective dosage range to be obtained before investigators proceed with large-scale randomized trials (Flournoy, in press).

Sequential group designs are useful for the monitoring and accumulation of study data, while they preserve the Type I error probability at a desired significance level, despite the repeated application of significance tests (Kim and DeMets, 1992). Parallel-groups are studied until a clear benefit is seen or it is determined that no difference in treatments exists (Lai, Levin, Robbins, et al., 1980; Whitehead, 1999). The sequential group design allows results to be monitored at specific time intervals throughout the trial so that the trial may be stopped early if there is clear evidence of efficacy. Safety monitoring can also be done, and trials can be stopped early if unacceptable adverse effects occur or if it is determined that the chance of showing a

Page 46 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 46

clinically valuable benefit is futile. Because there is a need in all clinical trials—as dictated by ethical requirements—to assess results throughout the course of the trial, there is a potential that the blind will be broken, depending on how the results are assessed and by whom.

The disadvantage of this approach is that in most trials patients are heterogeneous with respect to the important prognostic factors, and these methods do not protect against the introduction of bias as a result of changes in the types of patients entering into a clinical trial over time. Moreover, for patients with chronic diseases, responses are usually delayed so long that the advantages of this approach are often lost.

Decision Analysis-Based Design

Decision analysis (Pauker, 2000) can be informative in the experimental design process. Modeling of a clinical situation a priori allows testing of variables, which allows determination of the potential impact of each variable on the decision. Framing the question starts the decision analysis-based design process.

One explicitly considers both decision (e.g., intervention A or intervention B) and probabilistic events (e.g., side effect versus no side effect). A utility is assigned to each outcome. Utilities have numeric values, usually between 0 and 1, that reflect the desirability of an outcome; that is, they incorporate the weighting of the severity or importance of the possible adverse outcomes as well as the weighting of the severity or importance of the beneficial outcomes (Drummond, O'Brien, Stoddart, et al., 1997). Decision analysis combines the probability of each outcome with the utility to calculate an expected utility for each decision.

During the planning phase for a study, decision analysis is used to structure the question. One obtains (either from data or from expert opinion) best estimates for each probability and utility. One then varies potential important values (either probability or utility) over a likely range. This process, known as “sensitivity analysis,” allows the design group to determine if the decision is sensitive to that value. Thus, decision analysis can direct small trials to focus on these important variables. The integrity of an analysis depends both on the values and on the model's structure. One should make both values and structure available for evaluation. One can use the process of varying the value assumptions (known as sensitivity analysis) to determine if a value's precision would change one's decision. It is important to recognize, however, that decision analysis is dependent on the assumptions made about parameter values and model structure. Reviews of decision analyses

Page 47 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 47

should include careful critique of the model structure. (See Chapter 3 for a further discussion and an example of decision analysis.)

Ranking and Selection Design

Selection problems pervade the conduct of clinical trials. Statisticians can provide rational procedures for selection of the best of several alternatives. The formulation of the goal for the statistical significance of a trial influences sample size in a substantial way. The hypothesis test has been the predominant formulation used in the design of large-scale, randomized trials, but other paradigms deserve careful consideration, especially in situations with small sample sizes. One such paradigm is ranking and selection. Ranking and selection procedures are statistical techniques for comparison of the parameters for multiple study (k) populations under the assumption that these parameters are not all the same (Gibbons, Olkin, and Sobel, 1979). The methods, known generally as ranking and selection procedures, include techniques appropriate for the achievement of many different goals, although a careful formulation of the corresponding problem is needed for each goal. Suppose there are k populations and that each population is characterized by a parameter. For example, the k populations are normally distributed with different means and a common variance. In this context, populations for which mean values are large are preferable to populations for which mean values are small. For any given set of k populations, some of the goals that can be accomplished by these methods are

1. selection of the one best population;

2. selection of a random number of populations such that all populations better than a control population or a standard are included in the selected subset;

3. selection of the t best populations for t ≥ 2 (a) in an ordered manner or (b) in an unordered manner;

4. selection of a random number of populations, say r, which includes the t best populations;

5. selection of a fixed number of populations, say r, which includes the t best populations;

6. ordering of all the k populations from best to worst (or vice versa); or

7. ordering of a fixed-size subset of the populations from best to worst (or vice versa) (Gibbons, Olkin, and Sobel, 1979).

Ranking and selection procedures are particularly appropriate for answering questions such as the following (Gibbons, Olkin, and Sobel, 1979):

Page 48 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 48

Which one of Î» different drugs produces the best response?
Which subgroup of the Î» drugs produces a better response than a placebo?
Which two of Î± types of advertising media reach the largest proportion of potential buyers of a particular product?
Which one of Î² different learning techniques produces the best comprehension?

Instead of formulating the goal of a trial as the definitive rejection of a null hypothesis when it is false (with a high degree of statistical power) while limiting its rejection when it is true (at a given level of a Type I error rate) in planning a selection trial a clinician might reason as explained in Box 2-4.

A related goal is to rank three or more treatments in order of preference. Methods for ranking and selection lend themselves naturally to sequentialization. Sequential selection procedures can further reduce the sample size required to select the best of two or more treatments (Levin and Robbins, 1981). One of the ways in which ranking and selection methods can be of help in a process is by ruling out poor competitors. Suppose that investigators must choose the best of five interventions. With small sample sizes the investigators may not be able to choose the best but might be able

BOX 2-4

Example of a Selection Trial

Over the course of the coming year, a clinician will have N patients to treat with disease D. The clinician can treat these patients with therapy A or therapy B, but it is unclear which therapy is better. One thing is clear, however: the clinician must treat all patients with one or the other therapy and is willing to conduct a trial in which the goal is, ideally, to select the truly superior treatment. If the two treatments are in truth equally effective (with other factors such as cost and side effects being equal), the clinician should be indifferent to which therapy is selected. If one treatment is sufficiently better than the other, however, the clinician wants a high degree of probability that he or she will select the superior treatment.

In other words, the traditional hypothesis test is used for the formulation of confirmatory trials, but a selection trial identifies for further use a therapy that is sufficiently superior with a guaranteed high degree of probability. On the other hand, if the therapies are essentially equivalent, the goal is to be able to select either therapy. Because it is possible to view such selection trials as equivalent to classical hypothesis testing with a Type I error rate of 0.5 (rather than a Type I error rate of 0.05), it can be seen that selection trials generally require much smaller sample sizes than those required for the usual confirmatory trial. Note that this is not a way of cheating but is an explicit decision to acknowledge the importance of the selection paradigm over the definitive or confirmatory hypothesis test paradigm.

Page 49 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 49

to assert that the best is among a group of three of the interventions, although they are not sure which one is the best. Subsequent studies can then focus on choosing the best of the three interventions.

A key criterion in selection trials is the probability of selection of the “correct” treatment. Even more intriguing criteria have been proposed for the selection of a superior treatment. In a review of the second edition of Peter Armitage's book Sequential Medical Trials, Frank Anscombe introduced what has been called the “ethical cost” function, which considers the number of inferior treatments and the severity of such treatments errors (Lai, Levin, Robbins, et al., 1980).

Consider again the finite patient horizon of N patients to be treated over the course of a given time period. Suppose n pairs of patients (for a total of 2n patients) are to be considered in the trial phase, with treatment A or treatment B randomly allocated within pairs. After the trial phase, the remaining N − 2n patients will all be given the apparently superior treatment identified in the trialphase. The ethical cost function is the total number of patients given the truly inferior treatment multiplied by the magnitude of the treatment efficacy difference. If (AD) denotes the absolute difference in average endpoint levels between the two treatments, then the ethical cost is (AD)n if the truly superior treatment is selected in the trial phase and (AD)(N − n) if the truly superior treatment is not selected.

It is simple to implement a sequential version of the trial phase; it also has the virtue of achieving a substantially lower average ethical cost than that which can be achieved with a fixed sample size in the trial phase. A surprising feature of a large class of reasonable sequential stopping rules for the trial phase is that they can reduce the average ethical cost for a fixed sample size, even when the ethical cost is optimized for a given value of (AD). For example, one such rule will reach a decision in the trial phase in which n is no more than one-sixth of N. The main point for consideration in small trials, however, is that it may not be obvious how one rationalizes the trade-off between the number of patients put at risk in the trial and an ultimately arbitrary Type I error rate in a conventional trial. On the other hand, it may be much more desirable to design a selection trial with an ethical cost function that directly incorporates the number of patients given inferior treatment.

Adaptive Design

Adaptive designs have been suggested as a way to overcome the ethical dilemmas that arise when the early results from an RCT clearly begin to

Page 50 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 50

favor one intervention over another. An adaptive design seeks to skew assignment probabilities to favor the better-performing treatment in a trial that is under way (Rosenberger, 1996).

Adaptive designs are attractive to mathematicians and statisticians because they impose dependencies that require the full arsenal of techniques and stochastic processes (Rosenberger, 1996). An assortment of adaptive designs has been developed over the past few decades, including a variety of urn models that govern the sampling mechanism. Adaptive design can be associated with complex analytical problems. If the sample size is small enough, an exact analysis by exhaustive enumeration of all sample paths is one way to provide an answer. If the sample size is larger but still not large, a Monte Carlo simulation can provide an accurate analysis. If the sample size is large, then standard likelihood-based methods can be used. An example of an adaptive design is described in Box 2-5.

A major advantage of adaptive design is that over time more patients will be assigned to the more successful treatment. Stopping rules and data analysis for these types of designs are complicated (Hoel, Sobel, and Weiss, 1975), and more research is needed in this area. As with sequential designs, the disadvantage of adaptive designs is that in most trials, patients are heterogeneous with respect to the important prognostic factors, and these methods do not protect against bias introduced by changes in the types of patients entering into a trial over time. Morever, for patients with chronic diseases, responses are usually delayed so long that the advantages of this approach are often lost. Also, multiple endpoints are usually of interest, and therefore, the entire allocation process should not be based on a single response. Play-the-winner rules can be useful in certain specialized medical situations in which ethical challenges are strong and one can be reasonably certain that time trends and patient heterogeneity are unimportant. These

BOX 2-5

Play-the-Winner Rule as an Example of Adaptive Design

A simple version of a randomized version of the play-the-winner rule follows. An urn contains two balls; one is labeled A and the other is labeled B. When a patient is available for treatment assignment, a ball is drawn at random and replaced. If the ball is type A, the patient is assigned to treatment A; if it is type B, the patient is assigned to treatment B. When the results for a patient are available, the contents of the urn are changed according to the following rule: if the result was a success, an additional ball labeled with the successful treatment is added to the urn. If the result is a failure, a ball with the opposite label is added to the urn (Zelen, 1969).

Page 51 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 51

rules can be especially beneficial when response times are short compared with the times between patient entries into a study. An example of this is the development of extracorporeal membrane oxygenation (Truog, 1992; Ware, 1989).

Risk-Based Allocation Design

Risk-based allocation, a nonrandomized design, has a very specific purpose: to allow individuals at higher risk or with greater disease severity to benefit from a potentially superior experimental treatment. Because the design is nonrandomized, its use should be considered only in situations in which an RCT would not be possible.

For example, when a therapy is readily available outside the study protocol or when a treatment has been in use for a long time and is perceived to be efficacious, even though it has never been subjected to a randomized trial, a nonrandomized risk-based allocation approach may be useful. Bone marrow transplantation for the treatment of advanced breast disease is an illustration. A nationwide, multicenter randomized trial was designed to test the efficacy of harvesting bone marrow before aggressive chemotherapy followed by bone marrow transplantation with the patient's own (autologous) bone marrow for women with at least 10 axillary nodes with tumor involvement. The comparison group received the standard therapy at that time which omitted the bone marrow transplantation procedure. Bone marrow transplantation was widely available outside the clinical trial, and women were choosing that therapy in large numbers, drastically slowing patient enrollment in the trial. It took more than 7 years (between 1991 and 1998) to achieve the target sample size of 982 women, whereas more than 15,000 off-protocol bone marrow transplantation procedures were administered during that time period. If only half of the women receiving off-protocol bone marrow transplantation had been enrolled in the trial, the target sample size would have been reached in less than 2 years. The difficulty was that when participants were informed that they faced a 50 percent chance of being randomized to the comparison group, they withheld consent to obtain bone marrow transplantation elsewhere, often just across town. The final result of the trial was that there was no survival benefit to this approach. A risk-based allocation design might have reached the same conclusion much sooner, saving many women from undergoing a very painful, expensive, and, ultimately, questionable surgical procedure.

Other examples of desperately ill patients or their caregivers seeking experimental treatments and refusing to be randomized include patients with

Page 52 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 52

AIDS in the early days of trials of drugs for the treatment of human immunodeficiency virus infection and caregivers of premature infants with extra-corporeal membrane oxygenation. Other therapies, such as pulmonary artery (Swan-Ganz) catheter placement, estrogen treatment for Alzheimer's disease, or radical surgery for prostate cancer, have been nearly impossible to test in randomized trials because participants, convinced of their therapeutic benefits, did not want to receive the placebo or the standard therapy. These therapies have been cited in the news media because of the extreme difficulty in recruiting participants into randomized trials of the therapies (Altman, 1996; Brody, 1997; Kolata, 1995, 1997; Kolata and Eichenwald, 1999).

A risk-based allocation design attempts to circumvent these problems by ensuring that all of the sickest patients will receive the experimental treatment. The design is sometimes called an “assured allocation design” (Finkelstein, Levin, and Robins, 1996a, b). It has also been called the “regression-discontinuity design,” although that name presupposes a specific statistical analysis that is not always appropriate.

The design has three novel features. First, it requires a quantitative measure of risk, disease severity, or prognosis, which is observed at or before enrollment in the study, together with a prespecified threshold for receiving the experimental therapy. All participants above the threshold receive the experimental (new) treatment, whereas all participants below the threshold receive the standard (old) treatment. The second novel feature of the risk-based design is the goal of the trial: to estimate the difference in average outcome for high-risk individuals who received the new treatment compared with that for the same individuals if they had received the old treatment.

Thus, in the bone marrow transplantation example, women eligible for the randomized trial had to have 10 or more nodes of involvement. In a risk-based allocation trial, all of these high-risk women would have been given bone marrow transplantation, whereas women with fewer affected nodes would have been recruited and given the standard therapy. The treatment effect to be estimated in the assured allocation design would be the survival difference for women with at least 10 nodes given bone marrow transplantation compared with that for the same group of women if they had received the standard therapy.

The risk-based allocation creates a biased allocation, and the statistical analysis appropriate for estimation of the treatment effect is not a simple comparison of the mean outcomes for the two groups, as it would be in a randomized trial. One analytical method comes from the theory of general empirical Bayes estimation, originally introduced by Herbert Robbins in the

Page 53 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 53

1950s in a series of landmark papers (Lai and Siegmund, 1985; Robbins, 1956, 1977). Robbins applied this approach first to estimation problems, then to prediction problems, and later to risk-based allocation (Robbins, 1993; Robbins and Zhang, 1988, 1989, 1991). If one gives up randomization (because the trial would be impossible to carry out), one needs another principle to achieve a scientifically valid estimate of treatment effect. Therefore, the third requirement of risk-based design is a model that can be used to predict what outcomes the sicker patients would have had if they had been given the standard treatment. A prototypic example of the appropriate statistical analysis required is shown in Box 2-6.

Thus, there is good rationale for using a risk-based allocation design to compare the outcomes for high-risk patients who receive the new treatment with the predicted outcome for the same patients if they had received the standard therapy. One requires a model for the standard treatment (but only the standard treatment) that relates the average or expected outcome to specific values of the baseline measure of risk used for the allocation. Only the functional form of the model, not specific values of the model parameters, is required. This is because the parameters used in the model will be estimated from the concurrent control data, and extrapolated to the high-risk patients. This is an advantage over historical controlled studies. One need not rely on historical estimates of means or proportions of the expected outcome, which are notoriously untrustworthy. All one needs to assume for the risk-based design is that the mathematical form of the model relating outcome to risk is correctly specified throughout the entire range of the risk measure. This is a strong assumption, but with sufficient experience and prior data on the standard treatment, the form of the model can be validated. In the same way that an engineer can build a bridge without being completely agnostic about the laws of gravity and the tensile strength of steel, so progress can be made without randomization if one has a model that predicts the outcomes of a standard treatment. In addition, the validity of the predictive model can always be checked against the concurrent control data in the risk-based trial.

The usual problem of extrapolation beyond the range of data does not arise here for three reasons. First, one assumes that the mathematical form of the model relating outcome to risk is correctly specified throughout the entire range of the risk measure. If one does not know what lies beyond the range of data, then extrapolation is risky. Thus, in this situation one should assume a validated model for standard treatment that covers the whole range of the risk measure, including data for those high-risk patients that form part of the observed data. Estimation of the model parameters from a por-

Page 54 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 54

BOX 2-6

Example of General Empirical Bayes Estimation

Suppose one collects data on the number of traffic accidents that each driver in a population of motorists had during a 1-year baseline period. Most drivers will have no accidents, some will have one, some will have two, and so on. If one focuses on the subgroup of drivers who had no accidents during the baseline period, one can then ask the following question: assuming that traffic conditions and driving habits remain stable, how many accidents in total would the same drivers with no accident in the baseline year be predicted to have in the next year? A model is needed to make a prediction. A reasonable statistical model is that the number of accidents that a single driver has in a 1-year period follows a Poisson distribution, the standard probability law governing the occurrence of rare events. Subject-to-subject variability requires one to assume that the mean value for a parameter according to a Poisson distribution (the number of accidents expected per year) varies from driver to driver: some drivers have very safe driving habits with a small expected number of accidents per year, whereas others have less safe driving habits.

A key feature of a general empirical Bayes analysis is that no assumption about the distribution of the Poisson mean parameters in the population of drivers needs be made. In this case, the term “general empirical Bayes” does not mean empirical Bayes generally but, rather, refers to the kind of empirical Bayes method that does not make assumptions about the prior distribution (in contrast to the parametric variety used by Robbins [1956]). Robbins proved that an unbiased and asymptotically optimal predictor of the number of accidents next year by the drivers who had no accidents in the baseline year is the number of drivers who had exactly one accident in the baseline year. The proof of this assertion is based only on the assumption of the form of the model for outcomes (Poisson distribution), without any parametric assumption about how the model parameter is distributed among participants in the population. What is amazing—and the reason that this example is presented—is that information about one group of people (the drivers with no accidents) can be consistently and asymptotically optimally predicted on the basis of information about an entirely different group of people (the drivers with one accident), which is characteristic of empirical Bayes methods. There is no question that the two groups are different: even though the groups of drivers with no accidents includes some unsafe drivers who had no accidents by good fortune, the drivers in that group are, nevertheless, safer drivers on average than the drivers in the group with one accident, even though the latter group includes some safe drivers who were unlucky. This illustrates that the complete homogeneity and comparability of two groups so avidly sought after in randomized comparisons is actually not necessary to make valid comparisons, given adequate model assumptions and an appropriate (not naïve) statistical analysis.

Finally, one can observe the number of accidents next year among those with no accidents in the baseline year and compare that number with the predicted number using a 95 percent prediction interval based on the baseline data. An approximate 95 percent prediction interval is given by 1.96 times the square root of twice the number of drivers with either exactly one accident or exactly two accidents (Finkelstein and Levin, 1990). If the observed number is found to differ markedly from the predicted number, there are grounds to reject the starting assumption that driving conditions and habits remained the same. See the section Statistical Analyses in Appendix A for further discussion of risk-based allocation.

Page 55 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 55

tion of the data and then use of the model to predict responses for high-risk patients is not equivalent to extrapolation of the data into some unknown region of the sample data. Second, the model can be validated with the observed data, which increases confidence in the model over the unobserved data. Third, the effect of extrapolation is accurately reflected by the standard errors, but the effect is not some wild inflation into unknown territory. This third assumption is an important one, and identification of the appropriate model must be accomplished before a risk-based trial can be undertaken. Once the necessary model is developed, there are no other hidden assumptions. The reliability of the available data is important to this approach.

A clinical example from Finkelstein, Levin, and Robbins (1996b) is given in Box 2-7. That example uses a simple linear model to relate how much the level of total serum cholesterol was reduced from the baseline to the end of follow-up on the basis of a preliminary measurement of the cholesterol level among a group of cholesteremic, sedentary men in the placebo arm of a well-known randomized trial of the cholesterol-lowering compound cholestyramine. If the trial had been designed as a risk-based allocation trial, the actually observed lowering of the cholesterol level among the highest-risk (the most cholesteremic) men given the active drug could have been compared on the basis of a simple linear model with the lowering predicted

BOX 2-7

Potential Effectiveness of Replacing Randomized Allocation with Risk-Based (Assured) Allocation (in Which All Higher Risk Participants Receive the New Treatment)

High levels of cholesterol (at least the low-density lipoprotein component) are generally regarded as a risk factor for heart disease. A primary prevention trial was conducted in which 337 participants were randomly assigned to treatment arms to evaluate the ability of cholestyramine to lower total plasma cholesterol levels. The group with high cholesterol levels (> 290 mg/dl) had an average reduction of 34.42 mg/dl with a treatment effect (the reduction in the cholestyramine-treated high cholesterol subgroup minus the reduction in the high-cholesterol placebo controls) of 29.40 ± 3.77 mg/dl (standard error) (Lipid Research Clinical Program, 1984). The results also suggest that the drug is less effective in absolute terms for participants with lower initial total plasma cholesterol levels (<290 mg/dl). By applying a risk-based allocation model to the same data, the treatment effect is estimated for participants at higher risk (>290 mg of total plasma cholesterol/dl) to be 30.76 ±8.02 mg/dl, which is close to the result of the RCT of 29.40 mg/dl. Thus, for the high-risk patients, the results from the trial with a risk-based allocation design are virtually identical to those of the trial with the conventional design (Finkelstein, Levin, and Robbins, 1996b).

Page 56 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 56

for the same men while they were receiving a placebo. The example illustrates that the risk-based approach would have arrived at the same estimate of treatment effect for those at higher risk as the RCT did.

Some cautions must be observed when risk-based allocation is used. The population of participants entering a trial with a risk-based allocation design should be the same as that for which the model was validated so that the form of the assumed model is correct. Clinicians enrolling patients into the trial need to be comfortable with the allocation rule, because protocol violations raise difficulties just as they do in RCTs. Finally, the standard error of estimates will reflect the effect of extrapolation of the model predictions for the higher-risk patients on the basis of the data for the lower-risk patients. Because of this, a randomized design with balanced arms will have smaller standard errors than a risk-based design with the same number of patients. In the example of the study of cholestyramine in Box 2-7, the standard error was slightly more than doubled for the risk-based design than for the randomized design.

What do these ideas have to do with small clinical trials? Consider the example of bone mineral density loss among astronauts. An obvious risk factor that correlates with bone mineral density loss is the duration of the mission in space: the longer the mission, the greater the bone mineral density loss. What would be required in a risk-based study design is the mathematical form of this relationship for some standard countermeasures (countermeasure is the term that the National Aeronautics and Space Administration uses for a preventive or therapeutic intervention that mitigates bone mineral density loss or other physiological adaptations to long-duration space travel). Astronauts who will be on extended future missions on the International Space Station will be at higher-risk than those who have shorter stays. If those on the longer missions (who are at higher risk) were to receive new experimental countermeasures, their bone mineral density losses could be compared on a case-by-case basis to a prediction of what their bone mineral density loss would have been by use of the standard countermeasures. Such comparisons of observed versus expected or predicted outcomes are familiar in other studies with small sample sizes, such as studies searching for associations of rare cancer with a variety of toxic exposures.

Finally, any trial conducted in an unblinded manner has a potential bias. In some cases a trial with a risk-based allocation design need not be conducted in an unblinded manner; for example, patients may be assured of receiving an active experimental treatment together with a placebo standard treatment if they are at high risk or a placebo experimental treatment together with an active standard treatment if they are lower risk. The trial may

Page 57 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 57

be conducted in a blinded manner if the risk measure is not obvious to the patient. In many cases, however, the trial, such as a surgical intervention trial, would have to be unblinded. The issue is nothing new. Solid endpoints unaffected by investigator bias and careful protocols for permitted concomitant behavior are the best safeguard in unblinded trials.

SUMMARY

Scientific research has a long history of well-established, well-documented, and validated methods for the design, conduct, and analysis of clinical trials. A study design that is appropriate includes one with a sufficient sample size and statistical power and proper control of bias to allow a meaningful interpretation of the results. The committee strongly reaffirms that, whenever feasible, clinical trials should be designed and performed so that they have adequate statistical power.

When the clinical context does not provide a sufficient number of research participants for a trial with adequate statistical power but the research question has great clinical significance, the committee understands that, by necessity for the advancement of human health, research will proceed. Bearing in mind the statistical power, precision, and validity limitations of studies with small sample sizes, the committee notes that there are innovative design and analysis approaches that can improve the quality of such trials. In small clinical trials, it is more likely that the sample population will share several unique characteristics, for example, disease, exposures, or environment. Thus, it might be more practical in some small clinical trials than in large clinical trials to involve the participants in the design of the trial. By doing so, the investigator can increase the likelihood of compliance, adherence to the regimen, and willingness to participate in monitoring and follow-up activities. Investigators should also keep in mind opportunities for community discussion and conversation during the conduct and planning of all trials. It is also important for investigators to consider confidentiality and privacy in disseminating the results of studies whose sample populations are easily identified. Investigatiors should also keep in mind opportunities for community discussion and consultation during the planning and conduct of all clinical trials.

RECOMMENDATIONS

Because of the constraints of trials with small sample sizes, for example, trials with participants with unique or rare diseases or health conditions, it is

Page 58 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 58

particularly important to define the research questions and select outcome measures that are going to make the best possible use of the available participants while minimizing the risks to those individuals.

RECOMMENDATION: Define the research question. Before undertaking a small clinical trial it is particularly important that the research question be well defined and that outcomes and conditions to be evaluated be selected in a manner that will most likely help clinicians make therapeutic decisions.

RECOMMENDATION: Tailor the design. Careful consideration of alternative statistical design and analysis methods should occur at all stages in the multistep process of planning a clinical trial. When designing a small clinical trial, it is particularly important that the statistical design and analysis methods be customized to address the clinical research question and study population.

Clinical researchers have proposed alternative trial designs, some of which have been applied to small clinical trials. For a smaller trial, when the anticipated effect is not great, researchers may encounter a difficult tension between scientific purity or pragmatic necessity. One approach might be to focus on a simple, streamlined hypothesis (not multiple ones) and choose one means of statistical analysis that does not rely on any complicated models and that can be widely validated. An alternative approach is to choose a model-dependent analysis, effectively surrendering any pretense of model validation, knowing that there will not be enough information to validate the model, a risk that could compromise the scientific validity of the trial.

The committee believes that the research base in this area requires further development. Alternative designs have been proposed in a variety of contexts; however, they have not been adequately examined in the context of small studies.

RECOMMENDATION: More research on alternative designs is needed. Appropriate federal agencies should increase support for expanded theoretical and empirical research on the performances of alternative study designs and analysis methods that can be applied to small studies. Areas worthy of more study may include theory development, simulated and actual testing including comparison of existing and newly developed or modified alternative designs and methods of analysis, simulation models, study of limitations of trials with different sample sizes, and modification of a trial during its conduct.

Page 59 Cite

Suggested Citation:"2 Design of Small Clinical Trials." Institute of Medicine. 2001. Small Clinical Trials: Issues and Challenges. Washington, DC: The National Academies Press. doi: 10.17226/10078.

×

Page 59

Because of the limitations of small clinical trials it is especially important that the results be reported with accompanying details about the sample size, sample characteristics, and study design. The details necessary to combine evidence from several related studies, for example, measurement methods, main outcomes, and predictors for individual participants, should be published. There are two reasons for this: first, it allows the clinician to appropriately interpret the data within the clinical context, and second, it paves the way for meta-analysis with other small clinical trials or other future analyses of the study, for example, as part of a sequential design or meta-analysis. In the clinical setting, the consequences might be greater if one misinterprets the results. In the research setting, insufficiently described design strategies and methods diminish the study's value for future analyses.

RECOMMENDATION: Clarify methods in reporting of results of clinical trials. In reporting the results of a small clinical trial, with its inherent limitations, it is particularly important to carefully describe all sample characteristics and methods of data collection and analysis for synthesis of the data from the research.