Need for Innovative Designs in Research on CAM and Conventional Medicine
CHARACTERISTICS OF CAM TREATMENTS AND MODALITIES
Standard randomized controlled trials (RCTs); which consist of two or three study arms, large numbers of patients in each study arm, one specific, standard treatment or dose of treatment per study arm, and 1 or 2 years of follow-up may be ill-suited to answer questions about the long-term effects of complementary and alternative medicine (CAM) therapies on disease prevention and wellness. Several characteristics of CAM treatments and modalities are also difficult to incorporate into treatment effectiveness studies with shorter time lines as well as studies with more clearly defined symptom relief or disease state endpoints. These characteristics are not unique to CAM and are further discussed below.
CAM modalities frequently use “bundles” of therapies rather than just one therapy in isolation. Survey data show that patients who use one CAM modality frequently use other CAM modalities at the same time and use CAM modalities along with conventional medicine treatments for the same condition (Eisenberg et al., 1993, 1998; Wolsko et al., 2002). Although it may be possible to enroll patients in a study that restricts their treatments to one at a time, it is difficult, scientifically questionable, and possibly even unethical to restrict for study purposes treatments that would naturally accompany the specific therapy or modality being studied. For example, it may be difficult to conduct an RCT of a specific massage therapy technique if a large fraction of patients who receive this treatment in routine practice would also receive various combinations of herbal therapies, aromatherapy, stretching and exercise recommendations, and relaxation therapies.
It is often difficult to define the thing to be studied. Patients receiving homeopathy might also be accurately described as receiving a particular type, class, or school of homeopathic treatment; treatment from a particular type of provider or individual provider; and treatment with a particular material or combination of materials. Research could conceivably be done to establish the effectiveness of any of these things, from the most general to the most specific. The level of analysis that would be most informative for clinicians, individual patients, or health policy makers is not obvious. This problem is occasionally encountered in conventional medicine but less commonly than in CAM, as questions about effectiveness typically pertain to very precisely defined therapies rather than to whole disciplines or schools of thought (e.g., medicine, surgery, or radiation therapy). As a matter of convenience, one may speak of a study comparing surgical and medical treatments for low back pain, but a study would typically define the treatments in each domain quite specifically and not presume to be evaluating all possible treatments that might be offered under those broad labels. In CAM, however, there is a greater tendency to pose research questions about the effectiveness of whole modalities or schools of thought; for example, does chiropractic work for back or neck pain, and does acupuncture work for headache?
In CAM, treatments are individualized for each patient, and treatments may be individualized for each patient at each treatment (Park, 2002) One reason that research questions may be posed about whole CAM modalities at a time is that in some CAM modalities (e.g., traditional Chinese medicine) there is no such thing as a “standard” treatment or dose. Individualization of therapy to a unique combination of patient characteristics is a core concept of the modality. The only common characterstics to be studied across multiple patients and generalized from a study sample to a larger universe of patients are the modality and the general approach taken by the practitioner. Everything else can and will vary from patient to patient, at least in principle.
Some treatments are presumed to depend on the unique characteristics of the healer and on features of the healer-patient relationship. In some of the energy or touch therapies, for example, qi gong, the effectiveness of the treatment is presumed to be inherently bound up in a skill or an ability of the healer that may be viewed as a gift and therefore not easily measurable or generalizable (Krieger, 1998). This is not a completely foreign concept to research in conventional medicine; studies of surgical procedures typically take the skill or experience of the surgeon into account in some way; and studies of psychotherapy may take into account some measure of the skill, empathy level, or experience of the therapist. It is a complicating factor for research in any study in which the talents of service providers including conventional medicine vary and can be very problematic if the
skill or talent of the healer cannot be quantified in the same way that experience (e.g., number of patients treated) can. The problem is more complicated yet when treatment effectiveness is presumed to depend on a particular relationship, rapport, or bond between the patient and the healer. Unless that relationship or rapport can be defined and assessed at the start of a research trial, there is a risk that a poor outcome will be used as evidence that the necessary relationship did not exist and a good outcome will be used as evidence that it did.
For many CAM therapies, there is a need to pay explicit attention to placebo or expectation effects. In most studies in conventional medicine that include a placebo control arm, the goal of the study is to show that the treatment in question is superior to the placebo. The underlying assumption is that a placebo effect is not real biologically and that the treatment being studied can be deemed to have an effect only if the outcomes that result from the treatment are significantly better than those from the placebo. In many CAM modalities (and in some conventional medicine modalities as well), however, the placebo effect is an inherent part of the mechanism of treatment efficacy. That is, the benefit obtained by the patient is at least partially due to his or her own sense of hope, positive expectation, and activation of self-healing processes. One cannot design a study to eliminate these processes as explanations for outcomes, since they are, by definition within the CAM modality, not a source of noise or confounding but part of the essence of the treatment itself.
In evaluations of CAM therapies, end points may be difficult to measure in a standardized way. The techniques used to measure subjective experiences like pain, fatigue, the ability to perform daily activities, and mood state have experienced significant advances in the past 20 years (IOM, 1999). CAM treatments intended to produce benefits in these areas should be evaluable by using existing, standardized measures with strong scientific foundations.
Other potential outcomes of CAM treatments, however, are not as well defined or measurable. Feelings of general well-being, energy balance, harmony, or centeredness may be harder to measure in a reliable way, and perhaps hard to interpret outside the worldview or belief system of a specific CAM modality. Patients receiving an energy-based CAM therapy, for example, may very well understand questions about energy balance, and reliable and valid measures may be developed in the context of that therapy. The questions may not make as much sense to patients and the measures may not work as well, however, for patients receiving other treatment modalities. It will therefore be difficult to compare scores on such a measure across groups in comparative studies of the energy balance therapy and other CAM or non-CAM therapies. The same problem could hold in reverse, in that quantitative measures of pain intensity, for example, may not
make sense and may not have acceptable psychometric properties for patients receiving CAM modalities that do not take a quantitative approach to sensations like pain.
In both CAM and conventional medicine, there are treatments that have some defined boundaries or ranges of acceptable options, as embodied in a training manual, but the healer or provider may have immense room to use variations and his or her own judgment in individual interactions with specific patients. Many psychotherapies, for example, have a general framework and some well-defined features or boundaries, but the specific words used or issues raised at any point in time in a therapy session may differ. These decisions are up to the therapist and are based on a combination of formal training, experience, instinct, and immediate feedback from the patient. It is extremely difficult to study the effectiveness of a specific utterance or even sequence of microlevel interactions between the therapist and the patient, but it may be possible to study the effectiveness of an individual therapist or the approach to therapy taken as a whole. Similarly, in some CAM modalities, it will not be possible to study the effectiveness of a specific maneuver performed in the context of a 30-minute hands-on interaction with a patient (e.g., massage), but it may be possible to evaluate the effectiveness of the approach taken as a whole in comparison with that of some alternative approach to the same problem.
INNOVATIVE STUDY DESIGNS TO ASSESS TREATMENT EFFECTIVENESS OF CAM1
Addressing the special challenges mentioned above for research in CAM will require a broadening of thinking about the types of study designs that can produce valid evidence of treatment effectiveness. RCTs and systematic reviews of multiple RCTs will still stand as the “gold standard” of evidence when the key questions have to do with treatment efficacy and when the treatment is amenable to the narrow definition, standardization, and the use of strict controls typical of RCTs. (See Chapter 5 for a discussion of such trials.) When RCTs cannot be done, however, or when the results of RCTs may not be generalizable to the real world of CAM practice, it will be necessary to use other study designs. Some of these options are described in the following sections.
For some CAM therapies for some patients, it may be possible to organize a series of off-on administrations of a specific therapy. For example, baseline pulmonary function can be tested in patients receiving a homeopathic therapy for hay fever or asthma; and then the therapy can be administered for a period of time, stopped for a period of time and replaced by a placebo, re-administered for a period of time, and so on. Unless the clinician and the patient are both active participants in the essential therapeutic process, both would be blinded as to whether an active treatment or a placebo was given. This may be feasible with many homeopathic or herbal treatments, but it may not be possible with manual manipulation or aromatherapy. The sequence of off and on may also be randomized within and across patients.
Treatments for stable, chronic conditions are best suited to this sort of study, as treatment effectiveness can be determined by the extent to which a defined outcome (seasonal allergies or asthma, in this example) varies with administration of the treatment under study. Inferences are cleanest when a short latency exists between treatment administration and effect and when the treatment has little or no long-lasting effect. When these conditions hold, an N-of-1 trial (a trial with a single subject) can provide strong evidence of the effectiveness of the treatment for that patient. Multiple N-of-1 trials of the same treatment with pooling of the results for adequate numbers of patients can provide the same kind of evidence of effectiveness that would be available through traditional RCTs, assuming that the patients were representative of some larger population to whom the results could be generalized. This approach would be particularly well-suited to CAM therapies that are highly individualized. Each N-of-1 trial, if successful, would provide evidence of the effectiveness of a specific treatment in that one patient; multiple successful trials would provide evidence of the effectiveness of the general concept or manual methods.
In most RCTs, patients who agree to participate in the trial also agree to accept randomization to study arms, that is, to active treatment or a placebo treatment. They receive the treatment to which they are randomized, regardless of any preferences that they may have. This kind of study may be difficult to carry out when treatments are already in widespread use, are generally presumed to be effective, or just seem that they should be either more effective or less risky. In these situations a “preference RCT” is appropriate and may also allow the effects of patient preferences on outcomes to be tested empirically.
In a preference RCT (Brewin and Bradley, 1989; McPherson and Britton, 1999; Pocock and Elbourne, 2000), a pool of eligible patients is first asked to indicate whether they have a preference among the treatments being compared. Those who have a preference are given that treatment. Those expressing no preference are randomized to a treatment arm as in a traditional RCT. If the pool of patients is sufficiently large, the design allows three sets of comparisons to be made among the treatments: (1) the effectiveness of different treatments among the randomized patients (which is the same as that in a traditional RCT); (2) the effectiveness of different treatments in those who chose those treatments; and (3) the effectiveness of a specific treatment in those randomized to it compared with the effectiveness in those who chose it. This analysis provides a stronger base from which to make inferences about the effects of treatments in routine daily practice, when patients typically receive a particular treatment on the basis of their preferences.
Wennberg and colleagues (1993) describe a pilot preference RCT in the atricle, Outcomes Research, PORTs, and Health Care Reform. The currently funded NIH Spine Patient Outcomes Research Trial (SPORT), which is in the final stages of recruiting, is another example of this design.
This type of study design may be useful for the study of many CAM modalities for which therapies are widely presumed by practitioners and the lay public to be safe and effective and patients may have existing preferences either for or against a specific therapy.
Observational and Cohort Studies
Observational and cohort studies involve the identification of patients who are eligible for study and who may receive a specified treatment but who may not choose the therapy received as part of the study. Problems with the inferences about effectiveness that can be drawn from observational studies are well known, but in some instances data from these studies may be the only or the best data available. One of the most well-known and recent examples of this comes from the Women’s Health Initiative (WHI). In response to observational data that hormone supplements may improve a woman’s health peri- and postmenopause, WHI prospectively evaluated the benefits and risks to women of taking hormones during menopause and concluded that the overall health risks exceeded the benefits (Rossouw et al., 2002).
The problems with causal inferences in studies with these designs mainly have to do with the possibility that unmeasured patient characteristics, not balanced by random assignment to treatment, may be the true cause of any effects observed (Little and Rubin, 2000). Methods that can be used to control for measured characteristics (e.g., analysis of covariance, linear
regression, and stratification analysis) have been available for many years, but methods that can be used to control for unmeasured characteristics are relatively newer. It is now possible to control for baseline patient characteristics (measured and unmeasured) in better ways by use of analyses like instrumental variable analysis (Hogan and Lancaster, 2004; Newgard et al., 2004; Leigh and Schembri, 2004; Mealli et al., 2004). A detailed discussion of these analytic methods is beyond the charge of this committee, but both methods allow valid causal inferences about treatment effectiveness to be drawn from observational studies.
Other study designs discussed in this chapter are prospective, that is, they identify a pool of eligible patients before treatment is given, and the patients are then monitored through the period of treatment with a series of structured and scheduled measurement instruments. For some questions about CAM treatment effectiveness, however, it may not be possible to mount a reasonable prospective study (for example, if there is no practical way of identifying patients with a defined health problem or identifying and recruiting patients before treatment begins). On the other hand, it may be useful to try to obtain evidence of effectiveness by evaluating data for large numbers of patients who have received the treatment in the past. A case-control study is one example of a study that starts with outcomes and works backwards.
A case-control study involves the identification of people with good or bad health outcomes (e.g., those with a serious illness and those without an illness, those who died of an illness and those who were cured, or those who had relief of chronic pain, and those who did not), and then the assessment of a large number of variables, including the treatments received, to identify the factors correlated with a good or a bad outcome.
The case-control design has a long history in epidemiology and public health; in many instances it is the only effective way of conducting a first inquiry into a presumed cause-effect relationship. The case-control design has important limitations: no matter how detailed and thorough the data collection may be, it is still possible that unknown or unmeasured variables may be the true cause of the differences in outcomes observed and that the relationships observed in the study are not truly causal (Gordis, 1996). Despite its limitations, a case-control study may be an effective way to begin a line of inquiry about treatment effectiveness in CAM, as long as the inquiry continues by use of studies with stronger prospective designs to confirm any presumed causal relationships determined from the findings of the case-control study.
Studies of Bundles or Combinations of Therapies
As mentioned above, it is uncommon for CAM treatments to be given alone, in the sense of either CAM monotherapy or CAM as a strict alternative to traditional medicine. Instead, most patients use a mix of CAM and conventional therapies simultaneously. Studying the effectiveness of one part of a complex mix of treatments is difficult, unless it is possible in the context of a complex study design to vary one part of a package of therapies while the rest of the package is held constant. In most instances, it will be difficult or impossible to isolate the effects of one part of a complex treatment package, but it may be possible to study the effectiveness of the bundle as a whole by using essentially any of the designs described in this section. This will not be fully satisfying to most scientists trained in Western reductionist traditions, but such studies may be adequate to help patients make informed decisions about treatment approaches or for health policymakers or insurance companies to make decisions about coverage and payment.
Some study designs and analytic methods, however, are better suited than others to unraveling the effects of specific parts of a complex treatment package. Observational studies with very large sample sizes can evaluate multiple instances of a large number of specific treatment combinations. They also allow the observation of many complex interactions between patient characteristics and treatment features. The choice of analytic method depends on the presumed underlying mathematics of the combined effects of vectors of patient, provider, treatment, and environmental factors. If these relationships are presumed to be basically linear and additive, then well-known multiple linear regression or logistic regression models can be used to achieve at least a first approximation to the causal relationships in question. A class of methods known as recursive partitioning may be appropriate if the relationships are presumed to be multiplicative or interactive, i.e., the effects of one variable depend on the presence, or value, of one or more other variables, a very likely assumption in many CAM studies in which the interactions among patient characteristics and treatments are presumed to be crucial. Again, a detailed discussion of this is beyond the scope of this report, but well-developed statistical methods, specifically designed to identify the interactive effects of large numbers of causal factors acting simultaneously on a defined outcome variable are available.
Studies of “Manualized” Therapies
Many CAM therapies involve the application of general concepts, theories, or methods but allow for considerable variation in the selection of a
specific intervention for a single patient at a single point in time, for example acupuncture and herbal medicine. In most instances this variation is an inherent part of the underlying philosophy of the CAM modality, as it allows the treatment to be tailored to the characteristics of the patient, his or her symptoms, the practitioner, and the time and place of treatment. It is not error or unwanted treatment variability; on the contrary, it may be part of the essence of the CAM approach to be evaluated.
The standardization of treatment characteristic of most clinical research in conventional medicine is therefore inappropriate for studies of these “manualized” therapies that make up part of CAM. By definition and theory, these treatments cannot be standardized in the same way in which drug treatments are standardized by substance, dose, and route and timing of administration. There is precedent for effectiveness research in this domain, however, most notably in psychotherapy (Wampold et al., 1997). In psychotherapy effectiveness research, a model, theory, or general approach is defined and standardized; but the specific utterances by the therapist and the content of interactions between therapist and patient vary.
Effectiveness studies can be conducted on those aspects of the manualized therapies that can be defined and standardized: one general approach versus another approach, one school of thought versus another school of thought, or one intensity or duration of treatment versus another. These studies would be examples of what Tunis et al. (2003) call “practical clinical trials.” With some CAM modalities, it may be possible to study the effectiveness of an approach, the school or the intensity of treatment, and the use of a no-treatment or a placebo control as the comparison group. When effectiveness has already been shown relative to the results for the no-treatment controls, studies can be designed to compare more specific features of the general approach or modality.
The designs used for these kinds of studies are not necessarily any different from those used for effectiveness studies in conventional medicine. RCTs, as well as studies with less well-controlled prospective or retrospective designs, may be possible. Statistical methods, outcome measures, sample sizes, and the scope of the conclusions that are drawn may also be essentially the same, because the essence of a typical study would be the comparison of an average outcome and variability in outcomes across two or more groups defined by differences in treatment approaches.
It will not be possible, however, to draw conclusions about any of the specific aspects of treatment that vary without constraint, nor will it be possible to draw conclusions about the effectiveness of an individual provider or therapist unless well-controlled N-of-1 study designs are used in which the individual therapist is the intervention being studied.
Placebo or Expectation Effects
Many CAM modalities include patients’ hopes, expectations, emotional states, energies, and other self-healing processes as part of their core “mechanisms of action.” Studies of effectiveness of these modalities and therapies cannot consider these factors to be extraneous confounders that are separate from the mechanism(s) of action being tested, as would typically be the case in effectiveness research in conventional medicine.
If the core research question in a CAM effectiveness study involves the identification of a mechanism of action apart from or in addition to nonspecific placebo or expectation effects, then a traditional two-arm study comparing a particular treatment to placebo control would be appropriate. Studies of herbal remedies with inert substances in the control condition or studies of acupuncture with sham-treated controls (Biella et al., 2001) would be examples of this kind of study design. The only CAM therapies or modalities for which this design would not be appropriate would be those that do not claim any mechanism of action other than the patients’ own expectations or self-healing processes.
It is also possible to design studies that specifically manipulate the nonspecific placebo or expectation effects to determine whether variation of the “dose” of this variable can influence outcomes. For example, Pollo et al. (2001) conducted a study of how different expectations can produce different analgesic effects. Three groups of patients were treated with buprenorphine, given on request for 3 consecutive days, plus a basal intravenous infusion of saline solution; however each group was given different information about the basal infusions. Group A was told nothing; Group B was told that the infusion was either a powerful painkiller or a placebo; and Group C was told that it was a powerful painkiller. The results are shown in Table 4-1.
The investigators concluded that “different verbal instructions about certain and uncertain expectations of analgesia produce different placebo analgesic effects, which in turn, trigger a dramatic change of behavior leading to a significant reduction of opioid intake” (Pollo et al., 2001).
Given that expectation or placebo effects are generally presumed to
TABLE 4-1 Effect of Expectation on Analgesic Effects
Mean Dose (mg) of Buprenorphine Administered
1.15 ± 1.14
0.91 ± 0.11
0.76 ± 0.15
work in a positive direction, it is difficult to imagine an ethically defensible study design in which expectations were specifically manipulated in negative directions (i.e., telling patients that a treatment does not work). Accrual to such a study or the willingness of patients to accept random assignment to a study arm described in that way would presumably be challenging. The ethical and practical limits to manipulation of expectation effects is probably the absence of expectation. Even this limit will be difficult to reach in many studies of CAM effectiveness if the modalities or therapies are widely believed to be effective in the general population.
Even for CAM modalities whose mechanisms of action are largely or exclusively patient expectations or self-healing processes, it may be possible to design studies that compare the relative abilities of two or more modalities to activate those processes and produce measurable health benefits. For example, an ongoing study of patients with irritable bowel syndrome funded by the National Center for Complementary and Alternative Medicine is exploring whether placebo effects (via a sham acupuncture treatment) can be enhanced through variations in patient-provider contexts.
Attribute-Treatment Interaction Analyses
Attribute-treatment interaction analyses is not a study per se but is a way of analyzing data from studies with other designs. A likely result of effectiveness studies in both CAM and conventional medicine, almost regardless of study design, is variability in outcomes among patients within a study and among different studies. This variability leads to questions about reasons for the variability, which can often be expressed by analysis of the subgroups in which the treatments are relatively more or less effective. These analyses are referred to as “attribute-treatment interaction analyses” (Caspi and Bell, 2004a,b).
Because most effectiveness trials are designed with sufficient power to detect differences at the level of the sample as a whole, most subgroup analyses are exploratory in nature, with the conclusions subject to confirmation in more definitive studies conducted later. A variety of statistical methods are available to perform these analyses (for example, see the earlier discussion of recursive partitioning methods); these methods would not be fundamentally different in studies of the effectiveness of CAM than in studies of the effectivenes of conventional medicine. The variables used to classify patients would probably be different, however, since diagnostic and other clinical labels identifying meaningful categories of patients would be different between CAM modalities and conventional medicine and among CAM modalities.
Qualitative methods are not an alternative design to address effectiveness questions but are a way to make better decisions about measurement, sampling, recruitment, and other aspects of a study design. Questions about treatment effectiveness in CAM and in conventional medicine are typically quantitative in nature and involve assessments of more or less of some defined outcome characteristics among patients treated in one way versus another. Evidence for treatment effectiveness in both CAM and conventional medicine therefore typically comes from quantitative studies that use the designs and methods discussed above.
Qualitative research (ethnographic studies, focus groups, and in-depth interviews) cannot generally provide direct evidence of treatment effectiveness because of the relatively small sample sizes, the retrospective versus the prospective nature of participant recruitment and sampling, the absence of random assignment of patients to treatment conditions, and the use of open-ended versus categorical or close-ended data collection formats.
Qualitative research can, however, provide extremely valuable information to help interpret the results of effectiveness studies or to design those studies in the best possible way. Qualitative methods can be used to
understand the types of patients who use a particular CAM modality, their reasons for using that modality (including perceived effectiveness), and the circumstances or conditions of use;
understand other treatments that those patients may be using in addition to the specific modality being studied;
understand patients’ and practitioners’ definitions of and criteria for treatment effectiveness;
identify factors that may predict better or worse effectiveness (e.g., different levels of patient expectations and better or worse therapist-patient interactions); and
understand patients’ and providers’ models of health and illness and how those models influence CAM use and assessment of treatment effectiveness.
USE OF BOTH TRADITIONAL AND INNOVATIVE STUDY DESIGNS TO CREATE A RICH BODY OF KNOWLEDGE
The committee does not wish to recommend a single study design that is inevitably superior to others or to recommend that studies of treatment effectiveness in CAM always be conducted in a specific way. Alternative study designs have combinations of strengths and weaknesses; the richest information source will be the combined results of studies with several
different designs if the strengths of one complement the weaknesses of another. Classic RCTs, for example, will provide strong evidence of cause-and-effect relationships in carefully controlled circumstances, but ideally, the results would be complemented by the results of outcomes or effectiveness studies if the fundamental questions have to do with treatment effectiveness in real-world practice settings (IOM, 1999; Jonas and Linde, 2002).
The use of a variety of study designs to produce a rich, complementary body of evidence for specific treatments or modalities is a desirable approach, but in practice, only limited amounts of money and time are available for effectiveness studies. Study sponsors may have to choose between traditional and innovative study designs, at least at any one point in time, if trials are expensive and budgets are limited.
In those circumstances, trade-offs need to be examined in the context of the question(s) being addressed. If the fundamental question is one of safety, then a surveillance design capable of picking up rare but serious events is indicated. If the therapy is relatively new and unknown and the key questions have to do with efficacy, then a traditional RCT design would fit. If efficacy is accepted but the questions to be addressed have to do with effectiveness across a range of providers and settings, then a large outcomes study aimed at identifying determinants of good and poor outcomes may be indicated. If the key questions have to do with cost-effectiveness, then a more tightly focused outcomes study (i.e., one with fewer patients, providers, or treatment sites) that includes explicit collection of cost data will be required.
RELATIONSHIP BETWEEN BASIC RESEARCH AND CLINICAL RESEARCH
For many treatments, the results of RCTs or other types of clinical studies are the culmination of a much larger sequence of basic research studies that grow out of, contribute to, and increase the understanding of fundamental biological mechanisms of illness. Clinical trials of newer therapies for peptic ulcer, for example, were built on years of basic research on the roles of bacteria and acids in the generation of ulcers. Clinical trials of statins for the treatment of cardiovascular disease were based on years of basic research on the role of cholesterol in cardiovascular disease, and studies of new treatments based on reducing inflammation in coronary arteries will follow basic research on the role of inflammatory processes in the progress from coronary artery disease to acute myocardial infarction.
A crucial synergy exists between basic and clinical research. Basic research seeks to expand knowledge and understanding of the biological mechanisms of illness and treatment. Much of clinical research builds on the results of basic research to determine whether treatments based on new
concepts of illness and treatment can produce measurable benefits in defined groups of patients. Findings from clinical research may reinforce the insights gained from basic research or may reveal surprising results that lead to new questions or hypotheses to be tested in laboratory studies. Federal funding agencies (primarily NIH) support a balance of basic and clinical research studies, recognizing that the synergy between the two is crucial to advancing the fundamental science base of medicine. For NIH as a whole, one-third of the funding committed to research is spent on clinical research; for the NIH National Center for Complementary and Alternative Medicine “the ratio of clinical to basic research funding over time was 4:1 in FY 2000, 3:1 in FY 2001, 2.6:1 in FY 2002, 2.5:1 in FY 2003, and will likely fall a little further in 2004” (NCCAM, 2004).
A future strategy for funding CAM research will have to address questions about an appropriate balance between basic and clinical research and related questions about the available infrastructures for both types of studies. For example,
Should reviewers of proposals for clinical studies in CAM require that there be a foundation of basic research on the underlying mechanisms for the therapy being studied? If so, what must that foundation include? How extensive should it be? Should there be evidence of new insights or breakthroughs, or would it be sufficient for there to be a widely accepted theory (within the relevant provider community) about underlying mechanisms of treatment action?
Should special requests for proposals be issued for studies of the basic biological mechanisms of specific CAM therapies? If so, for which therapies and which mechanisms should they be issued? Should there be an emphasis on therapies or modalities for which there is significant disagreement about their basic mechanisms in the relevant CAM provider community, or should there be an emphasis on therapies or modalities in which there is general consensus among CAM providers but significant skepticism or lack of understanding of the basic mechanisms among traditional biological scientists?
If support is given to basic research in CAM, would it be required that the results of the studies have some direct relevance to either current or new CAM treatments, or should support be provided to “knowledge for its own sake”?
As a condition for funding a body of clinical research on a specific CAM modality, should NIH require some minimum level of ongoing related basic research to expand knowledge of the underlying mechanisms? Or are there CAM modalities for which it would be acknowledged that such basic studies are either unnecessary or impossible to conduct but that clinical studies would be useful nonetheless? In other words, in most clini-
cal studies there is an implicit understanding that a failure of the experimental treatment to produce the expected effect will call into question the assumptions made about the underlying mechanisms and will require the investigators to go back to the drawing board. This may not be the case for some CAM modalities.
If there is an absence or shortage of existing infrastructure (facilities, trained investigators, or a supportive academic environment) for basic research on an important CAM modality, should the funding strategy emphasize infrastructure development before specific research projects?
CONCEPTUAL MODELS TO GUIDE RESEARCH
Federal agencies supporting research on the effectiveness of CAM therapies may adopt one or more of a variety of conceptual models to guide their decision making about a research agenda and then on the subsequent task of translating research findings into practice guidelines or public policy decisions. The following sections describe several of the possible conceptual models.
Basic Science Excellence
In the basic science excellence model, the highest priority is given to projects that may provide significant breakthroughs in or enhancements of understanding of fundamental biological mechanisms. The concept can be extended to funding decisions about clinical research, in which a conscious choice would be made to fund studies that shed light on underlying mechanisms in preference to those that address only more limited efficacy or effectiveness questions.
Quality of Evidence
In the quality of evidence model, a well-designed study is more important than the ability of a study to shed new light on basic biological processes or mechanisms of treatments. The most important criteria used to make funding decisions are sample size, blinding of study participants, the use of clean methods of data collection and sophisticated methods of data analysis, statistical power, and the clarity of the inferences. An elegant, clean, powerful study addressing a relatively mundane question would be preferred over a less well-designed study addressing a more intriguing question.
Cost-effectiveness could actually refer to two different conceptual models. On the one hand, cost-effectiveness could refer to a property of the treatment or CAM modality in question. One could preferentially study CAM modalities with known or expected relatively good cost-effectiveness. Or, one could design studies to assess the cost-effectiveness of a modality or a specific therapy and require that clinical studies include a cost-effectiveness component to be funded. On the other hand, the term could refer to a property of the studies being proposed. A relatively explicit calculation of study cost versus the value of the information to be gained would be done, and only those studies with the best balance would be funded, regardless of other considerations.
Consumer preference also has two potential meanings. First, one could design a funding strategy based on the current or potential popularity of CAM modalities or specific treatments. Studies of the most popular or widely used therapies would receive funding preference, under the assumption that it would be more important to gain knowledge of treatment efficacy or effectiveness in those areas than elsewhere. Second, one could preferentially fund studies in which patient preferences would be specifically included. Funding agencies might solicit proposals for preference RCTs so that the results of the studies would perhaps be more generalizable to daily clinical practice, in which patient preferences and expectations are part of the milieu that affects treatment outcomes.
CONCLUSIONS AND RECOMMENDATIONS
This chapter has explored the characteristics of CAM treatments and modalities that make it difficult to apply the traditional RCTs or treatment-effectiveness studies used in conventional medicine. These characteristics include the use of multiple therapies (both CAM and conventional medicine) at the same time, individualization of therapies, the importance of the therapist to the outcome, placebo or expectation effects, the different outcomes valued, and manual treatments. The chapter has also discussed study designs that might be used to address some of these characteristics including N-of-1 trials, preference RCTs, observational and cohort studies, case-control studies, studies of bundles or combinations of therapies, and attribute-treatment interaction analyses. Qualitative research can also help to increase understanding of such things as the types of patients who use
particular CAM therapies, their motivations for the use of such therapies, and how they understand health and illness.
The committee believes that it is desirable to use a variety of study designs in the conduct of research of CAM therapies. Given the limited amount of funding available for clinical studies of CAM therapies, decisions about what to evaluate should be made on the basis of one or more of the following criteria. Clearly, no intervention will meet all criteria and a therapy should not be excluded from consideration because it does not meet any one particular criterion, for example, biological plausibility. However, the absence of such a mechanism inevitably will raise the level of skepticism about the potential effectiveness of a treatment (whether conventional or CAM) and will increase both the basic research needed to justify funding for clinical studies and the level of evidence from clinical studies needed to consider a treatment as “established.”
A biologically plausible mechanism exists for the intervention but it is recognized that the science base on which plausibility is judged is a work in progress.
Research could plausibly lead to the discovery of biological mechanisms of disease or treatment effect.
The condition is highly prevalent (e.g., diabetes mellitus).
The condition causes a heavy burden of suffering.
The potential benefit is great.
Some evidence that the intervention is effective already exists.
Some evidence that there are safety concerns exists.
The research design is feasible and research will likely yield an unambiguous result.
The target condition or the intervention is important enough to have been detected by existing population surveillance mechanisms.
Should CAM be held to the same standards of evidence as conventional medicine? Regardless of the specific choices made about study design, whether it be traditional or innovative, a question that the committee addressed was whether CAM therapies should be held to the same standards of evidence as medications, surgical procedures, or other therapies used in conventional medicine. By the “same standards of evidence,” the committee means that an insurance company would require “A-level evidence” (that is, evidence derived from consistent findings from multiple RCTs), for example, to include specific herbal therapies in a pharmacy benefit or formulary if they required A-level evidence for coverage of prescription drugs.
Research on treatment effectiveness is research about cause-effect relationships between the provision of particular treatments and defined pa-
tient outcomes. That is, the hypothesis being tested in effectiveness research is that Treatment A produces Health Benefit Y. Although CAM and conventional medicine may differ in terms of the nature of the treatments provided and the presumed mechanisms by which treatments produce beneficial effects, there is no fundamental diference in the basic nature of either the cause-effect relationships being tested or the major domains of patient outcomes being studied. Therefore,
The committee recommends that the same principles and standards of evidence of treatment effectiveness apply to all treatments, whether currently labeled as conventional medicine or CAM. Implementing this recommendation requires that investigators use and develop as necessary common methods, measures, and standards for the generation and interpretation of evidence necessary for making decisions about the use of CAM and conventional therapies.
Currently, CAM and conventional medicine are viewed as two separate sources of ideas to investigate for possible inclusion of therapies in the evidence-based interventions for comprehensive care. The fact that these are viewed separately implies that different principles and standards of evidence are applied. The committee believes that whether the source of an idea is CAM or whether it is conventional medicine, the same principles and standards of evidence should apply. There are unproven ideas of all kinds, both conventional and CAM, which should be studied using a variety of methods. The results of these studies then move the therapies from unproven ideas to evidence-based practice or comprehensive care.
Chapter 3 of this report discusses three different hierarchies of evidence. Hierarchies of evidence are helpful in making judgments about a body of evidence and address the public’s need for advice about how to identify better quality studies. Not all CAM modalities are easily amenable to evaluation, however, and the committee noted, that there are several considerations involved in applying levels-of-evidence concepts. These include
The importance of carefully defining the treatment or modality being studied. A given study may be designed to provide evidence on, for example, the effectiveness of a specific batch of an herbal product, a formulation of that product that is unique to a specific manufacturer but presumably consistent over time, an herb in general (e.g., St. John’s wort), or the whole concept of herbal medicine. In RCTs in conventional medicine, the treatment or modality being studied is typically very narrowly defined, for example, a specific dose, timing of administration, and route of administra-
tion of a specific compound. The application of this concept to some CAM modalities in which treatments are tailored may lead to a host of “N-of-1” RCTs.
The significance of characteristics of the provider as well as the treatment. Controlled trials of surgical procedures have been done less frequently than studies of medications because it is much more difficult to standardize the process of surgery. Surgery depends to some degree on the skills and training of the surgeon and the specific environment and support team available to the surgeon. A surgical procedure in the hands of a highly skilled, experienced surgeon is different from the same procedure in the hands of an inexperienced and unskilled surgeon (Hu et al., 2003). For many CAM modalities, it is similarly difficult to separate the effectiveness of the treatment from the effectiveness of the person providing the treatment. Indeed, the idea of conceptual separation of treatment and provider would seem foreign for those modalities. The designs of studies of CAM modalities that involve the active participation of a “healer” must incorporate the characteristics of that person as well as the characteristics of the treatment being applied by that person.
Different underlying theoretical and diagnostic systems. Concepts of levels of evidence and evidence-based medicine in conventional medicine rely on a generally accepted diagnostic classification system that is embodied in formal diagnostic systems like the International Classification of Diseases-Version 10 (ICD-10) and the Diagnostic and Statistical Manual-Version IV (DSM-IV). It will be somewhat challenging to apply similar study designs, measures of clinical endpoints, and standards of evidence to therapies that use different diagnostic systems and therefore to identify different sets of patients as the group to whom the study results apply. It will be even more challenging to apply these concepts to any CAM modalities that emphasize the uniqueness of each individual patient and that patient’s complex of symptoms and to avoid diagnostic classifications entirely.
Endpoints like feelings of emotional or spiritual well-being that are difficult to measure. The most important dependent variables in many CAM modalities will be hard to define in objective terms and may vary from patient to patient (Jonas and Linde, 2002). A study of whether acupuncture is effective for patients with cancer may not be able to focus on mortality or shrinkage of tumors but, instead may have to focus on questions of whether the patients feel relief of pain and other symptoms and whether they feel more in control of their illness and are better able to manage the cancer along with their other daily tasks.
Difficult or impossible to conduct double-blind trials with some modalities. The concept of blinding in which the patients and the treating clinicians participating in clinical trials do not know what treatment the
patient is receiving is an important way to minimize expectation effects and biases on the part of both the patient and the clinician. For most CAM modalities, however, blinding is very difficult or impossible.
A CAM research portfolio with a variety of types of studies will provide a great deal of knowledge about the use of CAM therapies by the American public. The next chapter discusses what is known about efficacies of some CAM therapies, identifies existing gaps, and proposes a framework that can be used to conduct research on CAM.
Biella G, Sotgiu ML, Pellegata G, Paulesu E, Castiglioni I, Fazio F. 2001. Acupuncture produces central activations in pain regions. Neuroimage 14(1 Pt 1):60–66.
Brewin CR, Bradley C. 1989. Patient preferences and randomised clinical trials. BMJ 299(6694):313–315.
Caspi O, Bell IR. 2004a. One size does not fit all: Aptitude-Treatment Interaction (ATI) as a conceptual framework for CAM outcome research. Part I. What is ATI research? J Altern Complement Med 10(3).
Caspi O, Bell IR. 2004b. One size does not fit all: Aptitude-Treatment Interaction (ATI) as a conceptual framework for CAM outcome research. Part II. Research designs and their applications. J Altern Complement Med 10(4).
Eisenberg DM, Kessler RC, Foster C, Norlock FE, Calkins DR, Delbanco TL. 1993. Unconventional medicine in the United States: Prevalance, costs, and patterns of use. N Engl J Med 328(4):246–252.
Eisenberg DM, Davis RB, Ettner SL, Appel S, Wilkey S, Van Rompay M, Kessler RC. 1998. Trends in alternative medicine use in the United States, 1990-1997: Results of a follow-up national survey. JAMA 280(18):1569–1575.
Gordis L. 1996. Epidemiology. Philadelphia, PA: W.B. Saunders Company.
Hogan JW, Lancaster T. 2004. Instrumental variables and inverse probability weighting for causal inference from longitudinal observational studies. Stat Methods Med Res 13(1): 17–48.
Hu JC, Gold KF, Pashos CL, Mehta SS, Litwin MS. 2003. Role of surgeon volume in radical prostatectomy outcomes. J Clin Oncol 21(3):401–405.
IOM (Institute of Medicine). 1999. Gulf War Veterans: Measuring Health. Washington, DC: National Academy Press.
Jonas WB, Linde K. 2002. Conducting and Evaluating Clinical Research on Complementary and Alternative Medicine. In: Gallin JI, ed. Principles and Practice of Clinical Research. San Diego, CA: Academic Press. Pp. 401–426.
Krieger D. 1998. Dolores Krieger, RN, PhD healing with therapeutic touch. Interview by Bonnie Horrigan. Altern Ther Health Med 4(1):86–92.
Leigh JP, Schembri M. 2004. Instrumental variables technique: Cigarette price provided better estimate of effects of smoking on SF-12. J Clin Epidemiol 57(3):284–293.
Little RJ, Rubin DB. 2000. Causal effects in clinical and epidemiological studies via potential outcomes: Concepts and analytical approaches. Annu Rev Public Health 21:121–145.
McPherson K, Britton A. 1999. The impact of patient treatment preferences on the interpretation of randomised controlled trials. Eur J Cancer 35(11):1598–1602.
Mealli F, Imbens GW, Ferro S, Biggeri A. 2004. Analyzing a randomized trial on breast self-examination with noncompliance and missing outcomes. Biostatistics 5(2):207–222.
NCCAM (National Center for Complementary and Alternative Medicine). 2004. National Center for Complementary and Alternative Medicine: The First Five Years. Bethesda, MD: National Institutes of Health.
Newgard CD, Hedges JR, Arthur M, Mullins RJ. 2004. Advanced statistics: The propensity score—a method for estimating treatment effect in observational research. Acad Emerg Med 11(9):953–961.
Park CM. 2002. Diversity, the individual, and proof of efficacy: Complementary and alternative medicine in medical education. Am J Public Health 92(10):1568–1572.
Pocock SJ, Elbourne DR. 2000. Randomized trials or observational tribulations? N Engl J Med 342(25):1907–1909.
Pollo A, Amanzio M, Arslanian A, Casadio C, Maggi G, Benedetti F. 2001. Response expectancies in placebo analgesia and their clinical relevance. Pain 93(1):77–84.
Rossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, Stefanick ML, Jackson RD, Beresford SA, Howard BV, Johnson KC, Kotchen JM, Ockene J. 2002. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: Principal results from the Women’s Health Initiative randomized controlled trial. JAMA 288(3):321–333.
Tunis SR, Stryer DB, Clancy CM. 2003. Practical clinical trials: Increasing the value of clinical research for decision making in clinical and health policy. JAMA 290(12):1624–1632 .
Wampold BE, Mondin GW, Moody M, Stich F, Benson K, Ahn H. 1997. A meta-analysis of outcome studies comparing bona fide psychotherapies: Empiricially, “all must have prizes.” Psychol Bull 122(3):203–215.
Wennberg JE, Barry MJ, Fowler FJ, Mulley A. 1993. Outcomes research, PORTs, and health care reform. Ann NY Acad Sci 703:52–62.
Wolsko PM, Eisenberg DM, Davis RB, Ettner SL, Phillips RS. 2002. Insurance coverage, medical conditions, and visits to alternative medicine providers: Results of a national survey. Arch Intern Med 162(3):281–287.