3
Contemporary Approaches to Evidence of Treatment Effectiveness: A Context for CAM Research

Evidence of treatment effectiveness from clinical research has become integral to effective clinical care. This chapter provides a context for the committee’s recommendations about research on complementary and alternative medicine (CAM) that appear in Chapters 4 and 5. A brief account of the development of present approaches to evidence-driven clinical and public policy (which includes practice guidelines and coverage policy) is presented. This is followed by a description of the basic ideas of clinical research design, including a taxonomy of study design and a taxonomy of outcome measurements. An account of some features of contemporary data analysis follows. The chapter concludes with an overview of the applicability of contemporary clinical research methods to some CAM therapies.

A BRIEF ACCOUNT OF THE DEVELOPMENT OF TREATMENT EFFECTIVENESS RESEARCH

As noted in Chapter 1, over the past twenty years practitioners of conventional medicine have made a marked shift from a reliance on experience (directly observed or as recorded by others in medical journals) to a reliance on more rigorous research to evaluate the effectiveness of treatments. For example, the concept of formal evaluation of therapies through randomized controlled trials is certainly not new (Kaptchuk and Kerr, 2004) but has regularly been applied in Western medicine only since World War II (Byar, 1980; Cochrane, 1972).

Some notable exceptions to reliance on experience exist, however. In the middle of the nineteenth century, Florence Nightingale pioneered the



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 74
Complementary and Alternative Medicine in the United States 3 Contemporary Approaches to Evidence of Treatment Effectiveness: A Context for CAM Research Evidence of treatment effectiveness from clinical research has become integral to effective clinical care. This chapter provides a context for the committee’s recommendations about research on complementary and alternative medicine (CAM) that appear in Chapters 4 and 5. A brief account of the development of present approaches to evidence-driven clinical and public policy (which includes practice guidelines and coverage policy) is presented. This is followed by a description of the basic ideas of clinical research design, including a taxonomy of study design and a taxonomy of outcome measurements. An account of some features of contemporary data analysis follows. The chapter concludes with an overview of the applicability of contemporary clinical research methods to some CAM therapies. A BRIEF ACCOUNT OF THE DEVELOPMENT OF TREATMENT EFFECTIVENESS RESEARCH As noted in Chapter 1, over the past twenty years practitioners of conventional medicine have made a marked shift from a reliance on experience (directly observed or as recorded by others in medical journals) to a reliance on more rigorous research to evaluate the effectiveness of treatments. For example, the concept of formal evaluation of therapies through randomized controlled trials is certainly not new (Kaptchuk and Kerr, 2004) but has regularly been applied in Western medicine only since World War II (Byar, 1980; Cochrane, 1972). Some notable exceptions to reliance on experience exist, however. In the middle of the nineteenth century, Florence Nightingale pioneered the

OCR for page 74
Complementary and Alternative Medicine in the United States application of epidemiological and statistical methods to the study of hospital deaths, and her discoveries of a plausible causal relationship between processes of care and outcomes led to challenges to thoughts about mechanisms of disease prevalent at the time, changes in clinical practice, and improvements in mortality rates. In the early twentieth century, Ernest Amory Codman, a Boston surgeon, argued strongly for the formal study of surgical outcomes in an effort to understand which surgeons, hospitals, and surgical procedures produced good versus bad outcomes (Neuhauser, 2002). This effort did not take root and grow—in fact, it provoked significant hostility among Codman’s colleagues—but it raised the question of the need for formal analysis of treatment outcomes that was picked up again more than 50 years later. The need for formal evidence of effectiveness for common medical and surgical interventions was recognized much more broadly beginning in the 1970s. Passage of Medicare and Medicaid legislation, together with apparent advances in medical and surgical care, contributed to a surge in health care spending. Policy makers and payers asked increasingly pointed questions about the “value” of health care, questions that required more fundamental questions about the effectiveness of interventions to be answered. Even more disquieting questions emerged from a body of work that described striking variations in the rates of common surgical procedures such as surgery for benign prostate disease, hysterectomy, and tonsillectomy, among seemingly similar geographic regions. This work began in the late 1960s in northern New England, where isolated hospital market areas could easily be defined (Wennberg and Gittelsohn, 1982). International differences in the rates of medical procedures were observed; and when the variations within countries were adjusted for the overall rate of variation among countries, a consistent pattern was detected; a high degree of variation was a marker for a high degree of discretion, and a high degree of discretion was often explained by professional uncertainty about effectiveness (McPherson et al., 1982). At the time, few of the procedures in question had been subjected to randomized controlled clinical trials or other credible clinical studies. Subsequently, variations in the rates of medical admission, physician visits, and diagnostic tests that could not be explained by clinical variables were also found. Taken together, these findings raised new questions about the science base of clinical practice. If decisions were based on science, how could it be that treatment depended more on where one lived than what was wrong or what one cared about? Policy makers wondered if high rates meant overuse and economic waste and if low rates meant underuse and deprivation. “Which rate was right?” became the pressing policy question and the answer required a new investment in clinical research to better define the outcomes of common interventions for common conditions. Thus, the practice variation phenomenon

OCR for page 74
Complementary and Alternative Medicine in the United States provided the motivation and the rationale for the development of “outcomes research.” The goal of outcomes research was to determine what was known and what was not known about common interventions, thereby setting research agendas for common conditions. Existing evidence was systematically reviewed by using techniques for combining data from different studies previously described in the social sciences. Claims data linked to Social Security Administration mortality and other administrative data were used to glean outcomes information to fill the gaps in knowledge that existed at the time. Patient surveys were conducted to capture patients’ subjective responses to treatments and outcomes. Variations in these responses highlighted the importance of patients’ preferences as a source of warranted variation in clinical decisions. Decision models (a model is a representation of reality) were constructed to test the relative sensitivities of decisions to key probabilities of good or bad outcomes and to patient preferences. Decision support tools were developed; and trials were designed to help patients and doctors choose among treatment options, including randomization in a trial when a well-informed patient was at equipoise, that is, finding each treatment equally acceptable. Other investigators used consensus methods to develop appropriateness criteria that were then applied to the medical records of patients who had undergone procedures with high rates of variation by geographic region. It was common for procedures for 30 percent or more of patients’ indications to be deemed inappropriate. The proportions of procedures deemed inappropriate was essentially the same in high- and low-volume settings, so the low-volume providers were not simply doing a better job of selecting only appropriate cases. This work was extended with a focus on guideline development. Professional organizations such as the American College of Physicians instituted rigorous guideline development processes, increasing recognition of the severe limitations of the evidence base for the treatment of common conditions. Evidence of Effectiveness for Prescription Drugs The limited evidence base for surgery and other procedures for the common conditions targeted by outcomes research contrasted sharply with the richer body of evidence for medical therapies. This difference can best be understood in the context of the regulation of medications that began in 1906 with the Pure Food and Drug Act, which made misrepresentation of ingredients illegal and which recognized the standardized drug formulae registered in a national formulary or pharmacopoeia. The 1906 act was silent on drug safety and efficacy.

OCR for page 74
Complementary and Alternative Medicine in the United States The Federal Food, Drug, and Cosmetic Act of 1938 extended safeguards by introducing the distinction between prescription and over-the-counter drugs and requiring pharmaceutical manufacturers to prove drug safety information before the drug could be released for use. This was a direct response to the elixir sulfanilamide tragedy, in which 107 people, most of them children, died when a new sulfa preparation was distributed without testing of the preparation for safety. In 1951, the Durham-Humphrey Amendment (U.S. Statutes at Large, 1951) made it clear that the classification of drugs as prescription was up to the Food and Drug Administration (FDA), not manufacturers. In its initial form, the amendment authorized FDA to test drugs for efficacy as well as safety. However, the efficacy requirement was eventually removed before passage of the amendment. The next significant change in legislation followed the thalidomide disaster in Europe, which was narrowly averted in the United States and which prompted passage of the Kefauver-Harris Amendment (U.S. Statutes at Large, 1962) in 1962. All clinical testing procedures had to be approved by FDA and had to demonstrate efficacy as well as safety. The pharmaceutical industry resisted the efficacy requirement, especially the retroactive evaluation of drugs. However, in 1970, in the court case Upjohn Co. v. Finch (422 F.2d 944, 955 [6th Cir. 1970]), the Court of Appeals ruled that commercial success alone did not constitute substantial evidence for efficacy in the case of the Upjohn drug Panalba. Evidence of efficacy as well as of safety had become an enforceable standard for prescription drugs. Over the ensuing decades, the pharmaceutical industry and clinical research organizations (CROs1) rapidly built the capacity to conduct those clinical trials necessary to meet the standards set by FDA. More Recent Developments: Evidence-Based Medicine As the need for evidence became more evident and funding for clinical research became more available, many academic settings emphasized development and use of methods for gathering clinical evidence. The multidisciplinary collaborations that formed with federally funded “patient outcomes research teams” matured as methodological expertise in clinical epidemiology, decision theory, and other domains of quantitative and qualitative research was honed. The “clinimetrics” work of Alvan Feinstein at Yale attracted more attention. David Sackett and colleagues in Canada and 1   CROs provide a wide range of research and development services. CROs assist pharmaceutical, biotechnology, and medical device companies to produce new medicines and new treatments (www.acrohealth.org).

OCR for page 74
Complementary and Alternative Medicine in the United States later England defined clinical epidemiology as the “basic science for clinicians.” Courses were devised to teach students and physicians how to critically appraise the medical literature. Fellowship and clinical scholar programs turned out clinical scientists with the skills necessary to generate and accurately interpret evidence. The Canadian Task Force on the Periodic Physical Examination and later the U.S. Preventive Services Task Force developed ratings for “levels of evidence,” which are described in detail later in this chapter. The Cochrane Collaboration was formed in Oxford, England, to systematically examine evidence for the full breadth of medical practice and quickly attracted an extensive international following. Journals with the titles Evidence-Based Medicine and Evidence-Based Health Care appeared. By the mid-1990s, the concept of formal, scientific evidence of treatment effectiveness had arrived, at least in some circles. The goal of evidence-based medicine is to ensure that, to the extent possible, individual clinical decisions and broader health policy decisions about tests and treatments be based on the published results of rigorous studies of efficacy and effectiveness. Because not all treatments have been subjected to formal study and because some treatments cannot be studied without investment in massive clinical trials, it will not be possible to base all treatment decisions on published evidence. Nevertheless, the concept of evidence-based medicine asks that decisions be based on published scientific evidence when it is available and that investments be made to gather evidence in as many areas of medical care as possible. In conventional medicine, there is now a general acceptance of the need to carefully study the effectiveness of tests and treatments, even those that have already become frequently used. Just in the past 3 years, prominent studies have challenged the effectiveness of bone marrow transplantation and high-dose chemotherapy for breast cancer (Farquhar et al., 2003), arthroscopic surgery for osteoarthritis of the knee (Moseley et al., 2002), and the use of estrogen replacement therapy during menopause (Rossouw et al., 2002). There are clearly some treatments for which evidence of effectiveness is immediate and compelling. It may be unnecessary or even unethical to conduct formal effectiveness trials in a variety of situations, for example, when a treatment results in a combination of a clear reversal or elimination of a disease process, has a short latency of noticeable effect, is nearly universally effective in all patients treated, and eliminates clinical symptoms. The use of penicillin in the mid-1940s, surgery for appendicitis, and resection of localized cancers all stand as examples of this sort of undisputed effectiveness. Even in these examples, there may be value in conducting long-term surveillance studies to detect rare or late complications or side effects, and it may be appropriate to conduct formal cost-effectiveness or cost-benefit studies.

OCR for page 74
Complementary and Alternative Medicine in the United States At the other end of the spectrum are those interventions that have modest effects, if any at all. It is these interventions that require studies with rigorous design and of rigorous execution to determine whether an effect does indeed exist and to estimate its size. The next section examines a variety of research methods available for use in conducting clinical effectiveness research. BASIC FEATURES OF CONTEMPORARY CLINICAL EFFECTIVENESS RESEARCH A Taxonomy of Clinical Research Methods Many factors can influence the outcome of treatment. These include the treatment itself, characteristics of the patient (such as age, gender, and comorbid conditions), other treatments, access to care, adherence to treatment plans, socioeconomic status and education, and the skill of the practitioner. In treatment effectiveness research, the goal is to evaluate the contribution of one of these factors, treatment, to determine whether treatment makes a difference. Doing so can be difficult if other factors are at play, as they often are. The goal of study designs is usually to make it possible to assess the contribution of the treatment after the other influences on outcome are taken into account. In a study comparing two clinical interventions, the goal is to be sure that any difference observed is due to the differences in the two interventions rather than some other factor. The “some other factor” is a “confounder,” because it confounds one’s efforts to draw the conclusion that differences between the interventions are responsible for the differences in outcomes. Random Assignment to Treatment or Control The best way to be sure that one can draw a strong conclusion from a difference in outcomes is to assign subjects randomly to receive one intervention or the other. If the randomization is successful and the number of patients is large enough, the two study populations will be essentially identical except for the different interventions. If one conducts the study so that, except for the intervention, the study populations are also identical at the end of the study, the researchers can make a very strong inference that the cause of the differences in outcomes is the difference in the interventions. Randomization is powerful because it ensures that the two populations are similar in every respect except for the intervention to which the researchers randomly assigned the patients. This claim means that if the study groups are large enough and the randomization was successful, the frequencies of

OCR for page 74
Complementary and Alternative Medicine in the United States all known factors (e.g., age, gender, and comorbid conditions) are similar in both groups; in addition, and perhaps more importantly, the frequency of any unknown or unmeasured factor will be the same in both groups. Randomized trials stand at the top of the hierarchy of evidence because they make it possible to infer a cause-and-effect relationship between an intervention and an outcome. Because the study groups are identical except for the intervention, any effect on outcomes must be due to the intervention. Observational Studies Other methods for studying the effects of two interventions rely on data derived from the observation of care. In contrast to a randomized trial, no one intervenes in an observational study. Instead, researchers use information about the patients to try to make inferences about the relationship of clinical factors (including treatment) to clinical outcomes. Sometimes, researchers collect the information systematically (a prospective study); other times, the data represent patient care as it happened in the past (retrospective study). In either case, the crucial distinguishing feature of an observational study is that receipt of the intervention depends on clinical circumstances and preferences rather than deliberate assignment, as in a randomized clinical trial. These circumstances that influence choice of treatment may also influence the outcomes. Differences in outcomes may therefore be due to the intervention or other circumstances, or a combination of both. Observational studies have many advantages. They are much less costly than randomized trials, they can have huge study populations, and the results are more likely to represent practice (Benson and Hartz, 2000). However, the circumstances that influence the choice of treatment often confound the interpretation of differences in outcomes. The frequencies of potential confounders may differ between those who receive the intervention and those who do not. To evaluate differences in outcomes independently of the influence of possible confounders, researchers perform multivariable regression techniques on the data. The variables used in the statistical model are either the several candidate predictor variables (the treatment itself and other potential confounders, such as demographic characteristics, clinical characteristics of the patient, comorbid conditions, and socioeconomic factors) or the dependent variable (the outcome that the model is trying to predict). These techniques effectively adjust the frequencies of the confounders measured so that they occur at the same rate in both the treatment and the no-treatment groups. If the treatment is still a statistically significant predictor of an outcome, researchers can infer an association between the outcome and the treatment. However, they cannot infer that the treatment causes the outcome because the statistical techniques can

OCR for page 74
Complementary and Alternative Medicine in the United States only adjust for differences in the confounders that the researchers measured. Unmeasured confounders are thus the bane of researchers who conduct observational studies. Therefore, the possibility that a confounding variable is responsible for observed differences means that one must express conclusions in terms of association rather than causation. Even then, researchers must be cautious in their conclusions because it is possible that the apparent association between two variables is actually the result of a third variable (the confounder) that is affecting the two variables at the same time so that they change in concert. In observational studies, the researchers must guard against concluding that the change in one variable is the consequence of the change in the other variable (cause and effect). Types of Observational Studies Observational studies come in several forms: cohort studies, case-control studies, case series, and cross-sectional and longitudinal studies. Each of these is described below. Cohort Studies. A cohort study (in the context of treatment effectiveness research) is the formal collection and analysis of data on treatments and outcomes for a defined set of patients with similar clinical characteristics. For example, a researcher might study pain and disability levels as outcomes in a cohort of patients older than 70 years of age who received lumbar fusion surgery for severe sciatica. The distinguishing feature of cohort studies is that researchers gather data on treatment and possible confounders at one point in time and measure outcomes at a later point in time. Cohort studies are a relatively powerful form of study design because researchers can often statistically adjust the final outcomes (e.g., levels of pain) for differences in the outcome variable at the beginning of the study (pain levels before surgery) and because they can measure the outcome variable at many points in time (e.g., from monthly pain reports for up to 2 years after surgery). The assembly of a cohort is the first step. It may take place in the present as a deliberate, planned activity in which the researchers gather data on the present state of the participants (prospective cohort), or it may rely upon data gathered in the past (retrospective cohort). In either case, the investigators use specific inclusion and exclusion criteria to define a group of people with many similarities. Even though members of the cohort are similar in terms of the inclusion criteria (in the example cited above, all patients will be older than age 70 years, all will have had fusion surgery, and all will have had severe sciatica before surgery), they will inevitably differ in many other predictors of outcome. For example, some members of the spine surgery cohort may be 70 years old and others may be 85 years old. Some may be overweight and others may be thin. Some will be engaged

OCR for page 74
Complementary and Alternative Medicine in the United States in regularly physical activity, and others will be sedentary. Some will have a spouse or caregiver available to help with work at home; others will be on their own. All of these factors, and countless others, may have influences on treatment outcomes. Researchers try to identify and record as many of these factors as possible, but inevitably some potentially important factors are not measured. Outcome measurement is the second major step. The researchers measure outcomes at a future time relative to the date of cohort assembly. With a prospective cohort, measurement of outcomes occurs in the future at specific time points relative to the date of treatment. With a retrospective cohort, the outcomes may have occurred in the past relative to the date of treatment and may have to be abstracted from existing data systems or may still occur in the future if members of the cohort are still alive and available for follow-up. Case-Control Studies. The study population in a case-control study in the domain of treatment effectiveness consists of the cases (those with the target outcome, such as complete pain relief) and the controls (those without the target outcome, for example, those with continued pain). Case-control studies are especially well-suited to studies of rare events, because cases (those experiencing the event) are oversampled relative to the controls. Case-control studies are typically retrospective, in that the researchers assemble the study population after the measurable outcome events have occurred. If adequate numbers of patients are available, researchers choose the controls by matching each control patient (or several control patients) to one case patient for variables such as age, sex, and date of entry into the population from which the researchers identify cases and controls. The next step is to measure rates of exposure to a treatment (e.g., a surgical procedure) for the cases and the controls. The ratio of the rates of exposure to the intervention for those who experience the outcome (cases) and those who do not experience it (controls) is mathematically equivalent to the ratio of the outcomes in those exposed to the intervention to the rate in those not exposed to it. Thus, the outcome of a case-control study is a rate ratio or an odds ratio of the target condition frequency in exposed patients versus that in unexposed patients. Researchers may perform regression analysis techniques to adjust the cases and controls for differences in potential confounders. The Achilles heel of a case-control study is confounders, and the researchers’ greatest challenge is assembling the control group to avoid confounders. One way to accomplish this task is to choose cases and controls from a cohort that the researchers assembled using the same inclusion and exclusion criteria (a so-called nested case-control study). Case Series. A case series is simply a serial collection of patients with

OCR for page 74
Complementary and Alternative Medicine in the United States some defining characteristic. A typical case series is a group of patients who have a rare diagnosis or who have undergone a new surgical procedure. In the context of treatment effectiveness research, a case series would be a consecutive set of patients who received a particular treatment. Case series do not have controls, so that it is very difficult to make any inferences about whether an intervention (a treatment or surgical procedure) had any effect. An exception would have been the first group of patients with pneumonia who received penicillin and experienced rates of survival that were unprecedented in the era before penicillin. In surgical research, it is reasonably common to publish results of case series studies and to compare the outcomes to those for other published case series for patients with the same underlying condition. These “historical controls” provide a basis for comparison of outcomes for the new treatment, but it is even more difficult to draw inferences in a case series study than in either a prospective or a retrospective cohort study because the patients in the comparison group were treated at a different place and at a different time, so there are confounders related to place and time, in addition to the confounders in the cohort study related to the patients’ clinical and personal characteristics. Cross-Sectional and Longitudinal Studies. Cross-sectional studies measure the relationship between variables at a single point in time. Cross-sectional studies are a relatively weak study design for the testing of hypotheses about treatment-outcome relationships because they rely upon a single measurement of each variable. A survey is a typical cross-sectional study design. Longitudinal studies measure the relationship between variables at two or more points in time. In effectiveness studies, longitudinal studies would typically involve the measurement of outcomes at several points after treatment. They are a relatively powerful method for the testing of hypotheses because repeating a measurement many times (or even once) for an individual reduces statistical variation and narrows confidence intervals. Clinical Outcomes: A Taxonomy Treatment outcomes can be objective or subjective. Objective outcomes are visible or measurable to people other than the patient, and subjective outcomes can be felt or reported only by the patient. One of the major contributions of the outcomes management movement in the late 1980s and early 1990s was to raise the status of subjective measures as valid scientific endpoints in clinical trials and other forms of research studies. Advances in the technology of subjective measurement made that change possible, so that it is now common to find a mix of subjective and objective endpoints in many clinical trials.

OCR for page 74
Complementary and Alternative Medicine in the United States Subjective Outcomes Subjective outcomes include those symptoms and other aspects of a patient’s experience that are not directly observable by others, but that represent the goals of treatment. Pain, sensations of nausea or dizziness, functional status, ability to perform activities of daily living, and experience of moods or emotional states are examples of subjective outcomes for which well-developed and widely used measures exist (Bowling, 1997; Frank-Stromborg and Olsen, 1997; McDowell and Newell, 1996). Because there is no direct way to validate a patient’s report of pain level or mood state, the development of valid measures requires careful attention to issues of reliability (i.e., whether measures taken at two adjacent points in time yield the same result or whether two closely related versions of the same scale yield the same result) and convergent validity (i.e., whether the results of two presumably related, but different, measures actually yield similar results). Because patients’ responses to single questions or item formats may be affected by idiosyncrasies of wording and interpretation, it is common for measures of subjective outcomes to be based on multi-item scales with different wordings and response formats. Patient reports may also be sensitive to context or contrast effects (for example, a relatively modest “absolute” level of pain may feel uncomfortable if it is new but may feel very minor if it has been preceded by a long period of excruciating, severe pain). The subjective domains for which well-established measures exist cover many of the endpoints of CAM treatments. Existing measures can be and have been used in studies of the effectiveness of treatments involving CAM. Some subjective domains are more unique to specific CAM modalities (e.g., feelings of “centeredness” or “wholeness”) and some additional measurement work may be required for these modalities; but in principle, virtually any subjective experience can be captured either as present or absent or as present as a matter of degree. Because subjective experiences cannot be independently validated, and because they can be significantly affected by context, contrast, and expectation effects, it is particularly important to try to build in features of the study design that minimize these kinds of biases. “Blinding” the patient to the specific treatment that he or she has received, for example, is a way to minimize the effects of expectations on reports of subjective outcomes. Careful selection of patients who are all similar in terms of the level of pain or disability at the time of treatment is a way to minimize contrast effects. Having the outcome assessment done by a person other than the treating clinician is a way to minimize the biasing effects from the desire of a patient to please the clinician.

OCR for page 74
Complementary and Alternative Medicine in the United States TABLE 3-3 Example of a Hierarchy of Evidence from the National Health Service Centre for Evidence-Based Medicine, 2002 An A-level recommendation for therapy • Level 1a evidence —Systematic review of many RCTs (with homogeneity) • Level 1b evidence —A single RCT with narrow confidence intervals • Level 1c evidence —Case series of a disease from which all patients died before the new treatment; now some survive —Case series of a disease from which many patients died before the new treatment; now all survive A B-level recommendation for therapy • Level 2a evidence —Systematic review of many cohort studies (with homogeneity) • Level 2b evidence —A single-cohort study • Includes randomized clinical trial with >20 percent drop-outs • Level 2c evidence —Ecological studies (performed with a preexisting dataset) • Level 3a evidence —Systematic review of many case-control studies (with homogeneity) • Level 3b evidence —A single case-control study A C-level recommendation for therapy • Level 4 evidence —Case series —Poor-quality cohort and case-control studies A D-level recommendation for therapy • Level 5 evidence —Expert opinion without an explicit critical appraisal of the evidence —Expert opinion based on Physiology Bench research “First principles” SOURCE: Adapted from Phillips et al. (1999). be considered to be as compelling as the results of a single well-controlled randomized trial” (IOM, 2001) and lays out a hierarchy of evidence as shown in Table 3-4. In this report about CAM, the committee has chosen not to recommend one particular hierarchy; however, it does emphasize the following points:

OCR for page 74
Complementary and Alternative Medicine in the United States TABLE 3-4 Hierarchy of Evidence Level Emphasis on Efficacy Emphasis on Effectiveness I Systematic Review (e.g., meta-analysis of Several Well-Controlled Randomized Trials—consistent results Systematic Review (e.g., meta-analysis) of Several Well-Designed Outcome Studies or “Effectiveness RCTs”—consistent results II Single Well-Controlled Randomized Trial Single Well-Designed Outcomes Study or “Effectiveness RCT” III Consistent Findings from Multiple Cohort, Case-Control, or Observational Studies IV Single Cohort, Case-Control, or Observational Study V Uncontrolled Experiment, Unsystematic Observation, Expert Opinion, or Consensus Judgments SOURCE: IOM, 2001. In general an RCT is the preferred study design if the issue is establishing treatment efficacy. More studies are better than fewer studies, therefore a meta-analysis of multiple good RCTs is better than one good RCT. Other study designs can provide evidence of efficacy or effectiveness. Meta-analysis of multiple non-RCT studies is better than one non-RCT study. Meta-analysis of multiple non-RCT studies may or may not be better than one good RCT; it depends on the details of the studies and the specific question being asked. If the question is treatment effectiveness, then some features of the typical RCT (stringent inclusion/exclusion criteria; treatment given in high-quality, high-volume clinical sites; detailed, frequent patient follow-up; etc.) create problems in generalizing findings to routine practice settings. Other study designs, including observational studies or “effectiveness RCTs,” may provide evidence that is at least equally compelling as that provided by an “efficacy RCT.” Effect size is another consideration that must be taken into account along with features of study design when one weighs the strength of evidence for a particular therapy. Treatments with clear, dramatic, positive effects in small or less well-controlled studies may be deemed “efficacious” sooner than treatments with more modest effects.

OCR for page 74
Complementary and Alternative Medicine in the United States APPLYING CONTEMPORARY RESEARCH METHODS TO CAM The remainder of this chapter discusses the context in which researchers will apply these established research methods, including the idea that CAM users may present particular needs for research, that CAM interventions may pose particular problems in applying research methods that have worked well for conventional medicine, and that such interventions may also expose some of the weaknesses of applying contemporary research practices to conventional medicine. Decision Makers and Sources of Evidence Lewith and colleagues (2001) have described the different decisions that various participants in health care make about treatments and how they use different kinds of information to make those decisions. Patients, providers, insurers, government policy makers, and others typically require different types of evidence and different amounts of certainty to decide for or against a particular treatment or treatment modality. The committee recognized that a discussion of evidence of CAM treatment effectiveness must be set in the context of the differences among users of information about CAM in terms of the decisions that they make, the information that they need to make those decisions, and the way(s) in which they think about treatment effectiveness. Researchers Researchers are typically interested in understanding cause-and-effect relationships between underlying mechanisms of illness, treatments designed to alter those mechanisms, and patient outcomes. Researchers trained in Western cultures and scientific traditions generally think in terms of linear cause-and-effect and try to identify the simplest possible causal models (i.e., the fewest explanatory variables and the simplest relationships among those variables) that account for the observed associations (Nisbett, 2003). Scientists from other cultures, however, may be more likely to think in terms of more complex “system” models that involve multiple factors and multiple levels of relationships and highly interactive and iterative, rather than linear, relationships (Nisbett, 2003). The results of a given study are taken as evidence of cause-and-effect relationships to the extent that certain criteria are met. These criteria typically include Features of the study design that allow strong inferences to be made about cause-and-effect relationships:

OCR for page 74
Complementary and Alternative Medicine in the United States a well-defined population to whom the conclusions apply; a well-defined, sufficiently large, and representative sample drawn from that population; a well-defined and controlled treatment(s) administration; a concurrent control or comparison group(s), when possible, that receives either no treatment or some different form or dose of the study treatment; well-defined study endpoints (objectively defined and measured outcome variables); and statistical analysis to assess the likelihood that the findings are produced by chance. Plausible biological mechanisms, that is, the ability to fit the observed relationships into some larger body of theory and evidence on how the body works. Consistency of findings from study to study. A single study is rarely definitive, although some large, well-designed clinical trials may produce evidence that is treated by the scientific community as definitive. Confidence in the existence of cause-and-effect relationships grows with the ability to see them in multiple studies over time. Confidence diminishes when results vary from study to study. Dose-response relationships. In most biological processes, the introduction of a larger amount of a substance produces a larger subsequent effect. There is almost always some upper limit at which no further effect is found or some different or counterbalancing biological process begins to take over. For the most part, however, within a reasonable range of doses, more “cause” produces more “effect.” Clear dose-response relationships typically increase the confidence in the underlying causal relationships between the treatment and the outcome. Teachers Training New Practitioners Medical school, nursing school, and allied health school faculty require evidence of treatment effectiveness to determine how to train students. The standards of evidence for specific treatments are not necessarily the same as those used by researchers, but they are similar. They include The criteria for researchers listed above. Faculty have the responsibility to stay current with the published literature and generally to apply the same criteria to published studies that researchers apply. Personal experience. In addition, however, clinical faculty draw heavily on their own experiences in determining which treatments are effective and which ones are not. This may be particularly true in the context of

OCR for page 74
Complementary and Alternative Medicine in the United States clinical rotations and residency training, in which much teaching is done on the basis of an apprenticeship model in a specific clinical environment. In this setting, both faculty and students have a chance to observe, directly and together, the effectiveness of specific treatments. The extent to which the treatment in question is a “standard of practice” in the medical community or is moving toward that standing. Students entering a profession become part of a professional community, and part of their learning involves knowing what the standards and typical practices of that community are. There is often a gap in time between the publication of scientific evidence of the effectiveness of a new treatment and the widespread adoption of that treatment by most or all members of a professional community, along with some appropriate caution and skepticism about new findings that seem to run counter to daily experience. Teachers train students in what the members of the professional community typically do on a daily basis as well as what the published literature says that they could or should do. Practicing Clinicians Clinicians treating patients have a somewhat more complex set of information requirements about treatment effectiveness, because they must know not only what has worked or what should have been effective in the abstract but also what they are actually able to do in the context of their own training and skills, their own practice settings, and their own sets of patients. Their requirements for information on treatment effectiveness include All of the preceding criteria, although many active clinicians will not have the same amount of time as their researcher or faculty colleagues do to monitor developments in the published literature. Consistency of a new practice with other aspects of current practice. A psychotherapist may accept the published evidence about the effectiveness of a specific herb for the treatment of depression but may be unwilling to incorporate the use of the herb into his or her own practice because of a professional commitment to therapies based on a different theory and conceptual model of mental illness. The availability of essential equipment, trained staff, supplies, and anything else necessary to provide a treatment safely and effectively. Many treatments require specialized equipment, training, or support staff that are not readily available to all clinicians. Difficulty in learning new skills (e.g., for new surgical procedures). The acceptability of a new treatment to patients and others in the community. Health care is usually a two-way human interaction; and po-

OCR for page 74
Complementary and Alternative Medicine in the United States tentially effective treatments will not be used if they conflict with the beliefs, cultural values, or expectations of large numbers of patients in a practice. Opinions of professional peers. In an environment in which it is impossible to keep up with all new advances in treatment, the opinions and practices of respected colleagues are a kind of evidence of treatment effectiveness that is often dominant. Reimbursement policies affecting a new treatment. Even when all other criteria have been met, a new treatment may not be adopted if the provision of it will not be adequately reimbursed. The extent to which the patient population is similar to those studied in clinical trials or other studies of treatment effectiveness. There are always variations in published studies of treatment effectiveness, and clinicians may legitimately believe that what works for many or most patients will not necessarily work for their own patients, particularly if they share some clinically relevant characteristic (Park, 2002). Employers or Purchasers and Insurers Those who pay for health care through insurance care about effectiveness, but also about cost-effectiveness, since they have at least some responsibility to use the dollars available for insurance to produce the best possible health benefit for covered employees. Evidence of treatment effectiveness relevant to employers and insurers, then, includes The scientific evidence listed above for researchers. The preferences, expectations, and experiences of employees and their families. Employers are not insuring passive and uninformed people. Employees who have positive experiences with specific therapies will ask for such therapies to be covered by insurance plans and may use coverage for those therapies as the basis for choosing one plan over another at open enrollment or even changing jobs. Published cost-effectiveness studies (when available). Employers and insurers may legitimately refuse to cover treatments that are effective but that are so costly that their inclusion prevents the coverage of less costly treatments that provide more health benefit to larger numbers of people. Internal cost-effectiveness analyses (for some larger employers). Large companies with many thousands of employees may be able to use their own databases to study relationships between treatments and work attendance, productivity, or the costs of illness. This information may be more compelling than information in published studies because there is no question about the generalizability of the findings to that employer’s population.

OCR for page 74
Complementary and Alternative Medicine in the United States Patients and Consumers Individual patients generally do not have direct access to peer-reviewed journals, and most patients do not have the technical background to interpret the results of published treatment-effectiveness studies. This information tends to be filtered through someone else before it reaches the individual patient. In addition, patients (particularly those with chronic conditions) have their own experiences to draw on and can judge treatment effectiveness by the extent to which their own symptoms or functional status improve with treatment. Information on treatment effectiveness for individual patients, then, comes mainly from Information provided by a clinician(s) in one-on-one treatment encounters, Word of mouth from friends and relatives, The lay press or media, Direct-to-consumer advertising, Internet, Direct personal experience (particularly for patients with chronic conditions), and Communications from illness advocacy groups. The Application of Contemporary Clinical Research Methods to CAM: Some Cautions Although the concept of levels of evidence has generally been accepted and widely used in many domains of conventional medicine, some question its applicability to CAM therapies or to individual treatment decisions for specific patients. These questions particularly relate to the use of RCTs as the “gold standard” of evidence. Given the broad array of modalities that are included within the definition of CAM, it may be that some CAM therapies are more amenable to evaluation than others. Questions about the applicability of clinical research methods to CAM are described and discussed below. Emphasis on Efficacy Rather Than Effectiveness As noted above, the distinction between efficacy and effectiveness refers to the extent to which a treatment has a measurable positive effect in highly controlled clinical trial contexts (efficacy) versus whether the treatment has a measurable positive effect in routine daily clinical practice with unselected clinicians and patients (effectiveness). Efficacy refers to what a treatment can do under ideal circumstances; effectiveness refers to what a

OCR for page 74
Complementary and Alternative Medicine in the United States treatment does do in routine daily use. Because the highest level of evidence in most evidence hierarchies is the combined results of several RCTs, the resulting recommendations will inevitably be based on evidence of efficacy rather than evidence of effectiveness. Difficult to Apply to Therapies for Which RCTs Are Difficult, Expensive, or Unethical It may be impossible to organize RCTs in situations in which the effects to be observed occur rarely, take many years to develop, or are relatively subtle. It is also difficult to conduct RCTs in situations in which the treatment is already in wide use and is generally accepted as effective. It may also be difficult or impossible to randomize patients to CAM modalities or specific therapies that inherently depend on patients’ belief, faith, or confidence in or relationship with a particular modality or provider. (See the discussion of “preference trials” in Chapter 4 for one way to address this problem.) Hard to Apply to Treatments That Become Popular and Widely Used Very Quickly Study participants may not accept random assignment to a placebo or some other type of control groups if the general public believes that the treatment being studied is widely effective. Likewise, institutional review boards may not be willing to approve randomization to a placebo or another control group if the professional community believes that the therapy being studied is widely effective. In addition to the problem of organizing RCTs for widely used treatments, there may also be a problem with all other study designs that involve some form of control condition that involves administration of a possibly ineffective treatment. Relatively Long Delay from First Development of a Treatment to Assembly of Large Body of Evidence The FDA has requirements for research on new drugs before they can be prescribed, but there are no similar requirements for surgical procedures and most CAM modalities. In both cases, there may be a long time lag (several years, in some instances) between the development and the first use of a treatment and the assembly of a body of scientific evidence of effectiveness. For drugs, this lag is invisible to most of the general public, and some evidence from RCTs must have been assembled before a drug is allowed on the market. For other treatments, however, the time required to organize an RCT or collect the results of other types of studies means that a large body

OCR for page 74
Complementary and Alternative Medicine in the United States of anecdotal experience will have been developed before more formal scientific evidence appears. For many CAM therapies based on traditional cultural beliefs, this time lag may be measured in hundreds of years. Emphasis on What’s Best for Largest Number Rather Than Search for What’s Best for Unique, Individual Patients A treatment is judged effective in an RCT if it is better than a placebo or an alternative form of treatment. “Better” means that the average outcome for the experimental group is superior to that for a control group, as determined by statistical tests that relate the difference in average outcomes to the variation in outcomes in the two groups. Unless the differences between the experimental group and the control group are dramatic, however, there are usually some patients in the experimental group who do worse than some patients in the control group (Park, 2002). What is best, then, for the “typical” or “average” patient is not necessarily best for every patient. This approach to identifying effective treatments is fundamentally different from the approach that emphasizes individual tailoring of treatments found in CAM modalities like homeopathy or traditional Chinese medicine. The desire to have objective, well-defined study endpoints in RCTs can lead to a focus on health outcomes like mortality, tumor shrinkage, or change in a measurable physiological parameter like temperature or blood pressure. An exclusive focus on objective endpoints can lead researchers to miss or ignore other effects in the realm of subjective symptoms (e.g., pain, fatigue, and cognitive function) and general well-being. For many CAM therapies, the treatment goals include feelings of well-being and mastery of the illness (Jonas and Linde, 2002); these will not be captured in studies with more objectively defined primary endpoints. Wellness Versus Treatment Effectiveness as a Research Objective Recent national surveys (see Chapter 2; Astin, 1998; Astin et al., 2000) have highlighted the fact that many CAM “treatments” are not used to treat a specific current problem or disease but, rather, are used to either prevent disease or to promote a more general state of health and well-being. RCTs may still be used to assess the effects of CAM on general health or well-being, but such RCTs may be even more difficult to conduct than RCTs of the effectiveness of treatments for specific diseases. RCTs in the domain of disease prevention or wellness enhancement may require much longer time lines (e.g., 10 to 20 years or more), very large sample sizes because of the relatively low incidence of specific medical problems being prevented, or even larger sample sizes because of the potential of loss to

OCR for page 74
Complementary and Alternative Medicine in the United States follow-up or switching of treatment arms over the course of the study (i.e., patients randomized to the presumed active treatment quit taking or doing it, and patients randomized to the control arm begin to take or do the active treatment on their own). Some outcome variables may be hard to define and measure (e.g., “I just feel better”), and effect sizes may be small, again adding to the sample size required for a trial to have a reasonable chance of detecting an effect if it is truly present. Finally, patients will inevitably be doing several things that contribute to wellness (or lack of it) over a multiyear study period, and it will be difficult to isolate the effects of a CAM therapy or modality from the effects of a larger package of lifestyle factors. REFERENCES Astin JA. 1998. Why patients use alternative medicine: Results of a national study. JAMA 279(19):1548–1553. Astin JA, Pelletier KR, Marie A, Haskell WL. 2000. Complementary and alternative medicine use among elderly persons: One-year analysis of a Blue Shield Medicare supplement. J Gerontol A Biol Sci Med Sci 55(1): M4–M9. Bensen K and Hartz AJ. 2000. A comparison of observatonal studies and randomized controlled trials. NEJM 342(25):1878–1886. Bowling A. 1997. Measuring Health: A Review of Quality of Life Measurement Scales. Philadelphia, PA: Open University Press. Byar DP. 1980. Why data bases should not replace randomized clinical trials. Biometrics, (June), 36:337–342. Cochrane AL. 1972. Effectiveness and Efficiency: Random Reflections on Health Services. London: Nuffield Provincial Hospitals Trust. Farquhar C, Basser R, Hetrick S, Lethaby A, Marjoribanks J. 2003. High dose chemotherapy and autologous bone marrow or stem cell transplantation versus conventional chemotherapy for women with metastatic breast cancer. Cochrane Database Syst Rev (1):CD003142. Frank-Stromborg M, Olsen SJ. 1997. Instruments for Clinical Health Care Research. Sudbury, MA: Jones and Bartlett. Garber AM, Phelps CE. 1997. Economic foundations of cost-effectiveness analysis. J Health Econ 16:1–31. IOM (Institute of Medicine). 2001. Gulf War Veterans: Treating Symptoms and Syndromes. Washington, DC: National Academy Press. Jonas WB, Linde K. 2002. Conducting and Evaluating Clinical Research on Complementary and Alternative Medicine. In: Gallin JI, ed. Principles and Practice of Clinical Research. San Diego, CA: Academic Press. Pp. 401–426. Kaptchuk TJ, Kerr CE. 2004. Commentary: Unbiased divination, unbiased evidence, and the patulin clinical trial. Int J Epidemiol 33(2):247–251. Lewith GT, Hyland M, Gray SF. 2001. Attitudes to and use of complementary medicine among physicians in the United Kingdom. Complement Ther Med 9(3):167–172. McDowell I, Newell C. 1996. Measuring Health: A Guide to Rating Scales and Questionnaires. New York: Oxford University Press. McPherson K, Wennberg JE, Hovind OB, Clifford P. 1982. Small-area variations in the use of common surgical procedures: An international comparison of New England, England, and Norway. N Engl J Med 307(21):1310–1314.

OCR for page 74
Complementary and Alternative Medicine in the United States Moseley JB, O’Malley K, Petersen NJ, Menke TJ, Brody BA, Kuykendall DH, Hollingsworth JC, Ashton CM, Wray NP. 2002. A controlled trial of arthroscopic surgery for osteoarthritis of the knee. N Engl J Med 347(2):81–88. Neuhauser D. 2002. Heroes and martyrs of quality and safety: Ernest Armory Codman, MD. Qual Saf Health Care 11:104–105. Nisbett RE. 2003. The Geography of Thought: How Asians and Westerners Think Differently and Why. New York: Free Press. Park CM. 2002. Diversity, the individual, and proof of efficacy: Complementary and alternative medicine in medical education. Am J Public Health 92(10):1568–1572. Phillips R, Ball C, Sackett D, Badenoch D, Straus S, Haynes B, Dawes M, McAlister FA. 2004. [Online]. Available: http://www.cebm.net/downloads/Oxford_CEBM_Levels_5.rtf [accessed May 2004]. Rossouw JE, Anderson GL, Prentice RL, LaCrois AZ, Kooperberg C, Stefanick ML, Jackson RD, Beresford SA, Howard BV, Johnson KC, Kotchen JM, Ockene J. 2002. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: Principal results from the Women’s Health Initiative randomized controlled trial. JAMA 288(3):321–333. Torrance GW. 1986. Measurement of health state utilities for economic appraisal: A review. J Health Econ 5(1):1–30. Upjohn Co. v. Finch. 422 F.2d 944, 955 (6th Cir. 1970). U.S. Preventive Services Task Force. 1996. Guide to Clinical Preventive Services. Baltimore, MD: Williams & Wilkins. U.S. Statutes at Large 65 (1951):648. U.S. Statutes at Large 65 (1962):788–789. Wennberg J, Gittelsohn A. 1982. Variations in medical care among small areas. Sci Am 246(4):120–134.