Click for next page ( 128


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 127
11 Strengthening the Connection Between Evaluative Research and Coverage Decisionmaking LUCIAN L. LEAPE In late 1989, the U.S. Congress upgraded the status and expanded the scope of government activity in the quality of health care research when it created the Agency for Health Care Policy and Research (AHCPR) to replace the National Center for Health Services Research and Health Care Technology Assessment. The charge of the new agency was to "enhance the quality, appropriateness, and effectiveness of health care services." In addition to the expansion of outcomes- related research projects, the agency's mandate included the creation of new initiatives in clinical practice guidelines development and dissemination (Con- gressional Record, 1989~. This significant expansion of federal support for medical effectiveness re- search resulted in part from evidence, accumulated over the previous two de- cades, of significant variations in the use of a wide variety of medical and surgi- cal technologies (Chassin et al., 1986; Lewis, 1969; Wenoberg and Gittlesohn, 1973, 1982) as well as accumulating evidence that a significant fraction of tech- nologies may be ineffective (Chassin et al., 1987; Graboys et al., 1987; Green- span et al., 1988; Kahn et al., 1988; Leape et al., 1990; Winslow et al., 1988a,b). It was also apparent that many of the tools developed for these research activities could be used to improve the outcomes of care. Payers, policymakers, and physi- cians saw the potential in these developments for improving both the quality and the value of health care. The historical significance of concerns about quality of care should not be overlooked. Prior to World War II, the effectiveness of most treatments was dubious. Physicians could do little to alter the natural course of most ailments, and patients expected little more. All of that changed with the incredible flower 127

OCR for page 127
28 LUCIAN L. LEAPE ing of biomedical science over the past 40 years. It has been medicine's extraor- dinary successes that have led to rising public expectations and an intolerance of medical failures. Indeed, it seems quite legitimate today to ask why it is that patients receive ineffective treatments. EVALUATIVE RESEARCH The term evaluative research means different things to different people. The traditional biomedical and industrial perspective was to evaluate safety and effi- cacy, and these are still the primary focus of Food and Drug Administration concerns. In recent years, researchers have devoted more and more attention to evaluation of a service or product after research has demonstrated that it is of some value. Evaluative research asks the question how beneficial is it? in terms of outcomes, good and bad, and in comparison with alternative treatment or diagnostic options. Fuchs and Garber (1990) have suggested that technology assessment should now embrace a wide spectrum of consequences-clinical, economic, and social, as well as ethical. Eddy states that equally important is how patients value the procedures that physicians and companies believe are effective and, most impor- tantly, whether they think they should be provided (Eddy, 1990c). To understand the place of evaluative research, let us first look at how medi- cal practices in the United States incorporate new technologies. For example, consider a surgeon who develops a new operation. He or she first tries the new idea out in the animal laboratory and then on an appropriate patient. It seems to work. The patient and other physicians are impressed. The surgeon tries it on a few more patients. After doing 5 or 10, the results are reported at a national meeting and published in a scientific journal. Other surgeons are also stimulated to try it. They get into fights with insurance companies that refuse to pay for an experimental procedure unless, that is, they are ingenious enough to classify the new operation under an old name and fool the payers. In any case, they persist, and after a while the payers cave in, declaring that the procedure is "accepted." How soon they cave in depends on how expensive the procedure is. The process is a bit different with a new device. It is the manufacturer who must convince the payer, as well as physicians, that the device is worth using- although often the device has been developed in collaboration with a physician, who is equally enthusiastic. The manufacturer also must convince the FDA that the device works and is not grossly hazardous, but that is relatively easy consid- ering that the FDA division that assesses devices is underfunded and under- staffed. For drugs, it is much more difficult, because drug approval requires several stages of clinical trials. Whether it is a drug, a device, or a procedure, the end result is similar: a new technology is usually disseminated, and paid for, without it having been estab- lished that the technology is either (1) substantially better than the existing alter

OCR for page 127
EVALUATIVE RESEARCH AND COVERAGE DECISIONMAKING 129 natives or (2) worth the additional cost over the alternatives (they almost never cost lesser. Not surprisingly, some are later found to be of no value, even~harm- ful, or at best of marginal benefit. A medical technology may also be effective for some uses but not for others. It is likely that most inappropriate use falls into this category. After all, relatively few technologies that are worthless are adopted. However, every medical tech- nology can be used inappropriately. For example, there is good evidence that carotid endarterectomy (removal of an obstruction within one of the arteries to the brain) will help prevent stroke if the artery is obstructed by 70 percent or more (Mayberg et al., 1991~. But there is no evidence that removal of a 30 percent obstruction is beneficial, and experts do not recommend the operation in that circumstance (Winslow et al., 1988b). Similarly, coronary artery bypass graft (CABG) surgery clearly improves long-term survival in patients with three- vessel or left main coronary artery disease (Alderman et al., 1990), but there is little evidence that survival is improved in patients with single-vessel disease. (CABG surgery is usually effective at relieving angina in these patients, however.) The reasons for adopting a medical technology before there is adequate proof of effectiveness or for extending its use beyond the original purposes are neither pure self-interest nor negligence. The developer or manufacturer of a new treat- ment or diagnostic tool always has some evidence of efficacy; the early experi- ence may be encouraging or the new technology offers at least marginal benefit over current options. In a society such as ours, a number of forces combine to stimulate such use of a technology before there is adequate proof of its effective- ness. As a society, Americans place value on action; doing something is better than doing nothing. It is better to take a chance than to lose from inaction. Then, too, Americans have a love affair with technology: patients, as well as physi- cians, are attracted to high-technology solutions. It is assumed that they are better than nontechnical alternatives. And, importantly, those who make the decisions to use a new technology, physicians and patients, are insulated from cost considerations; usually, neither pays the entire bill directly. Unfortunately, once a new procedure, drug, or device has been used and is thought to be of value, it is much more difficult to carry out an unbiased evalua- tion of its effectiveness. Patients and physicians are reluctant to participate in a randomized trial once they are convinced that the technology works. How can the process be improved? How can payment be more effectively linked to evidence of effectiveness? To rationalize decisions about the provision of any health care service, three questions must be answered: Does the technolo- gy work? If so, for whom is it indicated? Should it be provided (Eddy, 1990a)? Evaluative research attempts to answer these questions. In the following sections I will review the currently used evaluative research methods. Then, I will con- sider some of the limitations of evaluative research and barriers to its implemen- tation. Finally, I will propose a plan for improving evaluative research and linking it to payment decisions.

OCR for page 127
130 LUCIAN L. LEAPE DOES IT WORK? The test of whether a medical technology works is in the outcomes: Is the patient better off? This is what the current emphasis on outcomes research is all about: Do the benefits outweigh the risks? How do health care providers tell whether the patient is better off? What are the important outcomes that should be measured? How should they be measured? What are the sources of information? Efficacy is used to indicate that a treatment or diagnostic technology accom- plishes its alleged objectives under ideal conditions (usually those in which it was developed). Does CABG surgery prevent early death from coronary artery disease? Does it prevent heart attacks? Does carotid endarterectomy prevent stroke? Effectiveness is a measure of whether the technology works in general clini- cal practice-outside of the research environment. For example, CABG surgery has been shown to reduce substantially the death rate in patients with left main coronary artery disease, but not in patients with disease in a single branch vessel. Effectiveness considerations also include examination of the competence of the users. Types of Outcomes At least four types of outcomes can be differentiated: clinical, health status, functional capacity, and quality of life. Clinical Outcomes Until quite recently, the traditional outcomes that were considered to define the success, or effectiveness, of a medical technology were clinical, that is, they were defined by physicians. These may be positive, that is, the "benefits" of a medical technology, such as diagnosis or cure of disease, relief of symptoms, prevention of complications or disability, and prevention of premature death. Or they may be negative, that is, the risks or "harms" of a treatment or diagnostic tool, such as mortality, pain, anxiety, and complications. The oldest, simplest, and often still most relevant clinical measure is death. If a treatment reduces mortality from a disease, it is effective, or if the treatment causes a high rate of mortality its risk may be unacceptable. Mortality is easy to measure. There are no quibbles about its definition, its occurrence is almost always accurately recorded, and it is information that is usually readily available from a variety of sources. The purpose of many treatments, however, is not to prevent death but to relieve symptoms. And most treatments are rarely fatal. Thus, mortality may not be the most relevant outcome to measure in many situa- tions.

OCR for page 127
EVALUATIVE RESEARCH AND COVERAGE DECISIONMAKING Health Status, Functional Capacity, and Quality of Life Outcomes 131 Although few would quibble with the importance of relieving symptoms, curing disease, or reducing mortality, in recent years it has become apparent that classic clinical outcomes alone are sometimes inadequate measures of the value of a technology to the patient. Under certain circumstances, other outcomes may be more important. After all, the ultimate test is whether the patient feels that he or she is better off. What is the patient's perception of his or her health status or ability to carry out normal daily functions? After which treatment is the overall quality of life better? The answers to these questions do not necessarily coincide with "good" outcomes that are defined only in terms of mortality or disease state. Included in health status assessments are measures of functional status, emotion- al health, social interaction, cognitive function, and disability, as well as simple measures such as the ability to return to work (Epstein, 1990~. Evaluations of health status and quality of life almost invariably require interviews or question- naires, because these kinds of information are not routinely part of the hospital medical record (Cleary, 1988; Greenfield and Nelson, 1992; Lohr, 1992) and because they usually must be measured after the patient is discharged from the hospital. Sources of Outcomes Data Outcomes data may be collected prospectively or retrospectively. For pro- spective data collection, the investigators determine in advance which patients to study and what outcomes to measure. This information is then obtained for every study patient. Data sources include the patient's medical record, laboratory test results, and interviews or questionnaires administered to patients or care givers. Prospective data collection is regarded as superior to retrospective data collection because the necessary data are identified in advance and mechanisms are estab- lished to ensure that they are collected. Retrospective data usually consist of information collected for another purpose, such as medical records, discharge abstracts, or health insurance claims, but retrospective data may also include interviews or surveys of patients or care givers. The most useful outcomes information comes from controlled studies in which treatment effects are determined by comparing outcomes in patients who receive the treatment with those in similar patients who do not. Uncontrolled studies typically collect data only on treated patients or on unselected groups of patients. Evaluation of effectiveness from data from an uncontrolled study re- quires the investigator to make assumptions or to have knowledge about out- comes in untreated patients. However, outcomes data from an study without a control group can nonetheless often be very useful. For example, if it is known that the mortality from a condition is 100 percent without treatment, then a treat- ment that reduces mortality to 50 percent is clearly effective.

OCR for page 127
132 The Randomized Clinical Trial FUCIAN F. LEAPE Measuring Outcomes The randomized clinical trial (RCT) is considered the "gold standard" meth- od for assessing clinical outcomes. Patients are identified as candidates by preset criteria and are then randomly assigned to receive the study treatment or an alternative treatment (the control treatment). The outcomes data to be collected (survival, symptom relief, health status, etc.) are agreed upon in advance and are collected concurrently on study and control patients. The outcomes in the two groups are then compared to determine whether the study treatment is advanta- geous. If the selection criteria are rigidly adhered to, an RCT provides a valid com- parison of two forms of treatment in comparable groups of patients. The random assignment of patients to study or control groups also distributes any unrecog- nized differences among patients randomly and evenly. Accordingly, differences in outcomes should be attributable solely to the treatment. For these reasons, many regard RCTs as the most valid way to measure efficacy. The major disadvantage of RCTs is that they are expensive, time-consum- ing, and difficult to carry out. As a result, relatively few have been performed, so that rigorous comparative information on efficacy is available for only a small fraction of the indications for a small number of treatments. The other major problem with RCTs is that, to the extent that a treatment is perceived as effica- cious, it may be difficult to get physicians and patients to agree to accept random assignment to treatment or no treatment groups. Subconsciously or otherwise, physicians may disqualify many of their patients who otherwise appear to be logical candidates. For example, it would now be difficult to conduct a random- ized trial of laparoscopic cholecystectomy versus operative cholecystectomy be- cause patients clearly prefer the laparoscopic approach. Finally, RCTs are typi- cally conducted in or by academic medical centers, whose patients and doctors may not represent the population at large. Thus, results may not be fully general- izable to care in community hospitals. Uncontrolled Clinical Series Uncontrolled clinical series, usually from a single academic center, may provide data that are collected prospectively or retrospectively from patients re- ceiving a single treatment. Their strength is that they demonstrate what can be accomplished in an optimal environment by skilled physicians. Their disadvan- tage is that they provide no comparison with alternative treatments. The results also may not be generalizable in that others may be unable to match them.

OCR for page 127
EVALUATIVE RESEARCH AND COVERAGE DECISIONMAKING Data Registries 133 Data registries collect data prospectively from multiple institutions on pa- tients with certain conditions or treatments such as those who have liver trans- plants or who have used a thrombolytic therapy. Because participants agree on the data to be collected and send the data to a central source as patients are encountered, registry data are usually reasonably complete. By pooling data from multiple sources, registries make it possible to assess outcomes for condi- tions or treatment groups that no individual institution treats in large numbers. Like clinical series, the major deficiency of most registry series is the omis- sion of a control group. Although registry data can be helpful in estimating operative mortality, long-term survival, and complication rates, the absence of similar information for untreated patients limits the ability to determine whether the treatment is better than the alternatives. Cohort-Controlled Studies Cohort-controlled studies attempt to measure outcomes by retrospectively comparing outcomes in treated and untreated patients who are selected according to the similarity of their clinical characteristics and risk factors as reflected in the available data. A well-done cohort-controlled study can be an acceptable substi- tute for an RCT. The principal hazard of cohort-controlled studies is that the investigator may fail to adequately identify or control for all of the important nontreatment variables that determine outcomes. Hospital Patient Records Hospital patient records are a rich and readily available source of clinical data that can be collected and analyzed retrospectively. The major disadvantage of abstracting data from hospital charts is that it is expensive. In addition, for most patients there is little or no long-term follow-up information. Getting fur- ther information from patient records in clinics and physicians' offices may be difficult and adds further expense. Statewide Discharge Databases Statewide discharge databases use data from patients' hospital records which have been abstracted by personnel from the medical records office. These dis- charge abstracts summarize key clinical and demographic data that can be used for outcomes analysis. In 23 states, all hospitals furnish discharge abstracts to a central health data registry, which provides statewide data for all discharged patients. These databases are virtually the only source of health care data for entire populations, as opposed to data from survey samples or data on subsets of

OCR for page 127
34 LUCIAN L. LEAPE the U.S. population, such as the Medicare population or the beneficiaries of an insurance plan. As such, they permit population-based analyses of regional vari- ations in the use of various services and calculation of overall mortality rates as well as provide some other outcomes data, such as readmission or reoperation rates. Like registry data, this information is uncontrolled. However, because discharge data include all patients receiving a service, they provide an accurate measure of the outcomes that are actually achieved in practice. Not surprisingly, the results are typically inferior to those reported from uncontrolled series from academic centers or RCTs. For example, the average statewide mortality rate from CABG surgery in the early 1980s reported from California (Showstack et al., 1987), New York (Hannan et al., 1990), and Massachusetts (Dalen et al., 1990) was 3.7 percent, whereas that from the Cleveland Clinic series was 0.8 percent (Cosgrove et al., 1984~. A disadvantage of using discharge abstract data is that the amount of data in the abstracts is limited. Many important clinical details are not routinely cap- tured and critical data, such as the results of tests done outside the hospital, may not be available. Errors in abstracting information from hospital records are common. In addition, the absence of universal patient identifiers in many state hospital discharge databases may make it impossible to determine long-term out- comes by tracking a patient and linking one hospital admission to another or to a transfer. Claims Data Analysis The availability of a nationwide database, the Medicare claims files, has spawned a large number of studies that attempt to assess the outcomes of various treatments and to demonstrate regional variations in use of various treatments. Mortality and readmission rates are the outcomes measures most commonly stud- ied. The availability of large quantities of data without the need to abstract charts or perform surveys is appealing to researchers. With the Medicare databases, the outcomes of huge numbers of patients may be analyzed for almost any treatment. Because claims data are obtained for a different purpose, it is not surprising that claims data often fail to include critical information on the patient's condi- tion. Seldom is enough information present to permit adjustments for patient risk. This can lead to unjustified comparisons. For example, using claims data, it is possible to compare mortality in the year following a heart attack between those patients who underwent bypass surgery after the heart attack and those who did not. But the outcome is more related to the patient's condition than to the treatment. For example, if surgery is reserved for those who are in greatest danger of dying, the mortality rate from CABG surgery may be higher than that for patients who did not receive CABG surgery. Conversely, if only those with a good prognosis get operated on, surgical mortality will be less than that for the

OCR for page 127
EVALUATIVE RESEARCH AND COVERAGE DECISIONMAKING 135 remainder. Without adjustment for patient risk, the data may lead to inaccurate or unproved conclusions . Claims data analysis also does not deal with the selection effect, that is, the unmeasured characteristics of the patient that lead some to be selected and others not to be selected for surgery. Even RCTs have trouble dealing with this, despite their rigid and detailed entry criteria. Claims data may also distort or rearrange diagnoses or treatments so as to maximize payments. Finally, the use of multiple provider identification numbers and the separation of hospital and physician claims by Medicare has made attributing the provision of care to specific physi- cians difficult. This has recently been corrected. The Uniform Clinical Data Set The Health Care Financing Administration has investigated the practicality of abstracting much more clinical information directly from medical records. The information is incorporated into a large database, the Uniform Clinical Data Set. Although the validity of this approach is still unknown, the time and ex- pense of record abstraction to obtain the data may limit the application of this method of data collection. Evaluating Outcomes Data Even when outcomes information is adequate, it must be evaluated and inter- preted. It cannot be used "raw." Risk adjustment or patient selection may not have been optimal. Combining data from multiple sources may be difficult be- cause studies may have been carried out on different populations, at different times, under different conditions, and either before and after a major technical advance. Data may be incomplete; even RCTs do not provide data on the bene- fits and risks of a treatment for all of its uses. Finally, data may be conflicting: two apparently excellent studies may come to opposite conclusions. Two techniques have been developed to evaluate outcomes data: meta- analysis and appropriateness studies. Meta-Analysis Meta-analysis is a technique for combining the results of multiple RCTs to arrive at a conclusion that may not be justified by the results any of the individual studies alone (Lau et al., 1992; Sacks et al., 1987; Thacker, 19881. All of the data in each study are examined, categorized, and grouped according to important subcategories. The major effect of meta-analysis is to establish statistical signifi- cance. A difference in outcomes due to the use of a particular technology that is not significant in one study may turn out to be significant when the results from

OCR for page 127
136 LUCIAN L. LEAPE several studies are pooled. Conversely, pooling of data from a number of incon- clusive studies may establish that a technology is ineffective. Figure 11-1, from Lau et al. (1992), shows the results of cumulative meta- analyses of the efficacy of streptokinase treatment on acute myocardial infarc- tion. The position of the dot indicates whether there were fewer deaths among treated patients. In this example, the very first mals showed that streptokinase No. of No. of Year Trials Patients 0.5 1960 1965 1970 1980 1985 1 990 Mantel-Haenszel Fixed-Effects Method (odds ratio) 23 ~ 2 3 10 11 15 17 22 23 25 33 44 55 57 60 65 149 4 316 1,793 2,544 2,651 3,311 3,929 5,452 5,767 5,928 19,821 20,813 45,947 46,293 46,916 I 1 1 1 1 ' 1 1 ~1 1 1 1 1 2 P = 0.0059 P < 0.001 Favors Treatment Favors Control FIGURE 11-1 Cumulative meta-analyses of 60 trials of intravenous thrombolytic agents. "1" indicates that no difference from untreated patients. Less than "1" indicates that fewer numbers of patients died. SOURCE: Lau et al. (1992, p. 2481.

OCR for page 127
EVALUATIVE RESEARCH AND COVERAGE DECISIONMAKING 137 was very effective (dot is to the left of the "1" in Figure 11-1~. But the studies included low numbers of subjects, only 23 and 65 patients, so the validity of the findings was open to question. The confidence intervals were wide, as indicated by the line that crosses over the "1" in Figure 11-1. By 1974, 10 trials that included 2,544 patients had been carried out. The treatment was found to be effective in most of them, and the confidence limits were reduced so that the result was stable. Because meta-analyses were not being performed in 1974, however, questions about the efficacy of streptokinase treatment remained until several large RCTs were carried out in the late 1980s, more than 10 years later. Unfortunately, most of the available outcomes data are not from RCTs, and therefore they are not suitable for meta-analysis. Furthermore, outcomes studies vary markedly in the selection of the populations, their nature, and in the quality of the data gathering process. Although all would agree that the assessment of outcomes should be based on science, someone needs to interpret what the sci- ence IS. Appropriateness Studies Appropriateness studies address the issue of interpreting diverse outcomes data by obtaining group judgments from expert clinicians (Chassin et al., 1987; Park et al., 1986~. Although the primary purpose of appropriateness studies is to determine for whom the treatment is indicated (see below), the first step in the process is evaluation of the evidence. Outcomes data from all sources-RCTs, meta-analyses, and studies of claims data, discharge data, registries, and clinical series are evaluated. The expert panels comprise individuals who have vast clinical experience and who often have participated in many of the studies that generated the outcomes data. Thus, they have an intimate knowledge of the strengths and weaknesses of the data as well as of the clinical realities. From these data, their experience, and discussions they make independent subjective judgments of the benefits and harms of a treatment. PORT Projects Outcomes research received a major boost in 1990 when AHCPR expanded its Patient Outcomes Assessment Research Programs into the Patient Outcomes Research Teams (PORTs). These PORTs are centered in major medical schools and hospitals to conduct comprehensive outcomes studies of specific conditions, such as prostatic disease, stroke, and myocardial infarction (Health Services Re- search, 1990~. These centers employ a variety of techniques, such as meta- analysis, decision analysis, claims data analysis, and surveys, to gather and ana- lyze all available information concerning the effectiveness and appropriateness of use of treatments for the study conditions. Some PORT projects also evaluate

OCR for page 127
EVALUATIVE RESEARCH AND COVERAGE DECISIONMAKING The advantages of the appropriateness method is that it 141 identifies all of the key factors that enter into the clinical decision, is highly specific, is exhaustive of the clinical possibilities, and provides a quantitative representation of the extent of agreement among the panelists. Its disadvantage is that the ratings are implicit: global judgments of appro- priateness by experts who estimate probabilities and impute patient utilities. The validities of the estimates, as well as the global judgments, therefore depend in part on how experts are selected. Practice Guidelines Practice guidelines are a means by which the results of outcomes and appro- priateness studies are simplified and made accessible to clinicians (Leape, 1990~. If sufficiently detailed, they have the potential to identify, and thereby to dimin- ish, inappropriate use. Practice guidelines can also be used as review criteria for evaluating the quality of care or as criteria for payment for services. The devel- opment of practice guidelines is an active concern of a number of medical spe- cialty societies, as well as a major mission of AHCPR, which sponsored two Institute of Medicine studies (1990, 1992~. SHOULD IT BE PROVIDED? Having determined that a treatment is effective for some people, that is, it is superior to the existing alternatives, and having assessed which types of patients will benefit from it, one must then ask whether it should be made available to those patients (i.e., paid for). This requires that three questions be asked, in sequence: How great is the potential benefit? What is the cost-effectiveness of the technology? Is it worth what it costs (Eddy, 1990c, 1991a)? How Great is the Potential Benefit? The measure of benefit of a treatment is its value. A treatment is more valuable if it increases expected survival by five years than if it increases expect- ed survival by only six months. A life-saving treatment (e.g., liver transplanta- tion) is more valuable than one that relieves symptoms (e.g., hernia repair). Val- ue equals the net benefit, that is, relief of symptoms, reduction of pain, or increased survival, minus risks, pain, and psychological effects. Only the patient can do the calculus, because only the patient experiences the symptoms and the risks, fears, anxieties, and pain. And only the patient receives the benefit. There- fore, only the patient can place a value on a treatment (Eddy, 1990c). A relatively recent innovation that helps patients assess the value of a treat- ment has been the development of "Patient Shared Decision-making," in which patients see videotaped interviews of other patients who have undergone alterna

OCR for page 127
142 LUCIAN L. LEAPE live treatments (Barry et al., 1988). Using interactive techniques, information on clinical, functional, and satisfaction outcomes is integrated in a highly personal and effective format. Cost-Effectiveness Cost-effectiveness calculates the benefits of a treatment in terms of dollars and years of life saved (Weinstein and Stason, 1977~. In its simplest form, it asks the question, how much does a treatment cost for each life saved? For example, suppose mammography costs $50 per examination and is carried out every year on 100,000 women over the age of fifty. The annual costs would be $5 million. If this practice resulted in saving the lives of 1,000 women each year, the cost- effectiveness would be $5,000 per life saved ($5,000,000/1,000 lives saved). Actually, it is more complicated than that, because one must add the costs of treatment for the cases that are discovered and also include the costs of all the negative biopsies that false-positive mammographies generate. One must also subtract the costs of treatment of all of the cases of breast cancer that would have occurred if these women had not gotten mammography. The net cost, per life saved, over what is being spent, is the information being sought. Further refinements are necessary. Saving a life at age 30 is presumably more valuable and therefore more cost-effective than saving a life at age 80, so the cost per year of life saved is probably a more useful measure than cost per life alone. Furthermore, because a treatment or disease may leave the patient with some disability, the quality of the added life may be less than optimal. To capture these aspects, costs may be measured in terms of quality-adjusted life years (QALYs; LaPuma and Lawlor, 1990~. For example, a patient whose life is saved by kidney dialysis may feel that dialysis treatments reduce the quality of life to 80 percent of its value with no disease. If the cost were $20,000 per year, then the cost per QALY is $25,000 ($20,000/0.8~. The major limiting factor in cost-effectiveness analysis, as with all evalua- tive research, is the lack of accurate outcomes information. Cost information is also not as easy to obtain as one might think. As every hospital administrator and payer knows, costs are not the same as charges. How is charge information adjusted for this? Charges also vary among similar providers in a region and among different regions of the country. How should they be averaged? Cost-effectiveness analysis, like outcomes assessment, is not useful unless it is very specific (Eddy, 1992a,b). Annual screening mammography is not cost- effective for women in their twenties, for example, so specifying the age of the individual for whom a technology is appropriate is critical. For many high- technology therapeutic or diagnostic options, the relevant level of analysis is much more detailed: What is the cost-effectiveness of CABG surgery for a 60- year-old diabetic female with left main artery disease, a normal stress test, and good ventricular function? What is the cost effectiveness of CABG surgery for a

OCR for page 127
EVALUATIVE RESEARCH AND COVERAGE DECISIONMAKING 143 hypertensive 75-year-old male with two-vessel disease, a positive stress test, and 60 percent reduction in ventricular function? Probably very different. Is It Worth It? This is the trade-off question. It asks: For the population that is paying for health care (whether members of a health maintenance organization, employees of a company, enrollees in a Blue Cross plan, or, in an ideal system, the nation as a whole) do the value and the cost-effectiveness of a technology compared with those of other health services, justify the provision of the treatment for all who need it (Eddy, 1990b)? If resources were not limited, then most would choose to provide all treatments that are appropriate, that is, that are "worth doing," even, perhaps, some treatments that provide only a small benefit. Resources are limit- ed, however. Funding one service makes less money available for others. So how does one choose? Consider a simple example. Nationwide, approximately 3,000 liver transplants are performed each year, at a cost of about $300,000 each. Thus, the country spends about $900 million (3,000 x $300,000) for liver trans- plants annually. Is this preferable to using the same funds to provide prenatal care for 2 million pregnant women who do not now receive it? Or to using the funds to provide $5,000,000 to each of 180 emergency rooms in major cities to enable them to stay open and adequately handle an increasing load of trauma cases? The methods for making these decisions are underdeveloped, largely be- cause, as a society, the United States has until now refused to recognize the need to address the resource allocation issue. However, the basic steps of the process are clear. First, community judgments about the value and the cost-effectiveness of each service the cost per QALY- must be made. Second, services should be compared and ranked according to these judgments. Third, we must decide what we will and will not pay for as part of a basic benefits package that is provided for all citizens. We must make the trade-offs explicit. The problem is not our lack of techniques for doing this, but our lack of political leadership that recognizes the need and has the will to set in motion the process to make these . . declslons. MAJOR PROBLEMS IN THE APPLICATION OF EVALUATIVE RESEARCH The foregoing summary, while brief, portrays an impressive armamentarium for evaluating technologies, most of which has been developed in the past 25 years. Although methodological questions remain unsolved, evaluative research already has the capacity to provide the guidance that the U.S. health care system so desperately needs. Yet, by any objective assessment, one is forced to con- clude that evaluative research has so far had very little influence on the use of medical treatments. Why is this so?

OCR for page 127
44 LUCIAN L. LEAPE There are at least three reasons. Until they are addressed, rational decision- making is unlikely to become much more than a fringe activity of academics. These reasons are: . Inadequate resources have been committed to the task. We fail to deal with costs and trade-offs. We have not developed a public process for decisionmaking. Inadequate Resources Current expectations for the fruits of outcomes research are unrealistic. Al- though the recent expansion of the outcomes research effort with PORT projects and guideline development is encouraging, these are really only pilot projects- feasibility studies. I think they will be successful, but the number of treatments being evaluated is a tiny fraction of what is needed for even the high-cost tech- nologies. They must be expanded a hundredfold. In addition, it is seldom possi- ble to use "outcomes" alone to determine whether a treatment is worth doing. Outcomes must be evaluated in the context of a host of clinical variables and costs must be compared with alternatives. It is a complicated and expensive task. The job to be done is immense. The resources applied to it so far have not been sufficient. Is it too much to ask that one cent of each U.S. health care dollar be spent to determine whether the other 99 cents is well spent? Any successful industry spends at least that much on quality assessment. One percent of current health care expenditures would be $8 trillion! Even 1/10 of that, 0.1 percent, would be $800 million, more than six times the amount allocated to AHCPR for evaluative research in fiscal year 1992. Although industry, particularly pharma- ceutical manufacturers, spends an additional several hundred million dollars on proapproval drug and device testing, the total of public and private funding for evaluative research is small in comparison with the need. Until we make a much greater commitment, the impact of evaluative research will be limited. We Fail to Deal with Costs and Trade-offs The focus of evaluative research has traditionally been on the primary out- come question of effectiveness-does the technology work in practice? In recent years, with decision analysis and appropriateness research, the inquiry has been broadened to ask under which specific circumstances a treatment is better than the alternatives. We have asked whether a treatment is worth doing. And we have finally asked patients to value the care they might receive. But we have shied away from the really important question: Is the additional benefit worth the cost (Eddy, 1990b)? In fact, some people believe that it is unethical to ask that question. They think that physicians should offer, and payers should cover, anything that might

OCR for page 127
EVALUATIVE RESEARCH AND COVERAGE DECISIONMAKING 145 possibly help a patient. Indeed, it is politically dangerous to raise the cost ques- tion, especially if the conclusion is that a treatment may not be worthwhile for someone who has severe disease with limited life expectancy (such as AIDS), or for somebody for whom the benefit is relatively small (such as CABG surgery is for some), or for a technology for which the cost is very high (such as bone marrow transplantation). But this is the question that must be asked. It is, in fact, unethical not to ask it. Of course, cost counts. Would any payer approve any technology that cost $1 billion per patient? No. Would any payer bother to discuss coverage of a tech- nology that cost $1? No. We must ask the trade-off questions. We make the trade-offs now, providing liver transplants for some and no prenatal care for others, but we make them by default and by ability to pay, not by deliberation and choice. At some point in the not too distant future, we will get serious about reduc- ing health care costs. When that happens, basic and necessary care (care that every payment plan must provide) and care that will not be provided as a basic benefit will need to be identified. Those decisions will turn on comparative value. The experiment with Medicaid coverage revision in Oregon has pilot tested one way to do this. A comprehensive process was developed to elicit community preferences; this involved local and statewide meetings, a telephone poll of se- lected citizens, and interviews with special groups. Cost-benefit analysis was then used to prioritize treatments. The initial results were unsatisfactory because life-saving therapies were undervalued. This led to a reconsideration and revi- sion of the list, which was finally approved by the state legislature. The neces- sary waiver to implement the plan for Medicaid patients was then initially reject- ed, but subsequently approved, by the federal Department of Health and Human Services. Although some have objected to the Oregon plan because it rations health care only for the poor, others have lauded the fact that at least it does so explicitly and according to the specific health care service, rather than implicitly by limit- ing access of individuals, as in the current Medicaid system. Leaving aside the merits of either of these arguments, it is apparent that some valuable lessons about how to begin to make allocation decisions have been learned from the Oregon experience (Eddy, l991b). First, it demonstrates how difficult it is to place a proper value on services. Researchers found that people valued life- saving treatments more highly than others, regardless of "years of life saved" or other metrics. Second, the need for a high degree of specificity in describing condition-treatment pairs was reconfirmed. This immensely complicates the allocation process, but it cannot be avoided. Third, it reemphasized the tremen- dous need for good data regarding outcomes and costs. The Oregon experiment asked the right question: Should a particular treat- ment or diagnostic technology be provided? It has moved this issue onto center

OCR for page 127
146 LUCIAN L. LEAPE stage. Although the process has many imperfections, it is, as Eddy (1991b) has noted, at least an open process with the capacity for self-correction. The research agenda for improvement has been established. There is another lesson from Oregon: health care probably cannot be ra- tioned in the United States unless it is rationed for all citizens. This means that the trade-off decisions must apply to everyone to be accepted. It must appear to be fair. If there is to be a basic benefits package, it must apply to all. We Have Not Made Decisionmaking a Public Process We have it all backwards. Adoption decisions are made by hospitals, pay- ers, and plan administrators, and physicians decide which services to provide for their patients. It should be the other way around: society should decide which services it wants for its citizens. Physicians should then provide them and payers should pay for them. The value of a new treatment is determined by what it does for the recipient, that is, the patient (Eddy, 1990b,c). Only patients can decide whether that value is worth the cost. It is the patient who gets the benefits and suffers the harms, not the physician or the payer. The patient is far better quali- fied to decide if a treatment is worth it. It is also all of us as patients, potential or actual who, in the aggregate, pay for health care for everyone, either directly or indirectly. The public has the right to make the decisions on how resources are to be allocated. Not only has the public not been asked to make these decisions, but as patients we are also denied access to knowledge of the crucial ingredient needed to make even individual decisions rationally: monetary costs. Patients almost never pay directly the financial costs of a treatment they receive. In fact, in most cases, the patient does not even know the costs of a specific treatment. Similarly, the public has been shielded from any responsibility for allocating resources. By divorcing costs from decisionmaking, we have made it impossible for ordinary citizens to assess the value of health care. By default, the task has fallen to the payers and providers. To rationalize adoption decisions, this process needs to be reversed and coverage decisions need to be made by those who will receive the benefit. It is necessary to investigate whom to ask and how to frame the questions. What are the relative roles of those who have a disease and would benefit from an expensive treatment, versus the roles of all others who have, say, only a 1 :100,000 chance that they might someday need it? What is the treatment worth to each of those groups? What metric should be used to measure worth? Clearly, if you must pay for something directly, "worth" depends in large measure on what you can afford. A treatment cannot be "worth" $100,000 to someone who could never raise that amount of money; it might be worth more than that to someone who could raise a lot of money. How can these points of view be harmonized?

OCR for page 127
EVALUATIVE RESEARCH AND COVERAGE DECISIONMAKING 147 As the Oregon experiment has demonstrated, assessing public values and making public decisions is not easy, but it has begun. We must now proceed to develop an effective and feasible process, one that is balanced, fair, and accept- able to all. RECOMMENDATIONS The Federal Government's Role To relate adoption decisions to evaluative research, we need much more evaluative research. We must vastly expand and strengthen all aspects of technol- ogy assessment: effectiveness and outcomes research, appropriateness research, the methodology of determining patient preferences, and methods for performing the trade-off decisions among competing technologies. It seems inescapable that the federal government must play a much larger role. There are several reasons for this. First, the multifaceted nature of the new technology assessment and, in particular, the need to develop and implement methods for incorporating patient preferences require that technology assessment be a public process. The entire process is driven by the end result, the adoption decision. These decisions about which services will be provided must be public decisions made on the basis of public information of effectiveness, costs, and appropriateness, not private decisions made on the basis of private data, feasibili- ty, and profits. They must express community values. Second, most of the tasks are those for which industry has neither expertise nor interest in performing. It is reasonable to require the manufacturer of a new drug or device to demonstrate that it performs its stated function and that it does so safely. It is not reasonable to expect the manufacturer to objectively assess whether the new drug or device is superior to competitive products or, if it is, whether it is worth the extra cost. Similarly, the surgeon is not the person best qualified to evaluate the benefit or value of an operation compared with the benefit of another treatment (a drug, for example) for the same condition. Third, the magnitude of the task requires resources that far exceed the capac- ity of health-related foundations in the private sector to support, even if they were to devote all of their funds to these activities. Fourth, the process must be coordinated. Some authority (such as the Food and Drug Administration) must ensure that a new treatment is systematically moved along the assessment pathway from demonstration of efficacy to the final decision as to whether it should be provided as part of basic health services. No private organization can do that. Finally, centralizing responsibility for coordinating and ensuring that tech- nology assessment is carried out at all levels is the only way to reduce the dupli- cation and inefficiency now incurred because payers and hospitals perform their own evaluations to make needed coverage and adoption decisions. This process,

OCR for page 127
148 LUCIAN L. LEAPE however, should be insulated from the political process. Just as the U.S. Con- gress does not tell the Food and Drug Administration which drugs to approve, it should not certify effective services. There are several options. Either the Office of Health Technology Assessment of the Department of Health and Human Services could be given an expanded role or a new Health Technology Board could be established with the responsi- bility of carrying the products of AHCPR-funded research through the process to final assessment of effectiveness, appropriateness, and cost-effectiveness. Although these first stages of evaluation of a service could legitimately be centralized, ascertaining public preferences and performing the trade-off deci- sions that determine what is and what is not paid for is a political decision. It is probably best done at a regional level by those who will have to live by the consequences. An Optimal Evaluation and Approval Process The ideal evaluation and approval process ensures an orderly progression of a treatment through the stages of evaluation. The problems presented by new technologies and treatments are very different from those of treatments currently in use. New Technologies In the optimal system, the use of a new treatment prior to approval would be prohibited. After a treatment has been shown to be efficacious by the innovator, clinical trials would be carried out. This would be an extension of the current process for new drugs. These trials would be RCTs conducted by a limited number of centers and funded by the federal government under the peer review approval process. The first step, therefore, is to generate outcomes and cost information. Second, the specific indications for which use of a technology is appropriate should be defined. Uses for which a treatment is found to be ineffec- tive would not be evaluated further. Third, the value of effective treatments will need to be assessed by patient panels. Finally, a community-based process would be used to determine whether appropriate and valued services are worth their costs and to rank each new service in comparison with other services being provided. The end result would be a basic benefits package that all payers would be required to provide. Inappropriate services would be prohibited, but other services that were not included in the basic benefits package could be available under a self-pay or optional insurance program. Established Technologies For medical technologies now in use, the same judgments described above for new technologies would need to be made. Clearly, this is an immense task,

OCR for page 127
EVALUATIVE RESEARCH AND COVERAGE DECISIONMAKING 149 because there are thousands of diagnostic and therapeutic treatments currently in use, each with many indications. Additional outcomes data will need to be ob- tained for many by means of RCTs or prospective data collection. For debatable technologies, evaluation of effectiveness could be facilitated by making physi- cian and patient participation in data collection a condition of payment. Each service currently being provided will eventually need to be valued and either approved or rejected as part of a basic benefits package. Payers' Roles Payers would have several new roles in this coordinated system of evalua- tion and adoption decisionmaking. First, they would provide the detailed cost information needed for cost-effectiveness analysis and allocation decisions. Sec- ond, all payers would be required to cover all services that are defined as part of a basic benefits package. Coverage of optional services could form the basis of a variety of optional insurance packages. Finally, payers would continue to pay for established technologies while they were being evaluated for appropriateness and ~ _ O cost-effectiveness. A Public Process It is evident that no one- payers, providers, or patients- is happy with the current methods used to determine which services should be provided. A key reason is that we continue to duck the central issue: What should and what should not be paid for? Although a public process for rendering these judgments will always be a little messy and never entirely satisfactory to all concerned, I believe that it will be far more acceptable, and certainly more equitable, than our current method. REFERENCES Alderman, E., Bourassa, M., Cohen, L., et al. 1990. Ten-year follow-up of survival and myocardial infarction in the randomized Coronary Artery Surgery Study. Circula- tion 82:1-18. Barry, M. I., Mulley, A. G., Jr., Fowler, F. J., and Wennberg, J. E. 1988. Watchful waiting vs immediate transurethral resection for symptomatic prostatism. Journal of the American Medical Association 259:3010-3017. Chassin, M. R., Brook, R. H., Park, R. E., et al. 1986. Variations in the use of medical and surgical services by the Medicare population. New England Journal of Medicine 314:285-290. Chassin, M. R., Kosecoff, J., Park, R. E., et al. 1987. Does inappropriate use explain geographic variations in the use of health care services? A study of three procedures. Journal of the American Medical Association 258:2533-2537. Cleary, P. D. 1988. Patient satisfaction as an indicator of quality care. Inquiry 25:25-36. Congressional Record, November 21, 1989, pp. H9360-H9363.

OCR for page 127
150 LUCIAN L. LEAPE Cosgrove, D., Loop, F., Lytle, B., et al. 1984. Primary myocardial revascularization. Circulation 88:673~84. Dalen, J. E., Goldberg, R. J., D'Arpa, D., et al. 1990. Coronary heart disease in Massa- chusetts: The years ofchange(1980-19841. AmericanHeartJournalll9:502-512. Eddy, D. M. 1984. Variations in physician practice: The role of uncertainty. Health Affairs 3:7~89. Eddy, D. M. 1990a. Anatomy of a decision. Journal of the American Medical Associa- tion 263:441~43. Eddy, D. M. 1990b. What do we do about costs? Journal of the American Medical Association 264:1161-1170. Eddy, D. M. 1990c. Connecting value and costs. Journal of the American Medical Association 264:1737-1739. Eddy, D. M. 1991a. Rationing by patient choice. Journal of the American Medical Association 265:105-108. Eddy, D. M. l991b. What's going on in Oregon? Journal of the American Medical Association 266:417~20. Eddy, D. M. 1992a. Cost-effectiveness analysis: A conversation with my father. Journal of the American Medical Association 267: 1669-1675. Eddy, D. M. 1992b. Cost-effectiveness analysis: Is it up to the task? Journal of the American Medical Association 267:3342-3348. Epstein, A. M. 1990. The outcome movement Will it get us where we want to go? New England Journal of Medicine 323:266-269. Fuchs, V. R., and Garber, A. M. 1990. The new technology assessment. New England Journal of Medicine 323:673-677. Graboys, T. B., Headley, A., Lown, B., Lampert, S., and Blatt, C. M. 1987. Results of a second-opinion program for coronary artery bypass graft surgery. Journal of the American Medical Association 258:1611-1614. Greenfield, S., and Nelson, E. C. 1992. Recent developments and future issues in the use of health status assessment measures in clinical settings. Medical Care 30:23~1. Greenspan, A. M., Kay, H. R., Berger, B. C., et al. 1988. Incidence of unwarranted implantation of permanent cardiac pacemakers in a large medical population. New England Journal of Medicine 318: 158-163. Hannan, E., Kilburn, H., O'Donnell, J., et al. 1990. Adult open heart surgery in New York State: An analysis of risk factors and hospital mortality rates. Journal of the American Medical Association 264:2768-2774. Health Services Research. 1990. Patient Outcomes Research Teams: A new strategy for health services research on medical care quality. Editorial. Health Services Re- search 25:691-737. Institute of Medicine. 1990. Clinical Practice Guidelines: Directions for a New Pro- gram. M. J. Field and K. N. Lohr, eds. Washington, D.C.: National Academy Press. Institute of Medicine. 1992. Guidelines for Clinical Practice: From Development to Use. M. J. Field and K. N. Lohr, eds. Washington, D.C.: National Academy Press. Kahn, K. L., Kosecoff, J., Chassin, M. R., Solomon, D. H., and Brook, R. H. 1988. The use and misuse of upper gastrointestinal endoscopy. Annals of Internal Medicine 109:66~670. Kassirer, J. P., Moskowitz, A. J., Lau, J., and Pauker, S. G. 1987. Decision analysis: A progress report. Annals of Internal Medicine 106:275-291.

OCR for page 127
EVALUATIVE RESEARCH AND COVERAGE DECISIONMAKING 7C 7 LaPuma, J., and Lawlor, E. F. 1990. Quality adjusted life-years. Journal of the American Medical Association 263:2917-2921. Lau, J., Antman, E. M., Jimenez-Silva, J., et al. therapeutic trials for myocardial infarction. 327:248-254. 1992. Cumulative meta-analysis of New England Journal of Medicine Leape, L. L. 1990. Practice guidelines An overview. Quality Review Bulletin 16:42~9. Leape, L. L., Park, R. E., Solomon, D. H., et al. 1990. Does inappropriate use explain small area variations in the use of health care services? Journal of the American Medical Association 263:669~72. Leape, L. L., Hilborne, L. H., Kahan, J. P., et al. 1991. Coronary Artery Bypass Graft: A Literature Review and Ratings of Appropriateness and Necessity. Publication JRA- 02. Santa Monica, Calif.: The RAND Corporation. Lewis, C. E. 1969. Variations in the incidence of surgery. New England Journal of Medicine 281:880-884. Lohr, K. N. 1992. Applications of health status assessment measures in clinical practice. Medical Care 30:1-14. Lusted, L. B. 1968. Introduction to Medical Decision Making. Springfield, Ill.: Charles C Thomas. Mayberg, M. R., Wilson, E. W., Yatsu, F., et al. 1991. Carotid endarterectomy and pre- vention of cerebral ischemia in symptomatic carotid stenosis. Journal of the Ameri- can Medical Association 266:3289-3294. McNeil, B. J., Keeler, E., and Adelstein, S. I. 1975. Primer on certain elements of medical decision making. New England Journal of Medicine 293:211-215. Park, R. E., Fink, A., Brook, R. H., et al. 1986. Physician ratings of appropriate indica- tions for six medical and surgical procedures. American Journal of Public Health 76:766-772. Sacks, H. S., Berrier, J., Reitman, D., Ancona-Berk, V. A., and Chalmers, T. C. 1987. Meta-analyses of randomized controlled trials. New England Journal of Medicine 316:450~55. Showstack, J., Rosenfeld, K., Garnick, D., et al. 1987. Association of volume with outcome of coronary artery bypass graft surgery. Scheduled vs nonscheduled opera- tions. Journal of the American Medical Association 257:785-789. Thacker, S. B. 1988. Meta-analysis: A quantitative approach to research integration. Journal of the American Medical Association 259: 1685-1689. Weinstein, M. C., and Stason, W. B. 1977. Foundations of cost-effectiveness analysis for health and medical practices. New England Journal of Medicine 296:716-721. Weinstein, M. C., Fineberg, H. V., Elstein, A. S., et al. 1980. Clinical Decision Analysis. New York, N.Y.: W. B. Saunders. Wennberg, J. E., and Gittlesohn, A. 1973. Small area variations in health care delivery. Science 142:1102-1108. Wennberg, J. E., and Gittelsohn, A. 1982. Variations in medical care among small areas. Scientific America 246: 120-134. Winslow, C. M., Kosecoff, J. B., Chassin, M. R., Kanouse, D. E., and Brook, R. H. 1988a. The appropriateness of performing coronary artery bypass surgery. Journal of the American Medical Association 260:505-509. Winslow, C. M., Solomon, D. H., Chassin, M. R., et al. 1988b. The appropriateness of carotid endarterectomy. New England Journal of Medicine 318:721-727.