Approaches to Assessing Value— Illustrative Examples
The rising healthcare costs in the United States in the face of global economic turmoil underscore the necessity for a health system that identifies and eliminates low-value services, minimizes inappropriate use of medical services, and responds to the explosion of costly new technologies, thus positioning value as a key cornerstone to improving the quality of care delivered in this country (Clancy, 2008; Leavitt, 2008; Paulus et al., 2008). In workshop discussions, participants repeatedly suggested that creating a system that encourages and incentivizes the delivery of high-value services relies first on creating a common approach to defining and assessing value in health care.
Emerging from the presentations and dialogue at the workshop session on approaches to assessing value was the importance of perceptions and perspectives—the meaning of value changes as the stakeholders change. L. Gregory Pawlson discusses methods of estimating the value of physicians on both individual and group levels and the importance of measuring quality, resource use, and cost. He discusses the strides made in value measurement of providers and outlines the steps necessary to expand on prior work in a manner that will allow accurate, informative, and comparative assessments of efficiency and value in health care.
Since surgical care accounts for more than 40 percent of overall spending for inpatient care (National Center for Health Statistics, 2006), developing approaches to assess and estimate the value of individual surgical and interventional procedures is paramount. Justin B. Dimick highlights two
domains of surgical value: the value of surgical interventions and the value of individual providers, including both surgeons and hospitals. He discusses methods of measuring costs and outcomes in both of these domains and additionally surveys public policy options for improving value in surgery.
Howard P. Forman explores the challenges to determining the cost-effectiveness of diagnostic imaging and argues that better, more widely available, cost-effectiveness information could be an important component of stemming the growth of unnecessary imaging. David O. Meltzer focuses on the medical cost-effectiveness of preventive services and wellness approaches, concluding that prevention can be, but is not invariably, a short- or long-term cost-effective approach to improving health. Newell E. McElwee examines the issue of determining the value of pharmaceuticals, specifically discussing decision points along the pharmaceutical life cycle. He also emphasizes that the value of pharmaceuticals varies depending on the specific decision considered and the preferences of the stakeholder making that decision.
Presenters also focus on assessing the value of diagnostic tools and devices. Ronald E. Aubert proposes a framework for evaluating the potential value of pharmacogenetic diagnostics, providing a case study of how applying pharmacogenetic data to the dosing of warfarin, a blood thinner, could reduce adverse events and yield cost savings to the healthcare system. Parashar B. Patel concludes the chapter by discussing the impact of evidence requirements for medical devices on innovation and assessment of value from a device manufacturer’s perspective and the need for cross-stakeholder collaborative efforts in order to preserve incentives for innovation and discovery.
MEASURING VALUE OF AMBULATORY CARE SERVICES
L. Gregory Pawlson, M.D., M.P.H., National Committee for Quality Assurance
Measurement of value in health care is an increasingly important goal, given assessments that indicate less benefit from and higher cost for services provided in the United States versus countries of comparable wealth, as well as multiple studies pointing out apparent waste and less than desirable quality of care (Fisher et al., 2003; McGlynn et al., 2003). However, defining value, let alone measuring it, is very challenging in health care, where neither benefits provided nor resources used to create the benefits are straightforward. Although there have been a considerable number of research studies using various econometric approaches to cost and benefit determination in health care, there is as yet no standard practice for measuring value or even an agreed-upon definition of value.
Regardless of the challenge, accurate, valid, and reliable formulations of both the benefit and the cost portions of the value equation are absolutely critical to any hope of creating a “value-based” or value-driven healthcare system.
Limitations of Current Approaches
The most widely available and relatively easily accessible data sources for determination of quality and cost are so-called claims data (data on services—visits, procedures, laboratory services, and medications dispensed) provided by clinicians or others and submitted for payment to insurers. Claims data are intended to document the minimal data required for payment (most often under fee for service) and in many instances do not accurately reflect the actual services provided, the diagnoses to which the services were actually linked, or in some instances, which clinician actually provided the services. Moreover, major gaps in the completeness of claims data can seriously affect their utility in either quality or resource use-cost determinations (Pawlson et al., 2007). Careful audit procedures that look at such areas as sampling framework, completeness of data extraction, and other oversight are critical to using claims data for resource use-cost purposes. To provide valid and reliable information for most quality measures, intensive effort is required to abstract information from existing paper medical records. While the increasing use of electronic medical records (EMRs) may ameliorate this issue to some degree, many current EMRs lack adequate documentation and search capabilities that are crucial for their potential use in quality measurement. This gap is largely due to the fact that many EMRs were designed to mirror billing systems or paper records, not to facilitate systemic data collection on clinical care. Surveys, although a critical source of information on some aspects of patient experience of care, are of necessity based on patient recall and interpretation of events and, because of this, provide limited information in some instances.
Even where reliable and valid measures exist, limitations of the data are also reflected in the narrow breadth of available quality measurements. Until recently, except for a short-lived effort by the Centers for Medicare and Medicaid Services (CMS; the Health Care Financing Administration at the time) to generate national standardized comparison data from hospitals on coronary artery bypass graft surgeries, there are very few widely available standardized comparison data at any level (physician, group, hospital, or health plan) beyond regional or national comparisons. The Healthcare Effectiveness Data and Information Set (HEDIS) is one example of a widely available standardized set of comparison data, but it is available only at the health plan level. Moreover, since the development of HEDIS was driven primarily by consumers and purchasers concerned about the potential nega-
tive impact of health plans and capitation on quality through underuse of services (e.g., not providing screening for breast cancer), HEDIS measures, until recently, were focused almost exclusively on problems of underuse at the plan level.
There has been substantial recent effort by CMS and others to extend publicly available quality measurement to other levels of the system. In some instances these efforts have been accompanied by calls (and, in some cases, funding) for the development of a broader range of clinical structure, process, and outcome measures of quality and measures related to overuse, misuse, resource use-cost, and patient experiences of care. However, we are still far behind where we need to be to assess value broadly in the healthcare system. Moreover, creating measures in areas such as overuse, appropriate use, misuse, and resource use-cost is proving to be very challenging. Consider that Brook and colleagues demonstrated in the 1980s that much care cannot be categorized definitively as appropriate or inappropriate and little correlation exists between rates of inappropriate care and service utilization in a given region (Chassin et al., 1987; Park et al., 1986). David Eddy has noted that structural issues related to the nature of clinical medicine, such as relative rarity of key outcomes, remote times between interventions and outcomes, the heterogeneity of practice populations of different physicians, and inherent uncertainty in disease outcomes, pose major barriers to measurement, especially at more granular levels of the system such as hospitals or physician office practices (Eddy, 1998). While risk adjustment offers some hope of adjusting for some of the differences created by these factors, there is broad consensus that current risk adjustment approaches are far from ideal or adequate. Finally, research looking at the relationship between relative quality achieved and relative resources used has shown that the relationship is complex. Fisher and colleagues found that despite the use of 60 percent more care for hospitalizations, specialist care, and major tests in the last six months of life of Medicare patients in high-cost regions of the United States, the quality of care in high-cost regions appears actually to be lower (Fisher et al., 2003). Our own research has suggested small but significant negative correlations between higher quality and lower resource use for inpatient hospitalizations and positive correlations between higher quality and higher resource use for medications at the health plan level (O’Connor et al., 2008).
Measuring Value at What Level of Care?
The National Academy of Engineering and Institute of Medicine (2005) report Building a Better Delivery System: A New Engineering/Health Care Partnership described multiple levels of the healthcare system, ranging from the patient to the environment (defined as entities such as insurers
or regulators that do not deliver health care directly but influence the care delivered). In ambulatory care, quality could be examined at the individual physician, group, integrated delivery system, regional, or national level. Measurement at the individual physician level is appealing from the standpoint of accountability and “actionability.” Moreover, if information generated from a given physician’s patient chart is used, there is no problem with relating a given action to a specific patient and physician. However, both quality and costs are often “generated” at a higher level of the system. For example, many, if not most, patients with multiple chronic conditions interact with a substantial number of clinicians over the course of a single year. These multiple interactions represent a web of health care that cannot be captured by examining individual physician-patient interactions in a group of patients. Attribution of clinical measurement and cost to a single clinician is also problematic because much of the variance in costs or quality does not appear to reside at the level of the individual physician (O’Connor et al., 2008). This, coupled with the inherently wide variation in resource use-cost, especially where inpatient or surgical-procedure use is involved, the aforementioned heterogeneity of patients among practices, and the relatively small numbers of patients with a given condition in an individual physician practice, places severe limits on measurement, especially for public reporting or accountability at the individual physician or even the small-group level. The National Committee for Quality Assurance (NCQA), as promulgated in its Physician-Hospital Quality reporting standards, has indicated, based on a number of studies within and external to NCQA, that for quality measures, at least 30 patients are needed to obtain a reasonably low probability of misclassification, but adherence to the more stringent criterion of a 90 percent confidence interval or a reliability coefficient of 0.7 is highly desirable (Scholle et al., 2008). For resource use-cost measures, given widely variable confidence intervals depending on the specific resource use category and disease, there is no defined minimal sample size; thus, only a calculated confidence interval (CI) of 0.9 or greater or a reliability coefficient of 0.7 would be acceptable. Indeed, to achieve a calculated CI of 0.9 appears to require a sample size of more than 100 patients for even the most reliable resource use measures, and in most instances the number required exceeds 500. Thus, while physician-level measurement may provide important feedback for individual practitioners, data derived from small sample sizes cannot reliably be generalized to the practice of medical care at the broader system level.
To overcome these problems with sample size requirements and misclassification, both quality and resource use related to accountability should most often be measured at some level higher than the individual physician, such as the group or integrated delivery system (contractual or virtual) level. By examining clinical care patterns and use from the organizational
level of physician practice groups, much richer information about the relationships between quality and care emerge, especially for patients with multiple chronic conditions. System-level measurements also promote a sense of shared accountability for healthcare costs and outcomes. Within a system, data on individual physician performance, although not sufficiently robust for public reporting, can serve as the basis for feedback and discussion of performance. While there are relatively few functionally integrated health delivery systems that can facilitate these system-level assessments of value, research is critically needed to explore how to create or assign individual clinicians to virtually determined delivery systems (on the basis of hospital use, referrals to other physicians, etc.).
Moving to Measurable Clinical Efficiency
The concept of “measurable clinical efficiency” addresses using a set of quality measures as a proxy for benefit and a set of resource use measures as a proxy for the cost function (Table 3-1).
As illustrated in Table 3-1, such value assessments would include measurements of misuse, overuse, and underuse in evaluating the quality function and use of various types of resources for the resource use-cost function. Resource use in this respect can be measured using disease- or condition-specific claims, defined episodes delineated by “clean claims periods,” and sorting costs exclusively into those episode groups or by looking at total costs for all services for a defined group of patients for a defined period of time. Either actual (defined by claims paid or allowable charges) or standardized prices can be used since both have their advantages and disadvantages. All of these approaches imply looking at both quality and cost over
TABLE 3-1 Measurable Clinical Efficiency—Measures of Quality and Their Associated Outcomes
Measures of quality of care
Underuse: needed services not provided
Appropriate use: provision of needed services
Overuse: provision of unnecessary services
Misuse: provision of potentially harmful services
Excess cost-use for appropriate care
Cost of overuse
Cost of misuse
Clinical outcomes and patient experience
Aggregate cost-relative resource use
time and different entities, rather than in a single place at a single time as with most current quality measurement.
Measurable clinical efficiency can be reported for improvement or accountability purposes by combining composites of quality with resource use-cost measures in the same population of patients. The composites can be displayed in various combinations (ratios, scatter plots, relative “star” ratings, etc.). As noted before, the choice of what level of the healthcare system (e.g., individual clinicians, sites, groups, integrated delivery systems, health plans) to attribute measures of quality and resource use needs to be balanced with important trade-offs. Finally, further research to explore the relationships between quality and cost and what elements of the system have an impact on these measures is critical, as is continuing to set reasonable rules and standards for fairness and accuracy of measurement.
Limited transparency and problems with reliability of measurement hinder resource use-cost and quality measurements, and current tools provide only an initial starting point for combining these areas to determine value. Further research and development to develop reliable and valid measures of appropriateness of care and additional measures of overuse and misuse of clinical care as well as resource use measurement, is critical. Consideration must also be given to the development of measures of clinical outcomes at group, network, and plan levels. “Composite” measures incorporating clinical performance and intermediate outcomes in quality and resource use measures at multiple system levels need to be developed to allow comparative assessment of efficiency and value. As electronic medical records evolve and their capacity expands, attention should be paid to the types of data needed to assess the aspects of care related to value. Only with concerted and sustained attention to these interim steps can actual value to health care can be measured and used to improve quality and reduce waste in our healthcare system.
ASSESSING THE VALUE OF SURGICAL CARE
Justin B. Dimick, M.D., M.P.H., and John D. Birkmeyer, M.D., University of Michigan
Surgery accounts for a large proportion of healthcare services in the United States. The number of patients undergoing inpatient surgery doubled from 2000 through 2006 (from 23 million to 46 million) (National Center for Health Statistics, 2006). Surgical care also comprises a major component of healthcare expenditures, exceeding 40 percent of overall spend-
ing for inpatient care (National Center for Health Statistics, 2006). With healthcare costs skyrocketing, any effort to curtail their growth will have to include surgical care. Payers and purchasers also increasingly recognize that costs must be controlled without sacrificing quality. Consequently, their focus has shifted to optimizing value, rather than considering quality or costs in isolation.
When assessing the value of surgical care, there are two perspectives to consider. The first perspective—the value of surgical interventions— considers the value of surgery, relative to other approaches, for treating specific conditions. Often referred to as “technology assessment,” this perspective uses the tools of evidence-based medicine to evaluate the effectiveness and cost-effectiveness of new interventions. Identifying and eliminating surgical services of no value (waste) or low value will reduce healthcare spending without impacting quality.
Motivated by the widespread variations in outcomes and costs across providers, the second perspective assesses the value of specific surgical providers relative to others. Value assessment in this context, provider profiling, is particularly timely and is the focus of several public reporting and value-based purchasing efforts. Value can be optimized by directing patients to the highest-value hospitals and surgeons—those that provide high-quality, efficient health care.
Assessing the value of surgical care is challenging. This paper surveys existing tools—what we have—and tools on the immediate horizon—what we need—for value assessment in surgery. Within each perspective, we consider the evaluation of two key domains: outcomes and cost. We close by considering policy approaches for using the tools discussed to improve value in the context of surgical care.
Assessing the Value of New Surgical Inventions
The last decade has seen explosive growth in new medical technology. While this trend is pervasive in medicine, it is disproportionately focused in procedural specialties, especially surgery. There are new surgical procedures for conditions that were previously not treated. For example, bariatric surgery for morbid obesity has increased tenfold over the past decade and is now the second most common abdominal operation in the United States (Santry et al., 2005). There are also new, less invasive procedures that replace existing surgical procedures. For example, endovascular repair of aortic aneurysms has largely replaced the conventional, open surgical procedure (Schermerhorn et al., 2008).
New technology is an important driver of healthcare cost growth (Baker et al., 2003; Fuchs, 1999). There is general consensus among economists, policy makers, and healthcare purchasers that the introduction of new
surgical procedures, pharmaceuticals, and diagnostic imaging increases healthcare spending. However, there is extensive debate regarding the value of this new technology—whether the benefits are worth the costs (Cutler and McClellan, 2001). Understanding the value of new surgical interventions requires an evaluation of both outcomes and costs.
Comparing the effectiveness of new surgical interventions is traditionally the domain of evidence-based medicine. Principles of evidence-based medicine are central to the assessment of the value of novel therapies, including pharmaceuticals, medical devices, and surgical procedures. The goal of this assessment is to understand the comparative effectiveness and cost-effectiveness of new interventions. Because these tools are no different for surgery than for other new technologies or interventions and are considered elsewhere in this report, we consider them only briefly here.
Comparative effectiveness is evaluated by critical examination of randomized clinical trials and observational studies. The goal of these studies is to quantify the net benefit, in terms of healthcare outcomes, of the new surgical intervention compared to the next best alternative. Randomized trials, which minimize baseline differences in comparison groups, are widely considered the gold standard for evaluating new interventions. However, observational studies are important for two reasons. First, observational trials are sometimes the only option. In many situations, randomized trials are not feasible due to expense or a lack of clinical equipoise. Second, observational studies, particularly if they are population based, provide an estimate of the effectiveness of an intervention in the “real world.” In contrast, randomized trials provide evidence of efficacy in a narrow, carefully selected subpopulation.
Carotid endarterectomy, a procedure to prevent stroke, is one example in which observational studies made a contribution beyond the randomized trials. For this procedure, randomized clinical trials and population-based studies yielded very different estimates of surgical risk. Using the national Medicare population, Wennberg and colleagues demonstrated that outcomes after carotid endarterectomy were much better in hospitals that participated in the randomized trials compared to other, lower-volume facilities (Wennberg et al., 1998). Because surgical decisions are made by weighing the risks versus the benefits of the procedure, these population-based estimates of surgical outcomes are necessary to guide decision making and to understand the value of a surgical intervention in the real world.
Although the basic tools for evaluating comparative effectiveness exist, several challenges must be overcome. First, we need to address the paucity of evidence evaluating new interventions (Phillips, 2008). In the United
States, we have an undeveloped infrastructure for evaluating evidence. For primary evidence of benefit, we often rely on trials initiated by investigators or industry. For synthesis studies, such as meta-analysis, we rely on networks of volunteers, such as the Cochrane Collaborative. A national infrastructure for setting priorities and funding studies is a necessary first step in filling the evidence void. The second challenge we need to address is the rapid uptake of unproven surgery. New surgical techniques often become widespread prior to good evidence of their benefit. This premature diffusion may be due to the lack of regulatory oversight of surgical techniques and devices. There is currently no gatekeeper, analogous to the Food and Drug Administration (FDA), to prevent new surgical technologies from being adopted prior to good evidence of their benefit.
Strengthening the link between evidence and insurance coverage would also help slow the premature adoption of new technology. Currently, we rely on individual payers to evaluate and make coverage decisions on most interventions. The Medicare Evidence Development & Coverage Advisory Committee (MEDCAC) was recently created to advise the Centers for Medicare and Medicaid Services on national coverage decisions (Holloway et al., 1999). While this effort is no doubt a good start and provides a framework on which to build, it currently evaluates a small fraction of new interventions.
The costs of new interventions must also be considered. In assessing new technologies, the costs of an intervention must be considered in the context of its clinical benefit. While some new interventions are actually cost-saving, most result in an incremental increase in healthcare costs. Cost-effectiveness is a formal method for integrating evidence of benefit with information on costs. The cost-effectiveness of new interventions is evaluated as the incremental benefit divided by the incremental cost, relative to the next-best alternative. Most often, cost-effectiveness is evaluated using decision analytic techniques, and reported as the cost (in dollars) per quality-adjusted life-year (QALY). The best evidence regarding effectiveness, as described in the section above, is used in the numerator. The incremental cost (the denominator) is often the most challenging estimate to obtain. Good estimates of intervention costs must be performed from the societal perspective; these often include the costs of the intervention itself, other healthcare costs, and indirect costs to society (e.g., time lost from work).
Although the tools of cost-effectiveness are also well developed, there are still important challenges to overcome. First, we must address the inconsistent application of cost-effectiveness methods. For example,
a recent review of studies focusing on the cost-effectiveness of carotid endarterectomy found tremendous differences across studies. Divergent conclusions of cost-effectiveness were reported from studies that addressed the same questions and used similar inputs (Holloway et al., 1999). For an asymptomatic patient, the cost-effectiveness varied from 1.8 months at a cost of $52,700 per QALY to 3 months at a cost of $8,004 per QALY. Until this problem is addressed, critics will continually point to the inconsistent results of cost-effectiveness studies.
Although we have the necessary tools to evaluate effectiveness and cost-effectiveness, we clearly need a central organization for applying them in a uniform manner. There are obvious precedents. For example, the National Institute for Health and Clinical Excellence (NICE) was established as a part of the British National Health Service in 1999 (Pearson and Rawlins, 2005). NICE was created to set standards for the adoption of new healthcare technologies and explicitly take into account both clinical effectiveness and cost-effectiveness. Some advocate the creation of a similar organization in the United States. With the creation of such an organization, we would make the necessary first step toward improving the value of surgery by identifying and potentially reducing the use of surgical services with small (or expensive) marginal benefits.
Assessing the Value of Hospitals and Surgeons
The second perspective to consider is the value of surgical providers— surgeons and hospitals. Motivated by the widespread variations in use, quality, and costs across surgical providers, this perspective is particularly timely and is the focus of several public reporting and value-based purchasing efforts.
Empirical data from numerous sources reveal widespread variations in morbidity and mortality after surgery. Recent data from the 123 hospitals participating in the National Surgical Quality Improvement Program show that morbidity rates after colon surgery range from 3 to 23 percent, even after adjusting for differences in patient’s baseline risk (Figure 3-1). Knowledge of these variations has led to an unprecedented number of efforts aimed at measuring surgical quality. Unfortunately, these efforts are hindered by a lack of good measures.
The measures we currently have—individual quality indicators—are severely limited (Birkmeyer et al., 2004). Hospital morbidity and mortality rates are often too “noisy” due to the small number of cases performed at individual hospitals (Dimick et al., 2004). Hospital volume, widely used in
surgery, is an imperfect proxy for individual provider performance. Process measures, widely used for measuring the quality of medical diagnoses, are not as useful in surgery. Unfortunately, processes that are strongly related to outcomes (i.e., high leverage) are not known for most surgical procedures (Hawn et al., 2008). Finally, with the growing number of measures currently used, it is difficult to know how to interpret multiple, conflicting, quality indicators (O’Brien et al., 2007).
With these limitations of individual measures, we need a better approach for assessing surgical quality. Composite measures, which combine multiple individual indicators, can overcome many of these limitations (AHRQ, 2006; O’Brien et al., 2007; Staiger et al., 2009). By pooling multiple measures, they become less “noisy” and provide more reliable estimates of hospital performance. Composite measures also address the problem of multiple competing or conflicting quality indicators. They provide a single, easy-to-interpret, assessment of global quality. One challenge with composite measures is to optimally weight the input measures. The most
common approach to weighting measures is to provide equal weight or rely on expert opinion.
However, there is a growing trend toward the empirical weighting of input measures. With this technique, each of the inputs is weighted according to how reliably it is measured and how closely it relates to a gold standard quality measure. Staiger and colleagues recently published the methods for creating these measures using aortic valve replacement (Staiger et al., 2009). They found that a composite measure of risk-adjusted mortality and hospital volume with aortic valve replacement combined with risk-adjusted mortality for other cardiac procedures explained 70 percent of the hospital-level variation in mortality and was better at predicting future performance than any individual measure (Figure 3-2).
Assuming the perspective of a healthcare payer, such as Medicare, the costs of surgical care are a function of price per case and the number of procedures performed. Price, the payment for the episode, varies to some extent. Unfortunately, the tools we have to measure hospital resource use—length of stay and charges—are not useful for profiling providers. For most operations, the efficient use of resources is already incentivized due to bundled payments for physicians and hospitals (e.g., prospective hospital payment for Medicare and most private payers). For example, Medicare
payments for coronary artery bypass surgery vary only 13% (from $31,554 to $35,656) from hospitals in the lowest quartile to the highest quartile of resource use (Hackbarth et al., 2008). In contrast, there is much more variation in payment from the top to bottom quartile for readmissions (200%) and postdischarge care (110%). This is not surprising when you consider the potential sources of increased costs across different phases of the surgical episode (Table 3-2). Payments for each phase of care depend on both practice style and the quality of care.
What we need to profile provider efficiency adequately are measures that estimate resource use for the entire episode of surgical care—preoperative, perioperative, and postdischarge. The data and methods for creating such measures already exist. As a starting point, payment data from Medicare could be used. This would require using inpatient, physician, and outpatient files. The first step, and perhaps the most challenging, would be to use claims data to empirically define the surgical episode, either using a defined interval (30, 60, or 90 days) or identifying a natural cutoff where claims drop back
TABLE 3-2 Sources of Variation in the Cost of Surgery for Each Phase of the Surgical Episode
Examples of Practice Style-Related Excess Costs
Phases of Surgical Episodes and Payment Types
Examples of Quality-Related Excess Costs
Excessive rates of discretionary procedures
Initial decision making (decision to operate)
Unnecessary consultations, testing or imaging (higher unbundled payments)
Perioperative period (preoperative testing, procedure, and immediate postoperative care)
Complications resulting in “bumping” of diagnosis-related group level or outlier status
Excessive cost shifting (from inpatient stay to ancillary services)
(medical services after initial discharge)
Complications resulting in more hospital, physician, and ancillary services after discharge
to the presurgical “baseline.” Once these episodes are defined, hospitals could be profiled on the total payments during all phases of care.
While the price per case is important, the number of procedures performed is likely a much more important driver of total spending on surgical services. Like variations in outcomes, empirical data support wide variations in the use of surgery. Although decades of research show geographic variations in the use of surgery, this body of work has recently moved into the mainstream. For example, a New York Times interactive feature (currently available on its website) provides data on the use of heart bypass, knee replacement, and mastectomy across the United States (data are provided by the Dartmouth Atlas of Healthcare) (New York Times, 2007). For all three procedures, the use of surgery varies dramatically across regions; with heart bypass, rates of surgery vary more than fivefold from 1.9 to 9.5 per 1,000 Medicare beneficiaries.
Despite the growing awareness of these variations in the use of surgery, very little has been done to address them. Unfortunately, the existing tools for measuring utilization have problems that limit their widespread use. One approach—used in the New York Times feature—was pioneered by John Wennberg and the Dartmouth Atlas working group. The Dartmouth Atlas reports regional rates of utilization for each Hospital Referral Region (HRR) in the United States (Dartmouth University). These regions, which are determined empirically, are determined based on where patients receive complex surgery (i.e., cardiac surgery, neurosurgery) and often include multiple large hospitals within each HRR. As a result, this unit of measurement is much too broad to foster accountability. Simply put, individual hospitals or healthcare systems cannot be held responsible for the use of surgery in the entire region (Fisher et al., 2007).
The appropriate level of analysis—one that could be held accountable— would include only one hospital system. Recently, Fisher and colleagues have developed a novel unit of analysis for this purpose, the physician-hospital network (PHN) (Fisher et al., 2007). Each PHN is made up of a hospital and its extended physician medical staff. PHNs are created by assigning each patient to a primary physician and then assigning each physician to a hospital. Thus, each PHN is a virtual network of physicians clustered around a central hospital. Preliminary data reveal large variations in the use of surgery across PHNs. For example, the use of hip replacement surgery varies threefold across the largest 20 PHNs in the Medicare population (Figure 3-3).
Rates of surgery in each PHN could be used to improve the value of surgery in several ways. First, public reporting of PHN rates of surgery would allow patients to understand the “aggressiveness” profile of their hospital. Patients who are offered surgery in an aggressive system could seek a second opinion in a neighboring PHN that is more conservative.
Second, PHNs with high rates could be audited for appropriateness, ensuring that surgery is not being overused in these systems. This approach would incorporate both tools—appropriateness criteria and regional rates of utilization—in an efficient and meaningful way. Finally, hospitals and PHNs with high rates of specific procedures would think twice about hiring another practitioner, which would limit capacity and reduce supply-induced demand.
Unfortunately, even if the right unit of measurement is used, there remain considerable challenges in understanding existing variations in the use of surgery. Specifically, it is hard to know how much surgery should be performed in a given population—which rate is right. One approach, measuring appropriateness of care, is based on the assumption that variations are driven by the inappropriate use of surgery. Pioneered by Robert Brook, measuring appropriateness involves identifying a set of criteria that include all possible clinical indications for a procedure (Brook et al., 1990). There are several reasons why this approach will not help understand existing variations. First, creating appropriateness criteria for every procedure is a
daunting task. Second, even if resources could be marshaled, many clinicians disagree about clinical appropriateness, especially physicians from different specialties (Kahan et al., 1996). Finally, empirical data suggested that regions with high rates of surgery do not necessarily provide more inappropriate care (Leape et al., 1990).
Dealing with this problem is no simple task. Most decisions to perform surgery are in the middle of the spectrum, somewhere between frankly inappropriate and clearly indicated. Empirical evidence indicates that shared decision making between patients and physicians results in lower rates of discretionary surgery (Dartmouth University, 2009). To address these variations, we therefore need effective strategies for incorporating patient preferences and evidence into decision making. Ensuring that patients, rather than surgeons, make the decision to proceed with discretionary surgery will clearly improve value.
Policy Approaches to Improving Value
The measurement tools discussed above will have to be translated into policy to improve the value of surgical care. The policy remedy for eliminating or minimizing low-value surgical services depends on the perspective. When considering surgical interventions, the leading policy remedy is value-based insurance design (Chernew et al., 2007). This approach is considered in detail elsewhere in this report. In brief, value-based insurance design makes the patient pay more out of pocket for less valuable services. For example, healthcare interventions considered “high value” are free (e.g., diabetes medications and supplies), whereas “low-value” interventions would require a high copayment. This type of benefit design has been shown to encourage the use of high-value services. Although mostly applied to pharmaceuticals thus far, this approach could also be applied to surgical interventions. Of course, addressing the challenges of assessing the value of new surgical interventions, discussed earlier, will be necessary before value-based insurance design can be applied to surgery.
When considering low-value surgical providers, the most promising policy solution is value-based purchasing. Value-based purchasing is a general term encompassing several different mechanisms for realigning provider incentives to reward higher-quality and/or lower-cost health care, including pay-for-performance, tiered copayments, and others. These payment mechanisms are considered in detail elsewhere in this report. With pay-for-performance, physicians are given a bonus payment for meeting certain quality benchmarks, usually adherence to evidence-based processes of care (Rosenthal et al., 2007). With tiered copayments, patients would pay less to obtain care from high-quality providers and pay more to obtain care from low-quality providers. While these efforts are gaining momentum and are
applied by both private and public payers, better measures of outcomes, cost, and utilization are needed before they can reach their full potential.
INFORMATION FLOW IN DIAGNOSTIC IMAGING: CONSUMER, CLINICIAN, FACILITY, PAYER? WHY IMAGING VALUE IS DIFFICULT TO MEASURE
Howard P. Forman, M.D., M.B.A., Yale University, and Frank Levy, Ph.D., Massachusetts Institute of Technology
With national healthcare expenditures at an all-time high, public and private payers are increasingly looking at component spending to quantify relative value in order to improve the efficiency of spending and, ultimately, to improve health at any given spending level. Spending for diagnostic imaging (DI)1 is already a substantial component of total spending and is growing rapidly, thus consuming ever-larger pieces of the total (Baker et al., 2008; Government Accountability Office, 2008). In an environment where insurance design has taken center stage in efforts to “rationalize” the spending on health care from public and private sources, the limited information and various incentives that underlie the decision to order an image are important areas for study. Cost-effectiveness and comparative effectiveness analysis have been suggested as a necessary first step in this direction.
Before even considering such evaluation, one must be cognizant that such analyses can have impact only if payers are directly or indirectly capable of using such information. In the private insurance market, this is certainly within the realm of possibility, as radiology benefit management (RBM) companies have come to play a major role. In the Medicare market, such direct intervention has only recently been contemplated, but not implemented on any scale (Government Accountability Office, 2008). In the long run, however, these provider-based solutions will be most effective if they can also use cost-effectiveness information gleaned to reshape patient expectations. If quality and value are in fact measurable, the consumer may rationally be expected to play a role in reducing low-value spending and, perhaps, reducing overall spending.
In this paper, we begin by exploring the current state and challenges to imaging cost-effectiveness analysis. In the second section, we explore the factors that underlie a decision to order an image, emphasizing the reasons why a rational decision model may fail in the real world. Finally, we sug-
gest avenues that might be pursued to ameliorate these market failures in an effort to counter this effect.
Technology Assessments in Radiology
Some of the earliest seminal work in cost-effectiveness analysis focused on imaging and screening, with early studies of lung cancer and breast cancer screening dominating the dramatic increase in available technologies (Blackmore and Magid, 1997; Fineberg et al., 1977; Shapiro et al., 1966; Taylor et al., 1981). During the 1980s and 1990s, formal trials evaluating the effectiveness of imaging in screening and diagnostic utility were begun, some with public sector funding. Subsequently, studies of clinical utility have flourished with some notable limitations, generally related to the inability to connect imaging directly to outcome (Blackmore and Magid, 1997; Hollingworth, 2005; Singer and Applegate, 2001).
In the past few years, several papers and a book have been published summarizing and evaluating the existing cost-effectiveness and value-based imaging literature (Eddy, 2006; Hollingworth, 2005; Hunink, 2008; Otero et al., 2008).Their findings suggest limitations in the existing literature as well as practical explanations for why imaging may be less amenable to traditional studies. In the forward to the book Evidence-Based Imaging, Hillman states, “Despite our best intentions, most of what constitutes modern medical imaging practice is based on habit, anecdotes, and scientific writings that are too often fraught with biases,” a point we return to below. Even in Blackmore and Medina’s book, the majority of clinical applications appear to have limited or insufficient evidence to truly inform decision making, and it is rare to find an indication for which “strong evidence” is present.
In their review and meta-analysis of cost-effectiveness analysis in medical imaging, Otero and colleagues (2008) note that there has been an increase in the number of analyses over the last decade but not in analytical quality. They go on to describe and reference (Singer and Applegate, 2001) the multiple reasons why cost-effectiveness analysis in radiology may be more difficult: (1) imaging technologies evolve more rapidly than the ability to gather clinical evidence supporting their use and (2) the inability to accumulate sufficient data prior to widespread adoption.
In her accompanying editorial to Otero and colleagues, Hunink (2008) reviews the history of imaging cost-effectiveness research and raises additional cautions. She points out the variation between different experts in assumptions (including the dramatic variation in discount rates as used by UK and U.S. policy boards). Further, she notes the exclusion of increasing longevity from total costs, despite the impact this may have from the societal perspective. She amplifies on the concern that while DI cost-
effectiveness analyses have increased in number, they have not kept pace with other disciplines in methodological improvements in quality.
Ultimately, the evidence for imaging cost-effectiveness (and, presumably, value) is best in the category of breast imaging and generally poorer in other areas (Eddy, 2006). While numerous investigators have performed studies targeted at neuroimaging, musculoskeletal imaging, and cardiac imaging (among others), one major limitation has been the relatively narrow indications that are studied versus the application in clinical practice.
Ideal Versus the Reality—Why an Image Is Ordered
In a rational choice framework, the image ordering decision would be based on a social cost-effectiveness analysis that compares the cost of the image to the expected value of improvements in patient health that the image produces. The actual ordering decision falls short of this ideal for at least four reasons.
As noted above, the necessary relevant cost-effectiveness information is often unavailable.
Patients may exert pressure to receive an image based on their overestimate of the image’s benefits.
The ordering physician may face financial and psychological incentives to order the image.
The doctor-patient relationship—a principal-agent relationship— mitigates against correcting overestimated benefits and misaligned incentives.
With respect to patients, research demonstrates a statistically significant improvement in “well-being” and a reduction in anxiety after receiving a diagnostic workup, irrespective of positive or negative findings (Lucock et al., 1997; Mushlin et al., 1994). Thus, while there may be no meaningful impact on outcome or even a long-term impact on perceived well-being, information may, under certain circumstances, provide benefit in and of itself that is difficult to measure in traditional survey instruments.
It is possible that some of this benefit is based on patient misperception— for example, an underestimate of the risks associated with a false positive or an overestimate of the costs of two weeks of watchful waiting. Nonetheless, the part of the benefit that remains after exposure to this information should, in theory, be included in a social cost-effectiveness calculation.
With respect to physician incentives, various clinical settings may offer the physician financial and/or nonfinancial incentives—potential “payoffs”—to order the image. In the category of financial payoffs, we generally think of areas where a true economic rent is recovered (Winter
and Ray, 2008). If the physician is ordering a study where the payment exceeds the cost, there is a true profit potential. Even in the presence of strong ethical adherence to the Hippocratic oath and similar constructs, the physician may have incentives to over-order imaging studies. This fits under the rubric of supplier-induced demand.
In the absence of direct financial gain, there may be additional, non-financial payoffs including the following:
The ordering physician may be able to reduce effort by having a briefer or less intense physical examination.
The ordering physician may be able to have a shorter operating room commitment.
The ordering physician may avoid potential malpractice costs (real or perceived)—this being the defensive medicine argument.
The payoff may have no fungible equivalent but may be reflected in decreased physician concern or uncertainty regarding the patient in question.
From a cost-effectiveness perspective, these different payoffs carry different weights. The ordering physician’s profit (if any) should carry no weight. The reduced effort may or may not carry weight depending on the use of the saved time (see the example below).
The physician’s desire for reduced uncertainty is potentially important. Behavioral economists emphasize the way in which decisions can be driven by a desire to avoid “regret”—the guilt and responsibility one feels upon recognizing that one has made the wrong choice in an uncertain situation (Thaler, 1994). In a clinical setting, regret would arise from a misdiagnosis that could have been avoided by ordering an image.2 The psychic value of regret avoidance may be insufficient to justify the image’s cost, but it remains a benefit that should be included in a social cost-effectiveness analysis.
Problems in the doctor-patient relationship begin from the position that a fundamental element of an economically competitive market is full information for buyers and sellers. In such a market, information about price, value, and quality is necessary and symmetric. This implies that the buyer and the seller each have sufficient information about their product and/or service to enter into an exchange.
While there are few perfect markets in health care, the prescription drug industry offers a relevant comparison and basis for discussion. Prescription drugs, although protected for a period of time by a government-sanctioned monopoly (patents and FDA exclusivity, both of which confer the ability to collect monopoly economic rents), exist in a market where good (not great) information about efficacy, price, and outcome exists. In such a market, an empowered consumer may make rational decisions about purchasing drugs directly. In the presence of insurance (the most frequent situation), information may be used to steer patients to individual branded and generic drugs, using economic incentives and value-based approaches. In this market, pricing and spending growth has been muted and dramatic gains in market share have been seen for generic drugs, in particular. As a further consideration, the pharmacy benefit management (PBM) industry has risen up to incorporate economic and informational incentives targeted at steering patients to lower-cost options.
In imaging, value is best represented by our traditional metrics of effectiveness (limited at best and noted above); quality can vary considerably across practices; and pricing information is often limited or completely opaque. As Blackmore and Medina have tried to do, efforts at organizing information for clinicians are emerging. This brings us to the infrequently discussed topic of the principal-agency problem.
The principal-agent framework applies whenever one party (the agent) is hired by another (the principal) to take actions or make decisions that affect the payoff3 to the principal (Besanko et al., 2003). In health care, this paradigm is further complicated by third-party payers. However, for all intents and purposes, it fits the physician-patient relationship: the physician is most often the agent, with the patient being the principal. While the Hippocratic oath may seem to sterilize the pecuniary risk in the relationship, there is ample evidence of genuine conflict. For example, providing patients with information on physician incentives, the risk of false positives, and so forth, might reduce patient pressure for images that are medically unnecessary. The principal-agent problem reduces the likelihood that the physician will provide such information.
While not often described in this manner, the findings of the group at Dartmouth and their well-known presentation at the www.dartmouthatlas.org site support the informational and prinicipal-agency issues in healthcare. Their group suggests that the variation in healthcare use fits into three categories: (1) systematic underuse of effective care such as beta-blockers after heart attack, or diabetic eye care; (2) misuse of preference-sensitive care such
as discretionary surgery (as documented by striking variations among neighboring communities in rates of surgery); and (3) overuse of supply-sensitive care such as physician visits and hospitalization rates among chronically ill patients (Dartmouth University, 2009). There are no necessary conflicts between this typology and the suggestion that principal-agency issues may, in fact, be an overriding concern with regard to imaging. Further, the addition of this issue to the usual description begins to explain why growth in imaging may be greater than would be expected merely by direct financial gains.
The Radiology Benefit Management Industry: A Solution or an Interim Palliation?
The radiology benefit management industry has risen up in response to the rising cost of imaging and the difficulty of applying traditional managed care mechanisms for controlling utilization (Appleby, 2008). Using a combination of network control and monitoring, as well as more traditional means of pre-authorization and pre-certification, RBM companies attempt to control the principal-agent problem while relying on evidence. In the absence of strong direct evidence, they are forced to use consensus approaches to decision making.
The industry is an interesting parallel to the pharmacy benefit industry in that it generally takes no risk in its contracting and is paid, mostly, on a transactional and performance basis. By judging the growth of the industry and its penetration in the presence of low switching costs, one would assume that RBM companies perform well; but in the absence of statistically valid data, one can only infer this.
The RBM industry’s greatest strength lies in its ability to validate and disseminate information as well as oversee the “agents.” This last point is, perhaps, most difficult to directly address and verify because the relative payoff (as described above) to the referring clinician is difficult to measure. Still, it does imply an additional check on the ordering practices of referring clinicians and, theoretically at least, patient care.
The Road Forward
As we have argued, better cost-effectiveness information, distributed more widely, could be an important element in slowing the growth of unnecessary imaging. At the outset, such information would improve the “value added” by RBMs. The information would also make physicians more aware of both the costs of an image and the chances that it would, in fact, reduce diagnostic uncertainty. Similarly, access to this information would allow patients to take a more active role in the image ordering decision.
A realistic goal involves producing cost-effectiveness information that can serve as general decision guidelines. The more ambitious goal—cost-effectiveness analyses leading to detailed rules—is impractical because of hard-to-measure benefits (e.g., increase in the patient’s feeling of well-being). In addition, detailed rules are difficult because cost-effectiveness calculations for a given image critically depend on context. A few explanatory (and not infrequent) scenarios illustrate the point.
Three trauma patients simultaneously arrive at the hospital, all with moderate risk of nonpenetrating traumatic injury. A dedicated physical examination may reduce the need for further imaging. However, given resource constraints, computed tomography of the brain, cervical spine, chest, abdomen, and pelvis is performed on each. While this approach has minimal risk and should improve the outcome for the patient, it also will reduce the need for dedicated primary and secondary evaluation by the trauma surgeons and clinical staff. In this case, nursing resources (in addition to those of the trauma surgical team) may also benefit from reduced demand. Thus, the value flow (payoff to agent in the parlance of the principal-agent situation) includes the hospital (which can reduce staffing in this scenario) and the physician staff (which may reliably manage more such patients in the setting of additional imaging).
A sexually active patient is admitted to the emergency room with fever and lower abdominal pain. The emergency room physician is concerned that the pain is pelvic and perhaps related to pelvic inflammatory disease. The physical examination suggests, but is not definitive for, cervical motion tenderness. Consultation with the obstetrics-gynecology service is sought. The physician consult requests a transvaginal ultrasound prior to physically examining the patient. In this situation, the new consult can avoid making multiple trips to see the patient and may, ultimately, be able to make a remote decision to admit the patient without necessarily seeing the patient on an emergent basis. Again, while the outcome for the patient is not harmed, the payoff is to the consulting service.
A patient presents to the orthopedic surgeon with signs and symptoms of internal derangement of the knee. Arthroscopy is indicated. The orthopedic surgeon obtains a magnetic resonance imaging study in order to facilitate the procedure and provide a roadmap to the injuries. There may be a true advantage to the patient in limiting the intervention and also detecting rare complications. However, the orthopedic surgeon additionally gains from a shorter procedure.
In all scenarios, there is no violation of the Hippocratic oath. There is, however, a lack of perfect information and a principal-agent conflict. In each case, the payoff to the provider and the cost to the payer are seemingly disconnected in the absence of additional monitoring.
Once the goal is agreed to, cost-effectiveness information for the most frequent imaging settings can be obtained in a number of ways: studies that record the physician’s decision-making process to see in which circumstances (and with what likelihood) an image changes a decision; comparisons of the use of imaging in the United States and Canada (where highly restricted capacity limits imaging in certain situations), and efforts at discerning the incremental non-patient care payoff to the referring clinician, perhaps in settings where fee-for-service reimbursement is discouraged or absent. Current efforts by public and private payers to utilize the concept of the “patient-centered medical home” (Deloitte Center for Health Solutions, 2008) would fit in this latter category.
In summary, there is nothing truly extraordinary about diagnostic imaging that can explain its outsized growth in spending. Rather than the nature of the technology, it is the nature of the relationship between the flow of value to the patient, the referring clinician, other providers, and the presence of third-party payers that may increase the use of low-value (to the patient) imaging. Efforts to measure and, perhaps, capture the flow of value to the responsible clinician may allow for improved overall patient outcomes at lower costs. At present, the RBM industry plays a temporizing role in attempting to achieve this goal.
ASSESSING THE VALUE OF PREVENTION
David O. Meltzer, M.D., Ph.D., The University of Chicago
Prevention is widely recognized as a critical part of good health care and is a foundational principle of numerous aspects of the health professions, including the fields of preventive medicine and public health. Indeed, prevention can produce important health benefits in both length and quality of life and may have favorable effects on healthcare costs in some instances. However, prevention is not always beneficial or a desirable use of limited resources. As a result, there is a strong case for the application of principles of cost-effectiveness to the analysis of prevention. Medical cost-effectiveness analysis can provide a systematic framework for determining whether the benefit of a medical intervention of any type—whether preventive or not— is worth its cost. For this reason, medical cost-effectiveness analysis is an important tool to apply to all healthcare spending. Nevertheless, several aspects of prevention make it especially important that preventive services be analyzed through the lens of cost-effectiveness.
First, benefits that accrue in the future, such as those that come from prevention, are less valuable than similar benefits that could occur in the present. Cost-effectiveness analysis provides a well-developed framework
in which to balance benefits and costs occurring over varying periods of time. Indeed, because future benefits of prevention may be high and some prevention efforts may be associated with risks in the present, it is important to have a tool such as cost-effectiveness analysis to create aggregate measures that combine potential current harms and potential future benefits into a single measure of net benefit. Similar issues arise on the cost side because prevention may often generate costs in the short run yet have the potential to reduce costs over the long run, although the latter is by no means guaranteed or even the norm. Second, because benefits and costs may be uncertain in both the short and the long run, at individual and population levels, the framework of cost-effectiveness analysis can be particularly important for determining the value of prevention, especially since it is well suited to integrating uncertain outcomes into decision making.
Third, since the benefits and costs of prevention can also be borne by multiple parties, issues of perspective are important in the assessment of preventive services.
All of these factors can be captured within the context of medical cost-effectiveness analysis and make its application to the assessment of preventive services especially useful and important.
The application of medical cost-effectiveness analysis to preventive services has a long and distinguished history, including most notably Weinstein and Stason’s (1976) pioneering work on the cost-effectiveness of the treatment of hypertension and Louise Russell’s Is Prevention Better Than Cure? (Russell, 1986). This paper does not seek to synthesize or summarize that immense body of work but instead to briefly introduce the key concepts of medical cost-effectiveness analysis for users unfamiliar with it and to highlight four key points about the cost-effectiveness of prevention that may be less familiar even to readers highly familiar with the field.
A Brief Introduction to Medical Cost-Effectiveness Analysis
Medical cost-effectiveness analysis seeks to provide a logically coherent framework in which to maximize the health benefits of spending on health care subject to resource constraints. Medical cost-effectiveness analysis has roots in decision science, economics, and psychology but dates in its current form most clearly to the work of Weinstein and Stason (1976) on the cost-effectiveness of the treatment of hypertension.
Calculating Health Benefits
Following the most commonly used approach, health benefits are measured in terms of their effects on the quality and length of life, as combined
into quality-adjusted life-years. QALYs are a weighted form of life expectancy, where each year of life in any given health state (h) is weighted by a quality-of-life weight (Q(h)) between 0 and 1, where 0 is equivalent to death and 1 is equivalent to perfect health. These quality-of-life weights (also known as utilities) can be derived by a number of psychometric techniques that generally ask patients to rate their health in various health states relative to each other or relative to perfect health. Quite often the quality of life associated with a health state of interest for a given medical intervention has already been studied, and there are published libraries of such quality-of-life weights, such as the one included in the Tufts registry of published cost-effectiveness studies (Tufts University, 2009).
The uncertain nature of health in QALYs is captured by the probability that persons survive in health state h at time t, which can be written as S(h,t). The probabilities of various health states at various times in the future can either be measured directly through the use of clinical trials or estimated based on the analysis of existing data and then modeled mathematically.
A final element in the calculations of QALYs is that people may value outcomes at future times (t) less than outcomes in the present. To account for this mathematically, outcomes in the future may be weighted by a term βt, where β < 1 and t is the time from the present; therefore as time into the future increases, βt decreases and the future receives less weight in the present. This is known as discounting. For example, with future benefits discounted at 3 percent so β = 0.97, a benefit worth 1 unit today would be 0.97 unit if received one year in the future and 0.94 (0.97 × 0.97) if received two years in the future.
Combining all these elements, QALYs can then be calculated as the sum of future years lived in various health states weighted by their quality of life, probability, and time into the future. Thus QALYs can be calculated as
where Σt,h is the sum over all possible times and future health states.
To illustrate with a simple example, assume a person who is thought to be in fair health with Q = 0.6 health this year. Also assume this person has a 70 percent chance of surviving until next year with an associated quality of life of 0.1 and a 30 percent chance of dying by next year and therefore an associated quality of life of 0. Finally, assume that the person has a discount rate of 3 percent and therefore β = 0.97. Given these values, this person would have
To calculate cost-effectiveness, a measure of cost is needed, and this may be derived by a variety mechanisms, including direct collection of data as part of a clinical trial or use of published data on utilization and/or costs. A critical idea is the concept of incremental costs—the extra costs that occur because of one intervention compared to another. For example, if the choice being made is between a newer, more expensive treatment that costs $10,000 and an older one that costs $8,000, the relevant incremental cost is $2,000. Another critical idea in cost-effectiveness is the idea of perspective—that is, the question of costs and benefits to whom.
Most experts suggest that for most purposes, a societal perspective is appropriate, including all costs and benefits regardless of to whom they accrue. This has a variety of implications for measuring costs. One obvious implication is a preference for using direct measures of cost rather than price or charges for a service since the markup of the latter two over cost is simply a transfer payment from the entity buying the service to the entity that produced it, not a real social cost. More subtle examples of this issue that are relevant for prevention relate to future medical and nonmedical costs, which are discussed further below.
The case for taking a societal perspective is most fully articulated in a 1996 volume entitled Cost-Effectiveness in Health and Medicine edited by Marthe Gold and others, which represents the work of a Public Health Service panel asked to develop consensus on core methods in cost-effectiveness analysis. Although the field has advanced since the publication of that book more than a decade ago, it remains a very valuable reference for anyone wishing to learn more about this area.
Calculating and Utilizing Cost-Effectiveness Ratios
Having assembled data on the health effects of an intervention in QALYs and its costs, the next step is typically to calculate the cost-effectiveness ratio by dividing the cost by the number of QALYs gained. Such ratios are then often put into what is called a league table, which lists these interventions in order of increasing cost per QALY so that the most cost-effective interventions are at the top of the table and the least are at the bottom. Table 3-3 is a league table that reports the cost per life-year saved for a number of preventive services.
Reviewing the table, one sees that some interventions, such as screening neonates for phenylketonuria, may both produce health benefits and save money and are therefore certainly desirable from the perspective of cost-effectiveness. Other interventions may not produce health benefits and are costly. Those interventions are dominated and should not be pursued.
TABLE 3-3 Cost per Life-Year Saved for Preventive Services
Cost per Life-Year Saved
Neonatal screening for phenylketonuria
Secondary prevention for hypercholesterolemia in men ages 55-64
Secondary prevention for hypercholesterolemia in men ages 75-84
Primary prevention for hypercholesterolemia in men ages 55-64
Screening exercise test for coronary disease in men age 40
Screening ultrasound every 5 years for abdominal aortic aneurism
Most interventions, however, will have positive costs and benefits and will therefore be like the remaining interventions in the table, with the decision about whether they are cost-effective determined by the threshold one uses in terms of the cost per QALY (or cost per life-year if quality of life is not accounted for). While there is no specific agreement about what cost per QALY should define the threshold for cost-effectiveness, estimates in developed countries often range from about $50,000 per QALY to $200,000 per QALY and are justified by comparisons to implicit values that people place on risks to health in other contexts. One example of such an approach involves examination of the wage premiums that people have to be paid to accept jobs that have increased risk of death. Another examines the cost-effectiveness of medical interventions considered to be of borderline cost-effectiveness, such as dialysis among older adults, and compares other interventions to that point of reference. Yet another approach starts at the top of the table and funds interventions up until the point at which available funds for health care are exhausted. However, this approach is not useful when there is no explicit budget for health care or when one takes the perspective that nonmedical costs that would accrue outside such a budget are also appropriate considerations for cost-effectiveness analysis.
In practice, there is also often uncertainty about the cost-effectiveness of an intervention, so that precisely defining a threshold may not be as relevant as looking for extreme results on either end of the spectrum that provide opportunities for more effective resource allocation. For example, in Table 3-3, both increasing the use of cholesterol-lowering drugs for secondary prevention in men aged 55-64 years with hypercholesterolemia and reducing the use of screening ultrasound exams to search for abdominal aortic aneurism in the general population are clearly outside on one side or the other of the cost-effectiveness threshold.
While this general approach of promoting the use of interventions that are highly cost-effective and discouraging the use of those that are clearly not cost-effective is the correct one, it is also worth noting that the scale of the intervention considered is a very important concern if there
is limited ability to promote the use of cost-effective interventions over ones that that are not cost-effective. For example, if the cost-effective-ness threshold were $100,000 per QALY, it might be far more important to promote the use of an intervention with a cost-effectiveness ratio of $90,000 that could apply to many people than to promote an intervention with a cost-effectiveness ratio of $10,000 per QALY that could apply to a much smaller population. One relatively recently developed approach to address this problem is to emphasize the “net health benefits” of an intervention, which calculates the benefits produced by an intervention across the population net of the potential health benefits that could otherwise be produced by reallocating the costs of the intervention to pay for interventions that are at the threshold that defines cost-effectiveness (Stinnett and Mullahy, 1998).
Cost-effectiveness analysis can be criticized on a large number of methodological bases, ranging from how benefits and costs are defined, to how distributional issues are addressed. There is no question that many of these concerns about the approach are substantive. Nevertheless, the value of the approach is suggested by the more than 1,000 applications that have now been published (Tufts University, 2009) and the number of specific examples that have helped inform public policies. One favorite example is the use of Pap smears at varying frequencies, which cost-effectiveness analysis has suggested is highly cost-effective if done every three years, but less so when done every other year or annually. Annual testing costs almost $1,000,000 per life-year saved while adding only hours to life expectancy. Evidence such as this has been important in shaping national recommendations about the frequency of Pap smears, as evidenced by the move away from annual screening and increased emphasis on increasing the fraction of women having Pap smears performed at three-year intervals, if appropriate. As discussed below, the Pap smear example is bittersweet with respect to the value of cost-effectiveness analysis because much of the use of Pap smears in the United States remains at frequencies that are not cost-effective. Still, the small cost of performing cost-effectiveness analyses relative to the large cost of health care itself means that it does not take many examples of even partial success in better targeting or reducing spending to justify the use of cost-effectiveness in policy making.
It should also be pointed out that cost-effectiveness analysis alone should not be the only criterion for decision making. There are a wide variety of other concerns that policy makers, clinicians, and others who might use cost-effectiveness analysis should also consider in making decisions. Thus, the limitations of cost-effectiveness analysis can be compensated for to some extent by understanding that it should not be the only factor in decision making. Indeed, some have argued that one of the most valuable contributions of cost-effectiveness is forcing examination of the
factors the analysis can, and cannot, account for. While the United States has used cost-effectiveness analysis in policy making to a relatively small extent, the experience of many countries around the world, perhaps most notably the United Kingdom’s National Institute for Health and Clinical Excellence, suggests that incorporating cost-effectiveness analysis into the policy-making process can promote discussion of the benefits and costs of medical interventions.
Four Key Points About Prevention and Cost-Effectiveness Analysis
As noted above, there is a long and distinguished history of the application of cost-effectiveness analysis to the analysis of prevention. Rather than to attempt to replicate that literature, the goal of this paper is to highlight a few key parts of it and extend it in the context of recent discussions of the potential of prevention to address key concerns around the control of healthcare costs in the United States.
Point 1: If prevention produces health benefits then it should be worth paying for. Therefore, prevention need not—and generally will not—save money.
During the 2008 presidential primaries and general election, many of the candidates suggested that prevention might be an important source of cost control. Certainly it is true that if future healthcare costs can be averted, it is possible that prevention could reduce healthcare costs, and there are, indeed, examples of disease management programs that have saved costs. Nevertheless, a quick review of the sampling of healthcare interventions in Table 3-3, and the much broader list of preventive measures in the Tufts registry suggests that most preventive health care costs money (Cohen et al., 2008). However, this is not to say these are not worthwhile expenditures. Indeed many preventive healthcare interventions are highly cost-effective. By and large, rather than focusing on the cost savings of preventive health care, we have to take a comprehensive approach that generally will begin with the magnitude of its benefits rather than the magnitude of any reductions in downstream healthcare costs. The key point is that the idea that preventive health care saves money, while perhaps politically attractive, is a very incomplete perspective on the benefits and, hence priorities, for prevention. This point is by no means new (Russell, 1993). Nevertheless it seems to require repeated reinforcement. Perhaps this is a reflection of the difficulty in controlling healthcare costs by other means.
Point 2: If prevention extends life, it can affect costs in the future— medical and nonmedical. This produces economic advantages of emphasizing prevention that improves quality of life rather than length of life.
When prevention extends life, it can often produce costs in future years in terms of both medical and nonmedical costs, both of which can significantly change the cost-effectiveness of the intervention. Such costs have often been neglected in studies of preventive health care, but work by myself (Meltzer, 1997) and others has shown that including those costs can significantly change the cost-effectiveness ratio, often improving the cost-effectiveness of interventions that improve quality of life compared to interventions that increase length of life. This suggests that if one wanted to strengthen the economic case for prevention, focusing on interventions that primarily improve quality of life might be preferred to focusing on those that primarily increase length of life. However, this is less true for younger persons who are still in the workforce, and it may become less true even for older persons if working lives extend as people live longer—but the trend in retirement ages in the United States over the past decades has been the opposite. It should be noted that this is not to say that the goal of prevention should be to save money, but rather that accounting for future costs may make interventions that improve quality of life likely to be more cost-effective compared to those that increase length of life.
Point 3: The value of prevention depends on how we use it—not just which approaches but in whom.
Preventive services are a diverse set of interventions, some highly cost-effective and others not so, but many interventions vary in their cost-effectiveness depending on the context in which we use them. This makes general claims about “prevention’s” effect on costs, health, or the cost-effectiveness of health care overall inherently misleading. Policy discussions require a much more nuanced conversation of the specific approaches to prevention being advocated and the specific population and context in which they will be used. The earlier Pap smear case study provides an excellent example for illustrating the importance of context, since Pap smears are highly cost-effective if received once every three years but have almost no incremental value if done more frequently. Because the majority of Pap smears in the United States are given at frequencies that are not cost-effective, most of the money we spend on Pap smears would be better spent in other ways (Meltzer and Alexander, 2009). This said, providing Pap smears every three years produces benefits that are so substantial and so cost-effective that, even though we waste most of the money we spend on Pap smears, their overall cost-effectiveness remains very high. While eliminating these more frequent Pap smears that are not cost-effective could produce a more efficient healthcare system overall, one must be cautious in eliminating inefficient use if there is risk that efficient use might be reduced as well.
In cases where the benefits of targeted use are more modest, non-selective use can turn a potentially cost-effective intervention into non-cost-
effective one. This has been seen most strikingly in studies of intensive therapy for diabetes, where great heterogeneity in patient preferences about the value of intensive treatment exists (Meltzer et al., 2003). Interestingly, these variations are driven heavily by patients’ feelings about the quality of life associated with the therapy itself. Patients who feel that intensive therapy (with its more frequent fingersticks, injections of insulin, and risk of hypoglycemic events) reduces quality of life are much less likely to experience a net benefit from intensive therapy. Interventions such as this whose benefits depend heavily on patient preferences are often known as “preference sensitive.” A study of the cost-effectiveness of physical exercise found similar results in that exercise was found to be cost-effective only as long as the person exercising considered the time spent to be of reasonably good quality. Because prevention very often involves applying a service to a large number of people in order to prevent illness in a much smaller number, it is likely that many preventive services are highly sensitive to preferences about receipt of the service itself. It may be for reasons such as this that interventions that can be relatively unpleasant, such as colonoscopy, Pap smears, and mammograms, are far less than universally utilized despite strong evidence of their benefits.
Another important implication of the importance of patient preferences in prevention is a behavioral one. Patients whose preferences do not favor an intervention may indeed reject it, potentially improving both the net effectiveness and the cost-effectiveness of the intervention as it is used in practice. We have found that this is true for intensive therapy for diabetes, with patients who find the therapy itself more unpleasant rejecting the intervention. This effect is so dramatic, in fact, that if intensive therapy were used by all older patients, it would actually be harmful. However, as it is used in practice, we find intensive therapy both beneficial and cost-effective (Meltzer et al., 2003).
This is not to say that current patterns of use are ideal. We find that benefits would be great even if intensive therapy were adopted only by people whose preferences suggest they are expected to benefit from it. This makes the case for approaches, such as decision aids, that may help patients make better decisions. Some of our analyses suggest that the value of information that can result in better decision making at the individual level (i.e., the value of individualized or “personalized” care) may be much greater than the value of information that seeks to inform decision making only at the population level (i.e., the single treatment), which has been the focus of most cost-effectiveness analyses (Basu and Meltzer, 2007).
Point 4: If the value of prevention depends on how and in whom we use it, we must evaluate technologies as they are used in practice and seek to improve their use if it is not ideal.
The idea that the value of prevention depends on how we use it, whether this is misuse or overuse or the failure to individualize care, suggests that we need to think carefully about how to use cost-effectiveness analysis in policy making. If a cost-effectiveness analysis suggests that an intervention could be cost-effective if used in one way but it is not used that way in practice, should the intervention be considered cost-effective or not? This may be one area where the judgment of policy makers could be especially important, particularly if there are available policy options that can alter patterns of use in ways that could change the cost-effectiveness of the intervention. Examining these questions of how to think about cost-effectiveness in the context of how technologies are used is a very new area of inquiry and one worthy of substantial attention by cost-effectiveness researchers and policy makers.
If one accepts the idea that the value of interventions should be assessed in the context in which they are used, one is immediately drawn to consider approaches that may alter the use of technologies. There are myriads of possibilities available to influence behavior, ranging from patient-focused methods (e.g., copayments, patient decision aids) to provider-focused methods (e.g., payment incentives, practice guidelines, health information systems, opinion leaders). The extensive discussion of value-based pricing at this meeting can readily be understood in this context as an effort to try to influence the cost-effectiveness of therapies by better targeting them to the populations in which they will be most beneficial and cost-effective.
Prevention is a critical part of modern health care and has great potential to influence health and perhaps even help control certain aspects of healthcare costs. However, the value of prevention varies tremendously depending on the approach considered, and on how and in whom it is used. Although cost-effectiveness analysis must be only one consideration in the policy-making process, the tools of cost-effectiveness analysis can provide insight into efforts to maximize the value of prevention. The United States, to date, has used the tools of cost-effectiveness analysis less than other countries. Yet the United States may have more to gain than any other nation because of its exceptionally high level of healthcare spending, the increasing pressures to control that spending, and the harm that ill-informed controls on spending could cause. While the tools of cost-effectiveness analysis will continue to be refined and will never be perfect, it will be critical to utilize the insights gained from cost-effectiveness analysis and apply them to prevention and to the entire healthcare system as we seek to maximize the value of health care in this country.
EVIDENCE-BASED DECISION MAKING OR DECISION-BASED EVIDENCE MAKING? EVIDENCE AND DECISIONS ALONG THE LIFE CYCLE OF PHARMACEUTICAL PRODUCTS
Newell E. McElwee, Pharm.D., M.S.P.H., Pfizer, Inc.
The IOM’s Roundtable on Value & Science-Driven Health Care established the concept of “value” as an early priority (IOM, 2008). Value in health care was characterized by the Sectoral Strategies Working Group as “the right care to the right patient at the right time for the right price” and expressed as “the physical health and sense of well-being achieved relative to the cost” of healthcare interventions. The costs of these interventions are related to the total resources used, whether expressed in economic or monetary terms or otherwise.
In theory, value is a relatively simple concept. In practice, measuring value, especially in health care, is difficult. The Sectoral Strategies Working Group alluded to this, noting that measuring health benefits and healthcare costs is particularly challenging and that there may be substantial variability between the perspectives of individuals and those of the general population. Indeed, value is often in the eye of the beholder. This paper is intended to supplement the previous IOM Roundtable work on value and focuses on stakeholder perspectives of key decisions that must be made during the life cycle of a pharmaceutical product.
Overview of Decision Making and Key Decisions
The conceptual framework of making decisions under uncertainty and with incomplete information has been studied in the business world since the 1950s (Grayson, 1960).Eddy (1990) has described a similar framework for medical decisions and suggested that they consist of two components: scientific judgments and preference judgments (Figure 3-4).
Scientific judgments involve analysis of the scientific evidence on benefits and costs for each decision option. To the extent possible, scientific judgments are objective, analytical processes—a left-brain activity. Analysis of evidence is done by scientists who rely on established rules of evidence and who can generally reach consensus. In contrast, preference judgments involve personal values and preferences and are more of a subjective process—a right-brain activity. The stakeholders who ultimately make decisions about pharmaceutical products may not always be scientists and may not always have a goal of reaching consensus with others. Their decision reflects a combination of their interpretation of the scientific evidence and their own personal preferences.
Teutsch and Berger (2005) have developed a similar framework for medical decisions and added other variables such as budget constraints, equity, and acceptability to the preference judgment component. They refer to the scientific component as evidence synthesis and the preference component more generically as evidence-based decision making. Health technology assessment (HTA) agencies refer to the scientific component as the assessment phase and the preference component as the appraisal phase. By and large, however, these different descriptions refer to the same overall framework. While much of the debate about the quality of medical decisions has been focused on improving the scientific evidence base, there has been relatively little debate about how decisions are made, what evidence is necessary for specific decisions, and what role individual and societal preferences play in those decisions.
There are many decisions in the life cycle of a pharmaceutical product, but this paper focuses on only four: (1) the investment decision to advance a product in development from Phase 2 to Phase 3; (2) the regulatory decision to approve a product for marketing; (3) the decision to adopt and subsequently allow use of a product in a patient population; and (4) the treatment decision to prescribe a product for an individual patient (Figure 3-5). The stakeholders for each of these decisions are the product developers, the regulatory agency, the payers and their intermediaries, and the patients and their physicians, respectively.
Pharmaceutical companies make many decisions during drug development. Examples include “go/no-go” decisions for first advancing products into humans (Phase 1 studies, usually in normal volunteers), for determining the dose range and early indicators of efficacy (Phase 2 studies in selected patients), and for determining safety and effectiveness in large groups of patients (Phase 3 studies). Investment costs and complexity increases with each subsequent phase, with the greatest increase in costs occurring in Phase 3. Therefore one of the key development decisions for a pharmaceutical product is the decision to advance the product from Phase 2 to Phase 3. While many factors are taken into account, advancement decisions are based on opportunity costs for the development portfolio and are often informed by financial calculations such as expected net present value (eNPV), which is a metric that represents how much value will result from an investment. The calculations for eNPV are based on forecasts for revenue and expenses over the lifetime of the product. Revenue and expenses that occur in the future are discounted back to the present according to standard accounting practices.
The decision to advance a product from Phase 2 to Phase 3 also depends on estimates of the other key decisions already mentioned, that
is, the probability of technical and regulatory success, the probability of adoption and subsequent use by payers, and the probability that if the product is made available by payers, physicians will utilize it. Historically, financial calculations have been driven mostly by the probability of technical and regulatory success. In recent years there has been an effort to provide more granular input on adoption and diffusion. Indeed, we have recently done simulation modeling on the impact of policies that might restrict adoption and diffusion (such as coverage with evidence development) on eNPV. Coverage with evidence development (CED) is a federal program administered by the Centers for Medicare and Medicaid Services that requires additional data collection as a condition of coverage for national coverage decisions. CED restricts coverage to patients enrolled in the study—the decision for covering other patients is delayed until the new evidence is available. Our unpublished results suggest that this type of policy may significantly lower the eNPV of Phase 2 products, therefore emphasizing the importance of understanding the evidence required for adoption decisions at the time of marketing approval.
The overall goal of incorporating more granularity in investment decision inputs is to provide more accurate eNPV estimates and minimize the risk of either developing a product that companies cannot sell (false positive) or stopping development of a product that is beneficial for society (false negative). I think all pharmaceutical companies are attempting to adapt to this new evidence-based environment by making smarter development decisions earlier. However, they will require better eNPV estimates, which in turn will require better forecasting ability for marketing approval and, importantly, for adoption and diffusion by payers.
The evidentiary requirements for a given product label and the subsequent marketing approval decision by a regulatory agency are relatively predictable. Regulatory agencies put a lot of emphasis on Phase 3 study results in their marketing approval decisions. To better ensure that the evidence from Phase 3 studies will meet regulatory requirements, pharmaceutical companies hold meetings with regulatory agencies near the end of Phase 2 to discuss the type and strength of evidence needed in Phase 3. When the studies have been completed and the results are known, the agency may use an external expert advisory committee to provide advice if the agency anticipates questions surrounding interpretation of the results. While the regulatory approval process is not perfect, it is generally predictable.
The evidence requirements for regulatory agencies often differ from those of payers, patients, and physicians. Regulatory agencies typically focus on clinical value and not economic value. They are usually willing to
trade off external validity for internal validity—hence, their focus on data from randomized clinical trials. Their research question is often focused on whether a product is safe and effective, which may not require an active comparator to answer.
Adoption decisions are not as predictable as regulatory decisions, primarily because payers do not have clearly defined evidence requirements. Historically, pharmaceutical companies have proactively received informal input from payer advisory boards on developmental compounds for the purpose of Phase 3 study planning, but there is often considerable variability in the input from both within a given health plan and between health plans. Recently the National Institute for Health and Clinical Excellence (NICE) in the United Kingdom has begun a formal consultation service for pharmaceutical companies that is based on the end of Phase 2 meetings with the regulatory UK approval agency. This program is in its infancy, and it is too early to know whether this process will result in better predictability for adoption decisions, but it is a step in the right direction. In part, the uncertainty about the predictability of the process is because the evidentiary requirements identified in the process are nonbinding but done in “good faith.” Some of the larger payers and health plans in the United States are just beginning to think about a more formal process for determining the evidence necessary for adoption decisions and various benefit designs, but they have not yet come as far as NICE.
Better predictability of adoption decisions will depend in part on better-defined evidence requirements, that is, the left side of Figure 3-4, but there remains significant variability in the preference (or appraisal) component of adoption decisions, that is, the right side of Figure 3-4. Indeed, the preference component of adoption decisions may have more of an impact on predictability than the evidence itself. Few studies have formally addressed this issue, but we know that there are areas for potential improvement. For example, there are no generally accepted guidelines for how pharmacy and therapeutics (P&T) committees should approach adoption decisions, and very few plans have formal orientation and training for committee members. This issue has implications for patients choosing a health plan: they will want to know not only whether the medicines they need are on the formulary but also whether a new product they may need in the future is likely to be available based on their understanding of the P&T committee’s decision-making process. We have a project in progress that uses a modified RAND Appropriateness Criteria approach to assess how a group of adoption decision experts rate hypothetical scenarios where the quality and strength of scientific evidence on benefits, harms, and cost are varied. The
lack of studies in the preference or appraisal component of adoption decisions makes it ripe for additional research.
One specific issue related to adoption decisions has to do with “specialty pharmaceuticals.” These are typically injection and infusion therapies with a high cost (>$5,000 per year). The evidence requirements for the scientific assessment component of these adoption decisions is no different than for other pharmaceuticals, but the preferences and values of decision makers may differ. Health plans have responded to specialty pharmaceuticals by shifting a percentage of the cost directly to the patient in the form of a co-insurance copayment (Tier 4). This class of drugs is growing (3 percent Tier 4 in 2004 [Kaiser Family Foundation and Health Research and Family Trust, 2004] versus 7 percent in 2007 [Kaiser Family Foundation and Health Research and Family Trust, 2007]) and may create a situation in which many Americans face a choice of no medication or possible financial ruin (Kolata, 2008). This situation is an affordability issue that is independent of value.
Sackett and colleagues (2001) have defined evidence-based medicine as the integration of best research evidence with clinical expertise and patient values. Ideally, this should be standard for treatment decisions. Formal incorporation of patient values and preferences is rarely done but can be important. Fraenkel (2008) has shown that preferences may be important in selecting treatments for rheumatoid arthritis. Treatment decisions made by African-American patients were more likely to be based on preferences regarding adverse events, particularly rare, catastrophic adverse events, whereas treatment decisions made by Caucasians were more likely to be based on preferences regarding benefits. Preferences may also impact incremental cost-effectiveness ratios. Meltzer and colleagues (2003) have shown that patients’ self-selection, based on their own treatment preferences, changed the incremental cost-effectiveness ratios for aggressive glucose control among diabetic patients from above the threshold for “good value” to coming within the range of “good value.”
One challenge for individual patient treatment decisions is the application of population averages from study results when heterogeneity of treatment effects exists (Fraenkel, 2008). In Figure 3-6, photographs of 24 individuals are represented by the 12 pictures in the periphery of the figure—each picture is a digital composite of two people. The picture in the center is a digital composite of all 24 individuals and is analogous to the average results from a clinical trial where important differences existed between patients. The study results would be applicable to a given patient only to the extent that the patient was like the “average patient.” This problem of averages makes
it difficult in practice to get to the ideal of “best research evidence” as proposed by Sackett.
There are generally two approaches to reducing uncertainty around the heterogeneity of treatment effects. One is the use of genotyping and bioassays to reduce uncertainty at the individual patient level. The second is the use of subgroup analysis or actuarial diagnostics to reduce uncertainty at the subgroup level. Both have important roles, and it is expected that their use will increase.
The value of pharmaceuticals may be assessed and appraised differently depending on the type of decision being made and the preferences of the stakeholder making the decision. The type of evidence required for the decision and the stakeholder’s tolerance for uncertainty surrounding the evidence may also vary according to the decision being made. This implies that, ideally, evidence generation should be specific to the decision
and to stakeholder requirements—that is, decision-based evidence making should precede evidence-based decision making. In practice, this approach works relatively well for some types of decisions (regulatory), but there is still work to be done in understanding the evidence needs of payers making adoption decisions and the evidence needed to better inform individual treatment decisions. Investment decisions by pharmaceutical companies depend, in part, on their ability to predict regulatory, adoption, and treatment decisions. As we develop policies that balance the need for cost control with society’s desire for broad access to new, innovative medical treatments, it will be important for pharmaceutical companies to be able to better predict these other key decisions so that they can make smarter investment decisions earlier in the drug development process. This will require pharmaceutical companies to work closely with health plans and payers during early development and to better understand their requirements for evidence. Finally, individual treatment decisions could be improved by better incorporating individual preferences and heterogeneity of treatment effects into the decision.
APPROACHES TO ASSESSING VALUE: PERSONALIZED DIAGNOSTICS
Ronald E. Aubert, Ph.D., M.S.P.H., and Robert S. Epstein, M.D., M.S., Medco Health Solutions, Inc.
The mapping of the genome in 2003 and the dissemination of more efficient and less costly technology to detect DNA sequences led to a rapid series of completed genome-wide association studies (GWAS) and pharmacogenomic evaluations. The GWAS not only brought potential new targets for drug development, but also brought new diagnostics to more quickly and easily determine genetic predisposition to both disease and drug response. Because many of these new diagnostics were not deterministic (i.e., neither 100 percent associated with a given condition nor 100 percent predictive) but probabilistic in nature, their uptake in the clinic was not immediate. Additionally, studies to determine their impact on the natural history of disease or even treatment outcome were rarely assessed, leaving clinicians unclear about their relative value. These are frequently referred to as clinical utility studies.
Equally absent from the dialogue has been the perspective of payers, who bring their own determination of value. Along with the need for clinical utility, payers are anxious to evaluate value, balancing both cost and outcome differences in a trade-off. Because many of these diagnostics range in retail price between $200 and $3,000 (Human Genome Project, 2008), payers need to understand what downstream value they are receiv-
ing for coverage, in terms of either clinical improvement or cost avoidance or both.
The purpose of this paper is to outline some of the key methodological considerations in determining the value of personalized diagnostics. Although these are not necessarily different methods from those used to evaluate other healthcare technologies, there are nuances in personalized medicine that make some of these key considerations more or less challenging, and these are highlighted and explored. Also, while there is no single method to necessarily determine value, transparency around study approach and careful consideration of key methodological questions would make these value determinations more relevant to decision makers.
Perspective of the Evaluation
The first consideration for the conduct of these studies is to predetermine the perspective of the decision maker for the knowledge to be gleaned. This drives which end points are to be considered and also under what time period the evaluation will be conducted. For example, if the perspective is to be that of self-insured employers and the candidate diagnostic is to be used among their actively working population, the employers may be interested in trading off the diagnostic-associated costs with an understanding of total healthcare costs avoided (e.g., doctor visits, drug costs avoided or increased, ER visits, hospitalizations); differences in absenteeism, “presenteeism,” and short- and long-term disability; and any other metric that influences their bottom line. If however, the decision maker is the caregiver of an elderly patient, the caregiver may be interested in examining not only the relative clinical benefits that accrue by virtue of testing, but also the impact of those benefits on caregiver burden and the accompanying savings in terms of cost and human burden that can accrue. A listing of decision-maker perspectives and candidate end points is provided in Table 3-4.
The most challenging area for this issue within personalized diagnostics is the value assessment of those tests that determine the relative risks of developing a chronic illness in the far future (i.e., predisposition testing). The determination of ultimate value would have to explore the long-term natural history impact of testing versus not testing, but the perspective for many decision makers may not be consonant with this type of long-term outcome avoidance. For example, there are now personalized tests that can provide the long-term probability of developing Alzheimer’s disease in patients with mild cognitive impairment (Shaw et al., 2009). The value for an employer to cover the costs of these tests would have to trade off the incremental cost with the benefit of providing the probability to the employee. If the hypothetical value might include a hypothetical treatment in 20 years, the value might be the avoidance of all the current symptomatology and ensuing
TABLE 3-4 Listing of Perspectives and Potential Value End Points
Direct Medical Costs
Indirect Medical Costs
costs associated with Alzheimer’s care as we experience it today. However, this could be a 40-year net benefit on a test conducted and reimbursed today, with uncertainty around the probability from the test itself, the value of the hypothetical treatment, its costs, and the time horizon. All of this might make the value equation, even if conducted properly, irrelevant to the employer who may not necessarily bear the financial risk for this employee in 40 years. On the other hand, for the consumer, the net benefits could be improved peace of mind, improved quality of life, or even an influence on life planning. All of these considerations could impact the study design.
Study Designs and Sensitivity Analyses
The value of personalized diagnostics can be determined from a variety of study designs commonly considered in healthcare economic evaluations (Table 3-5).
Most decision makers in the United States are interested in cost-benefit analysis, where both the costs and the benefits are stated in economic terms. For example, the American Enterprise Institute (AEI) and the Brookings Institution published a study of the value of warfarin testing from the perspective of national healthcare costs (i.e., not a particular payer and excluding indirect and intangible costs) (McWilliam et al., 2006). Warfarin, a blood thinner, can cause significant adverse effects, including strokes and hemorrhaging. They estimated the costs associated with genotyping and forecasted an expected savings from avoiding 18,000 strokes and 85,000 serious bleeds, with assumptions and sensitivity analysis provided. This led to a base case assumption of $1.1 billion saved netting out the projected costs of testing, with the sensitivity analysis ranging between $100 million and $2 billion in savings.
What is important about this paper is that all the estimates are made explicit, and the sensitivity analyses allow for modifying these assumptions.
TABLE 3-5 Study Designs of Value in Personalized Diagnostics
Weighs the total expected costs of the intervention against the total expected benefits (both costs and benefits are estimated in dollars)
Costs are estimated in dollars, but benefits are estimated in terms of outcomes such as years of life gained or premature deaths averted
A special case of cost-effectiveness in which costs are estimated in dollars but benefits are measured in full health lived and expressed in quality-adjusted life-years or disability-adjusted life-years
For example, their base case assumes a 50 percent reduction in strokes with genotyping. However, one of their sensitivity analyses provides a revised net benefit estimate if only 5 percent of strokes are reduced ($487 million). Likewise, they provide a projected net benefit if the assumed bleeding rates are reduced by only 5 percent ($387 million in projected savings). The transparency around study design and aspects therein allows for interpretation by the particular decision maker. Interestingly, even with this relatively straightforward type of study design, Zarnke and colleagues found that 68 percent of published cost-benefit papers did not use standard methods of assessment in their research and more than 50 percent were incomplete (Zarnke et al., 1997).
What can be confusing are conflicting study designs that provide opposite apparent answers to the same research question. For example, a recent cost-utility assessment of warfarin testing came to the conclusion that warfarin genotyping was not cost-effective (Eckman et al., 2009). In this study, the authors took the perspective of society in that they valued the benefits not only in terms of costs avoided, as the AEI-Brookings Institution had done, but also the quality impact of having had these events as expressed by quality-adjusted life-years. Since they too were transparent about their assumptions, the opposite conclusion from this study can be readily examined. For instance, they assumed in their base case a one-month benefit from genotyping. Thus, they did not consider avoided bleeds or strokes that may occur after a month of therapy. Had they considered even three months as a reasonable time frame for assessing net benefit, their paper shows that warfarin genotyping would be cost-effective. Additionally, they estimated costs accrued to genotyping for delaying initiation of therapy until the genotype test results were completed. Most clinicians would start the loading dose when the need occurred and not wait for genotyping. This biased against genotyping in their model. Last, the expected incidence of bleed or stroke was highly conservative, so the avoidance of events was
minimal even if they decreased by 50 percent. Varying any of those assumptions would have made genotyping cost-effective.
Since base case assumptions for either study drove their overall conclusions, this is a worrisome issue for personalized diagnostics. The study design in its execution can provide only an estimate of net value; what is most important is transparency about the study design itself, all assumptions, and all sensitivity analyses. The decision maker should be allowed to make a judgment based on his or her particular inputs or perspective; thus, overall conclusions could vary. So for evaluating the value of warfarin genotyping, academic medical centers that enforce frequent international normalized ratio (INR) testing—a test to monitor the effects of warfarin—may ultimately not find genotyping of net value since their avoidable event rates may already be quite low, whereas rural healthcare centers where frequent INR testing is not practical may find it extremely valuable.
Scenarios, Populations, and Subgroups
There is generally a need to evaluate at least two scenarios: the treatment patterns and outcomes as they exist today and the changed environment in which a new personalized diagnostic would be used. This entails understanding the natural history of the condition or situation that exists today without personalized diagnostics, on which the future presumed state is layered.
For personalized diagnostics, this can be particularly challenging. Take, for example, the current diagnosis and management of Type 2 diabetes. Let us assume the base case examines a population over age 40 which undergoes routine screening for diabetes and then manages it once diagnosed. There is a predictable natural history that has been well elucidated on which to model the costs and routine interventions from pre-diabetic all the way to frank diabetes as well as the probabilities of various ensuing complications. If we were to examine the role of a new gene marker that is highly predictive of developing Type 2 diabetes, the scenario under which it is used might consider incremental costs for broader population screening at an earlier age than usual (since predictive markers might signal an even earlier scheme of lifestyle change), earlier interventions that could be diet or medication related, some estimated number of diabetic cases avoided through this genetic screening, at what age, and so forth. The comparison of these scenarios can be upended with personalized diagnostics, because personalized diagnostics can change our view of when to intervene with “typical” patients. Standard comparative evaluations today generally start at the same point—when the person is diagnosed.
Subgroups pose another challenge for personalized diagnostics. Take the example of K-ras mutations and the drug cetuximab. The value of the drug in providing two extra months of life was originally determined from clinical trials of metastatic colon cancer patients but without respect to a biomarker (Jonker et al., 2007). As genomic information was gleaned that the drug may not work for patients with K-ras mutations, post hoc analyses were conducted on the original clinical trial participants (Karapetis et al., 2008). The overall finding was that there was no benefit in the 30 percent of patients with mutations, leaving the results with the wild-type patients even better than those for the original overall cohort (Karapetis et al., 2008). However, to date, the Food and Drug Administration has not relabeled the drug, presumably because the patients were not stratified by K-ras status before randomization. Thus, the value of the drug is presumed to be lower overall than perhaps it should be if used in a targeted manner. Both the National Comprehensive Cancer Network and the American Society of Clinical Oncology guidelines suggest genomic testing despite the unchanged label, but this provides an example of value as determined in subgroups and the controversies in determining causation.
The impact of the time horizon on the value proposition is important and again challenging, particularly for personalized diagnostics. As described earlier, for those diagnostics that are pre-disposition tests, the value would have to acknowledge the downstream effects, which may occur so far away in the future that the decision maker is essentially indifferent. Alternatively, as also illustrated earlier in the study of the cost utility of warfarin, the time horizon of benefit could be truncated into such a short time period that there would not be enough events witnessed for any benefit to accrue.
For pharmacogenomics, the time horizon for value estimation relates to the downstream influence of the biomarker on the selection, dose, or duration of therapy and its ensuing outcomes. If, for example, value were to be determined for tamoxifen users with breast cancer, the time horizon from the 10-year outcomes trials where cytochrome P-450 2D6 metabolism status was related to breast cancer recurrence could be used (Goetz et al., 2007). If the 10 percent of women who were poor metabolizers of tamoxifen were assumed to be tested by year 1 and switched to an aromatase inhibitor, their outcomes could be estimated to be what has been shown with aromatase inhibitors. The 10-year calculated costs of recurrent cancers avoided would be compared to the increased costs associated with testing and incremental drug costs (since branded aromatase inhibitors are more expensive than generic tamoxifen).
The imputation of cost is probably one of the most important aspects of value estimation for most decision makers. Which costs matter is tied to the perspective (e.g., health plans generally focus on direct medical care costs, employers may also value work loss, consumers may also value quality of life). This does not differ for evaluations of personalized diagnostics. All of the usual methodological considerations for estimating direct, indirect, and intangible costs would apply in the usual fashion here.
What is challenging specifically for this field is to estimate a fixed cost of particular tests, given the accelerated pace of improvement in technology with associated reductions in prices for many of the tests. So while whole genome sequencing was $300 million just six years ago, it is predicted to decrease to $1,000 in the next five years (Next Big Future, 2008; Wade, 2006). Also, given that patients may someday have inexpensive whole-genome scans conducted as a matter of public health in childhood (like vaccinations), there would be no incremental cost due to testing later in life because genes do not change. The remainder of costing for these value equations would focus on the costs and benefits associated with changing the natural history of the condition under consideration, not the testing fees.
Determining the value for personalized diagnostics requires the same methodological considerations for determining value as considered in other healthcare interventions. However, there are nuances for personalized diagnostics. These nuances include upending the timing and determination of when someone is ill; the implied treatment course; estimating cost for tests in a changing environment with cheaper and cheaper tests; the need for assuming value even if information is not derived from pristine, randomized controlled trials (example of cetuximab above); and special attention to the time horizon for estimated benefits and costs. The term “evidenced-based” should not be limited to the evidence derived from randomized trials alone. More studies are needed that use alternative research designs, are conducted in more typical practice settings, and enable the measurement of outcomes such as provider adoption and time to optimal therapy. This is particularly true for personalized diagnostics where the knowledge of the technology and benefits among many providers is still minimal and the interest among the consumer and payer is quickly increasing.
MEDICAL DEVICE VALUE AND INNOVATION
John Hernandez, Ph.D., M.P.P., and Parashar B. Patel, M.P.A. Boston Scientific
Modern society places a high value on the advancement of medical technology, and stories of medical device innovations extending and improving patients’ lives are celebrated as modern miracles. Yet such devices including implantable defibrillators, drug-eluting stents, and cochlear implants are also visible embodiments of medical technology advancements that are criticized for driving the high costs of medical care (Newhouse, 1992).
Other criticisms levied at medical device technologies relate to gaps in clinical evidence and regulatory requirements, clinical uses for unproven indications, the need for comparative effectiveness research trials, and questions about the affordability and value of new technologies, among others (Deyo and Patrick, 2005; Kessler et al., 2004).
Despite, or perhaps arising from, these criticisms, profound changes have taken place in the medical device arena over the past decade, including increased emphasis on evidence-based medicine. Device innovators have stepped up to the challenges raised by regulators, payers, professional societies, and technology assessment organizations to rigorously demonstrate the clinical and economic value of their therapies. While additional reforms remain ahead, many proposed solutions are in place or rapidly emerging.
Close examination of key medical device inventions reveals that in many respects, their development has represented a “vindication of the scientific method” (Mueller and Sanborn, 1995). Early serendipitous discoveries and the synthesis of advancements across diverse disciplines often proceed in a nonlinear and discontinuous pattern, eventually resulting in the development of beneficial new technologies and procedures. Despite the perception of rapid device development and proliferation, innovation frequently requires decades of research and development before devices are first made available to patients. In many cases, physician pioneers and device inventors must overcome conventional wisdom and resistance by the medical establishment before new approaches are even considered. Only after new devices are introduced into clinical practice does their use spawn the iterative process of technological, procedural, and other clinical practice improvements that continuously improve quality and outcomes.
A large body of rigorous research evidence using a wide range of designs (appropriate to the stage of technology evolution and the nature of research questions at the time) has demonstrated that many of these device innovations provide both clinical and health economic benefits. Increasingly, randomized controlled device trials form the evidentiary standard for regulatory approval, reimbursement, and professional adoption.
That said, there are examples, such as cochlear implants, where experts widely agree that randomized controlled trials to demonstrate efficacy would have been unethical. Large-scale registries and surveillance studies are becoming the norm to identify safety concerns and track real-world patient outcomes. While it is inherently challenging to gauge the economic value of device technologies at the earliest stages of their life cycle, many studies have demonstrated cost-effectiveness and sometimes cost savings.
Despite progress in ensuring that medical technology innovations are grounded in solid evidence, the research agenda will always remain unfinished and health policy makers should recognize that there are limits to the clinical evaluative process (Gelijns et al., 2005). We encourage attention to the potential impact of additional reforms on medical innovation. Ultimately, device innovators need a predictable framework to foster new innovations that benefit patient care.
Clinical Benefits of Medical Devices
Sometimes lost amidst criticisms of medical device industry shortcomings are the many proven, evidence-based, and often dramatic benefits of medical devices in extending and improving the lives of millions of patients. The following are examples of medical device inventions that have revolutionized the treatment of a variety of deadly or debilitating conditions.
Cardiac pacemakers: Patients developing complete heart block experience repeated syncopal episodes leading, in many cases, to cardiac arrest. Although isolated reports of artificial pacemakers being used to successfully resuscitate patients from cardiac arrest date back to the 1920s, they were ignored as completely impractical until advances in electronics coincided with the new era of open heart surgery. The introduction of the first cardiac pacemakers by Paul Zoll, Wilson Greatbach, and others in the 1950s and 1960s built on decades of earlier research, enabling the pacemakers to sustain the lives of patients experiencing acute heart block episodes that would otherwise have been fatal. Huge advances in pacemaker technology, including fully implantable devices, interactive sensing and pacing algorithms, and remote device monitoring, have since taken place that vastly improved patient outcomes (Jeffrey, 2001).
Implantable defibrillators: Sudden cardiac death represents a serious national health problem, accounting for more than 300,000 deaths annually—or 13 percent of all natural deaths—because of deadly ventricular arrhythmias (Myerburg et al., 1993). The development of the implantable defibrillator by Michel Mirowski and
his research team from 1969 through 1980 built on earlier success with cardiac pacemakers, but required major research advancements to create implantable devices that could shock and restart the heart’s normal rhythm after episodes of ventricular fibrillation. Implantable cardioverter defibrillators (ICDs) save one life for every 3 to 11 patients implanted by reducing sudden cardiac death and mortality among implanted patients (Camm et al., 2007). More than a dozen landmark randomized controlled trials involving more than 8,000 patients have shown reductions in overall mortality of 23 to 55 percent with ICD therapy compared to optimal medication therapy (Ezekowitz et al., 2007). ICDs have also benefited from technology advancements including pacing capabilities to prevent deadly arrhythmias, algorithms to resynchronize multiple chambers of the heart, endocardial defibrillation leads, remote device and patient management capabilities, and smaller implants with extended battery longevity (Jeffrey, 2001).
Cochlear implants: Severe, profound deafness imposes a tremendous burden on both the hearing impaired and society, and afflicts an estimated 500,000 Americans. The deaf require specialized schooling and costly social welfare services, and they are the lowest-wage earners of all disabled patients. Even with specialized support, most never graduate high school, and graduates on average attain only a third grade reading level (Parisier, 2003). Beginning in the mid-1950s, William House and others undertook research and development that, over three decades, finally culminated in the FDA’s approval of the first cochlear implant in 1984 (Foote, 1992). Substantial obstacles needed to be overcome, including lack of research funding by the National Institutes of Health (NIH) and professional society refusal to publish research results due to concerns over ethics and long-term effects of permanent implants. While the first-generation single-channel implants provided very limited hearing benefits, technological advancements over the next two decades— including multichannel and bilateral implants with sophisticated speech processing algorithms—have been so dramatic that cochlear implants are now widely described as a miracle treatment. Prelingually deaf children can now be implanted at 12 months of age, enabling them to participate fully in the hearing world. Studies have demonstrated that patients with cochlear implants can hear and understand speech in challenging listening environments that approach the levels of their normal-hearing counterparts (Cheng and Niparko, 1999; Cheng et al., 2000a).
Percutaneous coronary interventions, including drug-eluting stents: Percutaneous coronary interventions (PCIs) subsume a number
of catheter-based procedures—including coronary balloon angioplasty and stenting—that are used to clear narrowed or blocked coronary arteries. Coronary artery disease remains the leading cause of death in both the United States and Europe, and it imposes major costs on society. When percutaneous transluminal coronary angioplasty (PTCA) was introduced by Dr. Andreas Gruentzig in 1976, its viability was met with widespread skepticism in the physician community even though cardiac catheterization procedures had already become common after their introduction in the 1940s. Yet since commercialization in 1980, balloon angioplasty and coronary stenting have revolutionized cardiology by enabling effective, minimally invasive treatments and becoming the dominant form of coronary revascularization (nearly 3 to 1 over coronary bypass grafting) for patients with coronary artery disease (Mueller and Sanborn, 1995; Smith et al., 2001). Multiple generations of technology improvements have led to greatly improved results, and PCI devices have now been studied in more rigorous clinical studies than any other medical intervention in history. Patient outcomes have been carefully tracked since the inception of the procedure in NIH-sponsored clinical registries (Detre et al., 1988; Williams et al., 2000), and a large number of randomized controlled trials rigorously demonstrated benefits of PTCA, coronary stents, and drug-eluting stents. For example, 39 randomized trials including more than 16,000 patients showed clinical superiority of stents over PTCA (Hill et al., 2004), and 19 randomized trials involving 9,000 patients showed clinical superiority of drug-eluting stents over traditional stents (Roiron et al., 2006).
Neurovascular coiling: Ruptured brain aneurysms are devastating events that have an extremely poor prognosis, with a one-year mortality rate of 50 percent, and an additional 30 percent suffering permanent neurological and cognitive deficits (Lindberg et al., 1992; Sacco et al., 1984). Neurovascular coils for occluding intracranial aneurysms were developed by Guido Guglielmi in the 1980s and commercially introduced in 1995 as a minimally invasive alternative to open neurosurgery (Guglielmi, 1997). Originally developed to provide a treatment option for patients at high risk for surgery, coiling has now largely supplanted surgical clipping as first-line treatment for these patients. The randomized International Sub-arachnoid Aneurysm Trial found that the minimally invasive coiling treatment reduced mortality and significant disability by 23 percent compared to surgery, leading to major changes in clinical practice (Derdeyn et al., 2003; Molyneux et al., 2002). Since its inception, coiling embolization has evolved through clinical experience,
with improved patient selection and introduction of technological improvements. Technological advancements include introduction of new coil sizes and shapes, microcatheters, bioactive coils, and new detachment mechanisms that have improved outcomes.
Cardiac ablation: Numerous heart arrhythmias can now be cured using minimally invasive catheters to ablate damaged heart tissue using radio-frequency energy, with success rates of around 90 percent versus less than 40 percent for treatment with medical therapy (Blomstrom-Lundqvist et al., 2003; Center for Devices and Radiologic Health, 2002). When introduced in the early 1990s for specific atrial arrhythmias, radio-frequency catheter ablation was shown to be so much more effective than alternative treatments that randomized controlled trials were determined to be unethical. When catheter ablation was later studied for broader application to treat atrial fibrillation, multiple randomized trials showed huge benefits of ablation over drugs, improving the treatment prognoses for these patients (Nair et al., 2008; Noheria et al., 2008). A series of additional head-to-head randomized trials of different ablation approaches have subsequently refined evidence and practice consistent with FDA recommendations (U.S. Food and Drug Administration, 2008b).
These and many other highly efficacious device-based treatments— including prosthetic heart valves, artificial joint replacements, advanced imaging technologies, wound management devices, and implantable neurostimulators for a variety of conditions—form the basis for the often described explosion of medical technology innovation that has extended and improved quality of life for millions of Americans.
Medical Device Regulatory Trends
Regulation of medical devices represents an inherently complex challenge that has evolved dramatically over the past three decades since Congress provided the FDA with broad authority to do so in 1976. Congress and experts have explicitly concluded, after years of study and oversight, that no single regulatory evidentiary standard is appropriate to encompass the vast diversity of devices ranging from simple and ancillary devices (e.g., bandages, splints, surgical drapes) to extremely complex permanent implants (e.g., cardiac pacemakers, defibrillators, cochlear implants). Instead, Congress consciously adopted a flexible, tiered standard that provided the FDA with substantial regulatory discretion to develop valid evidence requirements to ensure that devices are safe and effective and to adjust requirements for specific devices based on expert
input reflecting current scientific standards and knowledge (Advanced Medical Technology Association, 2008; Feigal et al., 2003; Merrill, 1994; Munsey, 1995).
Congress has closely overseen the FDA regulatory process for medical devices and enacted major legislative reforms over time to expand and modernize the agency’s regulatory framework in a manner that protects public health while enabling access for patients to beneficial new device technologies. The FDA has developed processes that require evidentiary development both before and after regulatory approval.
Although some critics have questioned the degree of clinical evidence required by the FDA to establish the safety and effectiveness of medical devices, Congress has repeatedly and concertedly rejected adoption of the same regulatory standards as applicable for drugs. Instead, recognizing the important differences between drugs and devices, Congress provided the FDA with discretional authority to develop and adjust requirements, based on input from well-qualified experts, well-controlled clinical trials, and other valid scientific evidence for devices (Advanced Medical Technology Association, 2008). Over time, the FDA has moved toward requiring randomized controlled trials for many high-risk devices as well as expanding clinical trial requirements for some 510(k) devices before approval. This has contributed to a large number of randomized device trials. Issuance of a series of FDA Guidance Documents has further documented the evolution of randomized controlled trial requirements for approval of various device types, including drug-eluting stents (U.S. Food and Drug Administration, 2008c), cardiac ablation devices (U.S. Food and Drug Administration, 2004a, 2008b), vertebroplasty devices (U.S. Food and Drug Administration, 2004b), and total artificial disks (U.S. Food and Drug Administration, 2008d).
As device interventions and technologies evolve, regulatory standards have been adapted for specific devices in recognition that different types of clinical evidence have been appropriate during different stages in the development of these technologies. For example, after randomized controlled trials demonstrated benefits of coronary stents versus alternative treatments in 1994, the FDA began accepting randomized equivalency trials demonstrating equivalency to approved stents rather than comparisons to angioplasty or placebo. As technology matured, the FDA required later-generation products to document performance consistent with objective performance criteria (OPC) based on clinical evidence from single-arm studies while requiring randomized trials for new technologies such as drug-eluting stents (U.S. Food and Drug Administration, 2008a). The FDA has adopted similar OPC standards for other mature device technologies
including prosthetic heart valves and joint replacements, and evaluations have concluded that this approach is safer and more efficient for patients than randomized controlled trials (Grunkemeier et al., 2006). In other cases, such as cochlear implants, it has been widely accepted that clinical evidence from single-arm (compared to randomized controlled) trials was appropriate to demonstrate safety and effectiveness from the outset since the natural history of treatment was well understood.
Post-approval clinical studies are increasingly required by the FDA to demonstrate safety and long-term outcomes of devices in large real-world treatment populations. While the FDA has required more than 80 post-market surveillance studies of different devices since 2005, some have called for a significant expansion of post-approval studies to evaluate real-world treatment outcomes in the recognition that pre-market trials have limitations and the medical device reporting (MDR) system for adverse events has major deficiencies and provides inconsistent data. Expanding on past experiences with FDA-mandated surveillance studies in addition to post-approval studies voluntarily developed under the sponsorship of the NIH, professional societies, foreign governments, and manufacturers could provide more high-quality data to evaluate safety and effectiveness (Mehran et al., 2004).
Cross-Stakeholder Collaborative Efforts
There are many instances in which landmark randomized controlled trials have been sponsored by the NIH and other government agencies to strengthen evidence regarding the comparative effectiveness of devices. Examples include NIH-sponsored randomized trials of implantable defibrillators (MUSTT, AVID, SCD-HeFT) (Camm et al., 2007), left-ventricular assist devices (REMATCH) (Rose et al., 2001), and deep-brain stimulators (Weaver et al., 2009), among others. In some instances, device manufacturers and independent researchers have proactively sponsored randomized trials to strengthen the evidence basis for approved indications even after widespread coverage and adoption are in place. Examples include randomized trials of spinal cord stimulation that have confirmed its efficacy and cost-effectiveness for chronic neuropathic pain (Kemler et al., 2004; Kumar et al., 2007; North et al., 2005) and trials of vertebroplasty for osteoporotic vertebral compression fractures (Gray et al., 2007; Voormolen et al., 2007). Many of these examples serve as excellent models of “comparative effectiveness” research trials that should be expanded into other areas of medicine.
In addition, the NIH and others have sponsored a variety of real-world device registries (e.g., National Heart, Lung, and Blood Institute
PCI Registries [Detre et al., 1988; Hill et al., 2004; Williams et al., 2000], Swedish National Hip Replacement Registry [Malchau et al., 2002]) to systematically track real-world clinical outcomes and support adoption of evidence-based improvements as device technology and practice evolve over time. These efforts complement registries sponsored by manufacturers and professional societies to track patient outcomes for a variety of devices. In the case of the recent NIH Wingspan Intracranial Stent Registry, data are supporting development of a definitive randomized controlled trial to rigorously evaluate clinical efficacy (Zaidat et al., 2008).
Perhaps the most profound trend impacting medical devices has been the escalation of both the evidence standards required by payers to provide coverage and the adequate funding levels needed to support clinical adoption.
As the largest payer in the world, the Medicare program exerts a huge influence on medical technology innovation. Numerous coverage decisions and other developments in the Medicare coverage process have made clear that the evidence bar has risen substantially over the past decade for reimbursement of new technologies. Since 1998, the Medicare Coverage Advisory Committee (MEDCAC) has provided expert reviews of scientific and clinical evidence in a public forum that increased the visibility of several major Medicare coverage decisions. The strength of the evidence used by the CMS in making decisions, while perhaps not yet at levels desired by many critics, is improving. Importantly, CMS declined to provide coverage in one-third of national coverage decisions from 1999 to 2007 and, when granting coverage, issued conditions in almost 60 percent of cases (Neumann et al., 2008).
One example of the increased strength of evidence required is the 2003 CMS decision to approve coverage for defibrillators only for a subgroup of the patient population studied in the Multicenter Automatic Defibrillator Implantation Trial (MADIT) II trial. This was despite having received FDA approval, unanimous approval from MEDCAC, and practice guidelines written jointly by three physician societies supporting the clinical benefits of defibrillators for the entire MADIT II population. While CMS recognized that MADIT II was a well-designed randomized trial, it was hesitant to provide broad coverage based on results from a single trial (Centers for Medicare and Medicaid Services, 2003). CMS did not cover the entire MADIT II population until results were confirmed by another large, well-designed trial (Sudden Cardiac Death in Heart Failure Trial [SCD-HeFT]).
Two other major examples of a higher evidentiary standard are CMS coverage decisions for left ventricular assist devices (LVADs) and carotid
artery stenting (CAS), both of which impose narrow coverage criteria. For example, CMS agreed to extend coverage for LVADs as “destination therapy” for end-stage heart failure patients meeting the REMATCH study criteria. However, CMS authorized coverage only at designated heart transplant facilities that have performed a threshold volume of LVAD procedures and met other criteria established by CMS (Ursula et al., 2007). For CAS, CMS restricted coverage to 24 percent of patients within the FDA-approved patient population.4 Coverage for CAS is further restricted to Medicare-certified sites and only if the site collects data on all CAS procedures performed at the site (Centers for Medicare and Medicaid Services, 2008a).
CMS has issued several other coverage decisions mandating participation in CMS-approved clinical trials or registries for devices as a condition of Medicare coverage (Tunis et al., 2007). The most highly publicized was the 2005 National Coverage Determination for implantable defibrillators— based on the SCD-HeFT trial—that led to the creation of a national ICD registry to track real-world outcomes and is being managed jointly by the American College of Cardiology and the Heart Rhythm Society. More recently, CMS has issued guidance documents explaining the rationale and conditions for requiring study participation as a condition of Medicare coverage. While coverage in the context of post-approval registries may be a desirable means to track outcomes and ensure efficient use of technology, it is important that such studies be designed to efficiently answer the important research questions that exist (Gillick, 2004).
CMS is raising the evidentiary standard through payment policy as well. Since 2001, Medicare has made available special payments under the hospital inpatient prospective payment system for new technology that demonstrates “substantial clinical improvement” and meets certain cost and other criteria. In 2002, Medicare added a similar clinical criterion for the new technology payment mechanism under the hospital outpatient prospective payment system. Transitional pass-through payments in the outpatient setting and new-technology add-on payments in the inpatient setting represent a type of pay-for-performance for new technologies. To be eligible for these payments, new technologies must be FDA approved, meet stringent cost criteria, and demonstrate a substantial clinical improvement for Medicare beneficiaries among other requirements (Centers for Medicare and Medicaid Services, 2001a,b). Technologies can meet the clinical criteria by demonstrating reduced mortality, lower rates of therapeutic interventions, reduced hospitalizations, and similar clinical outcome improvements. A recent study found that since 2001, CMS has determined that only 8 of 18 new technology add-on payment applications satisfied the substantial clinical improvement criterion. Seven of the 18 did not meet the substan-
tial clinical improvement criteria and three are still pending FDA approval (Clyde et al., 2008).
Even when new technologies meet CMS criteria, hospitals do not automatically receive “full” payments that cover the incremental cost of the new technology. In the inpatient setting, new technology payments are designed to cover, at most, only 50 percent of the incremental cost associated with the technology. In the outpatient setting, hospital pass-through payments are designed to cover the full incremental cost of a new technology. However, in many instances, hospitals have not received pass-through payments covering their actual incremental costs due to a variety of coding and billing problems, including charge compression, where hospitals typically have lower mark-ups for higher-cost devices, and lag in updating hospital billing systems.
In sum, through the use of coverage and payment policy, Medicare is raising evidentiary standards for hospitals and other providers to receive payment for using new technologies. We expect this trend to continue. For example in July 2008, CMS issued a list of potential future coverage decision topics that demonstrates its interest in revising existing coverage decisions for established treatments, including off-label use of drug-eluting coronary stents, vertebroplasty and kyphoplasty, lumbar fusion, and artificial cervical disks (Centers for Medicare and Medicaid Services, 2008b). These reimbursement trends are likely to accelerate in the future as Congress considers creation of a comparative effectiveness research entity and a host of other healthcare reforms.
Health Economic Evaluations
Considerable attention has been paid to the high costs of advanced medical technologies as a driver of medical expenditures. Yet these analyses typically do not take into account the value of technology and therefore do not answer the question of whether the expenditures are worthwhile. Research examining the overall health economic value of advanced medical technologies has concluded that they are generally worthwhile to society (Cutler, 2004).
Over the past decade, formal health economic evaluations of specific device interventions have become commonplace, with most attention being paid to assessments conducted by the National Institute of Health and Clinical Excellence in the United Kingdom. Device sponsors recognize that planning for economic assessments early in a new device’s life cycle can be a critical factor to commercial viability. While many economic studies of devices have found they are cost-effective for typical patient populations within commonly referenced thresholds of $50,000 or $100,000 per quality-adjusted life-year, the long-term value of devices may be underestimated
and improve over time as real-world experience leads to technological and other advancements. Further, economic assessments are extremely complex, with studies producing widely varying findings that are highly dependent on technical modeling decisions, including the analysis time horizon, effectiveness parameters, cost inputs, and specific patient subgroups chosen for analysis. Even rigorous economic studies have important weaknesses, and they are often difficult to compare.
Following are examples of economic evaluations demonstrating the cost-effectiveness of common therapeutic device treatments (see also Table 3-6):
Implantable defibrillators: Based on the analysis of eight landmark ICD trials on the primary prevention of sudden cardiac death, ICDs have been shown to be cost-effective (with a range of $25,000 to $50,000 per QALY) in populations where significant reductions in mortality have been demonstrated (Sanders et al., 2005). A more recent iteration of the defibrillator that includes cardiac resynchronization, otherwise known as a CRT-D, not only has been shown to be cost-effective in the COMPANION patient population, but has also been shown to reduce two-year follow-up hospitalization costs by 29 percent (Feldman et al., 2005).
Cochlear implants: Several independent studies have found that cochlear implants are cost-effective in both children and adults ($5,000 to $13,000 per QALY) (Cheng and Niparko, 1999; Cheng et al., 2000a). When indirect cost savings are taken into account, such as the reduced need for special education services when children are mainstreamed into regular classrooms, overall cost savings of more than $50,000 accrue per child (Cheng et al., 2000a).
Percutaneous coronary interventions: Numerous economic studies have assessed the cost-effectiveness of PCI technologies compared to clinical alternatives (Bakhai et al., 2003; Firth et al., 2008; Kupersmith et al., 1995). Studies of coronary stenting versus PTCA found that the higher initial costs of stents were almost completely offset by savings due to the reduced need for revascularization. More recent economic evaluations of drug-eluting stents have been marked by controversy because they demonstrate wide ranges of cost-effectiveness depending on technical modeling decisions (Firth et al., 2008).
Cardiac ablation: Several studies have demonstrated that cardiac ablation produces overall cost savings compared to chronic medical therapy for supraventricular tachycardias (SVTs), and recent economic evaluations of randomized studies have found high cost-effectiveness for atrial fibrillation (Cheng et al., 2000b; Hogenhuis
TABLE 3-6 Examples of Medical Device Cost-Effectiveness
et al., 1993; McKenna et al., 2008). Despite the higher initial costs of treatment, the elimination of symptoms and the averted need for chronic medications and associated medical care utilization leads to a reduction or neutrality in treatment costs over time.
Neurovascular coiling: Coiling of brain aneurysms has been found to be cost-saving for subarachnoid hemorrhage and cost-effective for large and symptomatic unruptured aneurysms (range of $5,000 to $12,000 per QALY) (Bairstow et al., 2002; Johnston et al., 1999).
Spinal cord stimulation: Multiple economic studies based on randomized controlled trials have found that spinal cord stimulation produces cost savings compared to conventional medical management or surgical reoperation for chronic neuropathic pain conditions (Bala et al., 2008; Kemler and Furnee, 2002).
Prosthetic hip replacements: Prosthetic hip replacements have been found to be cost-saving in younger adults and essentially cost-neutral in older adults (O’Shea et al., 2002).
Given the reality of current budgetary and cost constraints, it appears likely that rigorous economic evaluations of new medical technologies will become a permanent fixture in the healthcare arena. However, the many limitations of economic assessment methods should preclude their use as a mechanistic tool to guide reimbursement and funding decisions. Real dangers can result from the use of cost-effectiveness modeling as a blunt instrument for coverage and adoption of new technologies that could ultimately undervalue the benefit of these innovations and restrict the development of breakthrough technologies that are highly beneficial to society and patients.
Evidence and Innovation in Medical Devices
Evidence-based medicine trends have had a profound impact in the medical device arena over the past decade. Device innovators recognize the need to rigorously demonstrate clinical and economic value, and compelling evidence has demonstrated beneficial outcomes in many areas. Calls for additional high-quality evidence from randomized controlled trials have been answered by a dramatic increase in pre-approval and post-approval randomized studies for therapeutic devices. Many of these studies represent real-world examples of comparative effectiveness research that can serve as a model for future studies (Tunis et al., 2003; Wilensky, 2006).
Expanding the use of post-market clinical registries can also provide the additional evidence needed for safety surveillance and tracking patient outcomes in real-world treatment environments and patient populations. Recent actions by Medicare indicate that reimbursement may be increas-
ingly tied to participation in such studies, and this appears to be a promising concept in certain instances if research can be efficiently and adequately designed to answer critical questions regarding clinical value.
As pressure has grown for rational prioritization in health care as a means to control spending, economic evaluations have greatly increased for high-cost device technologies. While there is broad understanding that interventions should be worth their costs to society, methods for assessing economic value remain immature and we caution against simplistic use of blunt instruments such as cost-effectiveness in reimbursement and funding decisions.
Methods to assess the clinical and economic value of device interventions must take into consideration the nature of innovation in the medical device arena. For example, newly developed procedures may not be ripe for a fair assessment since the procedural technique may still be undergoing refinement. Similarly, there may be only a small cadre of skilled and experienced physicians performing the intervention. On the other hand, waiting until the technology matures may result in faster dissemination than desired by policy makers, particularly among populations that may not receive the greatest clinical benefits.
We recognize that policy makers must assess and refine methods to determine the value of all types of treatment modalities, including device interventions. The goal is to provide comparative information to clinicians, payers, and patients. However, ultimately, medical innovators need a predictable and reasonable framework in order to support development and commercialization of new medical devices in a society that still values technology advancement. While the bar continues to be raised in terms of clinical and economic evidence, we caution that the desire for additional evidence from clinical trials will always outpace our ability to perform them (Gelijns et al., 2005). Without proper application by policy makers to tailor requirements for different devices, there will be longer development time lines, reduced innovation, and fewer treatment options for patients.
Advanced Medical Technology Association. 2008. The 510 (k) process: The key to effective device regulation. P. 25.
AHRQ (Agency for Healthcare Research and Quality). 2006. Inpatient quality indicators composite measure. Draft technical report. http://qualityindicators.ahrq.gov/news/AHRQ_IQI_Composite_Draft.pdf (accessed June 27, 2007).
Appleby, J. 2008. The case of CT angiography: How Americans view and embrace new technology. Health Aff (Millwood) 27(6):1515-1521.
Bairstow, P., A. Dodgson, J. Linto, and M. Khangure. 2002. Comparison of cost and outcome of endovascular and neurosurgical procedures in the treatment of ruptured intracranial aneurysms. Australas Radiol 46(3):249-251.
Baker, L., H. Birnbaum, J. Geppert, D. Mishol, and E. Moyneur. 2003. The relationship between technology availability and health care spending. Health Aff (Millwood) Suppl Web Exclusives:W3-537-551.
Baker, L. C., S. W. Atlas, and C. C. Afendulis. 2008. Expanded use of imaging technology and the challenge of measuring value. Health Aff (Millwood) 27(6):1467-1478.
Bakhai, A., G. W. Stone, C. L. Grines, S. A. Murphy, L. Githiora, R. H. Berezin, D. A. Cox, T. Stuckey, J. J. Griffin, J. E. Tcheng, and D. J. Cohen. 2003. Cost-effectiveness of coronary stenting and abciximab for patients with acute myocardial infarction: Results from the CADILLAC (Controlled Abciximab and Device Investigation to Lower Late Angioplasty Complications) trial. Circulation 108(23):2857-2863.
Bala, M. M., R. P. Riemsma, J. Nixon, and J. Kleijnen. 2008. Systematic review of the (cost-) effectiveness of spinal cord stimulation for people with failed back surgery syndrome. Clin J Pain 24(9):741-756.
Basu, A., and D. Meltzer. 2007. Value of information on preference heterogeneity and individualized care. Med Decis Making 27(2):112-127.
Besanko, D., D. Dranove, M. Shanley, and S. Schaefer. Economics of strategy. Third ed. Malden: John Wiley and Sons. Pp. 477-478.
Birkmeyer, J. D., J. B. Dimick, and N. J. Birkmeyer. 2004. Measuring the quality of surgical care: Structure, process, or outcomes? J Am Coll Surg 198(4):626-632.
Blackmore, C. C., and D. J. Magid. 1997. Methodologic evaluation of the radiology cost-effectiveness literature. Radiology 203(1):87-91.
Blomstrom-Lundqvist, C., M. M. Scheinman, E. M. Aliot, J. S. Alpert, H. Calkins, A. J. Camm, W. B. Campbell, D. E. Haines, K. H. Kuck, B. B. Lerman, D. D. Miller, C. W. Shaeffer, W. G. Stevenson, G. F. Tomaselli, E. M. Antman, S. C. Smith, Jr., D. P. Faxon, V. Fuster, R. J. Gibbons, G. Gregoratos, L. F. Hiratzka, S. A. Hunt, A. K. Jacobs, R. O. Russell, Jr., S. G. Priori, J. J. Blanc, A. Budaj, E. F. Burgos, M. Cowie, J. W. Deckers, M. A. Garcia, W. W. Klein, J. Lekakis, B. Lindahl, G. Mazzotta, J. C. Morais, A. Oto, O. Smiseth, and H. J. Trappe. 2003. ACC/AHA/ESC guidelines for the management of patients with supraventricular arrhythmias—Executive summary. A report of the American College of Cardiology/American Heart Association task force on practice guidelines and the European Society of Cardiology Committee for Practice Guidelines (Writing Committee to Develop Guidelines for the Management of Patients with Supraventricular Arrhythmias) developed in collaboration with NASPE-heart rhythm society. J Am Coll Cardiol 42(8):1493-1531.
Brook, R. H., R. E. Park, M. R. Chassin, D. H. Solomon, J. Keesey, and J. Kosecoff. 1990. Predicting the appropriate use of carotid endarterectomy, upper gastrointestinal endoscopy, and coronary angiography. N Engl J Med 323(17):1173-1177.
Camm, J., H. Klein, and S. Nisam. 2007. The cost of implantable defibrillators: Perceptions and reality. Eur Heart J 28(4):392-397.
Center for Devices and Radiologic Health FDA. 2002. Cardiac ablation catheters generic arrhythmia indications for use; guidance for industry. Rockville, MD.
Centers for Medicare and Medicaid Services. 2001a. Medicare program; payments for new medical services and new technologies under the acute care hospital inpatient prospective payment system; final rule. Department of Health and Human Services.
———. 2001b. Medicare program; prospective payment system for hospital outpatient services: Criteria for establishing additional pass-through categories for medical devices: Interim final rule with comment period. Department of Health and Human Services.
———. 2003. National coverage determination (NCD) on implantable defibrillators. Department of Health and Human Services. Washington, DC: U.S. Government Printing Office.
———. 2008a. NCD for percutaneous transluminal angioplasty (PTA). Medicare coverage manual. 100-3. Department of Health and Human Services. Washington, DC: U.S. Government Printing Office.
———. 2008b. Posting of potential NCD topics: Proposed topic list for first quarterly release. Department of Health and Human Services.
Chassin, M. R., J. Kosecoff, R. E. Park, C. M. Winslow, K. L. Kahn, N. J. Merrick, J. Keesey, A. Fink, D. H. Solomon, and R. H. Brook. 1987. Does inappropriate use explain geographic variations in the use of health care services? A study of three procedures. JAMA 258(18):2533-2537.
Cheng, A. K., and J. K. Niparko. 1999. Cost-utility of the cochlear implant in adults: A meta-analysis. Arch Otolaryngol Head Neck Surg 125(11):1214-1218.
Cheng, A. K., H. R. Rubin, N. R. Powe, N. K. Mellon, H. W. Francis, and J. K. Niparko. 2000a. Cost-utility analysis of the cochlear implant in children. JAMA 284(7):850-856.
Cheng, C. H., G. D. Sanders, M. A. Hlatky, P. Heidenreich, K. M. McDonald, B. K. Lee, M. S. Larson, and D. K. Owens. 2000b. Cost-effectiveness of radiofrequency ablation for supraventricular tachycardia. Ann Intern Med 133(11):864-876.
Chernew, M. E., A. B. Rosen, and A. M. Fendrick. 2007. Value-based insurance design. Health Aff (Millwood) 26(2):w195-w203.
Clancy, C. 2008 (September 27). Value-based purchasing, transparency and transformation. Keynote address for the Third Annual Health Information Technology Summit. Washington, DC.
Clyde, A. T., L. Bockstedt, J. A. Farkas, and C. Jackson. 2008. Experience with Medicare’s new technology add-on payment program. Health Aff (Millwood) 27(6):1632-1641.
Cohen, J. T., P. J. Neumann, and M. C. Weinstein. 2008. Does preventive care save money? Health economics and the presidential candidates. N Engl J Med 358(7):661-663.
Cutler, D. M. 2004. Your money or your life: Strong medicine for America’s health care system. xiv:158.
Cutler, D. M., and M. McClellan. 2001. Is technological change in medicine worth it? Health Aff (Millwood) 20(5):11-29.
Dartmouth University. 2008. Research agenda and findings. http://www.dartmouthatlas.org/agenda.shtm (accessed January 11, 2009).
———. 2009. The Dartmouth atlas of healthcare. http://www.dartmouthatlas.org/ (accessed January 9, 2009).
Deloitte Center for Health Solutions. 2008. The medical home: A solution to chronic care management? http://www.deloitte.com/dtt/article/0%2C1002%2Csid%253d80772%2526cid%253d186574%2C00.html?wt.mc_id=indianapca (accessed January 11, 2009).
Derdeyn, C. P., J. D. Barr, A. Berenstein, J. J. Connors, J. E. Dion, G. R. Duckwiler, R. T. Higashida, C. M. Strother, T. A. Tomsick, and P. Turski. 2003. The International Subarachnoid Aneurysm Trial (ISAT): A position statement from the executive committee of the American Society of Interventional and Therapeutic Neuroradiology and the American Society of Neuroradiology. Am J Neuroradiol 24(7):1404-1408.
Detre, K., R. Holubkov, S. Kelsey, M. Cowley, K. Kent, D. Williams, R. Myler, D. Faxon, D. Holmes, Jr., M. Bourassa, et al. 1988. Percutaneous transluminal coronary angioplasty in 1985-1986 and 1977-1981. The National Heart, Lung, and Blood Institute Registry. N Engl J Med 318(5):265-270.
Deyo, R., and D. L. Patrick. 2005. Hope or hype: The obsession with medical advances and the high cost of false promises. New York: American Management Association.
Dimick, J. B., H. G. Welch, and J. D. Birkmeyer. 2004. Surgical mortality as an indicator of hospital quality: The problem with small sample size. JAMA 292(7):847-851.
Eckman, M. H., J. Rosand, S. M. Greenberg, and B. F. Gage. 2009. Cost-effectiveness of using pharmacogenetic information in warfarin dosing for patients with nonvalvular atrial fibrillation. Ann Intern Med 150(2):73-83.
Eddy, D. M. 1990. Clinical decision making: From theory to practice. Anatomy of a decision. JAMA 263(3):441-443.
———. 1998. Performance measurement: Problems and solutions. Health Aff (Millwood) 17(4):7-25.
———. 2006. Evidence-based imaging: Optimizing imaging in patient care. Edited by C. C. Blackmore and L. S. Medina. New York: Springer.
Ezekowitz, J. A., B. H. Rowe, D. M. Dryden, N. Hooton, B. Vandermeer, C. Spooner, and F. A. McAlister. 2007. Systematic review: Implantable cardioverter defibrillators for adults with left ventricular systolic dysfunction. Ann Intern Med 147(4):251-262.
Feigal, D. W., S. N. Gardner, and M. McClellan. 2003. Ensuring safe and effective medical devices. N Engl J Med 348(3):191-192.
Feldman, A. M., G. de Lissovoy, M. R. Bristow, L. A. Saxon, T. De Marco, D. A. Kass, J. Boehmer, S. Singh, D. J. Whellan, P. Carson, A. Boscoe, T. M. Baker, and M. R. Gunderman. 2005. Cost effectiveness of cardiac resynchronization therapy in the Comparison of Medical Therapy, Pacing, and Defibrillation in Heart Failure (COMPANION) Trial. J Am Coll Cardiol 46(12):2311-2321.
Fineberg, H. V., R. Bauman, and M. Sosman. 1977. Computerized cranial tomography. Effect on diagnostic and therapeutic plans. JAMA 238(3):224-227.
Firth, B. G., L. M. Cooper, and S. Fearn. 2008. The appropriate role of cost-effectiveness in determining device coverage: A case study of drug-eluting stents. Health Aff (Millwood) 27(6):1577-1586.
Fisher, E. S., D. E. Wennberg, T. A. Stukel, D. J. Gottlieb, F. L. Lucas, and E. L. Pinder. 2003. The implications of regional variations in Medicare spending. Part 1: The content, quality, and accessibility of care. Ann Intern Med 138(4):273-287.
Fisher, E. S., D. O. Staiger, J. P. Bynum, and D. J. Gottlieb. 2007. Creating accountable care organizations: The extended hospital medical staff. Health Aff (Millwood) 26(1): w44-w57.
Foote, S. B. 1992. Managing the medical arms race: Public policy and medical device innovation. Berkeley: University of California Press.
Fraenkel, L. 2008. Understanding why treatment preferences differ by race. Paper presented at Society for Medical Decision Making 30th Annual Meeting, Philadelphia, PA.
Fuchs, V. R. 1999. Health care for the elderly: How much? Who will pay for it? Health Aff (Millwood) 18(1):11-21.
Gelijns, A. C., L. D. Brown, C. Magnell, E. Ronchi, and A. J. Moskowitz. 2005. Evidence, politics, and technological change. Health Aff (Millwood) 24(1):29-40.
Gillick, M. R. 2004. Medicare coverage for technological innovations—Time for new criteria? N Engl J Med 350(21):2199-2203.
Goetz, M. P., S. K. Knox, V. J. Suman, J. M. Rae, S. L. Safgren, M. M. Ames, D. W. Visscher, C. Reynolds, F. J. Couch, W. L. Lingle, R. M. Weinshilboum, E. G. Fritcher, A. M. Nibbe, Z. Desta, A. Nguyen, D. A. Flockhart, E. A. Perez, and J. N. Ingle. 2007. The impact of cytochrome p450 2d6 metabolism in women receiving adjuvant tamoxifen. Breast Cancer Res Treat 101(1):113-121.
Government Accountability Office. 2008. Medicare Part B imaging services; rapid spending growth and shift to physician offices indicate need for CMS to consider additional management practices. Washington, DC.
Gray, L. A., J. G. Jarvik, P. J. Heagerty, W. Hollingworth, L. Stout, B. A. Comstock, J. A. Turner, and D. F. Kallmes. 2007. Investigational Vertebroplasty Efficacy and Safety Trial (INVEST): A randomized controlled trial of percutaneous vertebroplasty. BMC Musculoskelet Disord 8:126.
Grayson, C. J. 1960. Decisions under uncertainty: Drilling decisions by oil and gas operators. Cambridge, MA: Plimpton Press.
Grunkemeier, G. L., R. Jin, and A. Starr. 2006. Prosthetic heart valves: Objective performance criteria versus randomized clinical trial. Ann Thorac Surg 82(3):776-780.
Guglielmi, G. 1997. Endovascular treatment of aneurysms. History, development, and application of current techniques. J Stroke Cerebrovasc Dis 6(4):246-248.
Hackbarth, G., R. Reischauer, and A. Mutti. 2008. Collective accountability for medical care—Toward bundled Medicare payments. N Engl J Med 359(1):3-5.
Hawn, M. T., K. M. Itani, S. H. Gray, C. C. Vick, W. Henderson, and T. K. Houston. 2008. Association of timely administration of prophylactic antibiotics for major surgical procedures and surgical site infection. J Am Coll Surg 206(5):814-819; discussion 819-821.
Hill, R., A. Bagust, A. Bakhai, R. Dickson, Y. Dundar, A. Haycox, R. Mujica Mota, A. Reaney, D. Roberts, P. Williamson, and T. Walley. 2004. Coronary artery stents: A rapid systematic review and economic evaluation. Health Technol Assess 8(35):iii-iv, 1-242.
Hogenhuis, W., S. K. Stevens, P. Wang, J. B. Wong, A. S. Manolis, N. A. Estes 3rd, and S. G. Pauker. 1993. Cost-effectiveness of radiofrequency ablation compared with other strategies in Wolff-Parkinson-White syndrome. Circulation 88(5 Pt 2):II437-II446.
Hollingworth, W. 2005. Radiology cost and outcomes studies: Standard practice and emerging methods. Am J Roentgenol 185(4):833-839.
Holloway, R. G., C. G. Benesch, C. R. Rahilly, and C. E. Courtright. 1999. A systematic review of cost-effectiveness research of stroke evaluation and treatment. Stroke 30(7):1340-1349.
Human Genome Project. 2008. Gene testing. http://www.ornl.gov/sci/techresources/Human_Genome/medicine/genetest.shtml (accessed February 2009).
Hunink, M. G. 2008. Cost-effectiveness analysis: Some clarifications. Radiology 249(3):753-755.
IOM (Institute of Medicine). 2008. Value. In Learning healthcare system concepts v. 2008: The Roundtable on Evidence-Based Medicine annual report. Washington, DC: The National Academies Press. Pp. 17-20.
Jeffrey, K. 2001. Machines in our hearts : The cardiac pacemaker, the implantable defibrillator, and American health care. Baltimore: Johns Hopkins University Press.
Johnston, S. C., D. R. Gress, and J. G. Kahn. 1999. Which unruptured cerebral aneurysms should be treated? A cost-utility analysis. Neurology 52(9):1806-1815.
Jonker, D. J., C. J. O’Callaghan, C. S. Karapetis, J. R. Zalcberg, D. Tu, H. J. Au, S. R. Berry, M. Krahn, T. Price, R. J. Simes, N. C. Tebbutt, G. van Hazel, R. Wierzbicki, C. Langer, and M. J. Moore. 2007. Cetuximab for the treatment of colorectal cancer. N Engl J Med 357(20):2040-2048.
Kahan, J. P., R. E. Park, L. L. Leape, S. J. Bernstein, L. H. Hilborne, L. Parker, C. J. Kamberg, D. J. Ballard, and R. H. Brook. 1996. Variations by specialty in physician ratings of the appropriateness and necessity of indications for procedures. Med Care 34(6):512-523.
Kaiser Family Foundation and Health Research and Family Trust. 2004. Employer health benefits 2004 annual survey.
———. 2007. Employer health benefits 2007 annual survey.
Karapetis, C. S., S. Khambata-Ford, D. J. Jonker, C. J. O’Callaghan, D. Tu, N. C. Tebbutt, R. J. Simes, H. Chalchal, J. D. Shapiro, S. Robitaille, T. J. Price, L. Shepherd, H. J. Au, C. Langer, M. J. Moore, and J. R. Zalcberg. 2008. K-ras mutations and benefit from cetuximab in advanced colorectal cancer. N Engl J Med 359(17):1757-1765.
Kemler, M. A., and C. A. Furnee. 2002. Economic evaluation of spinal cord stimulation for chronic reflex sympathetic dystrophy. Neurology 59(8):1203-1209.
Kemler, M. A., H. C. De Vet, G. A. Barendse, F. A. Van Den Wildenberg, and M. Van Kleef. 2004. The effect of spinal cord stimulation in patients with chronic reflex sympathetic dystrophy: Two years’ follow-up of the randomized controlled trial. Ann Neurol 55(1):13-18.
Kessler, L., S. D. Ramsey, S. Tunis, and S. D. Sullivan. 2004. Clinical use of medical devices in the “Bermuda triangle.” Health Aff (Millwood) 23(1):200-207.
Kolata, G. 2008. Co-payments go way up for drugs with high prices. New York Times.
Kumar, K., R. S. Taylor, L. Jacques, S. Eldabe, M. Meglio, J. Molet, S. Thomson, J. O’Callaghan, E. Eisenberg, G. Milbouw, E. Buchser, G. Fortini, J. Richardson, and R. B. North. 2007. Spinal cord stimulation versus conventional medical management for neuropathic pain: A multicentre randomised controlled trial in patients with failed back surgery syndrome. Pain 132(1-2):179-188.
Kupersmith, J., M. Holmes-Rovner, A. Hogan, D. Rovner, and J. Gardiner. 1995. Cost-effectiveness analysis in heart disease, part iii: Ischemia, congestive heart failure, and arrhythmias. Prog Cardiovasc Dis 37(5):307-346.
Leape, L. L., R. E. Park, D. H. Solomon, M. R. Chassin, J. Kosecoff, and R. H. Brook. 1990. Does inappropriate use explain small-area variations in the use of health care services? JAMA 263(5):669-672.
Leavitt, M. 2008. Building a value-based health care system. Washington, DC, April 23.
Lindberg, M., K. A. Angquist, H. Fodstad, K. Fugl-Meyer, and A. R. Fugl-Meyer. 1992. Self-reported prevalence of disability after subarachnoid haemorrhage, with special emphasis on return to leisure and work. Br J Neurosurg 6(4):297-304.
Lucock, M. P., S. Morley, C. White, and M. D. Peake. 1997. Responses of consecutive patients to reassurance after gastroscopy: Results of self administered questionnaire survey. BMJ 315(7108):572-575.
Malchau, H., P. Herberts, T. Eisler, G. Garellick, and P. Soderman. 2002. The Swedish total hip replacement register. J Bone Joint Surg Am 84:S2-S20.
McGlynn, E. A., S. M. Asch, J. Adams, J. Keesey, J. Hicks, A. DeCristofaro, and E. A. Kerr. 2003. The quality of health care delivered to adults in the United States. N Engl J Med 348(26):2635-2645.
McKenna, C., S. Palmer, M. Rodgers, D. Chambers, N. Hawkins, S. Golder, S. Van Hout, C. Pepper, D. Todd, and N. Woolacott. 2009. Cost-effectiveness of radiofrequency catheter ablation for the treatment of atrial fibrillation in the UK. Heart 95:542-549.
McWilliam, A., R. Lutter, and C. Nardinelli. 2006. Health care savings from personalized medicine using genetic testing: The case of warfarin. Washington, DC: AEI-Brookings Joint Center for Regulatory Studies.
Mehran, R., M. B. Leon, D. A. Feigal, D. Jefferys, M. Simons, N. Chronos, T. J. Fogarty, R. E. Kuntz, D. S. Baim, and A. V. Kaplan. 2004. Post-market approval surveillance: A call for a more integrated and comprehensive approach. Circulation 109(25):3073-3077.
Meltzer, D. 1997. Accounting for future costs in medical cost-effectiveness analysis. J Health Econ 16(1):33-64.
Meltzer, D. O., and C. A. Alexander. 2009. The empirical cost-effectiveness of medical interventions [unpublished manuscript]. Chicago, IL: University of Chicago.
Meltzer, D., E. Huang, L. Jin, M. Shook, and M. Chin. 2003. Major bias in cost-effectiveness analysis due to failure to account for self-selection: Impact in intensive therapy for type 2 diabetes among the elderly. Med Decision Making 23(6):576.
Merrill, R. A. 1994. Regulation of drugs and devices: An evolution. Health Aff (Millwood) 13(3):47-69.
Molyneux, A., R. Kerr, I. Stratton, P. Sandercock, M. Clarke, J. Shrimpton, and R. Holman. 2002. International Subarachnoid Aneurysm Trial (ISAT) of neurosurgical clippingversus endovascular coiling in 2143 patients with ruptured intracranial aneurysms: A randomised trial. Lancet 360(9342):1267-1274.
Mueller, R. L., and T. A. Sanborn. 1995. The history of interventional cardiology: Cardiac catheterization, angioplasty, and related interventions. Am Heart J 129(1):146-172.
Munsey, R. R. 1995. Trends and events in FDA regulation of medical devices over the last fifty years. Food Drug Law J 50 Spec:163-177.
Mushlin, A. I., C. Mooney, V. Grow, and C. E. Phelps. 1994. The value of diagnostic information to patients with suspected multiple sclerosis. Rochester-Toronto MRI study group. Arch Neurol 51(1):67-72.
Myerburg, R. J., K. M. Kessler, and A. Castellanos. 1993. Sudden cardiac death: Epidemiology, transient risk, and intervention assessment. Ann Intern Med 119(12):1187-1197.
NAE (National Academy of Engineering) and IOM (Institute of Medicine). 2005. Building a better delivery system: A new engineering/health care partnership. Washington, DC: The National Academies Press.
Nair, G. M., P. B. Nery, S. Diwakaramenon, J. S. Healey, S. J. Connolly, and C. A. Morillo. 2008. A systematic review of randomized trials comparing radiofrequency ablation with antiarrhythmic medications in patients with atrial fibrillation. J Cardiovasc Electrophysiol.
National Center for Health Statistics. 2006. National Hospital Discharge Survey. http://www.cdc.gov/nchs/ (accessed March 30, 2009).
Neumann, P. J., M. S. Kamae, and J. A. Palmer. 2008. Medicare’s national coverage decisions for technologies, 1999-2007. Health Aff (Millwood) 27(6):1620-1631.
New York Times. 2007. http://www.nytimes.com/ref/business/20070611_GAP_GRAPHIC.html# (accessed January 9, 2009).
Newhouse, J. P. 1992. Medical care costs: How much welfare loss? J Econ Perspect 6(3):3-21.
Next Big Future. 2008. Whole genome sequencing costs continue to fall: $300 million in 2003, $1 million 2007, $60,000 now, $5000 by year end. http://nextbigfuture.com/2008/03/genome-sequencing-costs-continue-to.html (accessed February 2009).
Nisbett, R. E., and Ross L. 1980. Human inference, strategies and shortcomings of social judgment. Englewood Cliffs, NJ: Prentice Hall.
Noheria, A., A. Kumar, J. V. Wylie, Jr., and M. E. Josephson. 2008. Catheter ablation vs antiarrhythmic drug therapy for atrial fibrillation: A systematic review. Arch Intern Med 168(6):581-586.
North, R. B., D. H. Kidd, F. Farrokhi, and S. A. Piantadosi. 2005. Spinal cord stimulation versus repeated lumbosacral spine surgery for chronic pain: A randomized, controlled trial. Neurosurgery 56(1):98-106; discussion 106-107.
O’Brien, S. M., D. M. Shahian, E. R. DeLong, S. L. Normand, F. H. Edwards, V. A. Ferraris, C. K. Haan, J. B. Rich, C. M. Shewan, R. S. Dokholyan, R. P. Anderson, and E. D. Peterson. 2007. Quality measurement in adult cardiac surgery: Part 2—Statistical considerations in composite measure scoring and provider rating. Ann Thorac Surg 83(4 Suppl):S13-S26.
O’Connor, P. J., W. A. Rush, G. Davidson, T. A. Louis, L. I. Solberg, L. Crain, P. E. Johnson, and R. R. Whitebird. 2008. Variation in quality of diabetes care at the levels of patient, physician, and clinic. Prev Chronic Dis 5(1):A15.
O’Shea, K., E. Bale, and P. Murray. 2002. Cost analysis of primary total hip replacement. Ir Med J 95(6):177-180.
Otero, H. J., F. J. Rybicki, D. Greenberg, and P. J. Neumann. 2008. Twenty years of cost-effectiveness analysis in medical imaging: Are we improving? Radiology 249(3):917-925.
Parisier, S. C. 2003. Cochlear implants: Growing pains. Laryngoscope 113(9):1470-1472.
Park, R. E., A. Fink, R. H. Brook, M. R. Chassin, K. L. Kahn, N. J. Merrick, J. Kosecoff, and D. H. Solomon. 1986. Physician ratings of appropriate indications for six medical and surgical procedures. Am J Public Health 76(7):766-772.
Paulus, R. A., K. Davis, and G. D. Steele. 2008. Continuous innovation in health care: Implications of the Geisinger experience. Health Aff (Millwood) 27(5):1235-1245.
Pawlson, L. G., S. H. Scholle, and A. Powers. 2007. Comparison of administrative-only versus administrative plus chart review data for reporting HEDIS hybrid measures. Am J Manag Care 13(10):553-558.
Pearson, S. D., and M. D. Rawlins. 2005. Quality, innovation, and value for money: NICE and the British National Health Service. JAMA 294(20):2618-2622.
Phillips, K. A. 2008. Closing the evidence gap in the use of emerging testing technologies in clinical practice. JAMA 300(21):2542-2544.
Roiron, C., P. Sanchez, A. Bouzamondo, P. Lechat, and G. Montalescot. 2006. Drug eluting stents: An updated meta-analysis of randomised controlled trials. Heart 92(5):641-649.
Rose, E. A., A. C. Gelijns, A. J. Moskowitz, D. F. Heitjan, L. W. Stevenson, W. Dembitsky, J. W. Long, D. D. Ascheim, A. R. Tierney, R. G. Levitan, J. T. Watson, P. Meier, N. S. Ronan, P. A. Shapiro, R. M. Lazar, L. W. Miller, L. Gupta, O. H. Frazier, P. Desvigne-Nickens, M. C. Oz, and V. L. Poirier. 2001. Long-term mechanical left ventricular assistance for end-stage heart failure. N Engl J Med 345(20):1435-1443.
Rosenthal, M. B., B. E. Landon, S. L. Normand, R. G. Frank, T. S. Ahmad, and A. M. Epstein. 2007. Employers’ use of value-based purchasing strategies. JAMA 298(19):2281-2288.
Russell, L. B. 1986. Is prevention better than cure? Washington, DC: Brookings Institution.
———. 1993. The role of prevention in health reform. N Engl J Med 329(5):352-354.
Sacco, R. L., P. A. Wolf, N. E. Bharucha, S. L. Meeks, W. B. Kannel, L. J. Charette, P. M. McNamara, E. P. Palmer, and R. D’Agostino. 1984. Subarachnoid and intracerebral hemorrhage: Natural history, prognosis, and precursive factors in the Framingham study. Neurology 34(7):847-854.
Sackett, D. L., S. E. Straus, W. S. Richardson, W. Rosenberg, and R. B. Haynes. 2001. Evidence based medicine. How to practice and teach EBM, second ed. Churchill Livingstone.
Sanders, G. D., M. A. Hlatky, and D. K. Owens. 2005. Cost-effectiveness of implantable cardioverter-defibrillators. N Engl J Med 353(14):1471-1480.
Santry, H. P., D. L. Gillen, and D. S. Lauderdale. 2005. Trends in bariatric surgical procedures. JAMA 294(15):1909-1917.
Schermerhorn, M. L., A. J. O’Malley, A. Jhaveri, P. Cotterill, F. Pomposelli, and B. E. Landon. 2008. Endovascular vs. open repair of abdominal aortic aneurysms in the Medicare population. N Engl J Med 358(5):464-474.
Scholle, S. H., J. Roski, J. L. Adams, D. L. Dunn, E. A. Kerr, D. P. Dugan, and R. E. Jensen. 2008. Benchmarking physician performance: Reliability of individual and composite measures. Am J Manag Care 14(12):833-838.
Shapiro, S., P. Strax, and L. Venet. 1966. Evaluation of periodic breast cancer screening with mammography. Methodology and early observations. JAMA 195(9):731-738.
Shaw, L. M., H. Vanderstichele, M. Knapik-Czajka, C. M. Clark, P. S. Aisen, R. C. Petersen, K. Blennow, H. Soares, A. Simon, P. Lewczuk, R. Dean, E. Siemers, W. Potter, V. M. Lee, and J. Q. Trojanowski. 2009. Cerebrospinal fluid biomarker signature in Alzheimer’s disease neuroimaging initiative subjects. Ann Neurol 65(4):403-413.
Singer, M. E., and K. E. Applegate. 2001. Cost-effectiveness analysis in radiology. Radiology 219(3):611-620.
Smith, S. C., Jr., J. T. Dove, A. K. Jacobs, J. W. Kennedy, D. Kereiakes, M. J. Kern, R. E. Kuntz, J. J. Popma, H. V. Schaff, D. O. Williams, R. J. Gibbons, J. P. Alpert, K. A. Eagle, D. P. Faxon, V. Fuster, T. J. Gardner, G. Gregoratos, and R. O. Russell. 2001. ACC/AHA guidelines of percutaneous coronary interventions (revision of the 1993 PTCA guidelines)—Executive summary. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Committee to Revise the 1993 Guidelines for Percutaneous Transluminal Coronary Angioplasty). J Am Coll Cardiol 37(8):2215-2239.
Staiger, D. O., J. B. Dimick, O. Baser, Z. Fan, and J. D. Birkmeyer. 2009. Empirically derived composite measures of surgical performance. Med Care 47(2):226-233.
Stinnett, A. A., and J. Mullahy. 1998. Net health benefits: A new framework for the analysis of uncertainty in cost-effectiveness analysis. Med Decis Making 18(2 Suppl):S68-S80.
Taylor, W. F., R. S. Fontana, M. A. Uhlenhopp, and C. S. Davis. 1981. Some results of screening for early lung cancer. Cancer 47(5 Suppl):1114-1120.
Teutsch, S. M., and M. L. Berger. 2005. Evidence synthesis and evidence-based decision making: Related but distinct processes. Med Decis Making 25(5):487-489.
Thaler, R. A. 1994. Toward a positive theory of consumer choice. In Quasi rational economics. Journal of Economic Behavior and Organization 1(1):39-60.
Tufts University. 2009. CEA registry. https://research.tufts-nemc.org/cear/default.aspx (accessed March 29, 2009).
Tunis, S. R., D. B. Stryer, and C. M. Clancy. 2003. Practical clinical trials: Increasing the value of clinical research for decision making in clinical and health policy. JAMA 290(12):1624-1632.
Tunis, S. R., T. V. Carino, R. D. Williams 2nd, and P. B. Bach. 2007. Federal initiatives to support rapid learning about new technologies. Health Aff (Millwood) 26(2):w140-w149.
Ursula, M., A. J. Moskowitz, and A. C. Gelijns. 2007. Clinical evaluation of medical devices. Challenges in conducting implantable device trials: Left ventricular assist devices in destination therapy. P. 346.P. 346.
U.S. Food and Drug Administration. 2004a. Guidance for industry and FDA staff: Clinical study designs for percutaneous catheter ablation for treatment of atrial fibrillation. U.S. Department of Health and Human Services, Food and Drug Administration, Center for Devices and Radiologic Health.
———. 2004b. Guidance for industry and FDA staff: Clinical trial considerations: Vertebral augmentation devices to treat spinal insufficiency fractures. U.S. Department of Health and Human Services, Food and Drug Administration, Center for Devices and Radiological Health.
———. 2008a. Draft guidance for industry and FDA staff—Class II special controls guidance document for certain percutaneous transluminal coronary angioplasty (PTCA) catheters. U.S. Department of Health and Human Services, Food and Drug Administration, Center for Devices and Radiologic Health.
———. 2008b. Guidance for industry and FDA staff: Clinical study designs for catheter ablation devices for treatment of atrial flutter. U.S. Department of Health and Human Services, Food and Drug Administration, Center for Devices and Radiological Health.
———. 2008c. Guidance for industry and FDA staff: Coronary drug-eluting stents— Nonclinical and clinical studies. U.S. Department of Health and Human Services, Food and Drug Administration, Center for Devices and Radiological Health.
———. 2008d. Guidance for industry and FDA staff: Preparation and review of investigational device exemption applications (IDES) for total artificial discs. U.S. Department of Health and Human Services, Food and Drug Administration, Center for Devices and Radiologic Health.
Voormolen, M. H., W. P. Mali, P. N. Lohle, H. Fransen, L. E. Lampmann, Y. van der Graaf, J. R. Juttmann, X. Jansssens, and H. J. Verhaar. 2007.Percutaneous vertebroplasty compared with optimal pain medication treatment: Short-term clinical outcome of patients with subacute or chronic painful osteoporotic vertebral compression fractures. The VERTOS study. AJNR Am J Neuroradiol 28(3):555-560.
Wade, N. 2006. The quest for the $1000 human genome. New York Times, July 18.
Weaver, F. M., K. Follett, M. Stern, K. Hur, C. Harris, W. J. Marks, Jr., J. Rothlind, O. Sagher, D. Reda, C. S. Moy, R. Pahwa, K. Burchiel, P. Hogarth, E. C. Lai, J. E. Duda, K. Holloway, A. Samii, S. Horn, J. Bronstein, G. Stoner, J. Heemskerk, and G. D. Huang. 2009. Bilateral deep brain stimulation vs best medical therapy for patients with advanced Parkinson disease: A randomized controlled trial. JAMA 301(1):63-73.
Weinstein, M. C., and W. B. Stason. 1976. Hypertension: A policy perspective. Cambridge, MA: Harvard University Press.
Wennberg, D. E., F. L. Lucas, J. D. Birkmeyer, C. E. Bredenberg, and E. S. Fisher. 1998. Variation in carotid endarterectomy mortality in the Medicare population: Trial hospitals, volume, and patient characteristics. JAMA 279(16):1278-1281.
Wilensky, G. R. 2006. Developing a center for comparative effectiveness information. Health Aff (Millwood) 25(6):w572-w585.
Williams, D. O., R. Holubkov, W. Yeh, M. G. Bourassa, M. Al-Bassam, P. C. Block, P. Coady, H. Cohen, M. Cowley, G. Dorros, D. Faxon, D. R. Holmes, A. Jacobs, S. F. Kelsey, S. B. King 3rd, R. Myler, J. Slater, V. Stanek, H. A. Vlachos, and K. M. Detre. 2000. Percutaneous coronary intervention in the current era compared with 1985-1986: The National Heart, Lung, and Blood Institute registries. Circulation 102(24):2945-2951.
Winter, A., and N. Ray. 2008. Paying accurately for imaging services in Medicare. Health Aff (Millwood) 27(6):1479-1490.
Zaidat, O. O., R. Klucznik, M. J. Alexander, J. Chaloupka, H. Lutsep, S. Barnwell, M. Mawad, B. Lane, M. J. Lynn, and M. Chimowitz. 2008. The NIH registry on use of the wingspan stent for symptomatic 70-99% intracranial arterial stenosis. Neurology 70(17):1518-1524.
Zarnke, K. B., M. A. Levine, and B. J. O’Brien. 1997. Cost-benefit analyses in the health-care literature: Don’t judge a study by its label. J Clin Epidemiol 50(7):813-822.