Methods and Process Needed for Clinical Adoption and Evaluation of Biomarker-Based Diagnostics
In order for cancer biomarker tests to be used effectively in a clinical setting, their clinical risks and benefits must be assessed. Even for diagnostic tests that have received Food and Drug Administration (FDA) approval (which are few in number), the clinical utility has not been assessed. Clinical algorithms need to be developed that specify the target patient populations for the diagnostic test and the changes in patient management that follow from test results. Well-designed, prospective clinical studies are needed to demonstrate that the test results influence the patient’s management such that clinical outcomes are improved. However, the studies necessary to develop evidence of the value of these tests may be costly and lengthy, especially for tests used for cancer screening. This deters diagnostic companies from conducting such studies. Instead they usually introduce biomarker-based products into the market via a 510(k) review process or by developing homebrew tests for in-house use (FDA, 2001; IOM, 2005b, see also Chapter 3).
These pathways to the market bypass the need to supply evidence of a test’s clinical benefits and risks. They also do not require manufacturers to specify the patient population(s) for which the test should be used, or how the test fits into the clinical care pathway of a patient. Off-label use of FDA-approved biomarker tests also fosters clinical applications for purposes other than that for which they have been clinically validated. For example, most tests enter the market as cancer diagnostics, but they can then be used for cancer screening without adequate evaluation. The extension of the use
of the prostate-specific antigen (PSA) test for prostate cancer from diagnosis to screening, described below, is a prime example of this scenario. Once adopted in such a fashion, it may be difficult or impossible to adequately assess the risks, benefits, and value of a screening test. Postmarket surveillance of diagnostic tests is minimal, and once insurers provide coverage for something, coverage is rarely withdrawn unless the item is removed from the market because of safety concerns (reviewed by IOM, 2001).
Ultimately, the value of a test to society also depends on its cost-effectiveness and economic impact. Although these factors have not generally been considered in coverage and adoption decisions for health care in the United States, interest in such assessments is increasing as the cost of medical care continues to rise. Few economic evaluations of diagnostics have been undertaken thus far, perhaps because of their relatively low cost compared with many drugs and other medical interventions (Rogowski, in press). However, the newer class of pharmacogenomics-based molecular diagnostics that will enable personalized medicine may come under closer scrutiny because of their potentially significant budget impact due to high drug costs and the high cost of adverse drug reactions. Nonetheless, economic evidence for this class of diagnostics is presently still quite limited (Rogowski, 2007).
This chapter provides an overview of the challenges and needs of technology assessment and adoption, with the goal of identifying possible ways to facilitate data collection and analysis to monitor and improve the value of biomarker tests. Examples described below, as well as in Chapter 3, illustrate the complexity of this topic. Problematic cases include instances in which markers approved for one purpose were widely diffused and adopted for another purpose without sufficient evidence; instances in which the use of markers for vital treatment guidance is based on small, poor-quality studies likely to be less than definitive; and many instances in which evidence for the value of markers to improve patient outcomes is flawed or insufficient.
THE CHALLENGE OF ASSESSING CLINICAL VALUE
As a result of the limited scope of FDA oversight of laboratory tests, biomarker tests often are applied in clinical settings with little assessment of their clinical utility for specific medical situations (Reid et al., 1995; Feinstein, 2002; Weinstein et al., 2005). This does not seem to hinder widespread clinical adoption, however.
Tests that are introduced for one indication may find much wider application in other settings. The blood test for prostate-specific antigen provides a telling example of rapid adoption of a test for a use that it was not approved for. The FDA approved the PSA test in 1985 for the detection of prostate cancer recurrence, but it is now widely used for prostate cancer screening. Most studies show that 50–60 percent of men (50 years and older) get recommended prostate cancer screening, and some advocates report that up to 75 percent of that target population undergoes regular screening for prostate cancer, despite the fact that the U.S. Preventive Services Task Force (USPSTF) gave it an “I” rating (Swan et al., 2003; Carlos et al., 2005). This rating indicates that the task force found insufficient evidence to give the PSA test its backing for prostate screening purposes, primarily because there was inconclusive evidence that early detection by screening improves health outcomes and substantial evidence of screening-related harms. The task force reviewed studies that suggested that preventing one death from prostate cancer in eight years would require annual PSA screening of about 1,000 men with the test (Research Triangle Institute, 2002; CDC, 2006). But substantial proportions of these men would be subject to such potential harms as false-positive tests, anxiety, and treatment-linked erectile dysfunction, incontinence, and bowel dysfunction.
The rapid adoption of the PSA test for prostate cancer screening illustrates the potential costs to both society and individuals of adopting a biomarker test before its clinical risks and benefits have been adequately assessed. One way to foster the key prospective studies needed for such assessments is to support them via government funds, public–private collaborations, or nonprofit consortia.
Such support was recently provided to fund prospective clinical trials of OncotypeDX and MammaPrint, two genomic tests for predicting the risk of breast cancer recurrence. Both tests use the gene expression signatures of breast tumors to determine which women with node-negative invasive breast cancers would be most likely to benefit from chemotherapy (Box 4-1). But there is a lack of studies that can firmly establish the clinical outcomes of using cancer biomarker tests. Well-designed prospective clinical studies of diagnostic tests are often lacking (IOM, 2006). In addition, clearly defined patient populations, relevant comparators, and intention-to-treat analyses of all participants by initial group assignment are often missing, nor do studies always have the long-term follow-up needed to adequately assess the health outcomes of a medical intervention. The end
result is a lack of robust evidence of the effects of an intervention and how those effects compare with other interventions (IOM, 2006).
Two major disincentives for undertaking such studies are the cost and length of time needed to complete the study. As more biomarker tests are developed, it may become increasingly difficult to fund and undertake adequate studies to assess them all. For example, the cost of the trial to assess the OncotypeDX test (the Trial Assigning Individualized Options for Treatment, TAILORx) to NCI alone is estimated at $27 million for 5 years of the trial.1 It could be argued that, in the long run, the cost of the trial will be small compared with unnecessarily treating many women with chemotherapy. However, technology is continually evolving as new discoveries are made, so by the time this study is finished, new data may indicate that a slightly different set of genes is even better at predicting outcomes. But without an infinite source of funding, the ability to launch additional studies will be limited.
EVIDENCE FOR COVERAGE
The lack of direct evidence for the value of diagnostics makes it difficult for insurers to make informed decisions about coverage for new tests. Performance characteristics of the test are often used to fill in a model of how the technology can detect a condition or change its management to give an improved health outcome (IOM, 2006). But such an approach can be too simplistic. For example, there is a test for variations in the gene that codes for the drug-metabolizing enzyme cytochrome P450 (CYP450). These variants can reduce or increase the enzyme’s ability to metabolize certain drugs, including the anticoagulant warfarin. Therefore, a person who has a variant gene might benefit by lowering or raising the doses of those drugs. But other factors also can affect drug metabolism. These factors include other enzymes, coexisting disease, age, diet, and interactions with other drugs (reviewed by Takahashi and Echizen, 2003). Given this complex scenario, it is not yet clear how useful a test for just one influence on drug metabolism will be for patients who take warfarin. To address this question, the Critical Path Institute, a public–private partnership (see Chapter 2), has begun a randomized study of an individualized, genotype-based warfarin-dosing regimen versus standard care (Feigal, 2006).
Assessing the Value of OncotypeDX and MammaPrint
Chemotherapy is currently recommended for most women with node-negative breast cancer that is greater than 1 cm or has unfavorable pathology. But studies show that chemotherapy offers only a modest improvement in the 10-year survival rate, especially for women with estrogen receptor (ER)-positive disease treated with hormonal therapy. Many women could be spared the significant side effects of chemotherapy if there was a way to discern whether they have tumors not likely to be significantly affected by such toxic therapies, either because they are relatively indolent tumors not likely to recur and spread, and/or because they are relatively insensitive to the effects of chemotherapy.
Initial findings from studies on OncotypeDX suggest that this 21-gene test can predict the risk of recurrence for node-negative, ER-positive breast tumors. The studies identified a large subset of patients (about 50 percent) who were at very low risk of dying from breast cancer within 10 years. In one study, chemotherapy lowered the risk of recurrence by nearly 30 percent in women with a high recurrence score, but it reduced this risk by only about 1 percent in women with a low recurrence score.
These findings suggest that combining the oncotype recurrence score with tumor grade and size, or using it instead of these traditional prognostic factors, might help physicians better determine which women are at high risk of having a breast cancer recurrence and therefore might benefit from having more aggressive chemotherapy in addition to hormonal therapy. But none of these studies on OncotypeDX was a large, prospective, randomized clinical study that would be most likely to accurately assess the utility of the diagnostic in a clinical setting. The National Cancer Institute recently launched such a study, which is expected to last 10 years (with an additional follow-up of 20 years after initial therapies), and it will enroll over 10,000 women at 900 sites in the United States and Canada. The Trial Assigning Individualized Options for Treatment (TAILORx) is designed mainly to evaluate the effect of chemo-
Without evidence of clinically utility, many insurers are reluctant to cover the costs of innovative tests. Lack of coverage, in turn, often impedes their widespread adoption in clinical settings. But this poses a dilemma that was described in the Institute of Medicine report Saving Women’s Lives (IOM, 2005b):
therapy (in addition to hormonal therapy) on women with ER-positive, node-negative breast cancers with recurrence scores in the intermediate range. These women will all receive hormonal therapy, but then they will be randomly assigned to receive chemotherapy or not in addition.
With support from the European Organisation for Research and Treatment of Cancer and an estimated 10M from Agendia, researchers are conducting another large, prospective, randomized clinical study of MammaPrint, a microarray test for a 70-gene expression signature that initial studies suggest is linked to breast cancer prognosis in women 60 years or younger with either ER-positive or ER-negative tumors. One study found that this gene signature outperformed traditional prognostic factors, such as tumor size and grade, in predicting recurrence within 10 years. To more fully assess this, the Microarray In Node-negative Disease may Avoid ChemoTherapy (MINDACT) study will randomly assign chemotherapy to half the women with breast cancers that appear to be at low risk of recurrence from their MammaPrint results, yet at high risk of recurrence based on traditional prognostic factors. Over 6,000 women will be followed for 6 years. The researchers will use the recurrence and survival rates for each treatment strategy to assess whether MammaPrint is more effective than standard prognostic factors in determining who will benefit the most from chemotherapy.
Although there is very little overlap in the genes assessed by these two tests, a recent study of 295 patient samples found highly concordant outcome predictions (about 80 percent) between the test results. This concordance likely occurs because the different gene sets reflect common cellular phenotypes and biological characteristics that are present in different groups of breast cancer patients, but the results do raise the question of whether biomarker tests should target genes that are at the origin of pathophysiological pathways, or the final genes that encode proteins that delineate the tumor phenotype.
SOURCES: Eifel et al., 2001; Goldhirsch et al., 2001; van de Vijver et al., 2002; Fisher et al., 2004; Paik et al., 2004, 2006; European Organisation for Research and Treatment of Cancer, 2005; Frantz, 2005; Fan et al., 2006; Habel et al., 2006; NCI, 2006.
… insurance coverage of the new technology would increase its use, providing both some of the resources needed for its developers to study its clinical value and more clinical experience with the new technology. Yet, once coverage is granted, there is little incentive (and more likely a disincentive) for companies to gather data and formally evaluate the clinical effectiveness of their new technology (p. 230).
Conditional coverage is one way to get around this dilemma. Conditional coverage by the Centers for Medicare & Medicaid Services (CMS) and other insurers could provide a means of collecting important data on the use, effectiveness, and value of biomarker tests before they are broadly adopted. Payors would agree to provisionally cover new tests with the proviso that, in the interim, data would be collected in conjunction with use of the test, to assess its clinical utility and value.
CMS has already used Coverage under Evidence Development (CED) for innovative diagnostic biomarker technologies, such as fluorodeoxyglucose positron emission tomography (FDG-PET) scanning for cancer diagnosis, staging, and monitoring. CMS first determined that the evidence was not persuasive that FDG-PET scanning was a useful technology in all cancers. But based on some studies suggesting the usefulness of FDG-PET in certain cancers as a biomarker for cancer staging, diagnosis, and monitoring, the federal agency decided to provide coverage for such use of the imaging technology, dependent on the mandatory collection of clinical data (CMS, 2005a).
According to draft guidance put out by CMS (CMS, 2005a, 2006), the agency considers CED to be particularly useful in the following situations relevant to cancer biomarkers:
To clarify the risks and benefits for off-label uses.
To clarify the risks and benefits of a diagnostic or treatment in specific patient subgroups that other clinical trials have not addressed but that comprise a sizable portion of Medicare beneficiaries.
To assess important outcomes, such as long-term risks and benefits, quality of life, costs, and other real-world outcomes that clinical studies have not addressed.
To assess the comparative effectiveness of new items and services compared with existing alternatives, if not already addressed in clinical studies.
To determine the clinical significance of statistically significant benefits documented in other studies.
But this draft document did not specify who will be required to collect or analyze the data needed for CED, or what funds will be used to support such efforts, probably because that will vary according to specific situations. For example, CED was used for coverage of off-label, unlisted uses of four drugs approved for colorectal cancer. CMS would cover such uses of these
drugs only if patients enrolled in one of nine NCI-sponsored clinical trials. In this situation, NCI was responsible for gathering and analyzing the data. In contrast, CMS’s CED for implantable cardiodefibrillators for primary prevention of sudden cardiac death required the implanting physician to collect the data and enter them into an existing electronic data submission system present in all hospitals. The draft guidance notes that
Existing data systems should be used when available to avoid expending resources on creating new data systems. In addition, wherever possible, efforts should be made to use existing health information technology to support implementation of these studies. In many cases, it will be possible to link administrative data to data gathered for registries and practical trials, significantly expanding the value of the aggregate information collected and reducing the burden of data collection.
CMS will rely on the data collected to determine whether a given intervention is “reasonable and necessary for each patient who is the recipient of the item or service,” a fact sheet on the guidance states (CMS, 2005b). Once collected, both CMS and the public can use the data for research purposes.
CED will be applied only in the context of a national coverage determination. However, about 90 percent of Medicare’s coverage decisions are left to local carrier discretion; thus the agency expects that CED will be used infrequently (CMS, 2005b; for a review of local versus national coverage decisions, see IOM, 2001). CMS bases its legal authority for CED on its congressional mandate to provide payment only for items and services that are “reasonable and necessary” for the treatment of illness or injury. The agency claims that it needs to require CED when there is insufficient evidence to determine if a given intervention is both reasonable and necessary (CMS, 2005a, 2006).
Private insurers, however, are required to administer benefits according to the terms of the benefit plan. These plans typically exclude items or services that are deemed experimental and investigational, and they include no provision for the coverage of promising experimental interventions as evidence is developed. Some insurers have limited provision for coverage of promising experimental treatments for certain conditions (e.g., cancer, terminal illnesses) in clinical trials that meet certain qualifications. Although insurers may design benefit plans that provide CED, they may have difficulty justifying CED to plan sponsors and members. Some maintain that development of evidence of the effectiveness of an intervention is a public good that is not appropriate to the mission of private insurers, and others
may question the affordability of benefit plans that provide CED (IOM, 2006). Although many insurers provide coverage of routine care costs of persons in clinical trials, they do not consider the cost of the experimental intervention itself or protocol-induced costs (costs of data collection and analysis solely for purposes of the clinical trial) as established, medically necessary treatment of the member’s disease. On the other hand, some contend that investing in CED can ultimately provide payoffs to health plans both in terms of cost savings stemming from reduced use of unnecessary technologies and better outcomes for patients. Whether the knowledge gained from CED studies is proprietary or should be a public good (all health plan members, providers, and the public can benefit from the knowledge) is an unresolved issue.
The committee recommends that CMS and other health care payors, including private insurers, develop criteria for temporary, conditional coverage, similar to the CED approach, of new biomarker tests in certain circumstances to facilitate controlled and limited use of a diagnostic with a therapeutic, and even more importantly, a screening biomarker test, until sufficient evidence can be gathered to make an informed decision about standard (permanent, nonprovisional) coverage. That is, a risk-sharing approach should be implemented in which payors would agree to preliminarily cover new tests in specified circumstances contingent on data collection, to assess the clinical utility and value of the test. This would mimic the cost and risk sharing of evidence development that occurs between technology sponsors and several national health care plans overseas. Such cost and risk sharing has enabled these plans to have high standards for evidence in their coverage decisions. For example, the United Kingdom’s National Health Service pays for a new drug at an agreed-on price, with the requirement that data on the drug’s effectiveness be collected in a patient registry. If the drug does not show effectiveness at the expected level, the drug’s price is reduced so that the total reimbursement over time reflects the actual quality of life gain observed (UK Department of Health, 2002). As noted above, private insurers may experience difficulty with CED based on their mandate and legal limitations, but it would be beneficial to examine and overcome these challenges. Because Medicare primarily covers patients who are older than 65, private insurers could make a very important contribution by collecting data on younger patient populations for whom cancer screening tests may yield the greatest gains in survival and reduced morbidity.
The committee also recommends that when conditional coverage is applied, the cost effectiveness of biomarkers should be studied by independent research entities, in conjunction with the assessment of technology accuracy and clinical effectiveness. An independent, publicly funded information infrastructure to study and disseminate results on pharmaceutical cost-effectiveness has similarly been proposed recently (Reinhardt, 2004). Cost-effectiveness analysis (CEA) provides a framework for comparing the economic efficiencies of health care interventions. CEA measures the ratio of cost per quality-adjusted life years. This is particularly important for screening biomarkers due to the costs and potential morbidity of false-positive results (reviewed by IOM, 2001).
Although CEA is generally not used explicitly in making coverage decisions in the United States and CMS is prohibited from using CEA in making coverage decisions, there is increasing demand for cost-effectiveness analyses of medical interventions because of the rapidly increasing costs of medical care. Some experts call cost-effectiveness the “fourth hurdle” in health care, after safety, efficacy, and quality, and in some countries it is used explicitly to make coverage decisions (IOM, 2006). CEA is becoming more relevant to health policy makers because of the increasing number of options for medical interventions combined with limited financial resources and the high costs of many new medical technologies and treatments, such as the new targeted therapies for cancers.
For example, the United Kingdom’s National Institute for Health and Clinical Excellence (NICE) recently decided against making two new targeted therapies, bevacizumab and cetuximab, available for the treatment of colorectal cancer on the National Health Service (NHS), arguing that neither drug is cost-effective. NICE reported that use of the drugs was not “compatible with the best use of NHS resources” because, although the treatments may extend life expectancy of some patients with advanced colorectal cancer by a few months, the average cost of treating a patient with the drugs was more than the NICE threshold of effectiveness of about £30,000 ($56,500) per life-year saved (NICE, 2006a). This figure is not an absolute ceiling, however. Taking into account the nature of the disease and quality of life provided by the drug, particularly the frequency and duration of remissions, NICE last year approved use of the drug imatinib, which targets certain types of leukemia and gastrointestinal tumors and can cost at much as £35,000 ($66,000) per year (NICE, 2006b).
Biomarker tests present another health care expense that could be a cost challenge for insurers. But their additional cost might be offset by the opportunity to better direct appropriate treatment and derive greater patient benefit for each health care dollar spent. Most cancer therapies benefit only a fraction of the patients for which they may be indicated (Spear et al., 2001). Appropriate patient selection via accurate diagnostic biomarker tests that predict responsiveness could substantially improve patient outcomes and thus increase the cost-effectiveness of treatment. Similarly, if biomarker-based screening tests could be developed to detect cancers at an earlier, more easily treated stage, these new biomarker technologies could have a substantial impact on the economic burden of cancer by reducing the cost of treatment, as well as the overall burden and consequence of disease.
But assessing the value of a biomarker diagnostic or screening test is difficult, given that such tests are intermediate steps in the patient care pathway. Because they usually trigger a cascade of decisions regarding further testing, prevention, or treatment, medical tests can have enormous influence on the ultimate costs and benefits of medical therapies. Although diagnostics account for only 1.6 percent of total Medicare costs, they influence 60 to 70 percent of downstream treatment decisions, one study found (The Lewin Group, Inc., 2005). Analytical modeling techniques may be necessary to evaluate the cost-effectiveness of new tests. Such techniques are frequently used by countries with government-funded medicine to determine how best to prioritize the health care services they provide (IOM, 2006). Modeling methods rely on the information available about the biology of disease and the effectiveness of possible interventions (IOM, 2005a). Studying the cost-effectiveness of new biomarker tests in the context of conditional coverage would facilitate methods development and help to ensure that CEA is done appropriately in the future.
Cost-effectiveness analyses assess the value of a medical treatment by noting its costs relative to its health benefits. In that way, one can choose an intervention for which the cost relative to the benefit is less than a threshold value. Health benefits are measured with an index called QALY for quality-adjusted life years. This index combines measures of quality of life with length of life. The cost-effectiveness threshold for medical interventions in the United States is between $50,000 and $100,000 per QALY (Meltzer, 2006). A cost-effectiveness analysis can be done from multiple perspectives (societal, patient, insurers, government, providers). The societal perspective is always preferred, but the committee also recommends analyzing cost-
effectiveness from the insurer perspective because if the insurer and societal perspective are in conflict (e.g., an intervention is deemed cost-effective from societal perspective but not cost effective from insurer perspective), there may be a role for policy makers to intervene so that the incentives align better.
Cost-effectiveness appraisals have many methodological limitations that can affect their accuracy, as several speakers at the IOM workshop on biomarkers pointed out (IOM, 2006). How valid they are depends in part on how accurately health outcomes and other relevant metrics can be measured. The analyses can be adversely affected by basing them on inadequately controlled studies, studies that don’t consider the most useful comparators, or studies that are not of sufficient duration to truly assess the health outcome of interest. The use of surrogate markers that do not adequately correlate with relevant health outcomes can also be a problem. In addition, quality-of-life measures can vary according to subpopulation, and cost assessments may not be sufficiently comprehensive. Despite their limitations, cost-effectiveness analyses are increasingly being used in the biomedical arena.
For example, to distinguish a treatable subgroup of brain cancer patients in the United Kingdom, NICE conducted a cost-effectiveness analysis to evaluate the use of cancer biomarker O6-methylguanine-DNA methyltransferase (MGMT) methylation status2 in glioma patients. Treatment with temozolomide, in addition to radiotherapy, surpasses NICE’s cost-effectiveness threshold only in the subgroup likely to respond, as indicated by MGMT methylation status. MGMT methylation status and other response-predicting biomarkers thus have the potential to refine disease and therapy and improve cost-effectiveness (Stevens, 2006).
The value of a diagnostic test, including a biomarker test, depends on how it is used. The cost-effectiveness of the Pap test substantially falls, for example, when it is used annually as opposed to every two or three years, because costs rise incrementally but benefits (years of life saved) rapidly plateau as screening frequencies increase (Eddy, 1990). Statistical analyses also reveal that self-selection of a medical treatment by patients occurs because they tend to opt out of a therapy when it is not effective. This self-selection can substantially improve the cost-effectiveness of the treatment (Meltzer, 2005). But most cost-effective analyses consider only the costs and benefits
of a diagnostic or treatment for the entire general population and do not consider self-selection.
The cost-effectiveness of medical tests or treatments also can substantially drop if they are used incorrectly in the wrong populations. An example of this is the use of COX-2 inhibitors. Prior to the release of data showing their cardiovascular side effects, COX-2 inhibitors were shown to be relatively cost-effective drugs for patients at high risk of gastrointestinal bleeding. But the drugs were not cost-effective in people at low risk of such bleeding. However, most COX-2 inhibitors were used in the United States by people at low risk of bleeding, so the actual cost-effectiveness was poor because of how they were used (Meltzer, 2006). The cost-effectiveness of tests and interventions consequently needs to be evaluated not as they would be used under ideal circumstances, but as they are used in practice.
In addition to making coverage decisions, health insurers also need to set reimbursement rates for diagnostic tests that adequately reflect their value so that they are appropriately adopted in clinical settings. On one hand, when pricing is set too low, it discourages manufacturers from developing new and innovative tests. On the other hand, generous pricing encourages rapid uptake of the test, even if widespread clinical adoption may not be justified on the basis of the evidence of a test’s clinical validity or utility. When pricing is set too high, in contrast, it can impede the clinical adoption of a test. As noted in a recent Institute of Medicine (IOM) report on Medicare laboratory payment policy, “Theoretically, when prices do not reflect costs, they have the potential to inappropriately influence clinical decision making, inhibit innovation, waste taxpayer dollars, and limit beneficiary access to care” (IOM, 2000).
Medicare payment determinations for diagnostics not only affect the clinical care of its beneficiaries (one in seven patients in America) (Raab and Logue, 2001), but also influence state Medicaid and private insurers’ payment rates (IOM, 2000). However, many experts argue that the reimbursement levels for diagnostics set by Medicare do not adequately reflect their cost and clinical value, with some reimbursement rates set too high relative to value, while others are too low (reviewed by The Lewin Group, Inc., 2005; IOM, 2000). The IOM assessment on Medicare’s payment policy concluded (IOM, 2000):
Existing mechanisms for keeping payments up to date are inadequate . . . The process for integrating new technologies into the payment system, including determinations of coverage, assignment of billing codes, and development of appropriate prices, is slow, administratively inefficient, and closed to stakeholder participation…. Payments for some individual tests likely do not reflect the cost of providing services, and anticipated advances in laboratory technology will exacerbate the flaws in the current system. Problems with the outdated payment system could threaten beneficiary access to care and the use of enhanced testing methodologies in the future (pp. 7, 17).
These criticisms of Medicare’s payment policy are best understood in the historical context of how Medicare determines reimbursement rates for diagnostics. When Congress enacted a Medicare clinical laboratory fee schedule in 1984, it instituted rules that served to set these reimbursement rates below market value (SSA, 1984). For example, it set the reimbursement rates offered by each of its state-wide carriers for diagnostic tests to only 60 percent of the laboratory charge current at the time. It also specified that this limit be increased each year by the consumer price index (CPI), whose rate of growth is below the rate of medical inflation. Additional legislation created national limitation amounts (NLAs), which put a cap on payment for laboratory fees that is 74 percent of the median charges and froze pricing to 1997 levels until 2009 (ignoring CPI-indicated increases) as a means of balancing the budget (Raab and Logue, 2001; AdvaMed, 2006b).
Because congressional guidance was lacking on how to determine the reimbursement rate for new tests, Medicare developed its own administrative techniques for this, without public participation. One technique, called cross-walking, creates a reimbursement rate for new tests based on how clinically or technologically similar they are to older tests with established reimbursement rates. For example, its price determination for the new iron stain test for peripheral blood is equal to its price determination for the older iron stain test for bone marrow smears (Raab and Logue, 2001). The other technique is called gap filling, for which each state carrier uses its own rules to determine the appropriate price for new tests that cannot be cross-walked. These local carrier rates are used by Medicare to determine a NLA for the new test. Local carrier payment rates that are greater than the NLA are lowered to the NLA level. But those carrier rates that are less than the NLA are not raised to the higher NLA level (Raab and Logue, 2001).
Critics cite problems with both techniques used to determine new price determinations. Both cross-walking and gap filling are inherently subjective and dependent on the technical expertise of CMS staff, which often lacks the ability to adequately judge the similarity of a new test to a
test with pricing already established or to set a new fair price, some experts claim (Raab and Logue, 2001). This is compounded by another inherent problem, which is that cross-walking and gap filling are done internally, without consultation with outside experts or industry and without public commentary to correct any perceived arbitrariness or inaccuracy of final reimbursement rate determinations. Medicare and most other health plans lack test evaluation groups similar to the pharmacy and therapeutics committees they maintain to evaluate drugs (Ramsey et al., 2006). These committees are comprised of quasi-independent experts who often are not health plan employees (Ramsey et al., 2006).
Another problem inherent in Medicare’s reimbursement system for laboratory tests is a coding process for such tests that is not sufficiently specific. For payors to more adequately influence the adoption of biomarker tests, those tests need to have their own Current Procedural Terminology (CPT) codes. These identifying codes are used to report medical procedures and services to health insurers and reimbursement rates are specified for each code. CPT codes are also used for developing guidelines for medical care review. Many biomarker tests do not have specific CPT codes but instead are defined by process steps, so that insurers, even if they are willing to scrutinize the clinical utility of biomarkers, often find it difficult to know what type of biomarkers are being used (IOM, 2006). This process enables biomarkers to be incorporated into clinical practice without much scrutiny.
This is especially true for homebrew tests, which are always defined by process steps. To be reimbursed, laboratories breakdown a homebrew test into specific methods and analytes used, each with its own CPT code. A single test could entail 10 to 15 different existing codes, making it difficult for the payor to discern exactly what is being tested, and eliminating the risk of seeking a new CPT code and reimbursement rate for the test. Homebrew tests thereby bypass scrutiny by both regulators and reimbursers (IOM, 2006). Even when a test has been approved by the FDA, there is no guarantee that laboratories will use that test. Instead, they may offer their own homebrew version of the test, which may not be as accurate (IOM, 2006). Homebrew versions of the HerceptTest help explain the high degree of variability in accuracy between laboratories. For example, studies suggest that the false-positive rate for the HerceptTest is as much as 48 percent greater in small laboratories that use their own homebrew versions compared with large centralized reference laboratories that use the FDA-approved version of the test (Paik et al., 2002; Perez et al., 2006; Reddy et al., 2006).
In general, there is a lack of a standardized format for the information that insurers should consider when determining what diagnostic tests to code and reimburse and what the reimbursement rate will be for those tests. This is in contrast to the format developed by the Academy of Managed Care Pharmacy (AMCP) for the evidence-based evaluation of drugs (AMCP, 2005). This format specifies the types of information that insurers should request from industry about the drugs they manufacture when making policy determinations. This information includes the drug’s effectiveness and safety, its economic value relative to alternative treatments, and data on off-label indications. The AMCP format has been adopted by more than 50 health plans, hospitals, pharmacy benefit management programs, Medicaid programs, and other public agencies (Neumann, 2004). Because these guidelines are new, and because it is hard to define and measure their impact, data on the guidelines’ actual impact on patient outcomes is lacking (Neuman, 2004). However, AHRQ has provided some funding to evaluate the impact of the guidelines, focusing primarily on process issues (i.e., quality of submitted dossiers).3
A similar format for diagnostic evaluation would offer diagnostic companies an opportunity to participate in payor decision making, providing structure and transparency for the flow of information between these two entities regarding new laboratory tests. Researchers at the Fred Hutchinson Cancer Research Center and the University of Washington in Seattle created a template for manufacturers’ reporting clinical and economic information about laboratory tests that is based on the AMCP format (Ramsey et al., 2006). Other standards for evaluating diagnostic tests have been also published (Fryback and Thornbury, 1991; Reid et al., 1995). Test manufacturers or providers and health insurers could all benefit from standardizing the way evidence about new diagnostics is presented to payors (Ramsey et al., 2006). This evidence could be presented to insurers’ standing committees for evaluating diagnostics akin to the pharmacy and therapeutics committees, or the test evaluation process could become an additional responsibility of already established pharmacy and therapeutics committees with the addition of appropriate expertise.
The committee recommends that CMS modernize the process for evaluating, coding, and pricing diagnostic tests. Reimbursement policies should be clarified and the decision-making process should be made more uniform and transparent. CMS should convene stakeholders to develop consensus
See http://www.ahrq.gov/rice/ceproj.htm#Evaluation. Accessed November, 2006.
guidance on how to assess diagnostics to make coverage/reimbursement decisions. As previously recommended by an IOM report (IOM, 2000), Medicare ought to have “a single, national, rational fee schedule” for clinical laboratory tests that is based on the review of the tests by expert panels.
Similar reforms are called for in the Advanced Laboratory Diagnostics Act of 2006. This bill aims to improve the current process for determining reimbursement levels for new clinical diagnostics, correct historic payment determinations, and provide more transparency and opportunities for dialogue regarding Medicare reimbursement decisions. The Medicare Prescription Drug, Improvement, and Modernization Act, enacted in 2003, also calls for some yet-to-be implemented improvements in the coding and payment processes for new tests (AdvaMed, 2006a).
The committee also recommends that CMS use the power of its longitudinal data to assess the value of tests. Although CMS is prohibited from using clinical value as a criterion for reimbursement, assessing the clinical value of tests would aid clinical decision making.
SUMMARY AND CONCLUSIONS
Major impediments to achieving personalized medicine by implementing innovative biomarker-based cancer diagnostics in clinical settings are a lack of information about their clinical validity and utility, the inability of many diagnostic companies to expend the major resources necessary to provide this information, and inappropriate reimbursement for such diagnostics by health care payors because of an antiquated system for setting reimbursement rates.
To overcome these impediments, the committee recommends that insurers develop criteria for conditional coverage of new biomarker tests in certain circumstances, in order to allow controlled use of the tests while collecting additional information to inform final coverage decisions. The approach to conditional coverage should include the development of methods for high-quality population-based assessments of the efficacy and cost-effectiveness of biomarker tests. In addition, the committee recommends that the CMS coding and pricing system for diagnostic tests be modernized so that it more adequately fosters the appropriate reimbursement for and use of diagnostic tests.
AdvaMed. 2006a. AdvaMed Hails Bipartisan Legislation to Ensure Medicare Patient Access to New Advanced Diagnostic Laboratory Tests. [Online]. Available: http://www.advamed.org/publicdocs/PR-332.htm [accessed September 2006].
——. 2006b. Clinical Laboratory Diagnostic Tests Policy Milestones, 1984-2006. [Online]. Available: http://www.advamed.org/publicdocs/clinical_lab_test_1984-2006.pdf [accessed June 5, 2006].
AMCP (Academy of Managed Care Pharmacy). 2005. The AMCP Format for Forumlary Submissions.
Carlos RC, Underwood W 3rd, Fendrick AM, Bernstein SJ. 2005. Behavioral associations between prostate and colon cancer screening. Journal of the American College of Surgeons 200(2):216-223.
CDC (Centers for Disease Control and Prevention). 2006. Risk of Mortality from Prostate Cancer Among Men in a Randomized Trial. [Online]. Available: http://www.cdc.gov/cancer/prostate/screening/slides/slide20.htm [accessed July 10, 2006].
CMS (Centers for Medicare & Medicaid Services). 2005a. Draft guidance for the public, industry, and CMS staff. Factors CMS Considers in Making Determination of Coverage with Evidence Development.
——. 2005b. Fact Sheet: CMS responds to stakeholder feedback regarding coverage with evidence development.
——. 2006. Guidance for the Public, Industry, and CMS Staff, National Coverage Determinations with Data Collection as a Condition of Coverage: Coverage with Evidence Development. [Online]. Available: http://www.cms.hhs.gov/mcd/ncpc_view_document.asp?id=8 [accessed September 27, 2006].
Eddy D. 1990. Screening for cervical cancer. Annals of Internal Medicine 113(3):214-226.
Eifel P, Axelson JA, Costa J, Crowley J, Curran WJ Jr, Deshler A, Fulton S, Hendricks CB, Kemeny M, Kornblith AB, Louis TA, Markman M, Mayer R, Roter D. 2001. National Institutes of Health Consensus Development Conference Statement: Adjuvant therapy for breast cancer, November 1–3, 2000. Journal of the National Cancer Institute 93(13):979-989.
European Organisation for Research and Treatment of Cancer. July 7, 2005. Microarray In Node Negative Disease may Avoid ChemoTherapy (MINDACT).
Fan C, Oh DS, Wessels L, Weigelt B, Nuyten DS, Nobel AB, van’t Veer LJ, Perou CM. 2006. Concordance among gene-expression-based predictors for breast cancer. New England Journal of Medicine 355(6):560-569.
FDA (Food and Drug Administration). 2001. Agency information collection activities; submission for OMB review; comment request; medical devices; classification/ reclassification; restricted devices: Analyte specific reagents. Notice. Federal Register 66:1140-1141.
Feigal E. 2006. Partnerships to Accelerate Innovation. Presentation at the meeting of the National Cancer Policy Forum. Washington, DC.
Feinstein AR. 2002. Misguided efforts and future challenges for research on “diagnostic tests.” Journal of Epidemiology and Community Health 56(5):330-332.
Fisher B, Jeong JH, Bryant J, Anderson S, Dignam J, Fisher ER, Wolmark N. 2004. Treatment of lymph-node-negative, oestrogen-receptor-positive breast cancer: Long-term findings from National Surgical Adjuvant Breast and Bowel Project randomised clinical trials. Lancet 364(9437):858-868.
Frantz S. 2005. An array of problems. Nature Reviews Drug Discovery 4:362–363.
Fryback DG, Thornbury JR. 1991. The efficacy of diagnostic imaging. Medical Decision Making 11(2):88-94.
Goldhirsch A, Glick JH, Gelber RD, Coates AS, Senn HJ. 2001. Meeting highlights: International Consensus Panel on the Treatment of Primary Breast Cancer. Seventh International Conference on Adjuvant Therapy of Primary Breast Cancer. Journal of Clinical Oncology 19(18):3817-3827.
Habel LA, Shak S, Jacobs MK, Capra A, Alexander C, Pho M, Baker J, Walker M, Watson D, Hackett J, Blick NT, Greenberg D, Fehrenbacher L, Langholz B, Quesenberry CP. 2006. A population-based study of tumor gene expression and risk of breast cancer death among lymph node-negative patients. Breast Cancer Research 8(3):R25.
IOM (Institute of Medicine). 2000. Medicare Laboratory Payment Policy: Now and in the Future. Wolman DM, Kalfoglou AL, LeRoy L, eds. Washington, DC: National Academy Press.
——. 2001. Mammography and Beyond: Developing Technologies for the Early Detection of Breast Cancer. Nass SJ, Henderson IC, Lashof JC, eds. Washington, DC: National Academy Press.
——. 2005a. Economic Models of Colorectal Cancer Screening in Average-Risk Adults: Workshop Summary. Pignone M, Russell L, Wagner J, eds. Washington DC: The National Academies Press.
——. 2005b. Saving Women’s Lives. Joy JE, Penhoet EE, Petitti DB, eds. Washington, DC: The National Academies Press.
——. 2006. Developing Biomarker-Based Tools for Cancer Screening, Diagnosis, and Treatment: The State of the Science, Evaluation, Implementation, and Economics. A Workshop. Patlak M, Nass S, rapporteurs. Washington DC: The National Academies Press.
The Lewin Group, Inc. July 2005. The Value of Diagnostics Innovation, Adoption and Diffusion into Health Care. AdvaMed. [Online]. Available: http://www.advamed.org/publicdocs/thevalueofdiagnostics.pdf. [accessed July 2006]
Meltzer D. 2005. Effects of patient self-selection on cost-effectiveness: Implications for intensive therapy for diabetes. Society for Medical Decision Making.
——. 2006. Cost effectiveness analysis and the value of research. Presentation at the IOM workshop on Developing Biomarker-based Tools for Cancer Screening, Diagnosis, and Treatment: The State of the Science, Evaluation, Implementation, and Economics. Washington, DC.
NCI (National Cancer Institute). 2006. Personalized treatment trial for breast cancer launched. Washington DC.
Neumann PJ. 2004. Evidence-based and value-based formulary guidelines. Health Affairs 23(1):124-134.
NICE (U.K. National Institute for Clinical Excellence). 2006a. Final Appraisal Determination: Bevacizumab and Cetuximab for Metastatic Colorectal Cancer. [Online]. Available: http://pharmalive.com/news/download.cfm?articleid=366733&attachmentid=54419 [accessed August 2006].
Paik S, Bryant J, Tan-Chiu E, Romond E, Hiller W, Park K, Brown A, Yothers G, Anderson S, Smith R, Wickerham DL, Wolmark N. 2002. Real-world performance of HER2 testing—National Surgical Adjuvant Breast and Bowel Project experience. Journal of the National Cancer Institute 94(11):852-854.
Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, Watson D, Park T, Hiller W, Fisher ER, Wickerham DL, Bryant J, Wolmark N. 2004. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. New England Journal of Medicine 351(27):2817-2826.
Paik S, Tang G, Shak S, Kim C, Baker J, Kim W, Cronin M, Baehner FL, Watson D, Bryant J, Costantino JP, Geyer CE Jr, Wickerham DL, Wolmark N. 2006. Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor-positive breast cancer. Journal of Clinical Oncology 24(23):3726-3734.
Perez EA, Suman VJ, Davidson NE, Martino S, Kaufman PA, Lingle WL, Flynn PJ, Ingle JN, Visscher D, Jenkins RB. 2006. HER2 testing by local, central, and reference laboratories in specimens from the North Central Cancer Treatment Group N9831 intergroup adjuvant trial. Journal of Clinical Oncology 24(19):3032-3038.
Raab GG, Logue LJ. 2001. Medicare coverage of new clinical diagnostic laboratory tests: The need for coding and payment reforms. Clinical Leadership & Management Review 15(6):376-387.
Ramsey S, Veenstra D, Garrison L, Carlson R, Billings P, Carlson J, Sullivan S. 2006. Toward evidence-based assessment for coverage and reimbursement of laboratory-based diagnostic and genetic tests. The American Journal of Managed Care 12(4):21-27.
Reddy JC, Reimann JD, Anderson SM, Klein PM. 2006. Concordance between central and local laboratory HER2 testing from a community-based clinical study. Clinical Breast Cancer 7(2):153-157.
Reid MC, Lachs MS, Feinstein AR. 1995. Use of methodological standards in diagnostic test research. Getting better but still not good. Journal of the American Medical Association 274(8):645-651.
Reinhardt UE. 2004. An information infrastructure for the pharmaceutical market. Health Affairs 23(1):107-112.
Research Triangle Institute. 2002. Guide to Clinical Preventive Services, Evidence Syntheses. 3rd edition. AHRQ (Agency for Health care Research and Quality) (16):3-8.
Rogowski W. 2007. Current impact of gene technology on health care: A map of economic assessments. Health Policy 80(2):340-357.
Spear BB, Heath-Chiozzi M, Huff J. 2001. Clinical application of pharmacogenetics. Trends in Molecular Medicine 7:201-204.
SSA (Social Security Administration). 1984. Deficit Reduction Act of 1984: Provisions related to the Medicare and Medicaid programs. Social Security Bulletin 47(11):11-25.
Stevens, A. 2006. Cost effectiveness analysis and technology adoption in the United Kingdom. Presentation at the IOM workshop on Developing Biomarker-based Tools for Cancer Screening, Diagnosis, and Treatment: The State of the Science, Evaluation, Implementation, and Economics. Washington, DC.
Swan J, Breen N, Coates RJ, Rimer BK, Lee NC. 2003. Progress in cancer screening practices in the United States: Results from the 2000 National Health Interview Survey. Cancer 97(6):1528-1540.
Takahashi H, Echizen H. 2003. Pharmacogenetics of CYP2C9 and interindividual variability in anticoagulant response to warfarin. The Pharmacogenomics Journal 3(4):202-214.
UK Department of Health. 2002. Drug Treatment for Multiple Sclerosis (Beta-Interferons and Glatiramer Acetate)—Risk Sharing Scheme. [Online]. Available: http://www.dh.gov.uk/PolicyAndGuidance/OrganisationPolicy/PrimaryCare/PrimaryCareTrusts/PrimaryCareTrustsArticle/fs/en?CONTENT_ID=4000556&chk=nvlrjJ [accessed September 2006].
van de Vijver MJ, He YD, van’t Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, van der Velde T, Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R. 2002. A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine 347(25):1999-2009.
Weinstein S, Obuchowski NA, Lieber ML. 2005. Clinical evaluation of diagnostic tests. American Journal of Roentgenology 184(1):14-19.