Important Points Emphasized by Individual Speakers
- The increasing number of molecular diagnostic tests calls for the integration of clinical practice guidelines into routine oncology practice.
- Incentives that focus on outcomes, rather than reimbursement, for diagnostics will enable providers to improve patient responses and reduce the high costs associated with cancer care.
- A provisional period of several years during which payers cover part of the costs of a molecular diagnostic’s use would allow additional evidence to be gathered, after which the test could be accepted only if it produces substantial improvements in health outcomes.
- Situations may arise when a randomized controlled clinical trial is not needed for the approval, use, and reimbursement of a biomarker, but these situations must be chosen with great care.
- The term “clinical utility” does not mean much to most patients; a more relevant concept is “personal utility” or “personal guidance.”
Multiple stakeholders are involved in the generation, analysis, and use of evidence for clinical utility of molecular diagnostics in oncology. Representatives of five stakeholder groups—guideline developers, health care providers, payers, academic health systems, and patients—offered perspectives on the challenges that need to be overcome to assess the value of molecular diagnostics.
Clinical guidelines are not prescriptions for care, said Al Benson, professor of medicine and associate director for clinical investigations at the Robert H. Lurie Comprehensive Cancer Center, Northwestern University. They are tools to inform decision making between the individual patient and clinician. In oncology, in particular, physicians are confronted with a growing list of biomarkers and treatment options across multiple diseases. As the number of biomarkers grows and understanding evolves, the integration of guidelines into routine oncology practice will be increasingly important.
Benson discussed three concepts related to the evaluation of medical technology, drawing on the analysis of Archie Cochrane (1972):
- Efficacy is the extent to which an intervention does more good than harm under ideal circumstances—that is, in circumstances designed to maximize the effect of the intervention and eliminate confounding factors. Considerations of efficacy address the question, will it work?
- Effectiveness is the extent to which an intervention does more good than harm when provided to real-world patients by physicians practicing in ordinary clinical settings. The relevant question here is, does it work in practice?
- Efficiency measures the effect of an intervention in relation to the resources it consumes. In other words, is it worth it?
Benson also distinguished two principal types of evidence-based guidelines. The first category consists of integrated interventions over time, sometimes called a continuum-of-care approach. This category, which Benson covered in his presentation, includes the many hundreds of decision points reached in guiding treatment decisions. The second category consists of systematic reviews of single issues, which are described later in this chapter. The two are not mutually exclusive. Rather, the continuum-of-care guidelines help identify important areas for both systematic reviews and additional clinical research. Likewise, the results of systematic reviews of single issues can be integrated into continuum-of-care guidelines.
For a biomarker to progress to a clear clinical test, it should have significant and independent value and be validated by clinical testing, Benson said. Its use also needs to be feasible, reproducible, and widely available with quality control. Finally, use of the biomarker should benefit the patient. “Too often,” said Benson, “tests are ordered without clear benefit or understanding of how the tests will be used to inform a decision-making process.”
For an assay to have clinical utility, it must improve clinical decision making and patient outcomes. Measures of effectiveness include the probability of achieving a cure, the impact on survival, the impact on disease control, the impact on improving performance status, and the impact on disease-related symptom control. These outcomes often depend on the clinical situation, the availability of effective therapies, and the magnitude of the clinical benefit (or lack thereof) in one group versus another. They also depend on the relative values that patients, caregivers, and society place on the differences in the benefits and risks, including the benefits and risks that occur during continued surveillance of patients over time. These perceptions of benefits and risks can vary greatly and are often marked by a lack of understanding on the part of both patients and clinicians, Benson said.
Modern marker-generated clinical trial designs seek to answer complex questions regarding utility. For example, does the presence of a marker imply one kind of treatment while its absence implies another, which would provide potential clinical utility for that marker? Or does marker status make no difference on the effects of a given treatment, in which case the marker may not have clinical utility in that setting? A common situation, said Benson, is that a marker is prognostic and identifies risk but does not have a predictive correlation that an intervention will benefit the patient, in which case it has limited clinical utility.
Challenges in demonstrating clinical utility are already evident in oncology, noted Benson. As common cancers are broken down into smaller subsets, trials with smaller numbers of patients will become more common and evidence is likely to be limited. Decisions regarding evidentiary standards will become paramount in situations where large randomized controlled trials (RCTs) are no longer realistic. When dealing with a limited number of patients, the trials’ researchers will have to rely on national databases to gain access to larger numbers of patients. An increase in patient numbers may result in the development of stronger evidence for an intervention, but will also lead to increased expenses for screening eligible populations. An increased reliance on tumor banks of appropriate tissues, whether metastatic or primary, will become greater as well, perhaps with serial biopsies and the use of multiple markers over time to deal with tumor heterogeneity.
|• Medical oncology||• Interventional radiology|
|• Surgery/surgical oncology||• Nursing|
|• Radiation oncology||• Cancer genetics|
|• Hematology/hematology oncology||• Psychiatry, psychology|
|• Bone marrow transplantation||• Pulmonary medicine|
|• Urology||• Pharmacology/pharmacy|
|• Neurology/neuro-oncology||• Infectious diseases|
|• Gynecologic oncology||• Allergy/immunology|
|• Otolaryngology||• Anesthesiology|
|• Orthopedics/orthopedic oncology||• Cardiology|
|• Pathology||• Geriatric medicine|
|• Dermatology||• Epidemiology|
|• Internal medicine||• Patient advocacy|
|• Gastroenterology||• Palliative, pain management|
|• Endocrinology||• Pastoral care|
|• Diagnostic radiology||• Oncology social work|
SOURCE: Al Benson, workshop presentation, May 24, 2012. Derived from National Comprehensive Cancer Network (http://www.nccn.org/clinical.asp; accessed August 11, 2012).
Guideline Development in the National Comprehensive Cancer Network
The National Comprehensive Cancer Network (NCCN) seeks evidencebased consensus to allow for the development of comprehensive guidelines for treatment from prevention and screening to survivorship and hospice care, Benson stated. In many areas of treatment, high-level evidence exists, but in other areas gaps in evidence must be filled by expert consensus. Achieving this consensus requires the use of multidisciplinary panels representing a broad range of specialties (see Table 3-1). In examining the use of biomarkers, these panels evaluate the data demonstrating that the biomarker affects treatment decisions, the evidence that the biomarker can divide patients into clinically relevant subgroups, and the availability of reliable testing. They determine the levels of evidence using the results of tumor marker studies, taking into consideration whether the studies were prospective or retrospective, whether the studies used archived samples or were observational, and whether validation studies were available. The NCCN then classifies the test into one of three categories on the basis of the levels of evidence and consensus determination:
- Category 1: On the basis of high-level evidence, there is uniform NCCN consensus that the intervention is appropriate.
- Category 2A: On the basis of lower-level evidence, there is uniform NCCN consensus that the intervention is appropriate.
- Category 2B: On the basis of lower-level evidence, there is NCCN consensus that the intervention is appropriate.
- Category 3: On the basis of any level of evidence, there is a major NCCN disagreement that the intervention is appropriate.
“Unfortunately, in oncology the minority of decisions are based on a category 1 level of evidence,” said Benson. “The majority of the guidelines represent category 2A, which is based on a lower level of evidence, but uniform consensus.”
Another NCCN tool evaluates evidence derived from the use of archived tissues to determine the clinical validity of tumor markers. The tool considers such factors as the clinical trial design; the patients studied; specimen collection, processing, and archival; statistical design and analysis; and validation.
Using these and other tools, the NCCN can integrate markers into guidelines. For example, with colon cancer, KRAS or BRAF mutation testing has been integrated and linked with pathologic reviews so there is guidance about the testing and the methodology that is most appropriate. Similarly, with metastatic melanoma, recent targeted-therapy treatments have reached the category 1 level of evidence, with specific recommendation for use based on BRAF mutation testing.
The NCCN is also working on a biomarkers compendium. This collection is intended to ensure access to appropriate testing as recommended by NCCN guidelines. It seeks to identify the utility of a biomarker to screen, diagnose, monitor, or provide predictive or prognostic information. It is also meant to discriminate between clinically useful biomarkers and those that are not clinically indicated. More than 800 biomarkers are currently included in NCCN guidelines, and all will be integrated into the compendium, said Benson. The reference document will include the indication, molecular abnormality, test purpose, methodology, NCCN level of evidence, specimen types, and NCCN recommendation.
Today, people are making decisions on the basis of incomplete datasets, Benson said.1 Many medical devices, not just molecular tests, enter the market with insufficient information. Still, it is an enormous challenge to test these devices adequately. Clinical trials become complicated as populations are continuously subdivided, adding such expenses as screening for marker positive and negative individuals and evaluating markers over time to judge whether tumor biology is changing. “That comes at enormous cost. Who is
1 Although not discussed in this workshop, prior Roundtable workshops (IOM, 2011a, 2012a) have examined the significant challenges facing guideline development, including the inherent tension that exists between the need for greater certainty regarding benefits and risks and providing early access to innovative technologies.
going to pay for that?” asked Benson. No one group can cover this work. A major commitment of patients, insurers, government, public and private institutions, and clinicians will be needed to foster partnerships aimed at innovation and technology development, Benson concluded.
Development of Guidelines by the American Society of Clinical Oncology
Like Benson, Gary Lyman, professor of medicine and director of comparative effectiveness and outcomes research–oncology at the Duke University School of Medicine and the Duke Cancer Institute, observed that clinical practice guideline recommendations face particular challenges with molecular diagnostics. Many tests are already in existence; new tests and data are emerging rapidly; and analytic validity, clinical validity, and clinical utility all have to be established.
The American Society of Clinical Oncology (ASCO) has the goal of producing valid, reliable, and useful clinical practice guideline recommendations, Lyman said. In deciding whether to take on a guideline topic, ASCO asks several questions:
- Is the burden or the importance of the condition or intervention large enough to warrant guideline development?
- Is there uncertainty or controversy about the effectiveness or safety of available clinical strategies for the condition?
- Is there sufficient variation in practice in the management of a given condition or use of an intervention?
- Is there sufficient scientific evidence of good quality to allow guideline development?
- Is there potential for an impact on clinical decision making, clinical outcomes, or practice variation?
Once guideline development is initiated, ASCO bases its recommendations on exhaustive, systematic reviews overseen by a steering committee. Using well-defined inclusion and exclusion criteria for studies, it conducts quality appraisals and undertakes a formal data-abstraction process. It then places all the data before a guideline panel of content and methodology experts, patient representatives, and sometimes members of industry to generate its guidance. Draft guidance undergoes multiple internal and external reviews prior to being finalized. The recommendations are then disseminated through publication in the society’s Journal of Clinical Oncology and the Journal of Oncology Practice and by various other means. Recommendations do not go out for public review before publication.
This process meets most of the recommendations for the development of clinical practice guidelines established by the Institute of Medicine (IOM)
(2011b,c), said Lyman. The process is transparent, conflicts of interest are disclosed and managed, expert panels are multidisciplinary, reviews are rigorous and systematic, the format for recommendations is standardized and clear, and external review takes place. The one major area where the process falls short of the IOM’s recommendations is in the development of a formal rating of the strength of the evidence and the strength of the recommendation, which is “a controversial area within the field,” according to Lyman.
Biomarkers pose particular challenges to the guideline development process, said Lyman. Biomarkers are complex, as are the data describing them. The types of prognostic and predictive biomarker studies that have been done vary widely, and most biomarker studies are retrospective rather than prospective. Demonstrating clinical validity or clinical utility becomes difficult. All these factors create major obstacles for developing and updating evidence-based guidelines for biomarkers.
To date, ASCO’s recommendations around biomarkers have been limited, Lyman noted. The focus has tended to be on the analytic validity of a number of tests that are currently used in practice, such as HER2 testing and immunohistochemical testing of estrogen/progesterone receptors in breast cancer. Other tumor biomarkers have been discussed by ASCO, but not recommended because panels have concluded that their clinical validity or utility was insufficient, said Lyman.
A major challenge, said Lyman, is to learn within the evidence-based structure established by the IOM and ASCO to appraise and update oncology biomarkers to enhance their trustworthiness and impact on clinical practice. As an example of how impact on practice can be measured, Lyman briefly described the Quality Oncology Practice Initiative (QOPI) (Neuss et al., 2005). QOPI is a program offered by ASCO to its members for assisting with the evaluation of the quality of care that hematology-oncology practices provide their patients. By sharing limited data about more than 150 quality measures, QOPI can identify gaps in care and the resources needed to improve practices. The ASCO panel that develops the guideline recommendations defines the quality measures which are then put into the QOPI library. Ultimately, many of these measures are incorporated into the QOPI measurement process, enabling practices at particular sites to be benchmarked against similar practices. Though currently voluntary, “there may come a time when this type of process, or something like it, will be fairly mandatory for oncology practices,” said Lyman.
ASCO is also developing a decision-support system to provide realtime, point-of-care data and understanding that can be used in clinical
decision making. Known as Cancer Linq, the system was being piloted at the time of the workshop. ASCO is also looking at quality measures, rapid systematic reviews for guidelines panels, and a point-of-care guide on regimen benefits, toxicities, and costs. The goal, said Lyman, is “to bring the membership and practicing oncologists real-time data—updated, current, yet properly validated and assessed by an expert group—as they care for their patients.”
Additional barriers involve the lack of awareness, slow dissemination of new recommendations into clinical practice, inadequate access to the guidelines, reluctance to accept guidelines, and lack of accountability. “It’s a work in progress as far as biomarkers are concerned,” Lyman concluded. However, Lyman added that “we cannot afford not to demonstrate clinical utility.” The biggest challenge will be setting the bar where there is agreement about sufficient demonstration of clinical utility, he said. Well-defined outcome measures need to be accepted, and then the magnitude of the impact on those outcomes needs to be set to justify the adoption of a test into guidelines or regulatory approval.
In 2005, CMS stated: “Clinicians armed with appropriate assessments and the best evidence-based practice guidelines can reduce some of the unpleasant and frequent side-effects that often accompany cancer and chemotherapy treatment, obtain the best possible clinical outcomes, and avoid unnecessary costs” (CMS, 2005). It is an optimistic vision, said Lyman, and it is a goal that everyone working in the field shares.
As Lloyd Everson, vice chairman and founder of The U.S. Oncology Network, said, all of the great technological advances currently under way create a more promising situation for patients now than in the past. But providers are struggling with what he termed a “wild west” environment. Which tests will make a difference in the clinical care of patients? How will the use of tests influence costs? When should tests be moved into practice? Moreover, as cancers are divided into ever smaller subsets, how can evidence be developed to make such decisions? The process of validating outcomes according to marker status is in an “embryonic stage,” he said. And with venture capital fleeing the field, the development of molecular tests is likely to be hindered.
The situation is even more dire with cancer treatments, which have costs rising much faster than health care costs in general (see Figure 3-1). The tension between the ongoing technological explosion and constrained resources will not go away, said Everson. To address this tension, the provider community needs incentives that focus on outcomes, not on per-unit reimbursements.
FIGURE 3-1 The costs of chemotherapy are rising faster than the costs of cancer medicine and health care in general.
NOTE: U.S. GDP, U.S. gross domestic product.
SOURCE: Lloyd Everson, workshop presentation, May 24, 2012.
Like other stakeholders in the system, The U.S. Oncology Network has embraced the evidence-based approach to treatment. The network encompasses more than 1,000 affiliated physicians and almost 2,000 affiliated nurses at more than 350 sites of care. According to Everson, about 30 percent of all cancer cases in the United States come through an affiliated practice or treatment facility.
The U.S. Oncology Network’s pathways approach is similar to the approaches taken by the NCCN and ASCO. It develops evidence-based treatment guidelines that provide a precise, clinically proven approach to cancer care. In particular, level 1 pathways support physicians in making treatment decisions to provide a consistent platform for delivering, documenting, and reporting high-quality, evidence-based care. The goals of the evidence reviews during the guidelines development process are to permit flexibility of choice, find the balance point that maximizes patient benefits but maintains accountability for health care expenditures, ensures the ability of patients to participate in clinical trials, integrates cancer care with physicians’ workloads, and remains current. Flexibility is particularly important, Everson emphasized. Because their cancers are uncommon or
because of comorbidities, 20 to 30 percent of patients do not fit into the pathways that have been established. As a result, physicians often have to be flexible in interpreting pathways in a clinical context.
Analyses have shown that the pathways can save money without adversely affecting outcomes. According to Everson, level 1 pathways can reduce variation in patient care, improve the predictability of costs for health plans, offer up-to-date clinical tools for documentation and reporting, prepare oncologists to succeed in pay-for-performance relationships, and demonstrate fiscal responsibility to patients and payers. In particular, the patient perspective is critical because this level is where change has to happen.
The approach developed by The U.S. Oncology Network has met with resistance in the past, said Everson, but it is now being embraced by physicians. They see the benefits it brings to their patients in terms of better outcomes and less toxicity. “The evidence-based approach is something that can work,” concluded Everson. “It all depends on whether or not you can demonstrate in these smaller and smaller subsets of patients clinical utility. We have an enormous challenge, but if we can’t do it, I don’t know who is going to do it.”
If health care costs continue to rise at the current rate, today’s preschoolers will immediately have to earn the average U.S. salary when they graduate from high school just to pay their health care premiums, observed Lee Newcomer, senior vice president, oncology, at United HealthCare Corporation. At the same time, the mapping of the human genome has created phenomenal potential to better understand disease biology, target medicines to specific diseases, improve health, and advance the field of medicine dramatically. “The potential for the next decade is huge,” Newcomer said. The problem, he added, is that “if you can’t afford it, what difference does it make?”
Few people have come to terms with the unsustainable trajectory of rising health care costs. But unless new understandings from biomedical research dramatically improve the outcomes of care or markedly lower the cost, biomedical research will have relatively little impact, said Newcomer. The challenge, therefore, is learning how to pick from the genome the things that will make a difference.
A “Blue Sky” Proposal
Newcomer made what he called a “blue sky” proposal at the workshop wherein a new diagnostic or drug would need to lower the cost of care by
10 percent or improve an outcome by 10 percent to demonstrate its value. That bar is high, Newcomer noted, because most advances produce an improvement on the order of only 1 percent or so. To help facilitate the development of evidence to meet this goal, he proposed that a laboratory that has developed an analytically and clinically valid test could have the test covered by all payers for a 3-year period at a price that would cover some of the costs of using and continuing to develop the test. If the test achieves the 10 percent hurdle by the end of that 3-year period, it will be accepted. If it does not, it will not be accepted.
Newcomer also emphasized that the manufacturer would still need to provide and analyze the necessary data. The payers should not be trying to determine whether a test is useful. Payers could work with physicians, for example, to identify patients who have had particular responses, and the analysis could be conducted by a neutral third party, with protections for privacy. The manufacturer would work with that group to direct the study and bring out an unidentified or de-identified result.
This type of system would represent a major departure from current procedures. It would require that payers collaborate to offer provisional coverage, which would probably require an antitrust exemption. In the past, such collaboration has not been allowed, “but this may be a new world,” said Newcomer. Also, the customers of the payers, most of which are self-funded businesses, would need to agree to such a system, because they would be the ultimate funders of such an approach. Finally, current health care legislation limits payers to using 15 percent of premium revenues for administrative costs,2 and if the provisional funding were considered an administrative cost rather than a medical cost, it probably would not be a viable option.
These obstacles are substantial, said Newcomer, but they are all surmountable. And such a program would manage budgetary constraints while allowing biomedical advances to proceed. “We need to collaborate. We need to think about new models. I also think it is entirely possible,” he said.
One of the reasons Newcomer made his proposal, he noted, is that “it’s going to happen no matter what.” Payers, providers, and patients are going to have to find the advances of highest value if health care is to continue to progress. “The more we can begin to find those things of highest value,” he said, “the better off we will all be.”
Newcomer concluded by stating that he is trying to bend the cost curve. “There are an awful lot of coded technologies whose value is quite uncertain, yet we pay for them,” he said.
2 45 CFR Part 158.
The Evaluation of Evidence
Molecular diagnostics have raised particularly difficult challenges for Palmetto GBA, which administers Medicare health insurance for CMS, said Elaine Jeter, Palmetto’s medical director. The coding system for diagnostics has been inadequate, and no process has been put in place to evaluate evidence. CMS requires that tests be “reasonable and necessary,” which the agency defines as both being safe and effective and as demonstrating improvement in health outcomes. But assessing whether these standards have been met is difficult, and reimbursement issues are complex and contentious.
Jeter focused on the evaluation of evidence, which has been hindered by the inadequate coding system. The current procedural terminology (CPT) codes developed by the American Medical Association are insufficient, she said, and their descriptions are inadequate. Furthermore, there is no system to predetermine which tests qualify for payment. According to Medicare, a demonstration of clinical utility requires that a test or intervention improve patient outcomes by such measures as better functional status, improved quality of life, reduced disability, or changes in the physician’s management of a patient. But published evidence for clinical utility or evidence-based decisions is lacking.
Few molecular assays are going through the FDA regulatory process, and thus many have not been evaluated for analytic or clinical validity. Most are LDTs for which no hard look has been taken at the science, Jeter said. Furthermore, many assays are so new that they have not undergone the reviews conducted by professional organizations such as ASCO or the NCCN. “I’m usually seeing [an assay] 3 years before it comes to any of the professional societies. I’m the one who is having to make a determination: Are we going to cover it or not?”
Deciding on Claims
To determine which claims to cover, Palmetto has created a system known as MolDx Solution, which it is implementing in CMS’s J1 jurisdiction of California, Hawaii, and Nevada.3 The system assigns a unique identifier known as either a Z code or a PTI (for Palmetto Test Identifier) code to a test. This code needs to be submitted on the claim in the comment narrative field; without this unique identifier, claims will be rejected. This is not a replacement for the CPT codes but a more specific way of identifying tests, Jeter said.
3 More information about the MolDx system can be accessed at: http://www.palmettogba.com/palmetto/MolDX.nsf/DocsCatHome/MolDx (accessed August 11, 2012).
Once a code has been assigned, Palmetto requires a technical assessment by teams of experts in the subject matter; exceptions are made in cases where enough evidence already exists to make such an assessment unnecessary. Palmetto has identified approximately 50 of the 2,316 submitted applications that require technical assessments. These have either been completed already or are in the process of being done, noted Jeter. The Palmetto website has a short summary of the assessments. Once the assessment is completed, Palmetto makes a decision about coverage—to cover a test, cover it under certain situations, or not cover it at all. If an assay does not have evidence of clinical utility, the developer is notified in writing that the test will not be covered.
Some laboratories have resisted this system, saying that they have not had enough time to implement changes in their computer systems. As a result, implementation of the system was delayed until June 2012. The data in the system will be open to clinicians, patients, and the public, said Jeter, and coverage decisions will be published on the website.
Robert Bast, vice president for translational research and Harry Carothers Wiess Distinguished University Chair for Cancer Research at the University of Texas MD Anderson Cancer Center, took a different approach to the analysis of cancer biomarkers. He proceeded from insights into a particular disease to more general observations about where randomized controlled trials of clinical utility are needed.
About 22,280 new cases of epithelial ovarian cancer4 occur annually in the United States, with about 15,500 deaths despite advances in surgery and chemotherapy. It is often diagnosed late, after it has spread throughout the abdominal cavity, often first seen as a pelvic mass that requires surgical removal. Surgery for ovarian cancer is complex and requires specific training.
Decades of experience indicate that even when ovarian cancer cannot be removed, prognosis is improved when residual metastases are decreased to less than 1 centimeter. Whether surgical expertise or biology is the most important factor in this observation is unknown, Bast said, and a pro-
4 As of January 1, 2010, a reported 186,138 individuals were living with an ovarian cancer diagnosis (Howlader et al., 2013). The incidence rate for the disease is 12.5 per 100,000 women (ACS, 2012). Screening tests have limited accuracy for early detection and pelvic examination can generally only detect advanced ovarian cancer. Women at high risk may be referred for pelvic exam, transvaginal ultrasound, and testing for the tumor marker CA125. Treatment routinely includes surgery followed by chemotherapy. Bevacizumab and cediranib are currently being evaluated in clinical trials as targeted therapeutics for ovarian cancer treatment (ACS, 2012).
spective randomized trial of previously untreated patients is not feasible. Nevertheless, a retrospective meta-analysis of more than 50 nonrandomized studies involving almost 7,000 patients indicated that optimal versus nonoptimal cytoreduction is associated with 11 months of improved survival, which represents a 50 percent improvement (Bristow et al., 2002). For each 10 percent increase in cytoreduction, a 5.5 percent increase in survival results on average. Thus, Bast said, referral to gynecologic oncologists who are specifically trained in this kind of surgery improves outcomes for ovarian cancer patients.
Currently, however, only 30 to 50 percent of American women with ovarian cancer are referred to gynecologic oncologists. Those who are not referred tend to be poor, rural, and elderly, said Bast. The decision to refer is generally made not by oncologists but by general gynecologists, family practitioners, and internists (Goff et al., 2011).
Biomarkers of Malignancy
More than 200,000 exploratory operations for pelvic mass take place each year in the United States, and 13 to 22 percent of those lead to the diagnosis of cancer, said Bast. Biomarkers can help distinguish malignant from benign pelvic masses. A risk-of-malignancy index for ovarian cancer was developed in 1990 that incorporates the biomarker CA125, ultrasound, and menopausal status, providing a sensitivity of 71 to 88 percent and specificity of 74 to 97 percent (Jacobs et al., 1990). A more recently developed biomarker panel improves on CA125 and does not depend on ultrasound, which is observer specific, with better than 90 percent sensitivity and 75 percent specificity (Moore et al., 2010). A follow-up trial of this panel found that it had 100 percent sensitivity in premenopausal patients and had a negative predictive value of 99 percent (Moore et al., 2011a). This finding has prompted the referral within the last year of using the newer panel to triage patients, said Bast.
A second assay, developed by Vermillion, examines a panel of five biomarkers. It had better than 90 percent sensitivity, but specificity was 42 percent (Ueland et al., 2011). Although the difference in specificity should not affect patient outcomes—because a gynecologic oncologist can perform surgery on benign as well as malignant tumors—it could affect the distribution of medical resources. Neither of these is a screening test, Bast said, and should be used only for patients who are undergoing exploratory surgery. The real challenge, said Bast, is to encourage the use of either test.
Biomarkers can personalize the care of patients with epithelial ovarian cancer. When ovarian cancer is limited to the ovaries and has not metastasized, up to 90 percent of patients can be cured with the currently available chemotherapy and surgery, Bast said. Disease that has spread from
the pelvis is curable in less than 20 percent of patients. Currently, only a quarter of women with ovarian cancer are diagnosed in stage I. Detection of preclinical disease at an earlier stage could improve survival from 10 percent to 30 percent, predicted Bast.
Screening has stringent epidemiological requirements, however. The prevalence of the disease is 1 in 2,500 in the postmenopausal population, which is at greatest risk, requiring high sensitivity to detect early stage disease or, ideally, asymptomatic preclinical disease. But extraordinarily high specificity is also needed to avoid false positives—on the order of 99.6 percent to achieve a positive predictive value of 10 percent. In this context, these figures would imply 10 operations for each case of ovarian cancer detected.
Used alone, neither CA125 nor transvaginal ultrasound has adequate specificity. Ovarian cancer, however, is associated with rising CA125, whereas benign disease is not. Very high specificity and sensitivity can be attained when rising CA125 is used to trigger use of ultrasound in a two-stage strategy. A risk-of-ovarian-cancer algorithm was developed as a screening mechanism and uses each woman’s own CA125 baseline to determine whether a significant increase has occurred (Skates, 2012). A randomized trial in the United Kingdom of 200,000 women that will conclude in 2015 has reported that 48 percent of cancers found by using this algorithm for screening were in stages I or II, doubling the detection of early stage disease, and up to 89 percent of all cancers were detected (Menon et al., 2009). Only about three operations were required per case of ovarian cancer when CA125 was followed by ultrasound, compared to 36 operations per case with an annual ultrasound alone. This study is consistent with earlier data indicating that ovarian cancers appear to develop 2 years before they are detected by conventional means, suggesting that annual screening might be effective (Menon et al., 2009).
With Karen Lu at MD Anderson and in collaboration with seven different sites, Bast has participated in a smaller trial evaluating the use of CA125 and transvaginal ultrasound in postmenopausal women at average risk for developing ovarian cancer. The study is powered to test the specificity and positive predictive value of the screen and to explore the feasibility of using this methodology to screen in the United States. During the past 10 years, it has obtained 15,000 samples from more than 4,000 postmenopausal women. Less than 1 percent have gone on to ultrasound over a single year, and less than 3 percent over multiple years. The risk-of-ovarian-cancer algorithm screen has prompted 10 operations to detect 6 cases of ovarian cancer—“a very small number,” said Bast, “but encouraging.” Two were borderline cases, and four were invasive, and all were in stages I or II. With a positive predictive value of 60 percent for all cancers and 40 percent for
invasive cancers, no more than three operations would be required to detect each case of ovarian cancer using this strategy.
One question raised by this research is whether multinational trials are acceptable forms of evidence for national decisions. “We’re living in a global medical environment,” said Bast, “but I’m not sure that there is complete comfort with that.” In this case, a trial in the United States has shown the feasibility of the approach, but Bast questioned whether this was going to be considered adequate evidence.
Bast also raised the issue of LDTs, several of which have been applied previously to the detection of ovarian cancer. He concluded that FDA guidance needs to be applied to such tests. In particular, where significant risk is involved, the relevant question is whether LDTs should be held to the same standard as tests submitted for FDA approval.
Finally, Bast discussed current therapies for ovarian cancer. Most patients with ovarian cancer are treated routinely with both carboplatin and paclitaxel. However, only 70 percent of patients respond to platinumbased therapy, and 42 percent respond to paclitaxel as a single agent, and no synergy exists between the two (Muggia et al., 2000). Therefore, more than half of patients are treated with a drug that produces significant neurotoxicity without any obvious benefit. Biomarkers with high negative predictive value are clearly needed, Bast said.
Predicting responses to targeted therapy is an important issue for ovarian cancer, as it is for other diseases. Several potential candidate biomarkers exist, but given the potential toxicity and cost of treatment, a test with high negative predictive value would be very useful, Bast noted.
These treatment issues raise the more general question of whether accurate prediction of failure to respond to a toxic drug is adequate evidence of clinical utility, or are prospective, randomized controlled trials required to validate biomarkers or panels of biomarkers? Is a 90 percent negative predictive value an adequate benchmark? And is there a place for a test with positive predictive value? Would such a test simply have to demonstrate statistical significance, or would clinical utility be necessary? While some convergence has occurred around stakeholder evidentiary requirements (IOM, 2011a, 2012a), general agreement has not been reached on the evidence needed to garner approval, use, and reimbursement for tests.
In general, Bast concluded, situations may arise where prospective RCTs for utility are not needed to approve a biomarker for widespread use and reimbursement. But such situations need to be chosen carefully, he said, and early detection is not one of those situations.
The field of molecular diagnostics has had some successes but has also generated many issues for patients, said Deborah Collyar, president of Patient Advocates in Research. Our research environment is not set up for the collaborative efforts that are critical to moving forward in the molecular age. False positives and false negatives can have serious effects on individuals. High costs are also a substantial problem and limit patient support for cancer research, she noted. Collyar said, “I regularly get the question, ‘Why should we support cancer research if they are just going to produce drugs that nobody can afford?’” Finally, little information is available about the clinical utility of molecular tests even after a decade or more of use (Sparano and Solin, 2010).
Patients and people in general want true prevention of disease, but not at all costs, Collyar said. They want to reduce their risk of cancer or recurrence of cancer, but they also want to maintain their lifestyle as much as possible. They want cures as the word is customarily used and resist discussions of improved 5-year survival rates. They want a safe system, and those at high risk of developing a disease want to lower that risk for themselves, she said.
The word “diagnostics” has a variety of connotations for people, including hope; fear; anticipation of costs; vulnerability; and potential loss of self, family, culture, community, or privacy, Collyar said. The term that most people associate with that word is “risk,” however, though the medical and common individual definitions are quite different. Individuals put more emphasis on their absolute risk as opposed to the relative risk in a population. For example, tamoxifen may produce a 50 percent relative reduction in risk (Vogel et al., 2006), but the decline in absolute risk is only from 2.6 to 1.3 percent (Howlader et al., 2012) for a drug with serious side effects, which is one reason why women have not rushed to their doctors to secure the drug, said Collyar.
People’s lives can change dramatically after they are given the results of a diagnostic test, said Collyar. Yet, discussions of risk are complicated by a general lack of knowledge by the populace about the “world of the sick.” They may seek information from the Internet, get multiple opinions from physicians, rely on their gut instincts, or turn to their families for advice. They may have questions about procedures, pain, and suffering; what the test results mean; how to lower their risks; the implications for relatives; the options that remain open or are closed; their work; their family and social life; and protection against misuses of their personal information. “We have to have knowledge before we can actually create results for people,” Collyar said.
What does “clinical utility” mean to the common individual? Collyar asked. Most often, it means reliability—is a test going to predict how someone will react to a treatment, and what are the ramifications from the decisions that are made on the basis of the information gained? Most tests focus on a single marker, but the human body is an integrated circuit with many pathways that interact. Tests of single markers therefore need to fit into a diagnostic whole that includes imaging, clinical examinations, exploratory surgery, and so on. In general, however, “clinical utility” does not mean much to most people. A more relevant concept is “personal utility” or “personal guidance.”
From this perspective, people want to get test results in a reasonable time frame and have results explained in clear language, Collyar said. Health care providers should be comfortable interpreting the test information and conveying what result is most important. In addition, test results need to be updated quickly as the test or a person’s condition changes, with medical personnel receiving adequate training to keep up to date with rapidly changing technologies. Patients also need to have the choice to receive test results because some people will want to know a result even if no intervention is available, but others will not. Also, if a test has implications for family members, people want counseling because of the immense consequences for families that can follow the results.
Patients and advocates want the health care and research worlds to honor the true meaning of “patient-centered,” according to Collyar. “Nothing about us without us” is a key message—“we need to be involved in the dialogue.” Also, when groups are asked, they are all interested in being included in research, Collyar added, and they need to be so they are not excluded from the benefits of research.
From the perspective of the public and of patients, failures in developing molecular diagnostics can waste time and money, erode trust, and cost lives. “We have to get this right,” Collyar said.