Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Session 3: Data Collection Standards and Monitoring In order for a new drug therapy to obtain FDA approval, clinical trials must provide evidence of safety and efficacy. In addition, the FDA and the NCI frequently audit sites and cooperative groups to ensure the quality of the data collected and the safety of the experimental treatment being given. A number of regulations and guidance documents provide guidelines for the frequency and extensiveness of audits and for the amount and type of data required to support claims of safety and effi- cacy. These guidelines are continually revised by the FDA and the NCI to reflect lessons learned and, when appropriate, to respond to the concerns of a variety of stakeholders, including government, industry, patients, and clinicians. However, some clinical trials investigators have suggested that the NCI and the FDA reduce the amount of audits and data required, thereby streamlining the clinical trials process and saving both time and money. At the same time, additional data and resources are needed for todayâs trialsâsuch as those for targeted therapiesâto identify patient candidates for new therapies, for biomarker analysis, and more. The first session of the second day of the conference explored which data are essential for demonstrating safety and efficacy and what changes could improve current auditing and data requirements. The NCI perspective Dr. James Doroshow, director of the Division of Cancer Treatment and Diagnosis (DCTD) of the NCI, began this session by giving the NCI 65
66 MULTI-CENTER PHASE III CLINICAL TRIALS perspective on data collection and monitoring. He suggested that, given the expense of complex trials and the limited budgets of government and industry, one cannot collect all the data that might potentially be needed for a future review. Instead, he suggested that the best course is collect- ing only the amount of data that is absolutely necessary, but he noted that it is debatable just how much data are required to ensure safety and efficacy in any given situation. For example, new investigational drugs might have novel toxicities that require the monitoring of numerous signs and symptoms. But once a novel toxicity, such as hypertension, has been documented among the first 1,000 patients tested with the experimental drug, he asked, âdo we need to collect blood pressures on the next 10,000 patients?â The amount of data needed for audits is also a debatable issue, he said. He noted that the NCI regularly audits 10 percent of the patient cases at cooperative group sites and determines major deficiencies related to a number of factors listed in Box 5. But he noted that there are no standards for these auditing data and no comparative data against which to measure them. âI personally happen to think that if there were a major deficiency in adverse-event reporting of only 2.3 percent of the cases, thatâs pretty good,â Dr. Doroshow said. âBut should we give the cooperative groups kudos or tell them that this is unacceptable? We really do not know.â Dr. Doroshow also said he has data to suggest that reviewing 10 percent of patient cases is sufficient for on-site audits and that reviewing additional cases does not improve the information gained. âMonitoring every patient at every clinical visit is not necessarily going to provide you with additional information that tells you whether or not a trial or a site is doing well or is doing badly,â he said. BOX 5 NCI Cooperative Group Program: Patient Case Review Categories â¢ Informed consent â¢ Eligibility â¢ Treatment â¢ Adverse events â¢ Disease outcome/response â¢ General data timeliness SOURCE: Doroshow presentation (July 1, 2008).
DATA COLLECTION STANDARDS AND MONITORING 67 Another debate in data collection and monitoring concerns which kinds of endpoints are valid to use and how much verification is required for those endpoints. Many clinical trials are moving away from using survival data as endpoints and instead are using time-to-progression or tumor response as endpoints. These endpoints are determined by radio- logic criteria that can be subjective, because they are not easily quantified, but one study of three different trials found a close correlation between investigator assessments of radiologic endpoints and those of a radiologic review (Dodd et al., 2008). âWe may think it intuitively obvious that these radiologic reviews should be done,â Dr. Doroshow commented, âbut where are the data that make it clear that adding a procedure is unequivo- cally going to be beneficial in improving the quality of the data that leads to an indication?â He added that, for technical and biostatistical reasons, an additional blinded central review may introduce as many errors as it corrects. In a later talk, Dr. Fyfe agreed with this assessment, noting that because of the existence of different methods, different lesions selected, and measurement inconsistencies, a blinded radiologic review should not look for a concordance of findings that is then used to assess the quality of the reviewed site but should verify benefit and look for bias between the treatment arms in a study. She pointed out that there often is not a great degree of concordance in radiological assessments of small-volume disease, because such assessments can be highly subjective. âAs we start to use independent review facilities, there is a real danger, because more data are not better data,â Dr. Fyfe said. âThey are simply confusing, whether itâs an industry trial or a cooperative group trial.â Dr. Doroshow ended his presentation by saying, âItâs neither appro- priate nor desirable for the NCI to perform clinical trials in a manner that is identical to the model that industry has used and is continuing to use. One hundred percent source verification is neither reasonable nor neces- sary, nor does it necessarily get us a higher quality of data that will lead to improving treatment for our patients.â In defense of industry, Dr. Canetta commented, âSometimes our approach has been that if we believe we might need data eventually, we had better collect them. We do not relate only to the FDA, but also to a plethora of other regulatory agencies that are equally powerful in regulating the use of experimental agents in their countries.â Industry perspective In the following presentation, Dr. Gwendolyn Fyfe, senior staff scien- tist in clinical hematology and oncology at Genentech, gave the industry perspective on data collection and monitoring. She said that much of
68 MULTI-CENTER PHASE III CLINICAL TRIALS the data her company collects is never used. This includes data on vital signs, concomitant medications, laboratory values, medical histories, and low-grade toxicities; secondary information about adverse events; and independent reviews of efficacy endpoints. Furthermore, some data are collected excessively or in an inefficient manner. Dr. Fyfe noted that by the time a drug enters Phase III trials, much is known about its toxicity and the time course of its toxicity. Investigators should use that information to narrow the collection of data, she said. By the time Avastin (bevacizumab) reached Phase III trials for colorectal cancer, for example, its effects on blood pressure and likely effects on bleeding were known and should have focused the collection of data on adverse effects. Dr. Fyfe suggested collecting grade 3 or 4 toxicities on a cycle-specific basis. âI know, for the most part, when I go into Phase III, whether something is going to happen at day 7 or 14. I do not learn any- thing by collecting that exact date, and I put a burden on the sites if I ask for a precise date,â she said. Most of the data on adverse events are ârolled up into a worst grade, so for all the data you collect, you end up with just one number,â Dr. Fyfe said. âIâm not sure that all the specific stop and start dates helped us understand the safety profile of bevacizumab.â Dr. Fyfe noted that much of the data collected on adverse events is superfluous, because it merely improves confidence intervals without changing clinical decisions. As Table 2 shows, increasing the number of patients analyzed for adverse events causes statistical differences that are not meaningful in the clinic. âA physician will not manage patients differently if there is a 40 percent or a 60 percent adverse event rate,â she said. âWhen we think about collect- ing more data, there is this inference that itâs better that we know more. TABLE 2â Does More Safety Data Provide Greater Certainty About the Safety Profile? Expected Rate of Adverse Event 100 Patients 200 Patients 400 Patients 800 Patients (percentage) Analyzed Analyzed Analyzed Analyzed â5 4.3 3.0 2.1 1.5 10 5.9 4.2 2.9 2.1 20 7.8 5.5 3.9 2.8 30 9.0 6.5 4.5 3.2 40 9.6 6.8 4.8 3.4 50 9.8 6.9 4.9 3.5 NOTE: Confidence intervals as a function of patient number. SOURCE: Fyfe presentation (July 2, 2008).
DATA COLLECTION STANDARDS AND MONITORING 69 Itâs simply not true, in terms of being helpful. It probably just slows things down at an extra cost without providing much value.â Later, during dis- cussion, Dr. Ralph deVere White suggested that many of the data points collected to show drug efficacy may also be superfluous since they, too, might merely improve confidence intervals without having any relevance to clinical decisions. Dr. Fyfe agreed that this may be so and that it should be considered when developing minimum data standards. Not only are Genentech and other companies collecting informa- tion that they are not using, Dr. Fyfe said, but they are missing impor- tant data. In particular, she expressed dissatisfaction about the lack of placebo-controlled trials, which are needed to ensure unbiased reporting of adverse events. For example, Genentechâs randomized Phase II trial of Avastin did not include an arm that received a placebo, and the rate of thrombosisâblood clotsâwas much higher in the Avastin arm than in the control arm receiving standard treatment (26 or 13 percent, depending on which Avastin arm, versus 9 percent in the control arm). However, differ- ences in the rate of thrombosis in the control and treatment arms of the placebo-controlled Phase III trial of Avastin were more similar (16 percent in the placebo arm versus 19 percent in the Avastin arm) (Hurwitz et al., 2004; Kabbinavar et al., 2003). âWith the best possible intentions, people under-report adverse events on the control arm,â Dr. Fyfe said. Dr. Fyfe also noted the importance of understanding why physicians or patients stop treatment, as that can provide information about the tolerability of a drug or its efficacy. Often, however, this data is not col- lected. âSometimes patients stop a drug because itâs toxic, but sometimes they stop the drug because they are progressing but have not reached that magical âprogressive diseaseâ endpoint that we collect data on,â she said. Collecting data on subsequent treatment is also helpful, as multiple treatment lines are often pursued, and such data can help assess optimal treatment paradigms, she added. Dr. Fyfe suggested collecting data on the deaths, discontinuations, and SAEs in all patients at all sites, and she commented that collecting data on targeted adverse events is also appropriate in some cases. She added, however, that detailed adverse-event profiles in subpopulations are usually inadequately answered in Phase III trials and are probably better addressed in Phase II or Phase IV trials or through post-marketing registries. She suggested that a set of data standards could ensure that the data collected are adequate to reliably assess whether an unapproved agent has a good risk-to-benefit ratio, including the issue of whether the drug significantly improves outcome when added to or in contrast to a known standard, and also what the effect on safety is when that drug is added to or substituted for a known treatment standard. Ideally, data should also
70 MULTI-CENTER PHASE III CLINICAL TRIALS be collected that might identify subsets of patients in whom the risk/ benefit ratio is different from other patients. Dr. Fyfe said that data stan- dards should be similar for all licensure trials and that minor tweaking of current cooperative group standards may be all that is needed. Dr. Fyfe also stressed the importance of verifying study data in licen- sure trials to ensure accuracy and completeness, but she noted that this need be done only in a subset of patients. She suggested that stakeholders work together to quickly determine the most appropriate and consistent standards for data collection and monitoring. âWe simply need to stan- dardize so that we can assess the risk and benefit of drugs in the medical milieu of the United States, rather than offshore, as is increasingly hap- pening,â she said. She added that there should be a funding mechanism so that cooperative groups can meet the data standards created. She sug- gested creating a foundation to which industry contributes when doing Phase III trials with cooperative groups, and she added that there should also be a surcharge so that the cooperative groups can do trials that help define the standard of care for patients in the United States. Cooperative groups perspective The next speaker, Dr. Robert Comis, president and chairman of the Coalition of Cancer Cooperative Groups and group chair of ECOG, dis- cussed the increasing tension within cooperative groups between the need to reduce data collection in order to save money and time and the need to provide the data required for licensure of drugs. The pressure to save money is real: Dr. Comis highlighted the flat budget for the Coopera- tive Group Program over the past three or four years, which represents a substantial decrease when adjusted for inflation. He began his talk by describing the 1997 recommendation of the Armitage Committee, which reviewed the NCI Cooperative Group Program (NCI, 1997). The commit- tee recommended that in designing clinical trials, data collection should be reduced and that investigators should collect only data pertinent to studying endpoints and safety. At the same time, the FDA released a guid- ance for industry stating that cooperative group data could be used for FDA filings (FDA, 1998). This has, in part, fueled an increase in licensure Phase III trials run by cooperative groups, Dr. Comis said. But the NCI and the FDA differ in important ways in the data that they require and how it is reported, he said. They differ in how adverse events are reported, in eligibility and dosing checks, in how data are col- lected in the laboratory and audited and monitored, in what locks are placed on databases, and in what endpoints are verified. The additional data or procedures that the FDA requires for licensure add substantial
DATA COLLECTION STANDARDS AND MONITORING 71 costs on to a clinical trial. Those costs are not necessarily reimbursed by industry sponsors. For example, the FDA may require: â¢ ardiac safety monitoring tests and procedures that are not the c standard of care; â¢ a central review of imaging findings; â¢ revisions of case report forms; and â¢ upplemental data management efforts, such as reconciling NCI s and FDA databases of adverse event reports. âWhen we do a study that has registration implications,â Dr. Comis said, âthere are tremendous additional workloads that are imposed on the central offices and sites that are well beyond NCI funding levels.â To meet those additional requirements, many cooperative groups scramble in an ad hoc manner to acquire industry or other funding to support their efforts, but this can be unreliable and difficult given that âthe system is so underfunded that there is no elasticity,â Dr. Comis said. âWhat if industry funding goes away?â he asked. Dr. Comis suggested that cooperative groups, government agen- cies, and industry develop evidence-based standards for Phase III trials, including standards for data collection, data and site monitoring, and the content of case report forms. He also suggested that there be an inde- pendent and thorough analysis of the value of independent reviews of imaging findings and data, since many experts question their value and added expense. During the discussion that followed the presentation, Dr. Canetta agreed that such an analysis would be beneficial. Dr. Comisâs final suggestion was that there be a cooperative groupâwide support structure to provide services beyond the capacity of the central offices. FDA Perspective Dr. Richard Pazdur, director of the Office of Oncology Products in the Center for Drug Evaluation and Research at the FDA, joined the speaker panel in the discussion that followed Dr. Comisâs presentation, and in an impromptu presentation he agreed with many of the points and sugges- tions made by the previous speakers. He stressed the importance of inde- pendent reviews of the data and imaging findings for trials that are not placebo-controlled, but he added that independent review does not have to be extensive. It is not necessary, for instance, to review every patient case or solicit the opinions of three different radiologists. Dr. Pazdur sug- gested exploring alternative mechanisms, including requiring blinded trials, in order to ensure that there is no systematic bias in studies. The FDA does not require independent review for blinded trials, he pointed
72 MULTI-CENTER PHASE III CLINICAL TRIALS out. He also agreed with Dr. Fyfe that it is not feasible to require that there be concurrence between reviewer and investigator in the assessment of radiologic findings. On the subject of FDA requirements related to assessing the safety of a tested drug, Dr. Pazdur pointed out that the data needed to support a safety claim for an oncology drug are much smaller than required in other therapeutic areas. Because of this, for subsequent development of the drug in sNDAs, the FDA may require more safety information involving larger numbers of patients. That has been especially true in recent years, since the FDA has become more safety conscious in the post-Vioxx era, Dr. Pazdur said. He pointed out, however, that when oncology drugs fail to get approved it is not because investigators failed to demonstrate their safety to the FDA but rather because they failed to demonstrate their efficacy. He concluded that it would be helpful if the FDA defined more clearly what an optimal safety database is, and he suggested that a public hearing and workshop be held on this topic. Dr. Pazdur also noted that the FDA accepts the NCIâs auditing pro- cedure but that industry often supersedes that auditingânot because of requirements of the FDA but rather in order to meet its own needs. He suggested developing uniform auditing standards that the NCI, the FDA, and industry would all follow. In the general discussion that ensued, Dr. Bruce Hillman, professor of radiology at the University of Virginia School of Medicine, brought up the subject of innovative techniques in imaging, such as analysis software that can provide more precision and less variability in the analysis of images. âI find it really hard to understand why we are still talking about linear anatomic measurement criteria in this day and age of this extraordinary software, especially as we start talking about targeted treatments,â he said. He suggested considering these innovative analyses of radiologic findings when developing data and review standards. A few attendees raised the issue of industry funding of cooperative group trials and the effect that this might have on the data collected and reported. âThe evidence necessary to sell something under a monopoly structure at a high price is very different than the evidence needed to influence the practice of medicine,â Mr. Robert Erwin said. âWhat is the impact on the data that are collectedâand even the questions that are askedâby industry stepping into the breach to fill the funding gap left by NCI?â Dr. Doroshow countered that any effects that industry might have on clinical trial data could be kept in check by having an indepen- dent auditing system. Dr. Abrams agreed that such an independent audit is critical, as conflicts of interest can arise whenever cooperative groups receive industry support. âIf we do not have a robust independent review of these trials,â he said, âthe criticism will be raised quite quickly that
DATA COLLECTION STANDARDS AND MONITORING 73 these trials are being done by industry and that public dollars should not pay for them. What will protect these trials is that they have a very robust independent review, not just a cooperative groupâonly review.â Dr. Padzur added that most of the cooperative group trials receiving industry support are for supplemental indications of drugs whose safety and effec- tiveness are already well established and that clinical trials for primary indications generally are scrutinized more by the agency. Dr. Schilsky suggested that ASCO would be more than willing to oversee the development of minimum data standards for oncology clini- cal trials that are acceptable to all stakeholders. He also suggested that ASCO take the lead in determining the appropriate minimal eligibility requirements among stakeholders; this is important because a lack of eli- gibility is a major deterrent to patient participation in clinical trials, as had been pointed out by Dr. Grubbs on the first day of the workshop. Such eligibility criteria should ensure that the patient population of the study is well defined and that the proposed treatment is likely to be safe in the population to be studied, Dr. Schilsky said. Dr. Pazdur noted that lower- ing the threshold of eligibility might increase the need for more safety data. If, for example, people with compromised kidney or liver function are allowed to participate in a clinical trial and it is not known how an experimental drug is excreted, safety is more of an issue, he said. Dr. John Wagner, executive director of clinical pharmacology at Merck, suggested considering âfitness for purposeâ when developing minimal data standards so that the design of an experimental protocol can adequately validate and qualify a particular use of the drug or biomarker being tested. âThere can be a minimal set of data-collection standards, but for particular uses that may need to be augmented in one way or another,â he said. Dr. Canetta expanded on Dr. Fyfeâs comment that new toxicities are rarely found during Phase III trials, because they are already documented in Phase II trials. Industry and investigators âhave been cutting off a lot of the Phase II activities,â he said, âand therefore we have lost a lot of learning opportunities.â He suggested a few remedies, such as doing more randomized Phase II studies before proceeding to Phase III studies or having an independent data and safety monitoring committee that is program-wide for Phase II studies. The latter has helped his company acquire better safety information before proceeding to Phase III studies, he said. Dr. Mendelsohn asked Dr. Comis if, in his analysis of cooperative group trials, he found any key factors that foster adequate patient accru- als. âThe most important characteristics of a highly successful trial are that it answers an interesting question and involves a new approach,â Dr.
74 MULTI-CENTER PHASE III CLINICAL TRIALS Comis responded. âSo I think we all need to focus on those things that are the most cutting-edge.â Dr. DeVere White noted that money is often a driver of change, and he suggested changing the funding structure of cooperative groups. In par- ticular, he suggested that instead of one-third of the money that the NCI gives to a cooperative group going towards patient reimbursements with the rest going to the infrastructure of the cooperative group, NCI fund- ing be split more equally between patient reimbursements and coopera- tive group infrastructure support. If this were done, more money would be allocated to patients and to the physicians who put them in clinical trials. Dr. Buckner noted that the NCI typically does not require data collec- tion on attribution of adverse events, whereas the FDA does. His clinical trial findings suggest that attribution appears to be an unreliable endpoint and perhaps could be immediately removed. Dr. Canetta concurred, not- ing that he recently received a letter from the Japanese regulatory agency asking that such attributions not be done.