The emergence of high-throughput omics technologies beginning around the mid-1990s led to development of new approaches for studying the dynamics of biological systems. Multidisciplinary collaborations were formed among molecular biologists, bioinformatics experts, and statisticians at many institutions to devise experimental strategies and statistical methods for the analysis and interpretation of these rich new sources of data. At Duke University, researchers were pursuing these new avenues of research. In 2000, Joseph Nevins and Mike West founded the Computational and Applied Genomics Program (CAGP), a multidisciplinary research program (Kornbluth and Dzau, 2010). The CAGP formed the basis for what later became the Center for Applied Genomics and Technology (CAGT). As one of the initial centers of the Duke Institute for Genome Science and Policy (IGSP), which was formed in 2003 (Kornbluth and Dzau, 2010), CAGT researchers used various types of genomic analyses to elucidate potential mechanisms of oncogenesis and to understand the complexity of cancer phenotypes. DNA microarray analysis became a powerful tool in the CAGP/CAGT for the study of regulatory pathways essential for cancer initiation and tumor growth, and researchers developed several gene expression–based tests to predict patient responses to chemotherapeutic agents and published the results. At a very early stage in the discovery research, such tests were taken into clinical trials. The primary publications were criticized for major problems in data presentation and statistical analysis. Eventually, concerns were raised by statisticians about the validity of the tests and about potential harm to patients enrolled in the trials.
The Institute of Medicine (IOM) committee’s statement of task refers to three trials that were conducted at Duke University. Table B-1 outlines some information related to those trials.
This appendix provides a concise summary of the research objectives and the approaches taken in developing several of the gene expression– based chemosensitivity tests implemented in the three clinical trials in Table B-1, and presents findings that provide important insights about processes that were in place at Duke University, to enlighten the development of and to provide motivation for many of the IOM committee’s recommendations that are intended to enhance the integrity of future omics-related research. Many of these findings are in key areas that include the responsibilities of investigators and institutions, conflict of interest issues, and the roles of funders, regulatory authorities, journals, and biostatistical collaborators.
DEVELOPMENT AND EVALUATION PROCESS
Investigators are responsible for systematic and rigorous development of omics-based tests. Chapters 2, 3, and 4 explain the IOM committee’s recommendations on omics-based test discovery, development, and evaluation for clinical use. These recommendations are meant to help establish a process, agreed on by all collaborating disciplines, for the discovery and development of omics-based tests with the goal of improving patient care and outcomes.
Discovery and Test Validation Phases
Chapter 2 explains the technologies, statistical methods, computational methods, and bioinformatics methods that should be used in the discovery and confirmation of omics-based tests. Recommendation 1 defines critical steps in the discovery and confirmation of new candidate omics-based tests. Recommendation 2 (Chapter 3) focuses on omics-based test development and validation within a clinical laboratory certified under the Clinical Laboratory Improvement Amendments of 1988 (CLIA), in preparation for use in patient management decisions in clinical trials or for eventual use in patient management decisions in medical care. These steps include the design, optimization, validation, and implementation of the locked-down test in single or multiple CLIA-certified laboratories. Recommendation 2 also emphasizes discussion of a candidate test with the Food and Drug Administration (FDA) prior to validation.
The sections below present facts from the discovery and validation phases of the gene expression–based tests developed at Duke University and used in the three clinical trials the committee was tasked to evaluate:
TABLE B-1 Clinical Trials Related to Duke University Gene Expression– Based Tests Listed in the Institute of Medicine Committee’s Statement of Task
|Official title||A Randomized Phase II Trial Evaluating the Performance of Genomic Expression Profiles to Direct the Use of Preoperative Chemotherapy for Early Stage Breast Cancer||Phase II Prospective Study Evaluating the Role of Personalized Chemotherapy Regimens for Chemo-Naive Select Stage IIIB and IV Non-Small Cell Lung Cancer (NSCLC) in Patients Using a Genomic Predictor of Platinum Resistance to Guide Therapy||Phase II Prospective Study Evaluating the Role of Directed Cisplatin-Based Chemo With Either Vinorelbine or Pemetrexed for the Adj[uvant] T[herapy] of Early Stage NSCLC in Patients Using Genomic Expression Profiles of Chemo Sensitivity to Guide Therapy|
|Disease||Breast cancer||Lung cancer||Lung cancer|
|Start date||April 2008||February 2007||October 2007|
|Trial listed in ClinicalTrials.gov||March 2008||July 2007||October 2007|
|Sponsor||DOD||Eli Lilly/Duke/NCI||Eli Lilly/Duke|
|Principal investigator(s)||Paul K. Marcom, M.D., Duke University||Gordana Vlahovic, M.D., M.H.S., Duke University||Neal Ready, Ph.D., M.D., Duke University Medical Center, Hematology/Oncology, Duke Comprehensive Cancer Center|
|Chemosensitivity test||Doxorubicin (Adriamycin) and docetaxel (prospective)||Cisplatin (prospective)||Pemetrexed and vinorelbine (prospective)|
|Citations in ClinicalTrials.gov||Potti et al. (2006a)||Bild et al. (2006); Potti et al. (2006a)||Potti et al. (2006a,b, 2007b)|
|NOTE: DOD = Department of Defense, NCI = National Cancer Institute, NSCLC = non-small cell lung cancer.
aPersonal communication from Michael Cuffe, Duke University School of Medicine, July 23, 2010.
(1) tests for docetaxel and doxorubicin (Adriamycin) sensitivity were used in the trial NCT00636441; (2) a test for cisplatin sensitivity was used in the trial NCT00509366; and (3) tests for pemetrexed and vinorelbine sensitivity were used in the trial NCT00545948. For each, a brief explanation of test discovery and validation is provided, including information on the confirmation of the gene expression–based computational models; the availability of the data, metadata, computer code, and fully specified computational procedures used in the discovery and confirmation of the test; and whether the tests were locked down prior to progression to subsequent phases of test development.
Information regarding the CLIA laboratory and FDA aspects of test validation is general rather than specific for each of the tests discussed below. Communication with FDA is discussed later in this appendix. The committee had little information relating to the design, optimization, validation, and implementation of the tests in the CLIA-certified laboratory. At the March 2011 meeting, Nevins informed the committee that, at the time of performance testing, the laboratory was CLIA registered. (A certificate of registration does not indicate CLIA compliance but only that a CLIA application was submitted to the Centers for Medicare & Medicaid Services [CMS]; however, it does allow a laboratory to perform moderate and high complexity testing until an onsite survey is performed leading to CLIA certification if compliance to the regulatory standards is demonstrated.) The laboratory became CLIA certified during the course of the trials. Nevins stated that the investigators had implemented data quality control and security systems as well as an automated system for running the computational procedures that would ensure high-quality, reliable data (Nevins, 2011). The clinical trial protocols indicate that patient sample processing and microarray analyses were conducted in a CLIA-certified laboratory setting (Marcom, 2008; Ready, 2010; Vlahovic, 2010). It is not clear from the trial protocols where the computational procedures were performed on the data, but the Duke Clinical Genomics Studies Unit defined operational standards for “array data analysis through an automated system designed and controlled by a Duke Faculty biostatistician” (Kornbluth and Dzau, 2010, p. 5). Two of the protocols note that the data were available for quality assessment and analysis by a computational biologist (Ready, 2010; Vlahovic, 2010). As noted by Baggerly and by Lisa McShane, the computational models were not locked down when their performance was evaluated prior to use in the clinical trials (Baggerly, 2011; McShane, 2010a). As described in Chapter 3 and reflected in the committee’s recommendations, this constitutes a serious flaw in the test development process.
Docetaxel and Doxorubicin Chemosensitivity Tests (Potti et al., 2006a) Used in Breast Cancer Trial NCT00636441
The Duke researchers first published gene expression–based chemo-sensitivity tests for docetaxel and doxorubicin in the 2006 Nature Medicine paper (Potti et al., 2006a). This paper also presented chemosensitivity tests for five other chemotherapeutic drugs: paclitaxel, topotecan, 5-FU, cyclophosphamide, and etoposide. The drugs were chosen based on availability of gene expression microarray data and in vitro drug response (sensitivity) measures from the NCI-60 cell line panel from the National Cancer Institute (NCI) (Potti et al., 2006a).
A subsequent study conducted to evaluate the ability of the docetaxel and doxorubicin tests to predict patient response to a combination taxane chemotherapy regimen (docetaxel and epirubicin; abbreviated TET) or a non-taxane chemotherapy regimen (fluorouracil, epirubicin, and cyclo-phosphamide; abbreviated FEC), respectively, was published in 2007 (Bonnefoi et al., 2007). Both papers have now been retracted (Bonnefoi et al., 2011; Potti et al., 2011a).
The Duke researchers’ general approach for identifying signatures for each of the drugs was to first identify cell lines from the NCI-60 panel that were the most sensitive and resistant to the drugs. Then, they used statistical methods to develop the gene expression–based signatures that would form the basis of the computational models in the tests. However, conflicting and confusing information in the papers and the cited references regarding the data and the statistical methods contributed to the inability of colleagues in the scientific community to understand and replicate the generation of the computational models (Baggerly, 2011; McShane, 2010a; Review of Genomic Predictors for Clinical Trials from Nevins, Potti, and Barry, 2009). For example, the authors describe using Bayesian binary regression analysis, but the paper cited for this analysis (Pittman et al., 2004) presents a different statistical methodology for Bayesian binary prediction tree models. In addition, there were simple linear regression analyses reported in which p-values were stated to have been obtained by use of a log-rank test. The log-rank test is a statistical testing method applied for analysis of survival (time-to-event) data; its citation in the simple linear regression setting should have signaled a need for statistical review. The committee does not know if the paper was reviewed by a statistician either internally at Duke or during the Nature Medicine review process, but whatever statistical review occurred for this paper was inadequate. These instances point to the risks of relying on journal publication as the sole basis for judging the soundness of science, particularly when the results are poised for translation into the clinic.
Several datasets were used to confirm the gene expression–based computational
models generated. Potti et al. (2006a) reported using leave-one-out cross-validation to confirm the docetaxel computational model developed from drug sensitivity data derived from the NCI-60 breast cancer cell lines. The docetaxel test was reported to have been validated on several independent sets of data from ovarian and lung cancer cell lines and from clinical samples of breast and ovarian tumors; some of these data had been previously published and others were generated at Duke. The doxorubicin test also was reported to have been confirmed using leave-one-out cross-validation and then validated on independent gene expression datasets from breast, ovarian, and leukemia studies (Bonnefoi et al., 2007; Potti et al., 2006a).
Both the docetaxel and doxorubicin tests were used as part of computational models developed to predict response to multidrug chemotherapy regimens. Potti et al. (2006a) reported that when a compuational model for predicting sensitivity to combined TFAC (paclitaxel, 5-FU, Adriamycin, and cyclophosphamide) was applied to gene expression data from 51 patients in a breast neoadjuvant treatment trial, there was a statistically significant association between the predicted multiregimen response probability and response outcome. Similar statistically significant results were reported from a second collection of breast cancer specimens from patients who had received FAC (5-FU, Adriamycin, cyclophosphamide). Bonnefoi et al. (2007) reported good performance of multidrug sensitivity tests when applied to samples from the intergroup neoadjuvant therapy trial EORTC-10994/BIG-00-01, which randomized patients with estrogen-receptor-negative breast tumors between treatment arms for TET (docetaxel for three cycles followed by epirubicin plus docetaxel) and FEC (fluorouracil, epi-rubicin, and cyclophosphamide). The doxorubicin computational model described in Potti et al. (2006a) was used in lieu of an epirubicin computational model. The reported successful extension of the computational model methodology to multidrug regimens was seen as important because many cancer patients receive multidrug chemotherapy regimens.
Several aspects of the validations reported in Bonnefoi et al. (2011) and Potti et al. (2006a) raise questions about the rigor with which those validations were conducted. There was a lack of information about how the thresholds applied to the response probabilities generated by the computational models were selected for the validations involving clinical samples in these studies (Bonnefoi et al., 2011; Potti et al., 2006a), and the reported use of different thresholds for the two tumor types (breast and ovarian) indicates that these two validation studies on clinical samples could not have been based on an appropriately locked-down computational model (which must include locking down any threshold). In addition, neither paper states that the investigators were blinded to the response outcome data when they calculated the predicted response probabilities (Bonnefoi et
al., 2011; Potti et al., 2006a). The Bonnefoi et al. (2007) paper states that several authors had full access to all of the raw data, but it is not known when in the course of the study they may have used that access.
The drug sensitivity measures and gene expression microarray data used to develop the docetaxel and doxorubicin tests were publicly available in the database from the NCI-60 website.1 Computer code used to generate the gene expression-based computational models in Potti et al. (2006a) was available on a Duke website (Baggerly and Coombes, 2009). However, when statisticians Keith Baggerly and Kevin Coombes attempted to assess the validity of the tests at the request of colleagues at MD Anderson Cancer Center who were interested in using the tests or the same approach to develop new tests, they found insufficient information to reproduce the published results, using the available data and the methods published in the Nature Medicine paper (Baggerly, 2011). Therefore, Baggerly and Coombes began corresponding with the principal authors at Duke to better understand the data and methodology. At first there was an exchange of questions and answers regarding the data, cell line labels, and gene lists. However, after multiple exchanges between November 2006 and June 2007, Baggerly and Coombes were still unable to reproduce the results and communications between the groups broke off (Baggerly, 2011). The statisticians submitted correspondence to Nature Medicine outlining their unresolved concerns and questions. Their correspondence was published along with a reply (Coombes et al., 2007; Potti and Nevins, 2007). The concerns included an inability to reproduce the selection of cell lines from sensitivity measures, errors in gene lists, incorrect figures, combining of training and test sets in developing the computational models, and an inability to produce the reported test performance results. Further communication between Baggerly and Coombes and the authors and journals is described in the section on journals later in this appendix. When the Nature Medicine paper was eventually retracted on January 7, 2011, corruption of additional validation datasets was noted, with an explicit statement that the authors had been “unable to reproduce certain crucial experiments showing validation of signatures for predicting response to chemotherapies, including docetaxel and topotecan” (Potti et al., 2011a, p. 135).
The clinical trial using these tests, NCT00636441, titled A Randomized Phase II Trial Evaluating the Performance of Genomic Expression Profiles to Direct the Use of Preoperative Chemotherapy for Early Stage Breast Cancer, was listed in ClinicalTrials.gov on March 9, 2008. This trial was temporarily suspended from October 19, 2009, to February 12, 2010. The trial was suspended again on July 23, 2010, and terminated on November 4, 2010. This trial and the following two clinical trials named in the IOM
1 See http://dtp.nci.nih.gov/docs/cancer/cancer_data.html (Potti et al., 2006a).
statement of task are discussed in more detail later in this appendix (see section on evaluation for clinical use).
Cisplatin Chemosensitivity Test (Hsu et al., 2007) Used in Lung Cancer Patients in NCT00509366
The gene expression–based chemosensitivity test for cisplatin was published in the Journal of Clinical Oncology (Hsu et al., 2007), along with a chemosensitivity test for pemetrexed; this paper has now been retracted because of the “inability to reproduce the experiments demonstrating a capacity of a cisplatin response signature to validate in either a collection of ovarian cancer cell lines or ovarian tumor samples” (Hsu et al., 2010, p. 5229). The general statistical approach used to develop the computational models was similar to the one reported in Potti et al. in Nature Medicine (2006a); the authors had made computer code available on a Duke website. The cisplatin test was developed using publicly available gene expression microarray data and drug sensitivity data from a study published in the International Journal of Cancer (Gyorffy et al., 2006). Hsu et al. (2007) reported that the cisplatin test had been validated in two experiments. The first experiment used data from ovarian cancer cell lines on which the Duke investigators had performed drug sensitivity experiments and gene expression microarray profiling. A second experiment used clinical specimens from patients with ovarian cancer. There were no reported validation attempts using clinical tumor samples from patients with lung cancer, but the first trial in which the cisplatin test was used to guide therapy was the NCT00509366 trial for advanced lung cancer. As described in Chapter 3 and indicated in Figure S-1, the omission of such a validation step constitutes a critical flaw in the test development process.
Problems with posted data and figures were identified by Baggerly and Coombes for both the cisplatin and pemetrexed tests (Baggerly and Coombes, 2009). For example, they identified off-by-one errors in gene lists for both tests, “outlier” genes reported for the cisplatin test that could not be reproduced from the data (even after accounting for the off-by-one error), and a reversal of sensitive/resistant labels in a data figure for the pemetrexed test. Baggerly and Coombes (2009) noted in their analysis: “one theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common.” The statisticians were particularly concerned that the four outlier genes (probesets) mistakenly reported for the cisplatin test were exactly those cited in Hsu et al. (2007) as providing biological plausibility for the model. Even with access to the publicly available primary data and code posted by the authors on a Duke website, Baggerly and Coombes were unable to reproduce the published results. Further information on Baggerly
and Coombes’s examination of the cisplatin and several other tests is provided later in this appendix.
The clinical trial, NCT00509366, titled Phase II Prospective Study Evaluating the Role of Personalized Chemotherapy Regimens for Chemo-Naive Select Stage IIIB and IV Non-Small Cell Lung Cancer (NSCLC) in Patients Using a Genomic Predictor of Platinum Resistance to Guide Therapy, began accruing patients in June 2007 (McShane, 2010b) and was listed in ClinicalTrials.gov on July 30, 2007, temporarily suspended from October 6, 2009, to February 12, 2010, resuspended on July 23, 2010, and terminated on November 4, 2010.
Pemetrexed (Hsu et al., 2007) and Vinorelbine Chemosensitivity Tests Used in Clinical Trial of Lung Cancer Patients NCT00545948
The gene expression–based chemosensitivity test for pemetrexed was published in the Journal of Clinical Oncology (Hsu et al., 2007); as mentioned in the previous section, this paper has now been retracted (Hsu et al., 2010). The gene expression–based chemosensitivity test for vinorelbine does not appear to have been published; the protocol for NCT00545948 cites Potti et al., Nature Medicine (2006a) as the relevant reference (Ready, 2010). The general statistical approach used to develop the computational model for pemetrexed was similar to that in Potti et al. (2006a). The pemetrexed test was developed using methods similar to those used to develop the cisplatin test, but the data source was different. This test was developed using the publicly available gene expression data and drug sensitivity data derived from the NCI-60 cell lines. Hsu et al. (2007) reported that the pemetrexed test had been validated using in vitro drug sensitivity data from an independent set of 17 NSCLC cell lines. This appears to have been the only validation study conducted before the pemetrexed test was used to direct patient therapy in the NCT00545948 clinical trial. In this trial, the pemetrexed test was used along with a similar gene expression–based test for vinorelbine sensitivity to determine which of those drugs should be coupled with cisplatin for adjuvant therapy.
As mentioned in the previous section, problems with posted data and figures were identified by Baggerly and Coombes for both the cisplatin and pemetrexed tests. They were able to detect these problems using the data that were available from the NCI-60 website and the same computer code mentioned in the previous two sections that was also used for this test (Baggerly and Coombes, 2009). Further information on their examination of the pemetrexed and several other tests is provided later in this appendix. No information is available relating to the vinorelbine test.
The clinical trial NCT00545948, titled Phase II Prospective Study Evaluating the Role of Directed Cisplatin Based Chemo With Either Vinorelbine
or Pemetrexed for the Adj[uvant] T[herapy] of Early Stage NSCLC in Patients Using Genomic Expression Profiles of Chemo Sensitivity to Guide Therapy, was listed in ClinicalTrials.gov on October 17, 2007, temporarily suspended from October 6, 2009, to February 11, 2010, suspended again on July 23, 2010, and terminated on February 3, 2011.
Evaluation for Clinical Utility and Use Stage
Chapter 4 presented the committee’s third recommendation, regarding steps important for taking a validated omics-based test into clinical trials. The decisions to move the tests into clinical trials and subsequent decisions about use of the tests to guide therapy in the clinical trials are described in greater detail in the next section on Roles and Responsibilities. The series of events following publication of the Baggerly and Coombes paper in the Annals of Applied Statistics (2009), as described below, applies to all three clinical trials and related tests (docetaxel and doxorubicin chemosensitivity tests used in NCT00636441, cisplatin chemosensitivity test used in NCT00509366, pemetrexed chemosensitivity test used in NCT00545948). No information is available about the vinorelbine test.
In September 2009, NCI was in the process of reviewing a revised clinical trial protocol from the Cancer and Leukemia Group B cooperative group (CALGB-30702), which was proposing to use six of the Duke chemosensitivity tests in a clinical trial for patients with advanced lung cancer. The reviewers had noted serious discrepancies in the information presented in the protocol and a lack of validation of the tests on human lung tumor samples, and NCI disapproved that protocol. However, the protocol also mentioned several Duke trials already under way using several of the tests. The concerns generated by this protocol, along with the publication of the Baggerly and Coombes paper (2009), led NCI to contact leadership at Duke University, and ultimately resulted in suspension of the trials and launch of the external review in early October 2009.
These events prompted NCI to further scrutinize another test developed by Nevins and Potti (but not one of the tests being studied in the three clinical trials listed in the committee’s statement of task), the Lung Metagene Score (LMS), for which a clinical trial had already opened. In that trial, CALGB-30506, the LMS test was being used as a stratification factor for randomization of trial participants. During the protocol review process for CALGB-30506, NCI decided that, while the LMS test appeared to have some promise, there were concerns that laboratory batch effects might influence its performance. Therefore, NCI insisted on a change in the originally proposed design of the trial so that the test would not be used to direct therapy in the trial. Although results of the test were kept blinded and were not being used to guide therapy in the trial, evaluation of the
test was a co-primary aim of the trial. In November 2009, NCI’s Cancer Therapy Evaluation Program (CTEP) made a request to CALGB for data and computer code to reevaluate that test and information that had been provided to CTEP during its original protocol review process for that trial 2 years earlier, when NCI did not have access to the data and computer code. With data and computer code in hand, NCI’s reevaluation was able to identify a number of problems with the version of the LMS test that had been the basis for the trial approval and a supporting publication (Potti et al., 2006b). The problems included an unstable computational model and an inability to reproduce findings from a prevalidation exercise that had taken place during the trial approval process (McShane, 2010a). Eventually, the New England Journal of Medicine article was retracted because of “failure to reproduce results supporting the validation of the lung metagene model described in the article using a sample set from a study by the American College of Surgeons Oncology Group (ACOSOG) and a collection of samples from a study by CALGB” (Potti et al., 2011b, p. 1176).
In contrast to NCI’s reviews, oversight committees at Duke did not recognize significant problems with the other Duke chemosensitivity tests, and allowed them to be used to direct therapy selection in clinical trials. It is not known if the Cancer Protocol Review Committee (CPRC) and Duke Institutional Review Board (IRB), who were responsible for approving and overseeing the Duke trials, were fully aware of the extent of problems with the published papers or aware of contradictory statements being made about the validation status of some of the tests. For example, the IOM committee received conflicting information about validation of the pemetrexed test. Information supporting the lack of validation included correspondence between Potti and NCI. In Potti’s submission of R01-CA131049-01A1 in March 2008, Potti stated: “we have only been able to validate the accuracy of the cisplatin test in independent patient samples …, not the pemetrexed test … it is probably a little bit premature to employ the pemetrexed test to stratify patients” (NCI, 2010a). Potti also mentioned the “premature” status of the pemetrexed test in his 4/14/10 response to NCI’s letter dated 4/13/10 requesting information about his grant.2 Information suggesting that the tests had been validated was included in the protocol for the TOP0703 that was using the pemetrexed and vinorelbine tests. In Section 1.4.2 of the 4/21/08 version of the trial protocol, it is stated, “Using Affymetrix gene expression data with corresponding in vitro drug response data for vinorelbine and pemetrexed, our group has developed robust gene expression based models predictive of vinorelbine and pemetrexed sensitivity. These multigene models were validated with an accuracy of greater than
2 Communication from Anil Potti, Duke University, to William Timmer, National Cancer Institute, RE: R01CA131049-01A1 Information Request, April 14, 2010.
85% in independent in vitro studies of lung cell lines treated with vinorelbine and pemetrexed respectively.” There is no mention of validation using clinical samples. It is possible this represents confusion between in vitro validation (i.e., cell lines) and validation on human tumor samples. Despite reservations expressed by Potti about use of the pemetrexed test for directing patient therapy in 2008-2009, and an apparent absence of published validation results for the vinorelbine test, TOP0703 was opened to accrual and was listed in ClinicalTrials.gov in October 2007. Both the pemetrexed and vinorelbine tests were being used to select therapy in that trial.
ROLES AND RESPONSIBILITIES
This section explores the actions of the principal investigators (PIs), university, funders, and journals involved in the Duke case. It begins with a discussion of investigator responsibility, of Duke University’s existing infrastructure and oversight during the launch and conduct of the three clinical trials mentioned in the IOM committee’s statement of task (referred to as “the three clinical trials” hereinafter), and of the University’s response to the scientific controversy. Topics include a discussion on oversight of research, the need for an investigational device exemption (IDE), conflict of interest (COI) management, the whistleblowing system, the investigation into the controversy, and the nature of biostatistical collaboration. Subsequent sections address the role of funders in responding to scientific controversies and the role of journals in responding to credible concerns about published manuscripts.
First and foremost, investigators are responsible for the accuracy of their data, for the fairness of their conclusions, and for responding appropriately to criticism. Reproducibility—based on transparency—is a central component of the system of science. In the Duke story, there were not only inaccuracies in data, but also a lack of transparency by the investigators related to journals, other investigators, and the university’s external review committee (discussed more below).
Second, investigators have responsibility to ensure that clinical studies being conducted have appropriate scientific justification and approval of relevant review bodies. At Duke, it appeared that in some instances, gene expression–based tests were being used for patient management in clinical trials, while they simultaneously were being tested in other “preliminary” studies for their ability to predict results. This kind of problem arguably should have been apparent to—and avoided by—the PIs and their clinician colleagues.
The lead Duke investigators were not responsive to the queries of external investigators wanting to learn from and duplicate these methods on their own data, particularly after serious questions were raised in the medical literature. In addition, none of the coinvestigators in this series of publications originating from Duke raised concerns about the tests. As reported by Robert Califf in August 2011, Duke eventually surveyed 162 investigators involved in 40 papers coauthored by Potti, half of whom were by then at other institutions. Two-thirds of these papers, he testified, will be partially or fully retracted, with others pending evaluation. Yet in no instance did anyone make any inquiries or call for retractions until contacted by Duke. This experience suggests the need for coauthors to have more shared responsibility for the integrity of the published research.
When the Duke leadership was interviewed by the IOM committee on August 22, 2011, they stated that it is essential to be able to trust the PI because no audit system can totally overcome a fundamental lack of trust. In retrospect, they said that PIs must develop an appropriate culture with an accountability plan that must have (1) trust, (2) a system where dissent is encouraged, (3) appropriate data management systems, and (4) appropriate biostatistics collaboration.
Universities arguably have some of the most important responsibilities, as a “responsible party,” for assuring the soundness of science. Universities evaluate and hire researchers; have high standards for faculty appointment, promotion, and tenure decisions; and their names are inevitably associated with the work of their faculty. Universities are responsible for establishing oversight structures, such as IRBs, COI management, and other review committees, and for providing “safe environments” for reporting irregularities, to help ensure soundness of science and protection of patient-participants. Last, universities are directly charged with being the “oversight” bodies when specific questions or challenges arise, for example, in investigating questions of misconduct, or simply in investigating questions regarding “soundness of science” as Duke University was asked to do by NCI in fall 2009. For scientists at companies or stand-alone research institutions, the same institutional responsibilities apply.
Institutional “culture” includes expectations of behavior, achievement, and integrity that are transmitted by the institution and modeled by its leadership. Institutional culture starts with the dean, senior leaders, and members of their team stating how research is to be conducted, with integrity
and transparency, and with clarity that shortcuts will not be tolerated and that dishonesty is the basis for dismissal.
Role of Institutional Structure
In his opening remarks to the IOM panel on August 22, 2011, Califf outlined the organizational context in which the research was undertaken. He specified that the three clinical trials were conducted at Duke University, under general supervision by its Board of Trustees, President, Vice President, and Provost. The Chancellor for Health Affairs is responsible for the Duke University Health System, the Duke University School of Medicine, and the Duke University School of Nursing (integrated as “Duke Medicine”). The dean of the school of medicine and several vice chancellors—for science and technology, clinical research, and global health—report to the Chancellor. There are 20 departments and about 13 major centers and institutes, whose chairs and directors report to the Dean of the School of Medicine. Directors of several campus-wide institutes, including the IGSP, report jointly to the Provost and the Dean. Califf leads the Duke Translational Medicine Institute (DTMI), which has six major components, including the Duke Clinical Research Institute (DCRI). DCRI conducts multisite clinical research; it does not have accountability or authority for research done on patients in the Duke University Health System, unless part of a multisite trial. Dr. John Falletta is the senior chair of the Duke University Health System IRB and Dr. Michael Kelley chairs the Cancer Protocol Committee.
Institute for Genomic Science and Policy
Duke University launched the IGSP in 2003 to bring together multi-disciplinary teams of researchers. The CAGT, in which Nevins and Potti worked, was embedded within the IGSP. At the time of the three clinical trials, Duke had an extensive clinical trials infrastructure within the Duke Cancer Center, which normally would have been responsible for oversight and data stewardship in oncology trials conducted at Duke. These three oncology trials were not subject to Cancer Center oversight because of the collaboration with the IGSP to manage the implementation of the gene expression–based tests. This led to IGSP staff becoming involved in data entry and data management.
The genomics work was permitted to operate outside the established structures for review and supervision of clinical research, such as the Duke Cancer Center or the DTMI. This was explained by the fact that work spanned both basic sciences, including research in animals or banked specimens, and clinical research in people. The consequence was that a “separate pathway” had been created within the university that
ultimately did not provide the normal “checks and balances” in clinical research for storage of data, blinding where appropriate, providing locked-down protocols and plans for analysis, or an openness to critical reviews of protocols and publications. The IGSP created another separate infrastructure for gene expression–based clinical trials without the experience or expertise of the Cancer Center or the DTMI. It established the Duke Clinical Genomics Studies Unit (CGSU) to develop standards for genomics research. It also created a Data and Safety Monitoring Board “Plus” to monitor the safety of human subjects and the validity and integrity of data in ongoing genomic trials (Kornbluth and Dzau, 2011). However, this monitoring committee was not totally independent of the investigator team.
The IOM committee concluded that many of the problems that occurred in the Duke case would have been detected early or prevented entirely if routine structures and checks and balances had been in place or if the full infrastructure of the cancer center had been used. According to Califf, “there were numerous missed signals,” any one of which might have helped to avert the problems that emerged. Moreover, “there was ambiguity” in the lines of authority and oversight in the IGSP during the conduct of the three clinical trials. As Califf stated, the IGSP was supposed to be consultative with other research groups within the health sciences, but things got jumbled up. Geoff Ginsburg, a leader in the IGSP, added that in 2005, the IGSP set about to create a core function, which was to provide expertise fundamental to conducting clinical research in genomics: for example, bio-specimen procurement and quality assessment, sample management and processing, biobanking, and statistical analysis of genomic data. Ginsburg indicated that as time went on and the trials evolved, the sheer workload of the trials increased, and some of the genomics research coordinators began assisting in data entry and data management for these trials. The institution has since gained an appreciation for the need to clearly separate clinical trials activities from the genomics aspects of these studies and has created the systems to maintain this clarity; thus, the IGSP’s Clinical Genomics Studies Unit no longer participates in activities directly related to the performance of clinical trials. Instead, it consults, provides a resource, or collaborates with anyone doing genomics-based research, including investigators from other units who may be leading clinical trials.
Duke Cancer Center
For most NCI-sponsored cancer centers, such as the Duke Cancer Center, there is a requirement for a highly structured process for conducting clinical trials. Components of this system include a Clinical Protocol Review Committee (CPRC) that provides a thorough scientific review
regarding detailed scientific justification for the protocol, adequate trial design including sample size and an analysis plan, a clinical trials management system, and an independent data monitoring committee (IDMC) watching for early evidence of harm or benefit and monitoring the quality of trial conduct. In this case, the Cancer Center CPRC reviewed the protocols for their scientific merit, apparently relying on the journals’ decisions to accept the relevant scientific publications. Moreover, as noted above, the IGSP rather than the Cancer Center had taken responsibility for conducting and overseeing the trial. The protocols were reviewed by the Duke IRB to assess whether human safety, privacy, and autonomy were protected and appropriate informed consent procedures were in place. After these initial reviews, the trials were executed by the investigators and overseen by the IGSP (Kornbluth and Dzau, 2011). Ultimately, the IRB is responsible, but must rely on the CPRC for scientific review and the IDMC for expertise on the conduct of trials.
The University did not institute extra oversight or launch formal investigations of the three trials during the first 3 years after the original publications triggered widely known controversy about the scientific claims and after concerns started to develop about the possible premature early initiation of clinical trials. The Cancer Letter began covering the story in 2009, and scrutiny from NCI and the external community intensified during 2010. In 2010, the University formed a Translational Medicine Quality Framework (TMQF) committee to make recommendations to University leadership on appropriate oversight policies for future omics research being tested in clinical trials (TMQF Committee, 2011a,b). Their recommendations address lines of authority, oversight, and accountability.
There was significant ambiguity about whether an IDE was required at the onset of the three clinical trials. FDA has an oversight role in late-stage research before a test (or drug) is applied to patients. At the time that the Duke investigators were developing their clinical trial protocols, FDA was in the process of clarifying the requirements for when an application for an IDE must be made for omics-based tests. The Duke IRB’s understanding was that computational models were not considered devices by FDA and, even if a computational model were a device, the omics-based tests were not a significant risk to the patient-participants’ health, safety, and welfare because the tests were being used to direct choices among standard therapies (Falletta, 2011; FDA, 2011). However, FDA sent a letter to the
investigators in 2009 stating that the omics-based tests being studied in the three clinical trials needed to go through the IDE process (Chan, 2009). The investigators made some changes to the protocol of the studies in response to this letter and contacted FDA for further clarification about whether an IDE was still required (FDA, 2011; Potti, 2009). The Duke IRB determined that an IDE was not needed when it did not receive a response from FDA.3 In retrospect, and with FDA guidance in 2010, the Duke IRB chair recognized that an IDE should have been obtained for the omics-based tests used in the trials because the tests were used to direct patient management in the clinical trials (Falletta, 2011).
If an IDE had been sought at the initial stages, the FDA process of review would likely have asked for validation data and locked-down data files and computational procedures. While it cannot be known, it is likely that an FDA review would have uncovered some of the validation issues that would have prevented the use of these tests in clinical trials. Duke has since instituted new controls to ensure compliance with FDA’s IDE requirement (Califf, 2011).
Conflict of Interest Management
The Duke COI committee was charged with overseeing and managing investigators’ COIs while the three clinical trials were being conducted. The Duke IRB routinely asks investigators to answer questions about their COIs when submitting protocols for review (Ginsburg, 2011). If the answers trigger concerns about COIs, the IRB notifies the COI committee. The COI committee also requires all investigators to complete an annual reporting form intended to identify financial COIs. If a COI is identified through either of these channels, the COI committee works with the individuals to create management plans to address their COI. McKinney informed the IOM committee that individual investigators could be required to disclose their COIs on all consent forms for a clinical trial or be prohibited from acting as PIs in a clinical trial in which they had a substantial financial stake (McKinney, 2011).
The IOM committee reviewed information regarding the potential for COI for investigators, members of IRBs, the Data Safety Monitoring Board (called the DSMB-plus), and other oversight bodies for the trials at Duke. There is evidence that some of those involved in the design, conduct,
3 According to the FDA website, the Center for Drug Evaluation and Research (CDER) has no record of receiving the December 2009 letter from Dr. Anil Potti discussing an exemption for the trial that received pre-IDE review. The letter was brought to FDA’s attention during its 2011 inspection of the Duke IRB and clinical investigators (see http://www.fda.gov/MedicalDevices/ProductsandMedicalProcedures/InVitroDiagnostics/ucm289100.htm).
analysis, and reporting of the three clinical trials and related trials involving the gene expression–based tests had either financial or intellectual/ professional COIs that were not disclosed. Specifically, some investigators involved in the three clinical trials were evaluating omics-based tests for which they held a patent, or had a financial relationship with Expression Analysis Inc. and/or CancerGuideDx Inc.,4 laboratory and bioinformatics companies that were established quite early in the development process to market the Duke omics-based tests. According to Califf, there was great deal of confusion within the University at this time about when a patent and intellectual property interest qualified as a conflict. Some investigators believed a conflict developed when a patent application was filed or a patent was issued; others believed it was when a relationship was formed with a commercial company or when a marketable product was produced (Califf, 2011). At the meeting with the IOM committee on August 22, 2011, Califf acknowledged that the COI process had not identified that there was an important COI of a member of the DSMB-plus for the three clinical trials, which resulted from the member’s previous substantive collaboration with some lead investigators for the three trials on research closely related to the research in those trials.
In addition to individual COI, the potential for institutional COIs is important. Such COI not only is financial (i.e., the universities get a portion of profits from patent licenses and from spin-off companies), but also arises from interest to protect the reputations of the institution and respected colleagues. At the IOM committee meeting in March 2011, Califf acknowledged the university’s concern for its reputation in its handling of controversial issues of all kinds and the need to be especially careful in assessing the work of an esteemed senior faculty member. Managing situations in which both an investigator and the institution have potential COIs is particularly challenging.
Some investigators at Duke had intellectual property (IP) and equity interests in Expression Analysis and CancerGuideDx. Duke had no institutional interest in Expression Analysis, but did have a license agreement in place with CancerGuideDx as of January 2010; license negotiations began in early 2009. Duke’s vice dean for research indicated she was not aware of these conflicts at the time, and that better communication is needed because all parties should be aware of such issues (Kornbluth, 2011).
Duke leadership indicated that, whenever IP is filed, the institution, the COI committees, and the IRBs should be informed, but this was not routine procedure during the time of the design and conduct of the three clinical trials. Duke leadership acknowledged the possibility that COIs would have been considered to be private information, so that people working in
4 This company no longer exists.
the same groups or as coauthors would not necessarily know about some potentially important COIs of their colleagues. It was also reported that there was a lack of an institutional process that provided insight about how to address conflicts across the continuum of IP generation and development, from planning to file I P, filing IP, and forming relationships with a company. However, the senior chair for the Duke IRB process confirmed that PIs now are expected to disclose IP in their IRB submissions (Falletta, 2011).
Duke University follows a “just culture” model, which does not hold individuals accountable for system failings and errors over which they have no control; it only holds individuals accountable for mistakes that disregard patient safety or involve gross misconduct (Zuiker, 2008). Under a just culture model, it is expected that anyone at any level can criticize the scientific methods of a study in a protected environment. At the time of the three clinical trials, Duke University used both anonymous and non-anonymous reporting systems. It also had a compliance hotline through which individuals could report breaches of the rules and regulations governing clinical research (Cuffe, 2011). However, the problems with the three clinical trials were not brought to the attention of the appropriate individuals within the university leadership through any of these whistleblowing channels. According to Vice Dean for Research Sally Kornbluth, a number of people came forward after the university undertook its investigation and said they “were glad [the university was] reviewing things carefully” (Kornbluth, 2011). Why no one came forward earlier, or perhaps any such concern was not forwarded appropriately, is not known, but the fact that these problems were not brought forward earlier may be an indication of the discomfort or lack of confidence that faculty and staff may have with these systems.
Duke has taken steps to improve its whistleblowing system in response to what occurred in this case. The TMQF plan requires that every site-based research unit’s accountability plan must include a strategy to encourage individuals to discuss concerns about methods of research and to report suspected breaches in appropriate research practices. The goal is to create a culture of “dissent and discussion” (Califf, 2011). The university has made its Post-Doctoral Office more robust with the intent to make it a place within the university where postdoctoral students can report their concerns. It has also added an ombudsperson for faculty and students (i.e., an individual whom faculty and students can approach with concerns, if they feel uncomfortable confronting a superior) (Kornbluth, 2011). Everyone acknowledges that raising such challenges leads to anxiety.
Responding to Scientific Controversies
University leadership originally believed that the controversy surrounding the use of the omics-based tests in the three clinical trials involved disagreement about arcane scientific methodology. According to Kornbluth, “it was not presented or recognized as a criticism or implication of underlying data corruption” (Kornbluth, 2011). This position changed in 2009 when Baggerly and Coombes published the article in the Annals of Applied Statistics (Baggerly and Coombes, 2009), which stated that the omics-based tests did not work and were potentially endangering patient safety by incorrectly directing therapy (Kornbluth and Dzau, 2011). The potential for the tests to incorrectly direct therapy existed, among other reasons, because discrepancies identified in the data included reversal of some sensitive/ resistant labels.
At this time, NCI also had begun the process of reviewing protocol CALGB-30702 that had been submitted to NCI’s CTEP. The proposed trial would have used six of the chemosensitivity tests to guide therapy in an advanced lung cancer trial. The NCI reviewers noticed substantial differences between the trial protocol’s descriptions of the test and the way in which the tests were described in the validation studies. NCI disapproved the CALGB-30702 protocol, but the protocol mentioned that several gene expression–based tests were already guiding therapy in some Duke University trials. NCI staff conducted a search of ClinicalTrials.gov and became concerned when several clinical trials at Duke were identified using omics-based tests with similar methodologies that were developed by the Nevins/ Potti group. NCI contacted Duke in September 2009 regarding these trials (McShane, 2010a).
In response, Duke suspended the three trials, and the Duke IRB initiated an investigation. The IRB was designated as the appropriate university entity to conduct the investigation because the focus was on patient-participant safety concerns5 (Kornbluth, 2011). The Duke IRB formed an external peer review committee composed of two statisticians to conduct an independent evaluation of the data and three clinical trials. The reviewers’ identities were protected under a confidentiality agreement to encourage the reviewers to act objectively and without fear of reprisal (TMQF Committee, 2011b). The original intent was to give reviewers “unfettered access to all the data, software, and analyses. They also could request any other information needed from Nevins and Potti” (Kornbluth and Dzau, 2011, p. 16). The charge to the external reviewers consisted of two questions:
5 The Office of Research Integrity’s policy on responding to scientific controversies is limited to addressing scientific misconduct.
- “Have the methodology errors originally communicated by the MD Anderson Cancer Center researchers, Baggerly and Coombes, been adequately addressed by the Duke researchers?
- Do the methods as originally developed and as applied in the context of these trials remain valid?” (Kornbluth and Dzau, 2011)
In concluding their review, the external statisticians stated they were “able to show with an independent analysis that the approaches used in the Duke clinical predictors are viable and likely to succeed” (Review of Genomic Predictors for Clinical Trials from Nevins, Potti, and Barry, 2009, p. 1). On the basis of this report, the university resumed the three trials (Kornbluth and Dzau, 2011).
However, several revelations raise substantive concerns about this review process. The external statistical reviewers explicitly noted, given the data they had to review, that they were “unable to identify a place where the statistical methods were described in sufficient detail to independently replicate the findings of the papers.” They stated, “The one area [in which] they [the Duke investigators] have not been fully responsive and really need to do so is in clearly explaining and laying out [sic] the specific statistical steps used in developing the predictors and the prospective sample assignments” (Review of Genomic Predictors for Clinical Trials from Nevins, Potti, and Barry, 2009).
The integrity of the external review may also have been influenced by the involvement of Nevins. In the 12/22/2009 report of the external statistical reviewers, a reference is made to the pemetrexed test: “In addition, we agree with Nevins and Potti that since the profile is not used in any of the clinical trials patients are not being endangered.” This statement contradicts the fact that the pemetrexed test was being used to guide the choice of treatment in the trial NCT00545948 that opened in October 2007 even though, as noted previously, Potti had stated in his submission of R01-CA131049-01A1 in March 2008 that the accuracy of the pemetrexed test had not been validated in independent patient samples. This suggests that the independent review process permitted the PIs to be in direct contact with the external independent statistical reviewers, allowing the PIs to provide misleading information to these reviewers.
The external committee’s review also was influenced by lack of access to important relevant information. In the first week of November 2009, while the external review was in progress, new data about the cisplatin and pemetrexed tests (the subject of Hsu et al., 2007) were posted to a Duke website. Baggerly examined the new data and found additional errors, noting in particular that all of the samples used for validation were mislabeled. He forwarded a report and raw data to Duke officials on 11/9/09. That material, however, was never forwarded to the external statistical reviewers
because of the university leadership’s concerns that it might “bias” the committee’s review (Kornbluth and Dzau, 2011).
In summary, concerns have emerged about three areas: (1) whether the external statistical reviewers were encouraged to delve deeply because 4-6 weeks of intensive work likely would have been needed to do so, (2) the comprehensiveness of the information provided to the external reviewers, and (3) whether the information provided to the reviewers was substantially influenced by Nevins and Potti. These issues speak to the challenge of responsible parties’ oversight, resulting from institutional “conflicts” that could compromise the integrity of such oversight. In retrospect, Duke’s leadership recognized that if the reviewers “had been explicitly sent to McShane or sent to Baggerly and [instructed to consult with them during their analysis], there would have been a different outcome” (Kornbluth, 2011). According to Kornbluth, “One of the chief lessons learned is that there’s a balance between trusting investigators [who] have a very long track record with an institution and also thinking about what is necessary to ensure an adequate review” (Kornbluth, 2011).
Biostatistical Collaboration and Data Provenance Issues
As mentioned above, the IGSP was set up to conduct multidisciplinary research and included staff with diverse expertise, such as biostatistics, bioinformatics, clinical trials, pathology, and laboratory science. Various individuals with biostatistical expertise were involved in the development of the omics-based tests used in the three clinical trials, but there was a lack of continuity in personnel. Numerous errors identified in the statistical methodology and analyses (Baggerly and Coombes, 2009; McShane, 2010a,b) suggest there was insufficient statistical expertise involved in the studies for which published papers have now been retracted. The Duke TMQF committee recommended involving biostatistical expertise in all translational research projects (TMQF Committee, 2011b) and recognized the need to increase education and training of statisticians (TMQF Committee, 2011a). These experiences at Duke also emphasize the importance of involving senior-level biostatisticians who are coinvestigators and co-owners of responsibility, and who are intellectually independent, preferably reporting to an independent mentor or department chair.
Duke also attributed the lack of “sustained statistical collaboration” as a contributing factor to the research team’s failure to follow proper data management practices (Kornbluth and Dzau, 2011). In its discussion of data provenance, the TMQF committee recognized the importance to the integrity of confirmatory trials of maintaining confidentiality of interim data, stating, “secure database management systems are used to store and interrogate data for quality assurance; persons with vested interests (such
as clinical investigators) are blinded to, and independent from, data and analyses” (TMQF Committee, 2011b). This principle for having secure databases with a firewall between the interim data and the trial investigators is, for many reasons, of key importance to the integrity of confirmatory trials, including ensuring that the main hypotheses being addressed by the trial are not influenced by the data from the trial (see Chapters 2, 3, and 4).
There are indications that clinical databases for the three clinical trials were not adequately secure, in contrast to the principles stated above by the later Duke TMQF document. McShane stated that she had been contacted by someone who raised allegations that there were problems with how the data were being handled in some of the prospective Duke trials. In the 8/22/11 meeting with the IOM committee, Califf indicated some of the data from those trials could have been accessed some of the time because the endpoint information was not going into secure databases. Thus, in essence, the investigators could know case by case what the outcomes were, and potentially could have reconstructed the data. He also agreed that such access to emerging data could potentially lead to inappropriate actions such as reformulation of hypotheses.
THE ROLE OF FUNDERS IN RESPONDING TO SCIENTIFIC CONTROVERSIES6
NCI’s involvement in Duke’s external peer review of the three clinical trials was limited. It provided Duke with the names of statisticians who could potentially serve as peer reviewers and assisted Duke with the initial contact of the reviewers. However, according to McShane, “due to the incomplete information in the [Duke external reviewers’] report and the fact that NCI had no access to the data provided to the reviewers, NCI could not make a judgment on whether the concerns about the [tests] used in the Duke trials had been adequately addressed” (McShane, 2010a).
When NCI staff determined in April 2010 that NCI was providing partial funding through an R01 grant for the Duke trial using the cisplatin test (NCT00509366), it requested the data and computer code for the cisplatin and pemetrexed tests that were the basis for the primary aims of the grant work (McShane, 2010a). The investigators then provided NCI with the data and computer code for the cisplatin test, but not for the pemetrexed test. NCI staff evaluated the cisplatin test and were unable to reproduce the results. The analyses by both NCI and Duke’s external reviewers relied on
6 The Department of Defense (DOD) Breast Cancer Program funded the NCT00636441 trial, and Eli Lilly was a sponsor of the NCT00509366 and NCT00545948 trials. However, the committee did not interact with DOD or Eli Lilly and does not have information on any steps they took to investigate the scientific controversies surrounding the trials.
the data provided by the Duke investigators. NCI did not believe the external review was adequate. On June 29, 2010, NCI met with the Duke investigators and the Duke leadership. NCI asked Duke to produce the original raw data that would reproduce the findings in the papers. On October 22, 2010, Duke notified NCI that multiple validation datasets associated with the cisplatin test were corrupted. For example, William Barry explained in his August 22, 2011, testimony to the IOM committee that when he found the original source data for the ovarian cancer cell line drug sensitivity experiments, he was able to determine that the drug sensitivity measurements supplied to NCI for its evaluation of the cisplatin test differed from the true source data. As a result of the data corruption, the sensitivity predictions produced by the test showed a significant association with the incorrect sensitivity measures, but the association disappeared completely when the correct sensitivity data were used. Thus, the PIs and the Duke leadership agreed to terminate the trials (Kornbluth and Dzau, 2011). The process was also initiated to retract the paper by Hsu et al. that had been published in 2007 in the Journal of Clinical Oncology. This retraction (Hsu et al., 2010) was the first of several retractions from the investigators.
Nevins reported during his March 30, 2011, testimony that findings of data corruption had been observed for multiple datasets compiled by his team for purposes of validating the various chemotherapy sensitivity tests (Kornbluth and Dzau, 2011). These included data derived not only from Duke sources, but also publicly available data. As an example, a dataset of 133 samples from a neoadjuvant breast cancer trial at MD Anderson involving patients treated with the combined regimen TFAC was used for validation of a doxorubicin sensitivity test. The clinical annotation that was assumed to be used by Potti et al. included 34 responders and 99 non-responders, the same distribution as reported by MD Anderson. However, a detailed comparison of the two datasets revealed that the response information was reversed for 24 cases, with 12 labeled incorrectly in each direction. In this case, the corrupted data yielded positive validation results whereas the accurate data did not provide evidence for validation. Similar findings of data corruption in key validation datasets were observed in other instances.
There was a lack of substantive interaction between Duke and NCI about details of the charge to the external review committee and about details of the conduct of the investigation (e.g., regarding what material the committee had access to). Duke did not ask for any detailed help or comment from NCI, and NCI seemed to think it was not appropriate to try to provide specific direction unless it was “invited” by the university. The IOM committee recommended that federal funders of omics-based translational research should have authority to exercise the option of investigating any research being conducted by a funding recipient after requesting an investigation
by the institution. For example, in the future, NCI might consider having a more active supervisory role regarding adequacy of the “charge” or of the work of an independent review committee. Indeed, Kornbluth stated that Duke leadership wished it had understood early on that part of NCI’s concern stemmed from its inability to reproduce the exact published data. This became clear when she accompanied Nevins and Potti and colleagues to a meeting at NCI after the external review (Kornbluth, 2011).
In this case, NCI, through McShane, who had invested many months of time in pursuing the process and statistical issues, had great insight into the problems independent of the Baggerly and Coombes efforts. But NCI did not pursue this until later when the Duke external review appeared inadequate and after NCI determined that it was supplying partial funding for one of the three Duke clinical trials through an R01 grant to Potti. It was apparently only after discovering the funding tie through the grant that NCI believed it was justified in taking a more active role in the investigation and in requesting data and computer code to evaluate the tests. When NCI asked for data and computer code for both the cisplatin and pemetrexed tests that were being studied in the grant, the Duke investigators declined to provide the necessary data and computer code for the pemetrexed test on grounds that it was not being used in the trial linked to the grant. However, NCI funding supported much of the relevant foundational research for the trials. Perhaps a more active role by NCI earlier on might have avoided some mistakes in the external review process. In June 2010, however, Duke officials came to NCI, and McShane laid out the issues in great detail. As a result, Kornbluth said Duke leadership clearly understood the seriousness of the concerns. At the August 2011 meeting with the IOM committee, both Califf and Kornbluth clearly stated that they viewed McShane’s work as critical. Moving forward, it is important to ensure that universities (presuming that universities continue to be the major “responsibility party” for this kind of work) can get the detailed expertise and advice necessary to conduct a proper “evaluation process.” Kornbluth suggested during the August 22, 2011, meeting that for some problems a university might want to obtain “outside help,” perhaps a sister university in a consortium, either because of lack of expertise in-house or because of institutional conflict.
THE ROLE OF JOURNALS IN RESPONDING TO CREDIBLE CONCERNS ABOUT PUBLISHED MANUSCRIPTS
As described earlier in this appendix, Baggerly and Coombes pursued correspondence with the authors of the Nature Medicine paper, both directly and through letters to the editor. Baggerly and colleagues pursued communication directly with Nevins and colleagues from November 2006 to June 2007. Shortly after, communications between the groups broke
off. In June 2007, Baggerly and Coombes submitted correspondence to Nature Medicine outlining their unresolved concerns and questions about the omics-based tests (Baggerly, 2011). In November 2007, the correspondence was published along with a reply (Coombes et al., 2007; Potti and Nevins, 2007). As noted earlier in the section “Development and Evaluation Process,” the letter from Coombes et al. expressed five major concerns about the paper, including (1) their inability to reproduce the selection of cell lines from the sensitivity measures, (2) errors in the gene lists, (3) incorrect figures, (4) the use of combined training and test sets in the development process, and (5) their inability to reproduce the reported test performance results. In their reply, Potti and Nevins acknowledged some errors in the posted data, figures, and gene lists. However, they countered that Baggerly and Coombes had used different analytic methods, and they disagreed with Baggerly and Coombes’ objection to combining the training and test sets to develop the computational model. Furthermore, the authors stated that they had been able to successfully validate the tests in independent datasets, as reported in the papers by Hsu et al. (2007) and Bonnefoi et al. (2007). Two corrections to the Potti et al. Nature Medicine paper were published in November 2007 and August 2008, and the authors indicated that corrections had been made to the supplementary information posted online (Potti et al., 2007a, 2008). The Duke investigators said that, with only a few exceptions, the errors in posted data, figures, and gene lists were clerical errors that had no impact on the actual tests developed or the reported test performance results.
Baggerly and Coombes also corresponded with the authors and journal editors regarding the papers by Dressman et al. and Hsu et al., published in the Journal of Clinical Oncology (JCO). JCO published their letter and a reply from the authors regarding the Dressman et al. article (Baggerly et al., 2008; Dressman et al., 2008), but declined to publish the letter regarding the Hsu et al. article. According to Baggerly, when he tried to correspond with Potti et al. regarding the Bonnefoi et al. article published in Lancet Oncology (Bonnefoi et al., 2007), Potti was no longer willing to engage in a discussion. Lancet Oncology rejected their letter (Baggerly, 2011). Meanwhile, the papers were used and cited by hundreds of other investigators.7
Ultimately, the Nature Medicine paper was retracted on January 7, 2011, based on the NCI’s recommendation for a full review of all data associated with all of the predictors in the key papers that had been questioned or that had been used in clinical trials. The retraction cites the corruption of the validation datasets and explicitly states that the authors were “unable to
7 The Potti et al. (2006a) article was cited 306 times, the Hsu et al. (2007) article 60 times, the Dressman et al. (2008) article 111 times, the Bonnefoi et al. (2007) article 95 times, and the Potti et al. (2006b) article 350 times in Scopus (all as of October 28, 2011).
reproduce certain crucial experiments showing validation of signatures for predicting response to chemotherapies, including docetaxel and topotecan” (Potti et al., 2011a, p. 135). The papers by Hsu et al. (2007) and Bonnefoi et al. (2007) were retracted in November 2010 and February 2011, respectively. The Potti et al. (2006b) New England Journal of Medicine paper likewise was retracted in March 2011. The Dressman et al. (2007) Journal of Clinical Oncology paper was retracted in January 2012 (JCO, 2012). In addition, Duke leadership has identified 40 papers in which Potti was a coauthor and the study involved original data analysis. Duke has contacted all 162 of his coauthors and asked whether they support the veracity of their work. Based on this dialogue, two-thirds of the papers are being partially or fully retracted; one-third were still considered valid by Duke leadership as of August 2011 (Califf, 2011).
USE OF CHEMOSENSITIVITY TESTS AT OTHER INSTITUTIONS
One of the motivations for encouraging transparency and open scientific discourse is that scientific progress is built on the foundations of past work. The retracted papers were cited dozens or hundreds of times before they were retracted, and many grants were awarded based on such work. Thus, the committee sought information on whether clinical trials had been initiated at other institutions, based on the now-retracted work from Duke University investigators. The committee’s concerns focused on the following questions:
- Have gene expression–based tests either developed or validated on the Nevins/Potti data been used for patient management decisions in clinical trials other than those named in this committee’s statement of task?
- Are investigators at other universities or cancer centers involved in the design and conduct of trials using gene expression–based tests linked to the work of Nevins and Potti and colleagues?
- If so, who sponsored these trials? Are these sponsors fully informed about possible integrity issues with the tests used in these trials?
- Have NCI and other sponsors conducted appropriately comprehensive investigations into any suspected scientific integrity issues?
Through a search on ClinicalTrials.gov and on the NIH RePORTER, the committee identified a clinical trial at the Moffitt Cancer Center & Research Institute (MCC), NCT00720096, A Pilot Prospective Trial of Genomic Directed Salvage Chemotherapy with Either Liposomal Doxorubicin or Topotecan in Recurrent or Persistent Ovarian Cancer Within 12 Months of Platinum-Based Chemotherapy (ClinicalTrials.gov, 2011d) and
an NIH R33 grant, 5R33CA110499-05, Molecular Profiling to Predict Response to Chemotherapy (Lancaster, 2008). According to its ClinicalTrials.gov entry, the trial was initiated in July 2008 and was terminated in October 2009, with four patients accrued. The R33 grant turned out to be a continuation of the grant 1R21CA110499-01A2, acknowledged in the 2007 Journal of Clinical Oncology paper authored by Dressman et al. (2007).
The committee requested copies of the MCC clinical trial protocol and informed consent documents (H. Lee Moffitt Cancer Center & Research Institute, 2007, 2008a,b, 2009), and sent a letter to Moffitt Cancer Center & Research Institute Director William Dalton requesting further information about the trial and grant. Moffitt provided copies of the protocol and informed consent documents. Dalton responded to the letter from the committee,8 providing important additional insights into the history of the trial and grant. The letter also described some interactions with NCI, which prompted a response to the IOM committee from NCI in the form of a letter from NCI statistician McShane.9
The Moffitt ovarian cancer clinical trial was assessing omics-based tests developed to predict sensitivity to liposomal doxorubicin and to topotecan. Treatment in the trial was to be directed by the test results. Regarding the origin of the tests, the protocol states, “The predictive models for Doxil [liposomal doxorubicin] and topotecan as defined in our previous work [Potti et al., 2006a] will be implemented to assess the predictive response of a clinical trial sample.” Sponsors of the trial were Moffitt and DOD, and the PI of the trial was MCC’s Robert M. Wenham. Jonathan Lancaster, currently an MCC investigator, was identified by Dalton as a coinvestigator on the trial. Lancaster was formerly at Duke University and a coauthor with Nevins and Potti on three of the papers retracted to date: the 2006 Nature Medicine paper and two 2007 Journal of Clinical Oncology papers (Dressmann et al., 2007; Hsu et al., 2007; Potti et al., 2006a).
Dalton’s letter, which he describes as representing the inputs of Wenham and Lancaster, states that the gene expression–based tests were developed at Moffitt through an NCI-funded R21 grant and were undergoing prospective validation in a study supported by an R33 grant, in which the tests
8 Personal communication from William Dalton, H. Lee Moffitt Cancer Center & Research Institute, to Gilbert S. Omenn, University of Michigan, RE: Response to questions - Genomic- directed salvage chemotherapy with either liposomal doxorubicin or topotecan, September 28, 2011.
9 Personal communication from Lisa McShane, National Cancer Institute, to Gilbert S. Omenn, University of Michigan, RE: Moffitt response to questions on the trial Genomics- directed salvage chemotherapy with either liposomal doxorubicin or topotecan, October 23, 2011.
were not being used to guide therapy. The Moffitt trial, which was using the tests to guide therapy, was running concurrently with the R33 validation study. McShane’s response indicates that NCI did not view the tests as being ready for use in directing therapy, whereas Moffitt viewed it as acceptable for the tests to be used to guide therapy in the context of a feasibility trial. The Moffitt trial had undergone institutional scientific and IRB review as well as DOD review where it was deemed acceptable, just as in the case of the Duke tests and trials. This experience may point to inconsistencies in standards required by different funding agencies and institutions for when an omics-based test is ready for use in directing therapy, or it may reflect insufficient information provided to the funding agency. The committee makes several recommendations (in Chapter 5 on Responsible Parties) to guide these determinations and promote more consistent standards to protect patients and avoid waste of resources that could result when tests are put into clinical studies prematurely.
NCI stated that it had concerns about the information initially presented on the tests during the R21 grant transition review. Problems with clarity and consistency of information presented about the omics-based tests were identified by NCI transition reviewers, and NCI questioned whether the tests were appropriately locked down and ready for validation in the R33. This caused NCI to conduct a more extensive review than usual, involving direct interaction in early 2008 between NCI statistician McShane and Moffitt statistician Steven Eschrich and the provision of some example data and computer code to NCI to allow it to assure lock-down of one of the five gene expression–based tests reported to have been developed in the R21. However, the test examined was not one of those used in the Moffitt trial. NCI states that it was not aware until October 2009 of the Moffitt trial running concurrently with the R33 grant. Whether the trial cospon-sor DOD or Moffitt’s review bodies were aware of the concerns NCI had about readiness of the omics-based tests for use in directing therapy at the time they approved the trial is unknown. When NCI did find out about the Moffitt trial, it contacted Lancaster with its concerns, and the Moffitt trial was terminated shortly afterward. Termination of the Moffitt trial was also mentioned by the October 23, 2009, Cancer Letter, where it was reported that a Moffitt spokesperson indicated the closure was unrelated to the controversy concerning the three Duke University clinical trials (Goldberg, 2009b). This suggests that DOD and the Moffitt IRB might not have been aware of NCI’s concerns even when the trial was terminated. The IOM committee’s recommendations encourage better communications among funders so that important information about omics research and omics-based tests developed in the course of that research is shared to more fully inform decisions about the readiness of tests for use in clinical trials or clinical care.
Ambiguity remains on exactly what predictors were used in the Moffitt
trial and whether IDEs had been obtained. The Dalton letter states that the gene expression–based tests used in the trial were derived at Moffitt and not at Duke. NCI expressed its uncertainty about the source of the tests because the Moffitt trial protocol identifies Potti et al. (2006a) as that source. The answer to this question is important because the retraction notice for the Potti et al. (2006a) paper specifically states an inability to reproduce the validation results for the topotecan test as one of the reasons for the retraction. FDA oversight, such as through an IDE review process, might avoid some of these types of confusion and ensure that omics-based tests are locked down and identifiable. Understanding of CLIA and FDA requirements has evolved over recent years, and the recommendations from the IOM committee should be helpful in promoting better understanding and appreciation for these federal regulations. The Dalton letter supports the committee’s emerging recommendation that institutions seek FDA guidance and meet FDA IDE requirements.
The events that occurred at Moffitt support the notion that many of the problems identified at Duke University are probably not unique to Duke. Learning about these multiple situations informed the development of the IOM committee’s recommendations.
A time line summarizing events related to the material in this appendix is presented in Table B-2.
The committee identified several overarching themes in design, conduct, and oversight of omics research from the Duke case study. Among these themes, transparency and open communication remain important principles for the conduct of science, whether reporting data and computer code, disclosing conflicts of interest, or reporting potential breaches in scientific procedures. Institutions play an important role in establishing a culture that includes expectations of behavior, achievement, and integrity, and providing safe environments for reporting irregularities. Oversight processes that will maintain integrity even in the presence of institutional conflicts of interest may be especially important in achieving this goal.
Regarding development of gene expression–based chemosensitivity tests, validation requires steps to lock down important features, including hypotheses, computational models, and analysis plans. In order to protect patients from harm due to use of a faulty predictor, it is essential to follow the kind of scheme presented in Figure S-1 to confirm and then validate omics-based tests before launching clinical trials or offering them commercially for clinical use. It is important to involve appropriate expertise of bio statisticians and bioinformatics scientists in design, analysis, and oversight. It is also important for all members of a research team to understand the aims and many details of the collaborative study and for coauthors
of a publication to keep each other informed about constructive criticism of the work and ways to improve the publications and ongoing research. Even before a test is considered for use to direct patient care, these good science practices should be followed to avoid wasted effort and resources. These themes are reflected in the detailed recommendations presented in Chapters 2-5.
While further pursuit of the questions raised about the clinical trials and omics-based tests discussed in this appendix may be undertaken separately from the work of the IOM committee, the need for availability of data and computer code, the need to follow a rigorous test development and evaluation process prior to use of an omics-based test in clinical trials, and the responsibilities of investigators, institutions, journals, and funding agencies are clear lessons.
TABLE B-2 Time Line of Events Surrounding the Duke Gene Expression–Based Tests
|2000||Computational and Applied Genomics Program (CAGP) at Duke University founded by Joseph Nevins and Mike West (Kornbluth and Dzau, 2011).|
|2003||Creation of Duke University Institute for Genome Sciences and Policy
CAGP becomes the new IGSP Center for Applied Genomics and
Technology (CAGT) (Kornbluth and Dzau, 2011).
Anil Potti completes medical residency in North Dakota and begins
fellowship at Duke in lab of Thomas Ortel; in 2004, he joins Nevins
laboratory (Kornbluth and Dzau, 2011).
|2006||Potti hired by CAGT to establish an independent lab focused on gene expression-based research (Kornbluth and Dzau, 2011).|
|August 2006||New England Journal of Medicine (NEJM) publishes of “A genomic strategy to refne prognosis in early-stage non-small cell lung cancer” (Potti et al., 2006b).|
|October 2006||Nature Medicine publishes online “Genomic signatures to guide the use of chemotherapeutics” (Potti et al., 2006a).|
|November 2006||Keith Baggerly and colleagues begin correspondence about the Nature Medicine paper and subsequent publications with Potti and colleagues (Baggerly, 2011). Communication continues through June 2007.|
|2007||Clinical Genomics Studies Unit established at Duke University (Kornbluth and Dzau, 2011).|
|January 2007||Letters to the editor and author reply related to the NEJM paper (Potti
et al., 2006b) published (Larsen et al., 2007; Potti et al., 2007b; Singh
and Dhindsa, 2007; Sun and Yang, 2007).
Correction to Potti et al. (2006b) published in NEJM (Correction,
|February 2007||Journal of Clinical Oncology publishes “An integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer” (Dressman et al., 2007).|
|April 2007||William Barry joins IGSP (Kornbluth and Dzau, 2011).|
|July 2007||Study Using a Genomic Predictor of Platinum Resistance to Guide Therapy in Stage IIIB/IV Non-Small Cell Lung Cancer (TOP0602) entered on ClinicalTrials.gov (Identifer NCT00509366).|
|October 2007||Journal of Clinical Oncology publishes "Pharmacogenomic strategies provide a rational approach to the treatment of cisplatin-resistant patients with advanced cancer" (Hsu et al., 2007). Adjuvant Cisplatin With Either Genomic-Guided Vinorelbine or Pemetrexed for Early Stage Non-Small Cell Lung Cancer (TOP0703) entered on ClinicalTrials.gov (Identifer NCT00545948).|
|November 2007||Publication of a letter by Coombes et al. in Nature Medicine critiquing the Potti et al. (2006a) paper, together with a rebuttal (Coombes et al., 2007; Potti and Nevins, 2007).
Baggerly et al. submit a letter, “Pharmacogenomic strategies may not provide a rational approach to the treatment of cisplatin-resistant patients with advanced lung cancer,” to Journal of Clinical Oncology. It is rejected (Baggerly, 2011).
|December 2007||Lancet Oncology publishes “Validation of gene signatures that predict the response of breast cancer to neoadjuvant chemotherapy: A substudy of the EORTC 10994/BIG 00-01 clinical trial” (Bonnefoi et al., 2007).|
|March 2008||Trial to Evaluate Genomic Expression Profiles to Direct Preoperative Chemotherapy in Early Stage Breast Cancer entered on ClinicalTrials.gov (Identifer NCT00636441).
Potti et al. submit revised R01 grant proposal, "Prospective Validation of Genomic Signatures of Chemosensitivity in NSCLC" (CA131049-01A1), which is linked to a Phase II trial using the cisplatin chemosensitivity test to direct therapy for advanced-stage lung cancer patients. The trial was later identifed as NCT00509366, which began enrolling patients in June 2007 (McShane, 2010b).
Publication of a letter to the editor by Baggerly et al. "Run batch effects potentially compromise the usefulness of genomic signatures for ovarian cancer" (Baggerly et al., 2008), a comment on Dressman et al. (2007), and an author reply in the Journal of Clinical Oncology (Dressman et al., 2008).
|May 2008||Baggerly and Coombes submit a letter to the editor of Nature Medicine, “Microarrays: Retracing steps (again).”a
Baggerly and Coombes submit a letter to the editor of Lancet Oncology, "Have gene signatures that predict the response of breast cancer to neoadjuvant chemotherapy been validated?" (Baggerly, 2011).
|June 2008||Nature Medicine requests that Baggerly and Coombes 5/08 letter be sent
to Potti and coauthors.b
Nature Medicine rejects letter.c Lancet Oncology rejects letter.d
|July 2008||Genomic Directed Salvage Chemotherapy with Either Liposomal Doxorubicin or Topotecan entered on ClinicalTrials.gov (Identifer NCT00720096).|
|July 2009||Cancer and Leukemia Group B (CALGB) submits revised CALGB-30702 protocol (Genome-Guided Chemotherapy for Untreated and Treated Advanced Stage Non-Small Cell Lung Cancer: A Limited Institution, Randomized Phase II Study).e
Current Oncology Reports publishes “Translating genomics into clinical practice: Applications in lung cancer” (Jolly Graham and Potti, 2009).
|September 2009||Annals of Applied Statistics publishes online: "Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology" (Baggerly and Coombes, 2009). The National Cancer Institute (NCI) contacts Duke to ask that the university carefully consider the validity of the work and its extrapolation to the clinic (McShane, 2010a).|
|October 2009||10/2 - The Cancer Letter first covers the story; Nevins asserts that the
approach has been shown to work in a blinded validation by Bonnefoi
et al. (2007) (Goldberg, 2009a).
The Data Safety Monitoring Board and Duke Cancer Protocol Review
Committee conclude that issues raised by Baggerly and Coombes (2009)
presented no immediate increased risks to study patients already on
therapy (Kornbluth and Dzau, 2011).
Enrollment in the three trials is suspended (Duke University, 2007a,b,
2008). Patients already enrolled in the trials are informed of the
controversy and reconsented (Kornbluth and Cuffe, 2010).
Duke IRB commissions an independent, external two-person review of
the scientific methodology in question. NCI provides assistance in
identifying potential external experts (Kornbluth and Dzau, 2011;
Baggerly and Coombes’ data analysis and questions from the Annals of
Applied Statistics paper were shared with the Duke IRB and principal
investigators of the three clinical trials (Kornbluth and Dzau, 2011).
10/23 — The Cancer Letter reports statements from coauthors of the
Lancet Oncology study that the validation was never blinded (Goldberg,
|November 2009||11/9 — Baggerly sends a report highlighting problems with data posted on a webpage on the cisplatin and pemetrexed tests to Kornbluth at Duke. This report was shared with Nevins, who asked that it be withheld from the external reviewers; Duke leadership decided to honor Nevins’ request (Kornbluth and Dzau, 2011).
11/9 — Claudio Dansky Ullmann of NCI submits the review of revised CALGB-30702 protocol (Genome-Guided Chemotherapy for Untreated and Treated Advanced Stage Non-Small Cell Lung Cancer: A Limited Institution, Randomized Phase II Study) to NCI’s Cancer Therapy Evaluation Program (CTEP) Protocol and Information Offce and forwards the review and disapproval letter to CALGB.f ,g 11/16 - Lisa McShane and Jeffrey Abrams of NCI contact CALGB requesting re-evaluation of the Lung Metagene Score (LMS) test for CALGB-30506.h
Ullmann and McShane contribute to an erratum published in Current Oncology Reports to Jolly Graham and Potti (2009).
|December 2009||External reviewers find that "In summary we believe the predictors are scientifcally valid and with a few additions can be fully responsive to the comments of Baggerly and Coombes" (Review of genomic predictors for clinical trials from Nevins, Potti, and Barry, 2009).|
|January 2010||Letter submitted to NCI on 1/7/2010, accompanied by the report from the external reviewers (Kornbluth and Dzau, 2011; McShane, 2010a; Review of genomic predictors for clinical trials from Nevins, Potti, and Barry, 2009).
Duke restarts the three trials (NCT00545948, NCT00509366, and NCT00636441) (ClinicalTrials.gov, 2011a,b,c).
|February 2010||NCI completes reevaluation of supporting data for the CALGB-30506 trial (NCI, 2010b).|
|March 2010||Nevins et al. send a letter to McShane in response to some of her concerns about the LMS used in CALGB-30506.i
McShane and Abrams reply with the conclusions of their analysis of the LMS in the CALGB-30506 clinical trial: The test should not remain as a stratifcation factor, and the coprimary aim to evaluate its performance should be removed from the study.j
|April 2010||CTEP requests data and computer code from Potti regarding R01 grant
CA131049-01A1 titled "Prospective validation of genomic signatures of
chemosensitivity in NSCLC" (cisplatin and pemetrexed tests).k
Potti responds to CTEP.l
The Cancer Letter obtains a copy of Duke University’s external review
report from NCI via a Freedom of Information Act request and
publishes the document (Goldberg, 2010a).
|May 2010||CTEP sends follow-up questions to Potti regarding their response to the April 2010 request regarding the cisplatin and pemetrexed tests. Potti responds.m|
|June 2010||NCI completes its reevaluation of the cisplatin chemosensitivity test (McShane, 2010c).
NCI hosts Duke researchers to discuss the gene expression-based tests developed at Duke. NCI states that it is not satisfed, and directs Potti and Nevins to conduct a search of their labs to supply the data and code reproducing the results in Hsu et al. (2007) and justifying the trials under way. Duke statistician William Barry is tasked with checking the cisplatin/pemetrexed tests and verifying the data (Kornbluth and Dzau, 2011; NCI, 2010a; TMQF Committee, 2011b).
|July 2010||7/16 — The Cancer Letter reports that Anil Potti incorrectly stated his
credentials. Duke places Potti on administrative leave while the
University investigates allegations of inaccuracies in his curriculum vitae
and in the research with Nevins (Goldberg, 2010b).
7/19 - Thirty-one biostatisticians and bioinformatics experts from
around the world send a letter, “Concerns about prediction models used
in Duke clinical trials,” to NCI director Harold Varmus. This letter is
later signed by two additional statisticians (Baron et al., 2010).
7/23 - Lancet Oncology issues an expression of concern for
“Validation of gene signatures that predict the response of breast cancer
to neoadjuvant chemotherapy: A substudy of the EORTC 10994/BIG
00-01 clinical trial” (Bonnefoi et al., 2007).
NCT00545948, NCT00509366, and NCT00636441 trials suspended a
second time (ClinicalTrials.gov, 2011a,b,c).
7/30 — NCI and Duke request assistance from the Institute of Medicine
(IOM) in assessing the scientifc foundation of the three clinical trials
and identifying appropriate evaluation criteria for future tests based on
|August 2010||8/27 — Duke completes its review of Potti’s credentials; identifes issues of substantial concern resulting in corresponding sanctions. Potti remains on administrative leave (Duke Today, 2010).|
|October 2010||10/22 — Duke officials inform NCI that they have determined that several datasets reported to have been used to validate the cisplatin test were found to be flawed. The Hsu et al. (2007) paper would be retracted. Investigation into other datasets was ongoing (McShane, 2010a).|
|November 2010||NCT00545948, NCT00509366, and NCT00636441 trials terminated in ClinicalTrials.gov (ClinicalTrials.gov, 2011a,b,c). 11/16 — Journal of Clinical Oncology retracts "Pharmacogenomic strategies provide a rational approach to the treatment of cisplatin-resistant patients with advanced cancer" (Hsu et al., 2007, 2010) 11/19 — Anil Potti resigns from his position at Duke (DukeHealth.org, 2010), later taking a position as an oncologist in South Carolina (Cancer Letter, 2010) with strong endorsement from some Duke faculty members (Duke.Fact.Checker, 2011).|
|December 2010||12/20 - McShane describes to the IOM committee the NCI interactions with the Duke investigators pertaining to the gene expression-based tests, and supplies documentation to the committee. This is the frst public explanation of why NCI thought problems with the LMS were severe enough to warrant pulling it from CALGB 30506. This publicly calls the NEJM paper into question. In addition, she reveals that NCI had discovered that it had been providing partial funding to the trial NCT00509366 through an R01 grant awarded to Anil Potti. She describes her unsuccessful attempts to reproduce the results reported in the Hsu et al. (2007) paper for the cisplatin test and how that eventually led to discovery of several corrupted datasets (McShane, 2010a).|
|January 2011||IGSP Center for Applied Genomics and Technology is dissolved
(Goldberg, 2011; Havele, 2011).
Nature Medicine retraction (Potti et al., 2011a).
1/31 — The Food and Drug Administration (FDA) conducts an
inspection at Duke University to detemune the rationale for the IRB’s
initial non-significant risk decision regarding an investigational device
exemption (IDE) (FDA, 2011).
|February 2011||Lancet Oncology retraction (Bonnefoi et al., 2011).|
|March 2011||NEJM retraction (Potti et al., 2011b).
Draft document, A framework for the quality of translational medicine with a focus on human genomic studies: Principles from the Duke Medicine Translational Medicine Quality Framework [TMQF] committee, released. Final draft is released in May 2011.
|July 2011||Duke sends the IOM committee a list of identified problems, missed signals, and proposed solutions based on the work of the TMQF committee (TMQF Committee, 2011b).|
|August 2011||8/22 — Duke representatives meet with the IOM committee: Robert Califf, Sally Kornbluth, Michael Cuffe, Ross McKinney, John Falletta, Geoff Ginsburg, Michael Kelley, and William Barry.|
|January 2012||1/25 — FDA posts documents on its website indicating that it informed Duke in 2009 that an IDE should have been obtained for the three trials (Chan, 2009; FDA, 2011; Potti, 2009).
Journal of Clinical Oncology retracts “An integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer” (Dressman et al, 2007; JCO, 2012).
|a Communication from Michael Burns, Nature Medicine, to Keith Baggerly, MD Anderson Cancer Center. Receipt of NMED-LE40837, May 30, 2008.
b Communication from Alison Farrell, Nature Medicine, to Keith Baggerly, MD Anderson Cancer Center. NMED-LE40837, June 2, 2008.
c Communication from Alison Farrell, Nature Medicine, to Keith Baggerly, MD Anderson Cancer Center. Decision on NMED-LE40837, June 11, 2008.
d Communication from David Collingridge, Lancet Oncology, to Keith Baggerly, MD Anderson Cancer Center. Your submission to the Lancet Oncology, September 6, 2008.
|e Communication from Olwen Hahn, CALGB, to Michael Montello, National Cancer Institute. RE: CALGB 30702, July 28, 2009.
f Communication from Claudio Dansky Ullmann, National Cancer Institute, to CTEP Protocol and Information Office. Consensus review of revised protocol CALGB 30702: Genome-guided chemotherapy for untreated and treated advanced stage non-small cell lung cancer: A limited institution, randomized phase II study, November 9, 2009.
g Communication from Claudio Dansky Ullmann, National Cancer Institute, to Richard Schilsky, CALGB. Reference number PCALBG-30702#R01PDISAPP01, November 9, 2009.
h Communication from Jeffrey Abrams and Lisa McShane, National Cancer Institute, to Richard Schilsky, CALGB. Important computer code and data request for CALGB-30506, November 16, 2009.
i Communication from Joseph R. Nevins, Anil Potti, William Barry, and David Harpole, Duke University. Response to the NCI re-evaluation of supporting data for the CALGB-30506 trial, March 8, 2010.
j Communication from Lisa McShane and Jeffrey Abrams, National Cancer Institute, to Joseph R. Nevins, Anil Potti, William Barry, and David Harpole, Duke University. RE: Nevins, Potti, Barry, and Harpole response to the NCI re-evaluation of supporting data for the CALGB-30506 trial, March 26, 2010.
k Communication from William C. Timmer, National Cancer Institute, to Anil Potti, Duke University. RE: R01CA131049-01A1 information request, April 13, 2010.
l Communication from Anil Potti, Duke University, to William C. Timmer, National Cancer Institute. RE: R01CA131049-01A1 information request, April 29, 2010.
mCommunication from Lisa McShane, National Cancer Institute, to Anil Potti, Duke University. RE: R01CA131049-01A1 information request, May 17, 2010.
Baggerly, K. A. 2011. Forensics Bioinformatics. Presentation to the Workshop of the IOM Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials, Washington, DC, March 30-31.
Baggerly, K. A., and K. R. Coombes. 2009. Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. Annals of Applied Statistics 3(4):1309-1334.
Baggerly, K. A., K. R. Coombes, and E. S. Neeley. 2008. Run batch effects potentially compromise the usefulness of genomic signatures of ovarian cancer. Journal of Clinical Oncology 26(7):1186-1187.
Baron, A. E., K. Bandeen-Roche, D. A. Berry, J. Bryan, V. J. Carey, K. Chaloner, M. Delorenzi, B. Efron, R. C. Elston, D. Ghosh, J. D. Goldberg, S. Goodman, F. E. Harrell, S. Galloway Hilsenbeck, W. Huber, R. A. Irizarry, C. Kendziorski, M. R. Kosorok, T. A. Louis, J. S. Marron, M. Newton, M. Ochs, J. Quackenbush, G. L. Rosner, I. Ruczinski, S. Skates, T. P. Speed, J. D. Storey, Z. Szallasi, R. Tibshirani, and S. Zeger. 2010. Letter to Harold Varmus: Concerns about Prediction Models Used in Duke Clinical Trials. Bethesda, MD, July 19.
Bild, A. H., G. Yao, J. T. Chang, Q. Wang, A. Potti, D. Chasse, M. B. Joshi, D. Harpole, J. M. Lancaster, A. Berchuck, J. A. Olson, Jr., J. R. Marks, H. K. Dressman, M. West, and J. R. Nevins. 2006. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439(7074):353-357.
Bonnefoi, H., A. Potti, M. Delorenzi, L. Mauriac, M. Campone, M. Tubiana-Hulin, T. Petit, P. Rouanet, J. Jassem, E. Blot, V. Becette, P. Farmer, S. Andre, C. R. Acharya, S. Mukherjee, D. Cameron, J. Bergh, J. R. Nevins, and R. D. Iggo. 2007. Validation of gene signatures that predict the response of breast cancer to neoadjuvant chemotherapy: A substudy of the EORTC 10994/BIG 00-01 clinical trial. Lancet Oncology 8(12):1071-1078.
Bonnefoi, H., A. Potti, M. Delorenzi, L. Mauriac, M. Campone, M. Tubiana-Hulin, T. Petit, P. Rouanet, J. Jassem, E. Blot, V. Becette, P. Farmer, S. Andre, C. Acharya, S. Mukherjee, D. Cameron, J. Bergh, J. R. Nevins, and R. D. Iggo. 2011. Retraction: Validation of gene signatures that predict the response of breast cancer to neoadjuvant chemotherapy: A substudy of the EORTC 10994/BIG 00-01 clinical trial. Lancet Oncology 12(2):116.
Califf, R. M. 2011. Discussion at Discovery of Process Working Group Meeting with Representatives of Duke Faculty and Administration, Washington, DC, August 22.
Cancer Letter. 2011. In the cancer centers. 37(22):1.
Chan, M. M. 2009. Letter to Division of Medical Oncology, Duke University Medical Center. http://www.fda.gov/downloads/MedicalDevices/ProductsandMedicalProcedures/InVitroDiagnostics/UCM289102.pdf (accessed February 9, 2012).
ClinicalTrials.gov. 2011d. Genomic Directed Salvage Chemotherapy with Either Liposomal Doxorubicin or Topotecan. http://clinicaltrials.gov/ct2/show/NCT00720096?term=NCT00720096&rank=1 (accessed October 11, 2011).
Coombes, K. R., J. Wang, and K. A. Baggerly. 2007. Microarrays: Retracing steps. Nature Medicine 13(11):1276-1277.
Correction. 2007. New England Journal of Medicine 356(2):201-202.
Cuffe, M. 2011. Discussion at Discovery of Process Working Group Meeting with Representatives of Duke Faculty and Administration, Washington, DC, August 22.
Dressman, H. K., A. Berchuck, G. Chan, J. Zhai, A. Bild, R. Sayer, J. Cragun, J. Clarke, R. S. Whitaker, L. Li, G. Gray, J. Marks, G. S. Ginsburg, A. Potti, M. West, J. R. Nevins, and J. M. Lancaster. 2007. An integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer. Journal of Clinical Oncology 25(5):517-525.
Dressman, H. K., A. Potti, J. R. Nevins, and J. M. Lancaster. 2008. In reply. Journal of Clinical Oncology 26(7):1187-1188.
Duke Today. 2010. Duke Updates Response to Potti Allegations. http://today.duke.edu/2010/08/pottiresponse.html (accessed December 12, 2011).
Duke University. 2007a. Adjuvant Cisplatin with Either Genomic-Guided Vinorelbine or Pemetrexed for Early Stage Non-Small-Cell Lung Cancer (TOP0703). http://clinicaltrials.gov/ct2/show/NCT00545948?term=nct00545948&rank=1 (accessed November 23, 2011).
Duke University. 2007b. Study Using a Genomic Predictor of Platinum Resistance to Guide Therapy in Stage IIIB/IV Non-Small Cell Lung Cancer (TOP0602). http://clinicaltrials.gov/ct2/show/NCT00509366?term=nct00509366&rank=1 (accessed November 23, 2011).
Duke University. 2008. Trial to Evaluate Genomic Expression Profiles to Direct Preoperative Chemotherapy in Early Stage Breast Cancer. http://clinicaltrials.gov/show/NCT00636441 (accessed November 22, 2011).
Duke.Fact.Checker. 2011. Texts of Letters of Recommendation for Dr. Anil Potti. http://dukefactchecker.blogspot.com/2011/06/texts-of-letters-of-recommendation-for.html (accessed December 12, 2011).
DukeHealth.org. 2010. Duke Accepts Potti Resignation; Retraction Process Initiated with Nature Medicine. http://www.dukehealth.org/health_library/news/duke-accepts-potti-resignation-retraction-process-initiated-with-nature-medicine (accessed December 12, 2011).
Falletta, J. 2011. Discussion at Discovery of Process Working Group Meeting with Representatives of Duke Faculty and Administration, Washington, DC, August 22.
Food and Drug Administration (FDA). 2011. FDA Establishment Inspection Report, Duke University Medical Center. http://www.fda.gov/downloads/MedicalDevices/ProductsandMedicalProcedures/InVitroDiagnostics/UCM289106.pdf (accessed February 9, 2012).
Ginsburg, G. S. 2011. Discussion at Discovery of Process Working Group Meeting with Representatives of Duke Faculty and Administration, Washington, DC, August 22.
Goldberg, P. 2009a. A biostatistic paper alleges potential harm to patients in two Duke clinical studies. Cancer Letter 35(36):1-5.
Goldberg, P. 2009b. Duke halts third trial; coauthor disputes claim that data validation was blinded. Cancer Letter 35(39):1-4.
Goldberg, P. 2010a. NCI raises new questions about Duke genomics research, cuts assay from trial. Cancer Letter 36(18):1-7.
Goldberg, P. 2010b. Prominent Duke scientist claimed prizes he didn’t win, including Rhodes Scholarship. Cancer Letter 36(27):1-7.
Goldberg, P. 2011. FDA auditors spend two weeks at Duke; Nevins loses position in reorganization. Cancer Letter 37(4):1-2, 4-5.
Gyorffy, B., P. Surowiak, O. Kiesslich, C. Denkert, R. Schafer, M. Dietel, and H. Lage. 2006. Gene expression profiling of 30 cancer cell lines predicts resistance towards 11 anticancer drugs at clinically achieved concentrations. International Journal of Cancer 118(7):1699-1712.
H. Lee Moffitt Cancer Center & Research Institute. 2007. NCT00720096 Protocol version 10, October 19.
H. Lee Moffitt Cancer Center & Research Institute. 2008a. NCT00720096 Protocol version 13, January 10.
H. Lee Moffitt Cancer Center & Research Institute. 2008b. NCT00720096 Protocol version 14, July 14.
H. Lee Moffitt Cancer Center & Research Institute. 2009. NCT00720096 Protocol version 15, July 26.
Havele, S. 2011. IGSP reviews organization, future plans. Chronicle, January 21. http://duke-chronicle.com/article/igsp-reviews-organization-future-plans (accessed January 13, 2012).
Hsu, D. S., B. S. Balakumaran, C. R. Acharya, V. Vlahovic, K. S. Walters, K. Garman, C. Anders, R. F. Riedel, J. Lancaster, D. Harpole, H. K. Dressman, J. R. Nevins, P. G. Febbo, and A. Potti. 2007. Pharmacogenomic strategies provide a rational approach to the treatment of cisplatin-resistant patients with advanced cancer. Journal of Clinical Oncology 25(28):4350-4357.
Hsu, D. S., B. S. Balakumaran, C. R. Acharya, V. Vlahovic, K. S. Walters, K. Garman, C. Anders, R. F. Riedel, J. Lancaster, D. Harpole, H. K. Dressman, J. R. Nevins, P. G. Febbo, and A. Potti. 2010. Retraction: Pharmacogenomic strategies provide a rational approach to the treatment of cisplatin-resistant patients with advanced cancer. Journal of Clinical Oncology 28(35):5229.
JCO (Journal of Clinical Oncology). 2012. An Integrated Genomic-Based Approach to Individualized Treatment of Patients with Advanced-Stage Ovarian Cancer: Retraction. http://jco.ascopubs.org/content/25/5/517/suppl/DC1 (accessed January 30, 2012).
Jolly Graham, A., and A. Potti. 2009. Translating genomics into clinical practice: Applications in lung cancer. Current Oncology Reports 11(4):263-268.
Kornbluth, S. A. 2011. Discussion at Discovery of Process Working Group Meeting with Representatives of Duke Faculty and Administration, Washington, DC, August 22.
Kornbluth, S. A., and M. Cuffe. 2010. Preliminary Accounting of Events at Duke University. Durham, NC: Duke University.
Kornbluth, S. A., and V. Dzau. 2011. Predictors of Chemotherapy Response: Background Information: Draft. Duke University.
Lancaster, J. M. 2008. Molecular Profiling to Predict Response to Chemotherapy, 5R33CA110499-05. http://projectreporter.nih.gov/project_info_description.cfm?aid=8101020&icde=10060731 (accessed October 11, 2011).
Larsen, J. E., K. M. Fong, and N. K. Hayward. 2007. To the editor: Refining prognosis in non-small-cell lung cancer. New England Journal of Medicine 356(2):190.
Marcom, P. K. 2008. A Randomized Phase II Trial Evaluating the Performance of Genomic Expression Profiles to Direct the Use of Preoperative Chemotherapy for Early Stage Breast Cancer. Durham, NC: Duke Institute for Genome Sciences and Policy.
McKinney, R. 2011. Discussion at Discovery of Process Working Group Meeting with Representatives of Duke Faculty and Administration, Washington, DC, August 22.
McShane, L. M. 2010a. NCI Address to Institute of Medicine Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials. Presentation at Meeting 1. Washington, DC, December 20.
McShane, L. M. 2010b. Notes from June 29 meeting with Duke investigators.
McShane, L. M. 2010c. Re-analysis Report for Cisplatin Chemosensitivity Predictor. Bethesda, MD: National Cancer Institute.
NCI (National Cancer Institute). 2010a. Discussion of Genomic Predictors Developed at Duke University. Presented at the National Cancer Institute, Rockville, MD June 29.
NCI. 2010b. Executive Summary: NCI Re-evaluation of Supporting Data for the CALGB-30506 Trial. Bethesda, MD: National Cancer Institute.
Nevins, J. 2011. Genomic Strategies to Address the Challenge of Personalizing Cancer Therapy. Presentation at the Workshop of the IOM Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials, Washington, DC, March 30-31.
Pittman, J. E. Huang, J. Nevins, Q. Wang, and M. West. 2004. Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes. Biostatistics 5(4):587-601.
Potti, A. 2009. Letter to FDA’s CDER from Division of Medical Oncology, Duke University Medical Center. http://www.fda.gov/downloads/MedicalDevices/ProductsandMedicalProcedures/InVitroDiagnostics/UCM289103.pdf (accessed February 9, 2012).
Potti, A., and J. R. Nevins. 2007. Potti et al. reply. Nature Medicine 13(11):1277-1278.
Potti, A., H. K. Dressman, A. Bild, R. F. Riedel, G. Chan, R. Sayer, J. Cragun, H. Cottrill, M. J. Kelley, R. Petersen, D. Harpole, J. Marks, A. Berchuck, G. S. Ginsburg, P. Febbo, J. Lancaster, and J. R. Nevins. 2006a. Genomic signatures to guide the use of chemotherapeutics. Nature Medicine 12(11):1294-1300.
Potti, A., S. Mukherjee, R. Petersen, H. K. Dressman, A. Bild, J. Koontz, R. Kratzke, M. A. Watson, M. Kelley, G. S. Ginsburg, M. West, D. H. Harpole, and J. R. Nevins. 2006b. A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer. New England Journal of Medicine 355(6):570-580.
Potti, A., H. K. Dressman, A. Bild, R. F. Riedel, G. Chan, R. Sayer, J. Cragun, H. Cottrill, M. J. Kelley, R. Petersen, D. Harpole, J. Marks, A. Berchuck, G. S. Ginsburg, P. Febbo, J. Lancaster, and J. R. Nevins. 2007a. Corrigendum: Genomic signatures to guide the use of chemotherapeutics. Nature Medicine 13(11):1388.
Potti, A., D. Harpole, and J. R. Nevins. 2007b. The authors reply: Refining prognosis in non-small-cell lung cancer. New England Journal of Medicine 356(2):190-191.
Potti, A., H. K. Dressman, A. Bild, R. F. Riedel, G. Chan, R. Sayer, J. Cragun, H. Cottrill, M. J. Kelley, R. Petersen, D. Harpole, J. Marks, A. Berchuck, G. S. Ginsburg, P. Febbo, J. Lancaster, and J. R. Nevins. 2008. Corrigendum: Genomic signatures to guide the use of chemotherapeutics. Nature Medicine 14(8):889.
Potti, A., H. K. Dressman, A. Bild, G. Chan, R. Sayer, J. Cragun, H. Cottrill, M. J. Kelley, R. Petersen, D. Harpole, J. Marks, A. Berchuck, G. S. Ginsburg, P. Febbo, J. Lancaster, and J. R. Nevins. 2011a. Retraction: Genomic signatures to guide the use of chemotherapeutics. Nature Medicine 17(1):135.
Potti, A., S. Mukherjee, R. Petersen, H. K. Dressman, A. Bild, J. Koontz, R. Kratzke, M. A. Watson, M. Kelley, G. S. Ginsburg, M. West, D. H. Harpole, Jr., and J. R. Nevins. 2011b. Retraction: A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer. New England Journal of Medicine 364(12):1176.
Ready, N. 2010. Phase II Prospective Study Evaluating the Role of Directed Cisplatin Based Chemotherapy with Either Vinorelbine or Pemetrexed for the Adjuvant Treatment of Early Stage Non-Small Cell Lung Cancer (NSCLC) in Patients Using Genomic Expression Profiles of Chemotherapy Sensitivity to Guide Therapy. Durham, NC: Duke University Medical Center.
Review of Genomic Predictors for Clinical Trials from Nevins, Potti, and Barry. 2009. Durham, NC: Duke University.
Singh, T., and J. Dhindsa. 2007. To the editor: Refining prognosis in non-small-cell lung cancer. New England Journal of Medicine 356(2):190.
Sun, Z., and P. Yang. 2007. To the editor: Refining prognosis in non-small-cell lung cancer. New England Journal of Medicine 356(2):189-190.
TMQF (Translational Medicine Quality Framework) Committee. 2011a. Draft Table of Categories and Areas of Improvement Related to TMQF. Durham, NC: Duke University.
TMQF Committee. 2011b. A Framework for the Quality of Translational Medicine with a Focus on Human Genomic Studies: Principles from the Duke Medicine Translational Medicine Quality Framework Committee. Durham, NC: Duke University.
Vlahovic, V. 2010. Phase II Prospective Study Evaluating the Role of Personalized Chemotherapy Regimens for Chemo-Naive Select Stage IIIB and IV Non-Small Cell Lung Cancer (NSCLC) in Patients Using a Genomic Predictor of Platinum-Resistance to Guide Therapy. Durham, NC: Duke University Medical Center.
Zuiker, A. 2008. Building a Just Culture. http://inside.duke.edu/article.php?IssueID=183&ParentID=17859 (accessed November 22, 2011).