2

Overview of the Cancer Informatics Landscape

DISCUSSION POINTS HIGHLIGHTED BY INDIVIDUAL PRESENTERS

• Cancer researchers and care providers are facing an overwhelming volume of data from a multitude of sources and are hampered by the inability to merge those data or to communicate effectively across disciplines and stakeholders because of divergent standards, lack of interoperability, and other barriers.

• The most successful informatics tools will be those that integrate research and clinical data in an organized and efficient manner.

• A research information exchange system integrates data from multiple sources (extracting, transforming, harmonizing, and profiling for quality and accuracy) and then makes them available to diverse stakeholders per their queries. The information is provided based on the same data elements, but the presentation depends on who is asking for it (researchers, patients, clinicians, or administrators).

• Guiding principles for an integrated data warehouse include relevant standards for data entry, deep annotation, a good query interface, and sharing (via entry back into the database) of any new data derived from the analysis of data, specimens, or images stored in the data warehouse.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 7
2 Overview of the Cancer Informatics Landscape DISCUSSION POINTS HIGHLIGHTED BY INDIVIDUAL PRESENTERS ancer researchers and care providers are facing an over- C whelming volume of data from a multitude of sources and are hampered by the inability to merge those data or to communi- cate effectively across disciplines and stakeholders because of divergent standards, lack of interoperability, and other barriers. The most successful informatics tools will be those that inte- grate research and clinical data in an organized and efficient manner. A research information exchange system integrates data from multiple sources (extracting, transforming, harmonizing, and profiling for quality and accuracy) and then makes them avail- able to diverse stakeholders per their queries. The information is provided based on the same data elements, but the presen- tation depends on who is asking for it (researchers, patients, clinicians, or administrators). Guiding principles for an integrated data warehouse include relevant standards for data entry, deep annotation, a good query interface, and sharing (via entry back into the database) of any new data derived from the analysis of data, specimens, or images stored in the data warehouse. continued 7

OCR for page 7
8 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH esearch data sources span the spectrum from electronic R health records (EHRs) to disease registries to clinical research protocol repositories, with varying degrees of completeness, quality, and research utility. In silico research depends on hav- ing complete and valid information. caBIG is undergoing renovations and new informatics project review criteria are being implemented; NCI is open and recep- tive to communications from interested parties. In the first session, an overview of the current status of cancer infor matics was provided from the perspectives of cancer centers, cancer coopera- tive groups, and clinical translational researchers. Panelists also discussed the lessons that could be learned from the ongoing evolution of NCI's caBIG. STRUCTURED, INTEROPERABLE RESEARCH AND CLINICAL INFORMATION SYSTEMS--OR THE LACK THEREOF Data Overload Rapid advances in technology have led to a dramatic increase in the output of genomic and molecular data related to cancer biology, said Lawrence Shulman, chief medical officer and chief of the Division of Gen- eral Oncology at the Dana-Farber Cancer Institute. These emerging data can inform our understanding of basic cancer biology, epidemiology, and behavior, as well as response to therapies, toxicity of therapies, and optimal care for an individual patient or cohort. However, the sheer volume of information presents significant data management and analysis challenges and is becoming overwhelming from a clinical decision-making standpoint. To be optimally useful, data should be structured in a database, and we are still in the learning stages of how best to structure genomic and molecu- lar data, Shulman noted. For clinical data to be useful, they should contain certain critical elements. From an oncology perspective, examples of key data elements include patient demographics; tumor type and anatomic and non-anatomic staging; treatment plan, treatment intent (e.g., curative or palliative), and actual treatment; tumor response; toxicity; patient-reported outcomes; and disease-free and overall survival. The nation is moving, albeit

OCR for page 7
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 9 slowly, toward the adoption of electronic health records (EHRs) to facilitate efficient clinical practice and decision making. However, many of the data included in EHRs are not in a structured format (i.e., are entered as free text). One must often read through the notes of clinicians, and it can be challenging to discern exactly what has happened to the patient, Shulman said. Databases That Foster Learning Shulman stressed that the most successful informatics tools will be those that interconnect research, clinical activities, and data in an organized and efficient manner, with as broad a database as possible. Citing a 2010 IOM workshop on evidence-driven practice in cancer care, he explained that the patient is the center of the system around which there is a cycle of aggregating information (including routinely collected real-time clinical data), analyzing that information, making new discoveries, and applying those discoveries to improve the care of individual patients (Abernethy et al., 2010; IOM, 2010) (Figure 2-1). The American Society of Clinical Oncology has recently launched CancerLinQ,1 a rapid-learning system based on this model that is being pilot-tested first for breast cancer. Shulman described an example in place at Dana-Farber, called the Synergistic Patient and Research Knowledge Systems (SPARKS) (Box 2-1; Figure 2-2). This system links clinical data (e.g., EHR, surgical, and pathology data), tissue sample information, cancer registry information, patient-reported out- comes, translational research data (e.g., gene expression), and operations data (e.g., billing, scheduling, and other visit information). Making Connections Shulman offered the "life cycle" of a gene mutation as a practical example of the value of an integrated data system. In this scenario, a basic researcher discovers a gene mutation in the laboratory, but its clinical signifi- cance is unclear. An association with a clinical syndrome is then determined. Translational research ties that gene mutation to a clinical outcome (e.g., prognosis, response to therapy). The clinical significance of the mutation is validated, and testing for the mutation may have therapeutic implications. 1 See http://www.asco.org/ASCOv2/Practice+%26+Guidelines/Quality+Care/ CancerLinQ+-+Building+a+Transformation+in+Cancer+Care.

OCR for page 7
10 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH Clinical trials, comparative effectiveness research, molecular and biologic data Information-rich, patient-focused Evaluation of data outcomes Data Transformation of aggregation, subsequent care evidence delivery generation FIGURE 2-1 Rapid-learning health care system for cancer care. SOURCE: Abernethy et al., 2010; IOM, 2010. At Dana-Farber, researchers can query both a clinical data repository and a consented research database. A "transient data mart" houses data involved in a current query, which is then purged when the query is com- pleted. Data are de-identified, and queries are covered under an umbrella protocol so that additional institutional review board (IRB) approval is not required. Investigators could be seeking an actionable mutation in multiple tumor types and could query, for example, the aggregate number of patients who have human epidermal growth factor receptor 2 (HER2) amplification in breast, gastric, and salivary gland tumors and have responded to trastu- zumab. Queries can be very specific. For example, an investigator interested in hormone resistance might query the frequency of a particular mutation in women who have metastatic breast cancer that is estrogen receptor (ER)-positive and HER2-positive, who are between the ages of 50 and 65, and who had progressive disease while on tamoxifen. It is also possible to access identified data with IRB approval, and investigators recruiting for a

OCR for page 7
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 11 BOX 2-1 Dana-Farber Synergistic Patient and Research Knowledge Systems (SPARKS) Vision To provide a cutting-edge, collaborative institutional infor- matics framework to accelerate scientific discoveries and their translation into clinical practice to enable early diagnosis, personal- ized treatment, cure, and prevention of cancer and related diseases. Objective Implement policies, standards, systems, and tools that facilitate collection, integration, mining, analysis, and interpretation of biomedical data to accelerate scientific discoveries and their translation into personalized medicine and clinical practice. Long-term goal Establish an integrated, patient-centric clini- cal genomic data model and systems for enabling translational research and personalized medicine. NOTE: See also Figure 2-2. SOURCE: Shulman presentation (February 27, 2012). particular investigational protocol could query for actual patients who meet specific eligibility criteria. Robust EHR Systems and Research Databases In closing, Shulman stressed the need for robust EHR systems and robust research databases. All clinical data in EHRs should be codified or structured, he said. EHRs should include detailed data on patient demographics, tumor characteristics and staging, and treatment histories, as well as codified treatment responses and treatment resistance develop- ment and codified genomic and molecular data. Ideally, EHR systems would be interoperable and have standards for data entry. Similarly, he said there is need for robust, interoperable research databases containing structured genomic and molecular data, entered according to defined standards, and these research databases should link with relevant clinical databases. S hulman said that we are not even close to these aspirations. Such databases often exist at the laboratory level and, in many cases,

OCR for page 7
12 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH Surgical/Path Data Outcomes Procedures (CRIS, Survival (CRIS) TSI) Clinical Data Path (CRIS, CDR) QOL (CRIS) Patient-Reported Treatment (CRIS, LMR) Staging (CRIS, Recurrence (CRIS) Outcomes Medications (CRIS, CA Reg) Demographics LMR, DFCI Pharmacy) Lab results ( MISYS, History CRIS, RPDR) Risk Factors Orders (COE) Drug Adherence QOL Satisfaction with care Patient Cancer Registry (DFCI/BWH CA Reg) Translational Diagnosis Research Data Vital Status Genomic variations Follow-up information Gene expression Operational Data Tissue Samples Tissue microarrays Billing (IDX) (CA tissue) Scheduling (IDX) Type Visit information Location Handling methodology FIGURE 2-2 Dana-Farber Synergistic Patient and Research Knowledge Sys- tems (SPARKS). NOTE: BWH = Brigham and Women'sFigure 2-2.eps Cancer Center; CA = cancer; CDR = Clinical Data Repository; COE = computer order entry for chemotherapy and all medications; CRIS = Clinical Research Information System; DFCI = Dana-Farber Cancer Institute; IDX = IDX operating system; LMR = longitudinal electronic medical record; Path = pathological; QOL = quality of life; Reg = registry; RPDR = Research Patient Data Registry. SOURCE: Shulman presentation (February 27, 2012). at the institutional (e.g., cancer center) level. To be maximally effective, however, Shulman said, there is a need for database efforts at a national or even international level, incorporating data from academic centers as well as community practices (where about 80 percent of patients in the United States receive their cancer care). CANCER CENTER INFORMATICS: CONNECTING WITH PATIENTS To integrate new technologies into the standard of care there must be demonstrated value, explained William Dalton, president, CEO, and center director of the H. Lee Moffitt Cancer Center & Research Institute.2 A given treatment or technology, however, may not provide the same value 2 In July 2012, Dr. Dalton assumed the position of CEO for the newly formed M2Gen Personalized Medicine Institute at the Moffitt Cancer Center.

OCR for page 7
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 13 to all patients. Evidence must be generated regarding what works for which individuals or cohorts. Dalton outlined four elements of a personalized medicine approach that can lead to overall improved health care: 1.Addresses health care as a public issue and seeks to improve access, affordability, and quality of care by developing an information system to assist in making clinical decisions based on outcomes and com- parative effectiveness; 2. Integrates new technologies into the standard of care in an evidence- based fashion to identify populations at risk, personalize treatment, and improve individual outcomes; 3.Provides an approach to identify the best treatment for individual patients based on clinical and biological characteristics of patients and their disease; and 4.Creates a network of health care providers, patients, and researchers who contribute and share information from individual patients to ultimately improve the care of all patients by learning from the experience of others (Dalton et al., 2010). Research Information Exchange Improved medical care begins with data that provide information, from which we derive knowledge and develop wisdom, and we are still in the data phase of this journey, Dalton said. In implementing a research information exchange system that will serve the key stakeholders, cancers centers face technical, cultural (e.g., academic versus industry), regulatory, and financial challenges. From a technical and regulatory standpoint, data sharing raises many issues that must be addressed, for example, technical architecture, intellectual property concerns, privacy and security, and human subjects' protections. Another, perhaps underappreciated, technical aspect is data gover- nance, Dalton said. How are the validity and quality of the data entered into the system ensured? Semantic interoperability or harmonization is needed; the system must serve users who are looking for the same data but in a different context or with different semantics and syntax, depending on who is asking. Multiple data sources can actually help ensure data quality, Dalton suggested. The challenge is how to develop an integrated network information system that can manage the ever-increasing amount of information being

OCR for page 7
14 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH generated in basic, translational, and clinical research that is needed to sup- port a personalized medicine approach. Dalton pointed out that a single information system can serve multiple stakeholders, including researchers, patients, administrators, and clinicians. Information is provided based on the same data or "truth," but the presentation will depend on who is ask- ing for it. As noted by Shulman, a database can be used by researchers to identify a patient cohort for data analysis or for clinical trial recruitment. Researchers might also seek data on comparative effectiveness or to do molecular profiling. The same information system, Dalton said, should allow patients to have their own personalized health record, which they can interact with and contribute to. The same platform would also be able to support evidence-based decision making by clinicians and administrators. Dalton offered an illustration showing how an information exchange system could be designed (Figure 2-3). The goal was to develop a system FIGURE 2-3 Example of a research information exchange system at the Moffitt Cancer Center, integrating data from multiple sources and providing them to diverse stakeholders. Figure 2-3.eps NOTE: EMR = electronic medical bitmap record; TCCP = Total Cancer Care Protocol. SOURCE: Fenstermacher et al., 2011. Reprinted with permission from The Cancer Journal.

OCR for page 7
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 15 that could identify the same patient in multiple source data systems that may use different identification numbers or ways of describing the patient. The data are loaded into a "data factory" where they are extracted, trans- formed, harmonized, and profiled for quality and accuracy. Data are then stored in the warehouse to be delivered to stakeholders per their queries. In response to a question about the broad integrity of the informatics enter- prise, Dalton stressed the emphasis that the data factory places on ensur- ing the quality of data before they enter the data warehouse. In addition, multiple sources of data on the same individual allow for algorithm checks for quality. It was noted by a participant that a challenge in obtaining patient consent for inclusion in such systems is ensuring that patients understand what they are consenting to, specifically that their data may be used for studies that are not yet defined to answer questions that have not yet been thought of. A multiphased consent process, where patients can opt in or opt out of different parts of the program, may be most appropriate, but it is a complicated and lengthy process. Dalton shared the example of a large, ongoing observational study in which patients consented to be studied throughout their lifetime, including collecting tumor samples and being recontacted should the inves- tigators find something that might benefit them. An analysis of patient com- prehension found that over time, many had forgotten that they had consented to this ongoing interaction. To help address this, patients receive a card within 2 weeks of consenting that thanks them for enrolling in the study, including a phone number to call if they have questions or do not recall consenting. A patient portal has also been developed, Dalton said, which again thanks patients for consenting and explains that they can obtain information about how their data are being used and how they can opt out at any time. CANCER COOPERATIVE GROUP INFORMATICS: CONNECTING RESEARCHERS Robert Comis, president and chair of the Coalition of Cancer Coopera- tive Groups and group chair of the Eastern Cooperative Oncology Group (ECOG), provided an overview of the NCI Clinical Trials Cooperative Group Program. As a result of an ongoing reconfiguration of the coopera- tive group structure, the program now comprises five major groups: 1. Alliance for Clinical Trials in Oncology--merging the Cancer and Leukemia Group B (CALGB), the North Central Cancer Treatment

OCR for page 7
16 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH Group (NCCTG), and the American College of Surgeons Oncology Group (ACOSOG); 2. Children's Oncology Group; 3. ECOG-ACRIN Cancer Research Group--merging the American College of Radiology Imaging Network (ACRIN) and ECOG; 4. NRG Oncology Group--merging the National Surgical Adjuvant Breast and Bowel Project (NSABP), the Radiation Therapy Oncol- ogy Group (RTOG), and the Gynecologic Oncology Group; and 5.SWOG (formerly the Southwest Oncology Group). Comis explained that the cooperative group environment includes aca- demic centers, large community practices, smaller community practices, and biomedical researchers spanning the physical sciences through the clinical sciences. It is geographically dispersed and now includes international sites. Historically, cooperative group clinical trial results have been crucial to setting the standards of cancer care. The primary mode of data collection has been case report forms, which are stored in individual group databases. There are some ad hoc interfaces for reporting and local information tech- nology systems for managing tissue samples. In 1999, NCI established the Cancer Trials Support Unit (CTSU), which, Comis said, now serves around 1,700 sites and thousands of inves- tigators, supporting patient enrollment and randomization, data collection using common data elements, and regulatory and administrative activities. Informatics Tools Used by the NCI Cooperative Group Program Comis described the development and implementation of the M edidata Rave Clinical Data Management System software now used by the Coop- erative Group Program. In 2005, the cooperative groups recognized that there was a need for a unified data system across the program. Specifications were developed; a request for proposals was sent to several vendors; and in 2008, a contract was awarded to Medidata. The unsuccessful vendors then filed formal protests, stalling the start of the contracted work. Following resolution of the disputes, the program was officially initiated in April 2011, resulting in Rave, a web-based system for capturing, managing, and report- ing clinical research data that enables the user to record patient informa- tion using forms customized per study (visit, lab, and adverse event data). The program is now managed by the NCI Cancer Therapy and Evaluation Program (CTEP), with the assistance of the CTSU and Medidata, and

OCR for page 7
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 17 involves all of the cooperative groups. Hundreds of group representatives are currently engaged in classroom, webinar, and e-learning training on how to use the Rave system. Another effort Comis described focuses on coordinating the tissue banking activities across cooperative groups. The Group Banking Commit- tee, sponsored by NCI, is developing processes and standards for a national groupwide tissue bank virtual repository. Comis noted that each tissue bank has developed its own biospecimen information management system. Some systems integrate multiple banks, and some systems are integrated with institutional information technology (IT) systems. In addition, some groups have specimen tracking systems, which connect data entry at clinical sites with trial operations systems and bank inventory systems. The goal is to connect the various group IT structures to a single database that can be queried and is available to all researchers throughout the country. Opportunities for an Innovative Informatics Structure Mitchell Schnall, group chair of ACRIN, described some of the opportunities for cooperative groups in building an innovative informatics structure. Cooperative groups are a rich source of diverse biosamples and images that are associated with structured clinical information, often with long-term follow-up. In addition, there is a large array of other informa- tion, ranging from gross medical and histological images to molecular data profiling genes and proteins, that needs to be integrated with the clinical information. The ECOG-ACRIN vision, Schnall explained, is to generate an inte- grated data warehouse incorporating the individual case report form, such as that from the Medidata Rave system, with imaging data, laboratory data, tissue and specimen repository inventory, digitized pathology, and -omics data (e.g., genomics, metabolomics, proteomics) as well as patient-reported outcomes and claims data. Schnall offered several guiding principles for moving forward with such a system. First, it will be necessary for the cooperative groups to embrace rel- evant standards. Deep annotation should be encouraged (e.g., spatial annota- tion), a good query interface is important, and any new data that are derived from analysis of ECOG-ACRIN specimens or images should be entered into the data warehouse to further the value of the warehouse as a community resource. With regard to deep annotation, Schnall said that cancer control moves along a pathway from prevention to detection and characterization

OCR for page 7
20 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH erature. Unfortunately, the value of the literature for informing study design is negatively affected by publication bias and the lag time between study findings and their final publication. The trial registry, ClinicalTrials.gov, is a useful resource but is limited to clinical trials, and entries do not provide a level of detail, rigor, or standardization necessary for scientific analysis. There is a real need, Pollock said, for accessible meta-study data for all types of study designs (not just clinical trials). In this regard, Pollock drew attention to the Human Studies Database Project, a database of past and ongoing human studies, both interventional and observational. The primary project participants are CTSA institutions. Pollock said that the goal is to enable computational reuse of human stud- ies data for activities such as systematic reviews, planning future studies, scientific portfolio analysis, and research networking. A subcomponent of the project is the Ontology of Clinical Research (OCRe), focused on developing an ontology to deal with issues of study design, interventions and exposures, participants, outcomes, and statistical analysis. Informatics Challenges for Translational Research Studies Using Existing, Non-Research Data Research data sources span the spectrum from EHRs to disease reg- istries to clinical research protocol repositories, with varying degrees of completeness, quality, and research utility. Pollock concurred with Shulman regarding the need for structured data elements. In EHRs, for example, use- ful information about drug exposure would include dose, schedule, inten- sity, area under the curve, dose modifications, reasons for stopping a drug, and patient pharmacokinetics, pharmacodynamics, and pharmaco genomic characteristics. These data are not generally included in the EHR, however. The major challenge when using existing information, Pollock said, is the inability to go back and fill in the missing data needed for a particular research investigation. Another concern with mining existing data is the potential for systematic biases in large clinical data repositories. More data is not necessarily better, Pollock noted, and biases can be amplified. In silico research depends on having complete and valid information. Lack of Harmonization From the perspective of a clinical translational researcher, Pollock con- curred with the challenges highlighted by the cancer centers and cooperative

OCR for page 7
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 21 groups. Lack of harmonization is a key concern, and he noted that at his own institution, as in many academic health centers, there is more than one EHR system in place. Clinical data systems and research data systems do not routinely interoperate. Another issue for clinical research is the choice of clinical trial management system (CTMS) and whether to use a commercially available system or an open source system. There are sustain- ability and cost issues to consider. Open source is technically free, but not without cost because a lot more development time goes into implementing an open source platform; however, it may be more sustainable when mov- ing forward. Regulatory Barriers There are also regulatory barriers for public use datasets. As chair of the advisory committee for the Texas Cancer Registry, Pollock expressed concerns about the heterogeneity of across-state permissions to combine data from multiple state cancer registries (in some cases, four levels of approval are required before researchers can utilize the data). There are also within-state restrictions; for example, until very recently, link- ing hospital discharge data and Texas Cancer Registry data was not permitted. Tools, Technology, and Big Datasets Imaging data are highly dimensional, and technical advances, such as the ability to collect real-time functional imaging data, further increase data dimensionality. There also has been an explosion in -omics data and analysis tools. Decision support tools are available, but they require large- scale validation and constant updating. Limitations to data storage and networking are also a major issue. In one genome sequencing laboratory Pollock had visited, for example, it was possible to keep active datasets on the server for only about 1 week, after which the data had to be pulled off to create space for new data. Despite the challenges, big datasets can facilitate hypothesis generation and study planning. Data can be used to assess the feasibility of a study. Big datasets could also lower the cost of conducting clinical translational research, Pollock said, by offering more precollected data, more automation, and more interconnectivity.

OCR for page 7
22 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH Moving Clinical Translational Informatics Forward Pollock reiterated that research is a process, and more data do not ecessarily lead to more discovery. He offered several suggestions to help n evolve the clinical translational research process: Bring clinical data up to the same standards as high-quality research data. Devise statistical methodologies and study designs for use with clini- cal data. Develop better data mining and filtering approaches to sort through massive datasets. Connect genomic and molecular data with clinical data. Structure clinical data appropriately to support research. Ensure that these processes are guided in a way that is compatible with a research framework. caBIG--THE VISION AND THE REALITY Daniel Masys, affiliate professor of biomedical and health informat- ics at the University of Washington, Seattle, and chair of the new caBIG oversight subcommittee of the NCI Board of Scientific Advisors, shared the history and vision of caBIG and some of the lessons learned since its launch in 2004. The Cancer Biomedical Informatics Grid was launched by NCI to help address the growing problem of the overwhelming volume of data from a multitude of sources and the inability to merge those data or to communicate effectively across disciplines and stakeholders because of divergent standards and terms and other barriers. To define the priority areas for caBIG, a very extensive market deter- mination exercise was undertaken to define the unmet needs of the NCI- supported cancer centers (Box 2-2). As a result, caBIG was envisioned as a common, widely distributed infrastructure that would permit the research community to focus on innovation (rather than on the details of manag- ing information), with the intent that raw and published cancer research data would be available for data mining and integration into reanalyses and meta-analyses. It would be built on shared vocabulary, data elements, and data models that would facilitate information exchange and would have a collection of interoperable applications and tools developed with common standards.

OCR for page 7
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 23 BOX 2-2 Common Needs to Catalyze Effectiveness in Cancer Research That Helped to Shape the Priority Areas for the caBIG Activities Clinical data management tools Distributed data sharing/analysis tools Translational research tools Access to data Tissue and pathology tools Cancer center integration and program management Common data elements and standards Meta-data analysis Shared vocabulary and ontology tools and databases Statistical data analysis tools Visualization and imaging tools Proteomics Microarray and gene expression tools Licensing and intellectual property issues Staff resources High-performance computing Integration and interoperability SOURCE: Masys presentation (February 27, 2012). In 2010, NCI director Harold Varmus called for a high-level review of caBIG. In its report released in March 2011, the working group concluded that the need for caBIG is greater now then when it was conceived (NCI, 2011). In addition, there was very strong community support from cancer centers for the original caBIG vision and goals of interoperability and standards-based exchange of data. The working group also found, however, that the many successes of the program have been offset by several problems and that the overall impact of caBIG in transforming cancer research had not been commensurate with the level of the investment (about $300 million). The report highlighted findings in three main areas: 1. creation and management of standards for data exchange, and sup- port for community-based software already in place;

OCR for page 7
24 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH 2. impact and track record of caBIG initiatives and tools with regard to life science or integrative cancer research, clinical data management, infrastructure, and community engagement; and 3. program administration, contracts management, and budget. The working group concluded that the greatest impact of caBIG thus far had been in the first area. The review found that the program had been very effective in catalyzing progress in the development of community- driven standards for data exchange and interoperability; development, maintenance, enhancement, and dissemination of tools developed by academic researchers; and community dialogue on the interoperability of clinical and research software tools. The group found that the main problems with the caBIG approach that limited its uptake and impact included a "cart-before-the-horse grand vision"; a technology-centric approach to data sharing; unfocused expansion; a one-size-fits-all architectural approach; an unsustainable business model for both NCI and users; and a lack of independent scientific oversight. As a result, the working group issued five immediate tactical recom- mendations. They are, as summarized by Masys: 1.Institute an immediate moratorium on all ongoing internal and commercial contractor-based software development projects while initiating a mitigation plan to lessen the impact of this moratorium on the cancer research community. 2. Institute a 1-year moratorium on new projects, contracts, and sub- contracts by caBIG. 3.Provide a 1-year extension on current caBIG-supported academic efforts for development, dissemination, and maintenance of new and existing community-developed software tools. 4.Establish an independent oversight committee, representing aca- demic, industrial, and government (NCI, National Institutes of Health [NIH]) perspectives to review planned initiatives for scien- tific merit and to recommend effective transition options for current users of caBIG tools. 5.Conduct a thorough audit of all aspects of the caBIG budget and expenditures. The independent oversight committee that was called for in recom- mendation 4 (chaired by Masys) met for first time in July 2011. To begin to address the criticisms in the March report, the committee developed review

OCR for page 7
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 25 criteria for informatics projects (similar to the evaluation criteria used by study sections), outlining how it would assess each of the ongoing caBIG activities, Masys noted (Box 2-3). Looking Forward: A Three-Step Approach to Success in Informatics Innovation In response to a request from the NCI director, Masys consulted with other experts to try to define the "recipe for success," that is, are there com- parable programs that have been successful and what can we learn from them? Masys found that two applications had "gone viral": the Research Electronic Data Capture (REDCap) developed through a CTSA, and the Informatics for Integrating Biology and the Bedside (i2b2), developed with funding from NIH's National Center for Biomedical Computing. In considering how these applications were different from caBIG, Masys and colleagues drafted the following general steps to success in informatics innovation. 1. Do not repeat the mistakes of the past. Do not try to solve all clinical and translational research informa- tion technology problems in one framework. Do not worship standards over functionality. Do not try to have enterprise software adopted by fiat from above. Do not try to buy adoption software products, because for the sponsor, those costs grow ever larger. Recognize that organizations that cannot afford ongoing staffing and help desk functions for software should not be expected to adopt software even if it is free or provides some income to the adopter. The acquisition cost is dwarfed by the support, mainte- nance, and integration costs. 2. Understand the basic truth about IT complexity. Increased functionality that is built at the expense of increased com- plexity is always at risk of delays in development, inability of local implementers and users to understand what has been built and how to use it, and being overtaken by other approaches that have a better price-to- performance ratio (e.g., grid computing versus web services).

OCR for page 7
26 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH BOX 2-3 NCI Informatics Project Review Criteria 1. D oes the activity, application, or resource meet a well- articulated and attainable need of basic, translational, or clinical research, or cancer care (i.e., is there a "driving biological or clinical project" and are the intended users members of the project team)? 2. How will success or failure be evaluated? Analogous to stop- ping rules for clinical protocols, what will be the stopping rules for ending the project if it either fails to meet its technical objec- tives or fails to be adopted even if technically successful? 3. Will the activity, resource, or application, if successful, make some objectively measurable incremental progress toward an overall vision of interoperability of data and systems? Will it enable data sharing and make use of and/or enhance open international standards for research? 4. Is the activity, resource, or application designed to anticipate change in a rapidly expanding knowledge base of science and practice? Flexibility and generalizability are important charac- teristics for longevity in an era of agile science. 5. Is the intended deliverable of the project achievable in the time frame and budget proposed? 6. Will the output of the project be broadly implementable by organi- zations of varying size and sophistication? Will it be used broadly by organizations and institutions outside of NCI cancer centers (e.g., other NIH centers or academic research organizations)? 7. Is there a documented plan for long-term maintenance, enhancement, and fiscal sustainability of the activity, applica- tion, or resource and its user base? 8. What is the user base and has there been a stakeholder assessment to ensure that the activity, application, or resource will indeed meet a currently unmet need or a reasonably antici- pated future need? 9. Is the project generalizable and likely to create value or address broad needs across the community of cancer centers and investigators? Alternatively, would this activity, resource, or application be perceived as a "pet project" of an "in" group? 10. Does the activity, resource, or application have enough market value to gain adoption without incentives? Or if financial or policy incentives are required, are they justified? SOURCE: Masys presentation (February 27, 2012).

OCR for page 7
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 27 3. Observe the informatics research and development "do's." Solve one significant challenge at a time. Use small, nimble development teams led by domain experts. Keep development-to-implementation intervals short. Deploy software that can solve at least one problem that users or adopters care about within 12 months of adoption. Demonstrate success first with a smaller group of the most advanced sites and then let others follow. Create software that makes adoption of standards easier (not harder) than nonstandardized alternatives. Let the market prioritize and vet the standards. Invest in simple interfaces between applications, not architectures. Make interested health care organizations demonstrate willing- ness to invest their own assets and time for enterprise software. Allow intraorganizational and interorganizational needs and technologies to diverge as needed, to maximize productivity. With regard to increasing the probability of successful adoption of informatics innovation in cancer research specifically, Masys recommended focusing data sharing efforts (both the standards for sharing and the applica- tions to do it) on those data for which there is a preexisting motivation to share. Areas in which researchers really need one another's data and scientific problems that simply cannot be solved within one laboratory or institution are prime areas on which to focus sharing efforts (e.g., those studying rare alleles who are trying to assemble a cohort of interest for a study). Coopera- tive groups are a good example of this, he noted. The current challenges are less technological and more policy oriented (e.g., IRB issues, privacy concerns). Increasing the uptake of innovative informatics in cancer care is a more difficult task, given the low penetrance of EHRs in U.S. health care and current economic pressures. Therefore, Masys advised aligning NCI efforts with the EHR adoption incentives of the Office of the National Coordina- tor for Health Information Technology (discussed further by Mostashari in Chapter 5). A plausible transition goal for NCI, he suggested, would be to make it easier for every oncology practice in America to care for a patient on a clinical trial protocol. Building "public libraries" of decision support tools to guide providers and patients through clinical protocols would be an important contribution in this regard.

OCR for page 7
28 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH Community Participation in Moving Informatics Forward During the discussion, there was much interest in involving patients as well as the research community in efforts to advance cancer informatics. Questions remain regarding how best to interact with NCI on caBIG activi- ties, the potential role of publicly developed apps, open source software, the development of institutional systems, and the importance of patient- reported outcomes data. NCI is very open and receptive to communications from interested parties, Masys said. George Komatsoulis, interim director for the Center for Biomedical Informatics and Information Technology at NCI, added that caBIG is continuing to move forward, and there are "workspaces" in inte- grative basic biology, clinical trials, data standards, imaging, biospecimens, and other areas where individuals can bring their ideas to the attention of the scientific advisory group and the caBIG program staff. Lynn Etheredge of George Washington University asked if a national apps strategy for cancer informatics would be appropriate, that is, whether individuals could begin to develop the needed functionalities. Masys noted that apps, as they are commonly understood, tend to be fairly small-scale computer programs. They have value as lightweight applications that handle very common tasks on a personal level, often on a personal device such as a smartphone or tablet. However, their level of complexity is usually around two orders of magnitude less than enterprise-level software (e.g., that used by researchers for regular data inputs required for supporting sponsors), with lower levels of data manipulation, storage, and security. Enterprise- level software requires a more in-depth approach to implementation and, in health care and clinical research, may also involve workflow modification. Steven Piantadosi asked about open source software relative to caBIG. Masys responded that open source software is part of the caBIG infra- structure: i2b2 is open source, while REDCap is an academic consortium model where a limited number of people have the ability to contribute to the code base. Masys cautioned that in a full open source software model (where anybody in the community can contribute to source), modules can be introduced that have unintended transitive consequences (i.e., interfere with other functions). As such, some open source models use quality control mechanisms such as curators or stewards. Mendelsohn noted that many organizations are trying to develop their own internal standardized record of clinical trials and asked whether these organizations should continue to spend scarce resources, time, and effort

OCR for page 7
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 29 developing these independent information systems in light of the national effort. Masys responded that any major cancer center is going to have two overlapping spheres of development, one that is outwardly focused and one that is local and specific to the organization. There will then be transi- tion applications that facilitate sharing sets of data by formatting them and ensuring that they meet the standard for external collaborations. The cooperative groups, for example, are well served by tools such as Medidata Rave, which, while not perfect, is implemented on a large scale and provides a conceptual basis for sharing information, he said. Mark Gorman, cancer survivor and patient advocate, expressed support for the patient-centered elements of the information systems discussed, for example, those designed to collect patient-reported outcome information. There is also great potential for informatics to support patients in their decision making and in managing their own care. Dalton noted that patient portals are very popular, with more than 70 percent of new cancer patients at Moffitt creating their portal before their first visit. Shulman added that the power of the systems discussed is the multiple sources of complementary data obtained by different methodologies, and this includes the patient portals being implemented by many cancer centers. REFERENCES Abernethy, A. P., L. M. Etheredge, P. A. Ganz, P. Wallace, R. R. German, C. Neti, P. B. Bach, and S. B. Murphy. 2010. Rapid-learning system for cancer care. Journal of Clinical Oncology 28(27):4268-4274. Dalton, W. S., D. M. Sullivan, T. J. Yeatman, and D. A. Fenstermacher. 2010. The 2010 Health Care Reform Act: A potential opportunity to advance cancer research by taking cancer personally. Clinical Cancer Research 16(24):5987-5996. Danaei, G., M. Tavakkoli, and M. A. Hernn. 2012. Bias in observational studies of prevalent users: Lessons for comparative effectiveness research from a meta-analysis of statins. American Journal of Epidemiology 175(4):250-262. Fenstermacher, D. A., R. M. Wenham, D. E. Rollison, and W. S. Dalton. 2011. Implementing personalized medicine in a cancer center. Cancer Journal 17(6):528-536. IOM (Institute of Medicine). 2010. A foundation for evidence-driven practice: A rapid learning system for cancer care: Workshop summary. Washington, DC: The National Academies Press. NCI (National Cancer Institute). 2011. An assessment of the impact of the NCI Cancer Biomedical Informatics Grid (caBIG). http://deainfo.nci.nih.gov/advisory/bsa/ bsa0311/caBIGfinalReport.pdf (accessed July 11, 2012).

OCR for page 7