5

Potential Pathways and Models for Moving Forward

DISCUSSION POINTS HIGHLIGHTED BY INDIVIDUAL PRESENTERS

• Volumes of molecular, clinical, and epidemiological data already exist in thousands of data repositories. Integrating data that are already in the public domain to generate new hypotheses for testing can help to identify new diagnostics, therapeutics, and disease mechanisms.

• An integrated knowledge ecosystem that supports moving data from discovery to actionable intelligence could drive better decision making and a learning health care enterprise.

• The overarching biomedical informatics challenges are systems issues; scale, standards, and sharing; software, storage, and security; sustainability; and social issues (changing mindsets and behaviors).

• Convenience and personal empowerment drive the disruptive innovation in service industries, and this will be the case for health care as well, with the patient or consumer as a primary disrupter.

• Numerous end-user applications can be developed on a core enterprise analytics platform, allowing researchers, clinicians, administrators, and others to analyze high-quality data. A common infrastructure can also support secondary uses of data.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 63
5 Potential Pathways and Models for Moving Forward DISCUSSION POINTS HIGHLIGHTED BY INDIVIDUAL PRESENTERS olumes of molecular, clinical, and epidemiological data V already exist in thousands of data repositories. Integrating data that are already in the public domain to generate new hypotheses for testing can help to identify new diagnostics, therapeutics, and disease mechanisms. An integrated knowledge ecosystem that supports moving data from discovery to actionable intelligence could drive better decision making and a learning health care enterprise. The overarching biomedical informatics challenges are sys- tems issues; scale, standards, and sharing; software, storage, and security; sustainability; and social issues (changing mind- sets and behaviors). Convenience and personal empowerment drive the disruptive innovation in service industries, and this will be the case for health care as well, with the patient or consumer as a primary disrupter. Numerous end-user applications can be developed on a core enterprise analytics platform, allowing researchers, clinicians, administrators, and others to analyze high-quality data. A com- mon infrastructure can also support secondary uses of data. 63

OCR for page 63
64 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH To set the stage for discussion in the third panel session, John endelsohn, Forum chair, highlighted some of the key needs identi- M fied thus far in the speaker presentations. One main area of concern was the collection, structure, storage, and analysis of big datasets, including the need for interoperability and standardization of systems. In addition, observational research requires that the data be de-identified and pooled. Mendelsohn noted that the more technical issues of computer power, soft- ware, and interconnectivity did not seem to be major concerns. Another main area of discussion was data scrutiny and use, which are affected by ethical and social issues more than scientific issues. Key issues raised by individual participants were patient privacy and trust. Many stakeholders need or want access to the data, including patients themselves, investigators, universities, pharmaceutical companies, the government, and others. There are questions of whether there is ownership of the data and, if so, by whom. With these needs and concerns in mind, panelists discussed a variety of approaches for moving the field of cancer informatics forward. PUBLIC DATA-DRIVEN SYSTEMS AND PERSONALIZED MEDICINE Atul Butte, chief of the Division of Systems Medicine at Stanford University and Lucile Packard Children's Hospital, shared examples of how public data can drive science and enable personalized medicine. There are tremendous volumes of data already in the public domain. For example, a DNA microarray or "gene chip" can quantitate every gene in the genome. This high-throughput genome technology is now widely used in research laboratories and has led to massive volumes of microarray data. One of the repositories tasked with holding these data is the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO). This United Statesbased repository currently contains more than 625,000 publicly available microarray datasets. Together with a comparable European repository, there are more than 900,000 microarray datasets in the public domain. At the current pace, the content doubles every 2 years (Butte, 2008). Commoditization of Data Data generation has been commoditized, Butte said. He offered the example of Assay Depot, an online marketplace for scientific research ser-

OCR for page 63
POTENTIAL PATHWAYS AND MODELS FOR MOVING FORWARD 65 vices. One can search for anything from assays to animal models and can purchase services from vendors "as easily as finding a song on iTunes." These are laboratories around the world with excess capacity that are ready to do business with "dry bench" researchers, as well as laboratories in need of a specific service or laboratories that find they get faster results, or higher- quality results, by outsourcing these types of services via the Internet than by using the local university facility. At this point, most of the steps along the translational pipeline can be commoditized, including clinical and molecular measurements, statistical and computational methods, and validation. The step that cannot be com- moditized or outsourced, Butte said, is asking good questions. Given all of the public data, what are the new kinds of questions we should be asking? Given this commoditization of data, one no longer needs a wet lab to conduct academic research or launch commercial research ventures. All one needs is a place to formulate questions (and the means to pay for the studies). Integrative Genomics to Identify Novel Targets The first example Butte described involves using integrative genomics on public data to find causal factors for complex diseases, factors that can be targets for new drugs. Butte argued that the only way forward for some complex diseases will be to develop algorithms to integrate genetic, genomic, proteomic, preclinical model, and clinical data. A decade ago, Butte and colleagues conducted a microarray study on type 2 diabetes and identified 187 genes that were differentially expressed in diabetes and non-diabetes muscle tissue (Patti et al., 2003). Now, inter- secting those data with the results of 130 similar independent microarray experiments looking at muscle, fat, beta cells, and liver from rat, mouse, and human, Butte sought to identify common genes across all of these tissues and species (Kodama et al., 2012). Most of the genes in the genome were positive in just one of those studies. One gene, which Butte referred to as Gene A, was differently expressed between diabetes and control samples in 78 of the diabetes microarray experiments but, remarkably, had never been pursued for elucidation of the pathophysiology of type 2 diabetes. Further study by Butte and his group showed that Gene A codes for a functioning cell-surface receptor. Another common differentially expressed gene, which he referred to as Gene C, codes for the ligand for that receptor. Subsequent studies in mice (done through collaboration or purchased services, Butte noted) showed that the receptor is upregulated in mice fed a high-fat diet

OCR for page 63
66 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH and is expressed on inflammatory cells in adipose tissue. Gene A turns out to be a well-known receptor, and a knockout mouse is available from Jackson Laboratories. Further testing showed that the knockout mouse had increased insulin sensitivity (i.e., does not die from diabetes and fares better than the wild type), which Butte suggested is why this gene was never studied further for diabetes. Knocking out this receptor now makes this an interesting therapeutic target, Butte said. Because a soluble form of Gene A protein can be detected in the blood, serum from patients was assayed for the diabetes marker, hemoglobin A1c (HbA1c), and Gene A protein. The data show that the lower the level of the receptor, the lower the level of HbA1c. Treatment for 7 days with therapeutic antibody to Gene A protein lowered the blood sugar of mice fed a high-fat diet. In summary, Butte said, by using publicly available data that anyone can access, a protodrug, an antireceptor antibody, and a serum companion diagnostic have been developed, all in about 18 months. There are also human pathology data, mouse models, and human genetics data. Butte and colleagues have used the same approach for type 1 diabetes, small-cell lung cancer, and other diseases. Genomic Nosology and Drug or Diagnostic Discovery In another approach, Butte sought to find every microarray experiment that has looked at normal and disease samples in the same experiment. Many cancer researchers study metastatic versus nonmetastatic disease, but very few studies actually include normal controls. The first challenge was discover- ing that there were 200 words for "normal" in the repository (e.g., normal, vehicle, wild type, control, time zero, margins). From the searches, a systematic classification of disease based on similarities in gene expression was assembled. Butte highlighted the fact that colon cancer and colon polyps clustered together based on molecular profiles (as would be expected since certain polyps are associated with cancer); however, cervical cancer was most similar to type 1 autoimmune polyglandular syndrome. In other words, cervical cancer was more similar to a very rare pediatric genetic disease that is not a cancer than it was to another cancer, colon cancer. Around the same time, the Broad Institute released the Connectivity Map, a repository of genomewide transcriptional expression data from human cell lines treated with more than 1,500 different drugs at vary-

OCR for page 63
POTENTIAL PATHWAYS AND MODELS FOR MOVING FORWARD 67 ing doses. Matching the disease gene expression data with the drug gene expression data, Butte identified hundreds of correlations and is currently pursuing two: a seizure drug, topiramate, that may be effective for inflam- matory bowel disease and an ulcer drug, cimetidine (Tagamet), that may be effective on lung adenocarcinoma (Sirota et al., 2011). Both have shown efficacy in animal models. Again, Butte stressed, this entire work was done using publicly avail- able data, and he urged investment in building these types of repositories, keeping them updated, and facilitating access to them. In conclusion, Butte said, bioinformatics is more than just building tools. There are plenty of tools published every month, and the molecular, clinical, and epidemiological data already exist. There are thousands, perhaps tens of thousands, of repositories today. We can identify new diagnostics, therapeutics, and disease mechanisms by integrating datasets. We just need to demonstrate what can be done, he said. Finally, there is a need for investigators who can imagine the basic questions to ask of these clinical and genomic repositories and are willing to make a career of studying publicly available data. Investigators need to move beyond the mindset that "if it's not your own data you can't trust it." The data are just sitting there, Butte said, waiting for people to use them. ADAPTING TO DATA-INTENSIVE, DATA- ENABLED BIOMEDICINE Data represent the fastest-growing resource on Earth, said George Poste, chief scientist for the Complex Adaptive Systems Initiative at Arizona State University. The volume, variety, and speed with which new data are being generated in biomedicine are staggering, and the current computa- tional power may not be sufficient. Data are global and range across mul- tiple users and scales, from the molecular level to the patient in the clinic. The central challenge is how to integrate these data. Data Production, Analysis, and Utilization in Biomedicine Poste summarized the overarching themes in meeting the biomedical informatics challenges as the following: systems, scale, standards, and sharing,

OCR for page 63
68 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH software, storage, and security, sustainability, and social issues (changing mindsets and behaviors). With regard to data production in biomedicine, Poste said, there are more data, but many reported findings have not been validated, and there are issues with replication, suitability for a specified purpose (e.g., regula- tory), and authenticity (e.g., information on the Internet). There are more powerful, high-throughput analytic tools, but we deploy them against small sets of samples, resulting in inadequate analytic and statistical rigor. Technology convergence and the creation of multidisciplinary datasets are handicapped by single-specialty silos. There are more participants, loca- tions, and distributed data, but there is a pervasive lack of interoperable exchange formats and standards for data annotation, analysis, and curation. There is also a poor record of sharing. Data analysis and utilization in biomedicine, Poste said, require more rapid, real-time data access, but data are often trapped in isolated and hier- archical databanks. There is a need for more quantification and precision analytics, but insufficient numbers of personnel are trained for large-scale data analysis. More complexity and uncertainty exist, but there are escalat- ing gaps in institutional and individual cognitive and analytic capabilities to handle it. Finally, the rate of change in data is increasing, as is the rate at which our knowledge and competencies depreciate. Most of the current approaches to bioinformatics and health care infor- matics lack the agility and extensibility to meet projected needs, whether in basic or clinical research. We need much more sophisticated approaches for end-to-end system design, Poste said. Systems must meet the needs of a multiplicity of end-user communities without creating new silos. Ours is a data-driven, data-enabled society. Most data are now fundamentally net- worked, and an increasing fraction of data is digital from the outset. But as datasets become ever larger, they become increasingly unmovable with the existing infrastructure. Sophisticated simulations and meta-analytics can amplify the data streams. Poste referred to the "fourth paradigm" of scientific discovery espoused by computer scientist Jim Gray, which states that we are now in a period of data-driven knowledge, intelligence, and actionable decisions (following the earlier paradigms of experiment, theory, and simulation). The nature of discovery has moved from hypothesis driven to hypothesis generating, based upon the analysis of large datasets, and explanation

OCR for page 63
POTENTIAL PATHWAYS AND MODELS FOR MOVING FORWARD 69 involves complex statistical probabilities instead of simplistic, unitary (particularly binary) values. Having reviewed the gaps and challenges, Poste asked if we are still building systems and infrastructure that merely support the collection of data or are working toward an integrated knowledge ecosystem that sup- ports moving the data from discovery to actionable intelligence that can drive a learning health care enterprise. Importance of Having the "Right" Data in the System Poste stressed the importance of pre-analytical variables such as rigor- ous selection of specimen donors, standardized specimen collection, and annotated health records. Most researchers, however, do not have access to highly standardized, stringently collected, and phenotyped patient samples. He quoted Carolyn Compton, former director of the NIH Office of Biorepositories and Biospecimen Research and now president and CEO of C-Path, who has stated in several venues that "the technological capacity exists to produce low-quality data from low-quality analytes with unprecedented efficiency . . . we now have the ability to get the wrong answers with unprecedented speed." This is a pervasive problem in bio- marker identification and validation, which suffers from a "small N" prob- lem that leads to bias and overfitting. It has been suggested that more than 50 percent of the data from academic labs cannot be replicated by industry for new drug target validation and submission to the FDA (Ioannidis and Panagiotou, 2011; Mullard, 2011). The ability of each individual to have his or her genome sequenced inexpensively will add to the complexity. He said that having this genome information will allow for modulation of gene expression that can be transmitted transgenerationally, altering the epigenome in the progeny. The 95 percent of the genome that is noncoding is also turning out to be profoundly important in regulating the other 5 percent, Poste said. What is going to represent a complete and accurate analysis of genome sequence, architecture, and regulation, Poste asked, for the purposes of informing regulatory and clinical decisions? He cited a recent publication describing only 88 percent concordance of single-nucleotide variants when the same samples were analyzed on two different sequencing platforms (Lam et al., 2011). The FDA is currently reviewing validation issues for the clinical use of genome sequencing. Researchers also can access numerous proteinprotein interaction and

OCR for page 63
70 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH pathway databases, based on the critical assumption that the databases are accurate, he said, citing Schnoes and colleagues (2009), who describe inaccuracies and misannotations in large primary protein databases (includ- ing GenBank NR and TeEMBL). As discussed by Leroy Hood (Chapter 3), mapping the dysregulation of biological networks in disease is a rational foundation for targeted drug discovery. The goal is to target an intervention where the network is being perturbed. However, these are complex, adaptive systems, and a unifocal point intervention in a network will almost certainly be compensated for or have a bypass circuit available. One approach is to try to define network choke points as targets, subverting the alternate compensatory pathways as well. He said a challenging question in cancer drug development is, "At what point does the level of network dysregulation eclipse any feasible approach to achieve a homeostatic reset with drugs?" In this regard, Poste suggested that diagnostic technologies for early detection may be the more prudent option for research investments. In summary, Poste said that a huge amount of data being put into the public domain is not accurate. Moving forward, Poste listed the need for controlled vocabularies and ontologies; minimal information checklists and open source repositories; algorithms and source code for analytic tools; exchange formats and semantic interoperability; and cross-domain harmo- nization, integration, migration, and sharing. Ultimately, the only valuable data are validated, actionable data. Computational Capabilities for Large Datasets Other disciplines are skilled at handling big datasets, but Poste sug- gested that insularity makes us reluctant to look at, learn from, and import these approaches into biomedicine. Most of us, he said, have been trained in "static world" biomedicine that involves conventional collaborations, traditional social and professional preferences and hierarchies, minimum patient input, and a general reluc- tance to share data. The world we live in, however, is increasingly dynamic. This includes web-based collaborations, fluid populations of diverse partici- pants with many unanticipated productive inputs, the ability to mine huge amounts of public data, open source networks, and extended communities. Despite this overall movement toward increased access, a 2009 review of the top 500 papers published in the 50 journals that had the highest impact factor found that only 9 percent had deposited the primary raw data

OCR for page 63
POTENTIAL PATHWAYS AND MODELS FOR MOVING FORWARD 71 into a public database (Alsheikh-Ali et al., 2011). Of the portion of the 500 papers that were covered by a journal or funding agency data access policy, 59 percent were not compliant. Access to the raw data and computer code is absolutely essential, Poste stressed, and new incentives are needed to encourage people to share data. There is a need for ways to ensure due credit, attribution, and citation of the original dataset when it is used by others. He said the greatest challenge, however, is how to drive molecular medicine and IT-centric capabilities in routine clinical medicine. Designing next-generation health IT systems that will comprehensively capture the genetic, biological, behavioral, social, environmental, and ecological factors relevant to disease risk, progression, and outcomes is extremely challenging. Electronic health records need to be thought of in a dynamic sense, rather than simply a digital version of the original fixed paper format. Most EHRs, however, are not designed to support secondary use of data. A comprehensive clinical data integration system would include, for example, current and planned clinical trials, observational data from the provider as well as patient-reported informa- tion, SEER data, mobile health or remote sensor data, and payer datasets. The final reckoning for actionable data is regulatory science, Poste said. Yet while there are many references to personalized medicine in FDA strate- gic planning documents, Poste noted that there is scant reference to how the challenges of personalized medicine will be met, including the informatics needs. We have the capability to create a new health care ecosystem from the convergence of technologies and markets, Poste said (Figure 5-1), but this depends upon cyberinfrastructure for both e-science and e-medicine. He referred participants to the recently released report from the National Science Foundation (NSF, 2012) on cyberinfrastructure needs for 21st century science and engineering. If we do have standardized, validated data, how do we move them? Most academic data remain isolated in laboratories or centers. Contemporary academia does not have the necessary connec- tivity (e.g., optical networks with 10,000 megabit per second transfer) or the computational capacity to gain access to the data it needs. There is a growing imbalance between the ability of the end-user population to access data and embrace their complexity and the ability (or lack thereof ) of institutions to access and analyze large sets of data. Poste suggested that institutions that cannot harness big datasets will suffer "cognitive starvation" and relegation to competitive irrelevance in the scientific and engineering domains.

OCR for page 63
72 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH Passive/Active Data Collection Dx/ Analytics and Patients Devices Network Services Architecture HIx for EMR/PMR Integrated Rx Performance and Consumers Care Outcomes Analysis Data Mining Increasingly Targeted Integrated Technology and Integration Care and Efficient Platforms Services Use of Finite Resources FIGURE 5-1 A new health care ecosystem arising from convergence of technologies and markets. Figure 5-1.eps NOTE: Dx = diagnostics; EMR = electronic medical record; HIx = health information; PMR = personal medical record; Rx = (bio)pharmaceuticals. SOURCE: Poste presentation (February 28, 2012). Whether one has access to a high-performance computing center internally or participates with others in a consortium to create a cluster that provides high-performance computing capability, the cloud is a ubiquitous option. There are numerous commercial cloud computing services, and there is no single business model for cloud computing adoption at this stage. Poste noted that although the cloud provides on-demand access to large-scale, eco- nomically competitive computing capacity and flexibility, there are concerns about security, reliability, intellectual property, and regulatory compliance. Moving from Silos to Systems Moving forward begins with changing minds and changing behaviors to transition from informational silos to integrated systems. Technology is only the enabler, Poste said. We must embrace new organizational structures and must engage and educate multiple constituencies. The health care space will become increasingly decentralized with regard to how data are gener- ated and increasingly centralized for data analytics and decision support. Data flows will increase as patient encounters with the health care system evolve from being episodic to more continuous, real-time monitoring. In developing a new framework that can adapt to the scale and logis- tical complexity of modern biomedicine, Poste suggested that research

OCR for page 63
POTENTIAL PATHWAYS AND MODELS FOR MOVING FORWARD 73 sponsors (e.g., NIH) focus less on single-investigator awards that offer incremental progress and instead seek to fund high-risk and high-reward projects with the potential for radical, disruptive innovation; that a single-discipline career focus be replaced with obligate assembly of diverse expertise for multi dimensional engagement; that new study sections with broader expertise, including industry, be assembled; and that siloed datasets be abandoned for large-scale, standardized, interoperable open source data- bases with professional annotation, analytics, and curation. Government has a vital role to play, Poste said, in the promulgation of standards, centralized coordination of resources, enforcement of data shar- ing, and proactive design of regulatory frameworks to address new technolo- gies. Industry plays an important role by participating in pre-competitive privatepublic partnerships and by taking a proactive role in shaping new transdisciplinary education, training, and employment opportunities. Ours is a world of massive data, Poste summarized, and to manage these data we need disruptive change and new products, services, and part- nership models. He emphasized that moving forward will require courage to declare that radical change is needed; resilience to combat denial and deflection by entrenched constituencies; competitiveness and new participants who drive disruption at the margins or at convergence points (the voice of patients, payers, and new industrial participants will drive e-science and health IT); and accountability and responsibility, providing improved return on investment of public and private funding and addressing urgent societal and economic imperatives. BIG DATA AND DISRUPTIVE INNOVATION: MODELS FOR DEMOCRATIZING CANCER RESEARCH AND CARE More so than in any other area of health care, cancer research and cancer care are especially overloaded by data, said Jason Hwang, executive director of Health Care at the Innosight Institute. Before the advent of next- generation sequencing techniques, the doubling time of DNA sequencing data was about 19 months, slightly slower than the doubling time of hard- disk storage capacity (about 14 months). After the uptake of new sequenc- ing technologies, however, the doubling time of sequencing data decreased dramatically to around 5 months (Stein, 2010). Sequence data are just one component of biomedical data. Combined with all of the data being gener- ated in the clinic and in research, the volume of biomedical data, especially cancer-related data, is growing exponentially.

OCR for page 63
80 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH The fundamental problem is turning the growing mass of data into transformational insights. Because biases in the data can lead different individuals to reach different conclusions, transparency in how one derived the insight from the data is critical, Joshi stressed. These challenges are general informatics problems (i.e., not specific to cancer or even specific to research), and they should be solved like general informatics problems, Joshi said. Also, as noted by others, much can be learned from other industries. He added that in trying to solve one problem, you often find that some of the solutions also solve another problem elsewhere in the health care ecosystem (e.g., addressing a research informatics issue also ends up solving a payer issue). Many decisions are now made using analytics, which must be repro- ducible if one is to be able to justify the decision. Analytics, Joshi said, is a niche industry where only those who know how to deal with the data can derive value from it, and this keeps analytics from being a highly used tool. One ongoing challenge in analytics has been integrating and normal- izing data from the different source systems (e.g., clinical, financial, admin- istrative, research). A lot of data quality problems start at this point, Joshi noted. The informatics requirements for data integration, validation, and normalization are not trivial and require an enterprise approach. However, once a core enterprise analytics platform is in place, numerous applications can be developed that allow researchers, clinicians, administrators, and others to analyze those high-quality data. A common infrastructure can also support multiple secondary uses of data and thereby lower costs. For example, data can be used for clinical trial optimization, decreasing time lines and enhancing efficiency and accuracy. Joshi suggested that in addition to thinking about secondary uses of data, we should consider secondary uses of IT infrastructure as another way to reduce overall costs. Specifically, creating an entire system focused on cancer runs the risk of spending too much time, effort, and money; not focusing on innovation; and most likely duplicating existing systems to some extent. In closing, Joshi mentioned one example of the regional initiatives that are gearing up to enable collaboration across the health care ecosystem: the Partnership to Advance Clinical electronic Research (PACeR) initiative, which is a publicprivate partnership between New York State hospitals and life sciences companies.

OCR for page 63
POTENTIAL PATHWAYS AND MODELS FOR MOVING FORWARD 81 Consumers as Disruptive Innovators The Patient Protection and Affordable Care Act (ACA) put in place incentives for coordination of care and for innovation around patient engagement, explained Farzad Mostashari, national coordinator for health information technology in the Office of the National Coordinator for Health Information Technology (ONC). Insurers no longer want to pay for care as piecework, he said. Per the ACA, if hospitals can provide care for Medicare patients that is more coordinated, they can share in any resulting savings. Mostashari listed three elements that will aid in the successful coordina tion of care: (1) the technology infrastructure for care coordination exists, (2) there is a strong business case for care coordination and patient engage- ment, and (3) there is movement toward the democratization of informa- tion and information tools. The consumer is the ultimate disrupter here, he said. As a case example, Mostashari offered his thoughts on some of the reasons Google Health failed. While certain elements are specific to Google, others are more generalizable, and we can learn from them. First, people had to spend hours typing in their own information. Per HIPAA, patients have a right to get a copy of their own medical information (and certain legislative provisions may require that it be provided electronically and within a speci- fied time period). It is legally possible for people to get access to their own records, but it should also be easy to do. People are often uncomfortable asking for their records, worrying that asking is in some way challenging the doctor, Mostashari added. This attitude needs to change, he said. Knowing your information is part of what being a good patient is about. Another problem with Google Health, Mostashari said, was that once people typed all of their information in, they could not do much more with it except read or print it. People want to be able to find a clinical trial, learn what the side effects of their medications are, get a second opinion, see their images, share their data, or keep abreast of the latest research on their cancer. They want to find resources and find other people like themselves. The progress in health IT for doctors and hospitals is exciting, Mostashari said, but what is more exciting is the disruptive possibilities of consumers and their caregivers as the nexus where information comes together and is used to generate new knowledge.

OCR for page 63
82 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH THE EHR AND CANCER RESEARCH AND CARE Enhancing Uptake of EHRs Mostashari said that when he joined ONC in 2009, about 10 percent of hospitals and 20 percent of primary care providers used a basic EHR system. The Health Information Technology for Economic and Clinical Health Act (HITECH) of 2009 put in place financial incentives for doc- tors and hospitals to adopt and meaningfully use EHRs and created some of the digital infrastructure to help small practices. By 2011, 37 percent of hospitals were using an EHR system. Mostashari anticipated that by 2013, it would be more than 50 percent. The EHR is not just an office system, he said. ONC has developed interoperability standards to facilitate sharing and has a program to certify EHR systems that conform to these standards. Per the standards, informa- tion on clinical care, medications, procedures, and other data will be in a standard XML format with tags to tell users what the data elements are and where they go. Data will be shared when patients transition from care set- ting to care setting. Data will also be shared with the patient, because this is one of the requirements for meaningful use of an EHR, Mostashari noted. The EpicCare System as a Model for the Uses of EHRs in Cancer Research and Care Sam Butler, a physician and member of the clinical informatics team at software developer Epic, described the features of the EpicCare system as a model for the uses of EHRs in cancer research and care. He estimates that EpicCare has been used for somewhere between 108 million and 142 mil- lion patients in the United States, across 270 health care clients. An available add-on module to the main EHR is Beacon, Epic's oncology information system. Currently, Butler explained, Beacon is primarily for chemotherapy management. Functions include staging, problem lists, protocols, treatment planning, review and release, pharmacy verification, electronic medication administration record (eMAR), flow sheets, and reporting. Of the 270 current EpicCare EHR users, 140 of them are actively installing Beacon. The system is extensible, and it is easy for an institution to add infor- mation to a set of problems. The model system includes 250 protocols, both standardized regimens and research protocols. These are not meant to be a review of the literature, Butler noted, but are representative protocols

OCR for page 63
POTENTIAL PATHWAYS AND MODELS FOR MOVING FORWARD 83 using standard protocol language to serve as starting points. Customers are required to review the protocols, validate them, and make them their own protocols. A protocol can be adapted as a treatment plan for a particular patient. Once a treatment plan is created, nurses review the information when a patient arrives, and the nurse or pharmacist releases the orders, which then go through a process of pharmacy verification. Administration of treatment is instantly documented in the eMAR, often by using bar cod- ing. Everything is captured in the flow sheet, which informs and supports physician decision making regarding the next round of treatment. Then the cycle of treatment plan, pharmacy verification, eMAR documentation, and flow sheet begins again. Beacon can also capture discrete data such as reasons for changing a treatment plan or discontinuing it. This cyclical system works well in a large institution where the oncolo- gists, pharmacists, and nurses all are in the same organization. However, if the physicians are in their community practice and they have another EHR, or another program to create their treatment plan, there is no standard to communicate that treatment plan to a hospital, infusion center, or pharma- cist with a different system. Application and workbench data are exported nightly or more fre- quently and deposited into a reporting database that is relational and that can be accessed by analytic tools. The data can also be combined with other data warehouses or biorepositories in other systems. The database can be used to create custom data marts that are customer developed and owned. In the future, the system will allow more patient-entered data. Epic is working on ways to engage patients, such as sending out surveys, aggregat- ing the data that are reported, and displaying that information in graphs and reports to help patients make an informed decision about their treat- ment by showing them what other patients have experienced. The chal- lenge, Butler noted, will be getting enough patient data to make those graphs meaningful and significant. Epic is also working on incorporating genomics and adding the ability to document the care a patient received before entering this system, for a complete oncological history. Butler noted several challenges in moving forward, including encourag- ing physicians to see the value of using an EHR (he said physicians are very concerned that using an EHR is going to slow them down), interoperability, and management of big datasets.

OCR for page 63
84 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH CANCER CENTERBASED NETWORKS FOR HEALTH RESEARCH INFORMATION EXCHANGE Following on his discussion of the informatics challenges facing cancer centers (Chapter 2), William Dalton, of the Moffitt Cancer Center, offered an aspirational model for a research and health care information exchange network that would allow many different partners and stakeholders to par- ticipate, contribute, and benefit (Figure 5-4). Expanding on the research information exchange he described earlier (see Figure 2-3), the data ware- house would become a federated network information system. Personalized Cancer Care The development of personalized cancer care relies on the efforts of many people contributing to the continuous cycle of discovery, translation, and delivery of health care. Data networks allow for the discovery of asso- ciations between specific molecular profiles and clinical information from individual patients, leading to new knowledge that can be translated into more personalized cancer care. Hospitals and health care networks FIGURE 5-4 Designing a new federated research and health care network model. SOURCE: Dalton et al., 2010. Reprinted with permission from the American Associa- tion for Cancer Research. 5.4

OCR for page 63
POTENTIAL PATHWAYS AND MODELS FOR MOVING FORWARD 85 One approach to facilitate personalized care is to ask patients to par- ticipate as partners in the care journey. As an example, Dalton described the Moffitt "Total Cancer Care Protocol." This IRB-approved observational study protocol includes three critical questions for the patient: 1.Can we follow you throughout your lifetime? (With the goal of entering any health care data into a central data warehouse) 2.Can we study your tumor using molecular technology? (With the goal of entering genetic and genomic data into a central data warehouse) 3.Can we recontact you? (For purposes of sharing information that might be of importance to you, such as a clinical trial designed for patients like you) As part of a publicprivate partnership with Merck, the protocol is currently open in a consortium of 18 sites in 10 states, all using the same protocol and consent and following standard operating procedures for tumor collection and data aggregation, and many using the same central IRB. Now finishing its sixth year, more than 85,000 people have been enrolled to date. More than 32,000 tumors have been collected, all clini- cally annotated according to standard operating procedures, and almost half have been profiled. Dalton noted that in its early stages, the database was more of a repository than a warehouse, with some of the queries taking weeks, if not months, and Moffitt sought expert assistance from Oracle, TransMed, and Deloitte. Through this strategic partnership, an integrated health research information platform was developed that creates the real-time relationships and associations from disparate data sources that are needed to create new knowledge for improved patient treatments, outcomes, and prevention. Critical to this endeavor was harmonization of the data through creation of a data dictionary. In addition to defined elements, the dictionary also incorporated a means of measuring the quality and veracity of the data. Because patient information resides in many sources, often with different identification numbers, the first challenge was to create a means of cohort identity. Harmonization of the data takes place in the data factory before the data are placed in a warehouse, where they can then be queried for dif- ferent uses by different partners and stakeholders. The data warehouse is a robust, scalable dataset of oncology patients, Dalton said, and queries are done in real time.

OCR for page 63
86 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH As one example, Dalton described how the data warehouse could be queried for cohort identification. One could, for example, query for females diagnosed and/or treated at Moffitt between 2005 and 2009 with breast cancer, with the histology of infiltrating ductal carcinoma, not otherwise specified, who were stages III to IV at diagnosis and who were estrogen receptor positive, with a documented family history of breast cancer. Out of more than 212,000 female patients in the database, the successive real-time queries identified a cohort of 25. In a second example, Dalton demonstrated how queries could identify patients meeting select clinical trial criteria who were eligible and available at a particular consortium site and who also had banked tissue samples that could be assayed. Proposed Federated Data Model It is one thing to be able to do health research information exchange at a single institution or within a defined consortium, but it is quite another, Dalton said, to manage this on a national scale. He proposed a national health and research information exchange, incorporating regional "hub and spoke" platforms, with cancer centers as the hub and their individual colleagues within the community contributing data and having access to those data. This federated framework could facilitate many aspects of cancer research, including basic science, translational research, drug discovery and development, clinical trials, companion diagnostics, comparative effective- ness and outcomes research, postmarketing surveillance, and others. The goal, as with the other models discussed earlier, is a rapid-learning informa- tion system. Each patient added iteratively improves the learning process. In summary, Dalton said that the guiding principles for developing cancer center collaboration are inclusiveness, accessibility of data (especially real-time access through a research information exchange), and public private partnerships to achieve long-term sustainability. OTHER MODELS AND PATHWAYS During the discussion, panelists offered additional comments on path- ways forward and other examples of instructive models in other domains.

OCR for page 63
POTENTIAL PATHWAYS AND MODELS FOR MOVING FORWARD 87 Patients Helping Patients Brandon Hayes-Lattin, cancer survivor, cancer researcher, and senior medical adviser for the Lance Armstrong Foundation, said that the foun- dation's Share Your Story campaign has transformed cancer survivorship through individual patients posting personal stories so that others might be helped by them. Along these same lines, he supported developing tools where patients could contribute their clinical data to a shared resource and also access aggregated clinical data to guide their personal decision making. He noted that the Livestrong constituency was surveyed regarding sharing their data, and 87 percent of the 8,500 respondents agreed that r esearchers should have the ability to review their information as long as it is not directly linked to them. Further, 71 percent felt that their data was safer when stored electronically than on paper. Patients are looking for a range of things, Hayes-Lattin said. They want to trust their physician, but they want to double-check, too. Getting a second opinion requires their raw data, not just the interpretations. Patients also say that they want to be able to find resources. There are a lot of resources available for cancer patients, but it can be hard to find them, he said. It is important for patients to be able to put their situation into context in the larger world of cancer, to learn from the experiences of others. Bradford Hesse, chief of the NCI Health Communication and Infor- matics Research Branch, added that a significant portion of traffic to govern- ment health websites such as NCI or the Centers for Disease Control and Prevention (CDC) involves patients. Patients are engaged and activated, and they need to have a health system that is prepared to help them. He also referred to the SHARP initiatives (Strategic HIT Advanced Resource Projects), some of which focus on the security and privacy of health IT. Providing a Substrate for Innovation Hwang mentioned the role of government in creating a substrate for innovation in the private sector. For example, no private-sector company or start-up was likely going to invent the Internet, but a government agency with the will and the resources did create such a network and opened it up to the private sector for use, essentially launching a new economy. Another example was the launch of global positioning system (GPS) satellites and allowing the private sector to use those data to generate whatever products and services it could imagine. Most companies would not try to launch

OCR for page 63
88 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH their own satellites, but many have created innovative products using the data. Poste added that the Internet predecessor, Advanced Research Projects Agency Network (ARPANET), and GPS technology were both products of the military sector. A primary factor driving military technology is a perception of an existential threat. There are similar threats in biomedicine, Poste said, including the economics and viability of the health care system at large and the threat at the level of the individual facing a terrible disease. In these and other instructive precedents, government has taken the lead in recognizing the threat, recognizing the need for a coherent systems- based approach, and allowing the mobilization of the creativity community (i.e., innovation from the bottom up). Poste argued that what is needed is a combination of national leadership, the courage to acknowledge that much of the system is broken, and the willingness to allow individual, bottom-up contribution and participation in the larger infrastructure. Mining Data to Assess the Quality of Cancer Care Allen Lichter, chief executive officer of the American Society of Clinical Oncology (ASCO), noted that ASCO is interested in informatics from a quality-of-care perspective. Quality monitoring is especially important in oncology. As mentioned earlier, most cancer patients are treated in com- munity settings where the vast majority of oncologists are generalists. In addition, diagnostic and therapeutic options are increasing rapidly, and physicians need to make sure that they are up-to-date. The ASCO Quality Oncology Practice Initiative (QOPI) was designed to monitor the quality of cancer care and has been in place for about 10 years. One concern, Lichter said, is that it assesses quality retrospectively (reviewing cases from 6 months or more prior), uses sample cases, has a limited number of measures, and is manual (taking close to an hour per chart). ASCO is looking to evolve this into a real-time electronic system, reviewing consecutive cases, monitoring the full spectrum of care, and providing decision support. Fostering Sharing Butler noted that researchers make their living from their data and they are understandably hesitant to give it up. However, over the course of the workshop there was wide support for the concept that the data should be used to benefit society and should be made available for broad use. Butler

OCR for page 63
POTENTIAL PATHWAYS AND MODELS FOR MOVING FORWARD 89 suggested starting with data sharing efforts for just a few diagnoses. For example, in pediatric gastroenterology, 30 institutions provide all of their data on inflammatory bowel disease patients to one institution that then aggregates the anonymized data. They did not try to start too big. He added that the United Network for Organ Sharing (UNOS) has done the same with transplant data. Murphy offered pediatric oncology as a model for collaborating. The field of pediatric oncology has a long history of collecting data and using the information in a virtuous cycle to inform and improve the next genera- tion of care. The field is less competitive and more geared toward sharing. Education, Training, and Funding Mia Levy, director of cancer clinical informatics at the Vanderbilt- Ingram Cancer Center, said that there is a real need for more people who are computationally oriented in medical schools. There is also a need for career paths and leadership positions for people who are trained in both medicine and informatics. In funding these researchers, Levy said that review com- mittees need to be receptive to the potential of informatics and the value of observational data. If Data Are Available, Users Will Come Levy referred to how Butte and others have made use of publicly avail- able data repositories of cancer information. If data are available, people will start using them for a variety of purposes. To facilitate more use of the data sources that are already available, she said there has to be increased acces- sibility, more sharing of data, and an invitation to the broader informatics community to come to the table and start working on problems that are important to cancer. REFERENCES Alsheikh-Ali, A. A., W. Qureshi, M. H. Al-Mallah, and J. P. Ioannidis. 2011. Public availability of published research data in high-impact journals. PLoS ONE 6(9):e24357. Butte, A. J. 2008. Translational bioinformatics: Coming of age. Journal of the American Medical Informatics Association 15(6):709-714. Dalton, W. S., D. M. Sullivan, T. J. Yeatman, and D. A. Fenstermacher. 2010. The 2010 Health Care Reform Act: A potential opportunity to advance cancer research by taking cancer personally. Clinical Cancer Research 16(24):5987-5996.

OCR for page 63
90 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH Ioannidis, J. P., and O. A. Panagiotou. 2011. Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses. Journal of the American Medical Association 305(21):2200-2210. Kodama, K., M. Horikoshi, K. Toda, S. Yamada, K. Hara, J. Irie, M. Sirota, A. A. Morgan, R. Chen, H. Ohtsu, S. Maeda, T. Kadowaki, and A. J. Butte. 2012. Expression-based genome-wide association study links the receptor CD44 in adipose tissue with type 2 diabetes. Proceedings of the National Academy of Sciences 109(18):7049-7054. Lam, H. Y., M. J. Clark, R. Chen, R. Chen, G. Natsoulis, M. O'Huallachain, F. E. Dewey, L. Habegger, E. A. Ashley, M. B. Gerstein, A. J. Butte, H. P. Ji, and M. Snyder. 2011. Performance comparison of whole-genome sequencing platforms. Nature Biotechnology 30(1):78-82. Mullard, A. 2011. Reliability of "new drug target" claims called into question. Nature Reviews Drug Discovery 10(9):643-644. NSF (National Science Foundation). 2012. Advanced computing infrastructure: Vision and strategic plan. NSF Document nsf12051. http://www.nsf.gov/pubs/2012/nsf12051/ nsf12051.pdf (accessed April 27, 2012). Patti, M. E., A. J. Butte, S. Crunkhorn, K. Cusi, R. Berria, S. Kashyap, Y. Miyazaki, I. Kohane, M. Costello, R. Saccone, E. J. Landaker, A. B. Goldfine, E. Mun, R. DeFronzo, J. Finlayson, C. R. Kahn, and L. J. Mandarino. 2003. Coordinated reduction of genes of oxidative metabolism in humans with insulin resistance and diabetes: Potential role of PGC1 and NRF1. Proceedings of the National Academy of Sciences 100(14):8466-8471. Schnoes, A. M., S. D. Brown, I. Dodevski, and P. C. Babbitt. 2009. Annotation error in public databases: Misannotation of molecular function in enzyme superfamilies. PLoS Computational Biology 5(12):e1000605. Sirota, M., J. T. Dudley, J. Kim, A. P. Chiang, A. A. Morgan, A. Sweet-Cordero, J. Sage, and A. J. Butte. 2011. Discovery and preclinical validation of drug indications using compendia of public gene expression data. Science Translational Medicine 3(96):96ra77. Stein, L. D. 2010. The case for cloud computing in genome informatics. Genome Biology 11(5):207.