Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 7
2
Overview of the
Cancer Informatics Landscape
DISCUSSION POINTS HIGHLIGHTED
BY INDIVIDUAL PRESENTERS
· ancer researchers and care providers are facing an over-
C
whelming volume of data from a multitude of sources and are
hampered by the inability to merge those data or to communi-
cate effectively across disciplines and stakeholders because of
divergent standards, lack of interoperability, and other barriers.
· The most successful informatics tools will be those that inte-
grate research and clinical data in an organized and efficient
manner.
· A research information exchange system integrates data from
multiple sources (extracting, transforming, harmonizing, and
profiling for quality and accuracy) and then makes them avail-
able to diverse stakeholders per their queries. The information
is provided based on the same data elements, but the presen-
tation depends on who is asking for it (researchers, patients,
clinicians, or administrators).
· Guiding principles for an integrated data warehouse include
relevant standards for data entry, deep annotation, a good
query interface, and sharing (via entry back into the database)
of any new data derived from the analysis of data, specimens,
or images stored in the data warehouse.
continued
7
OCR for page 8
8 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH
· esearch data sources span the spectrum from electronic
R
health records (EHRs) to disease registries to clinical research
protocol repositories, with varying degrees of completeness,
quality, and research utility. In silico research depends on hav-
ing complete and valid information.
· caBIG is undergoing renovations and new informatics project
review criteria are being implemented; NCI is open and recep-
tive to communications from interested parties.
In the first session, an overview of the current status of cancer infor
matics was provided from the perspectives of cancer centers, cancer coopera-
tive groups, and clinical translational researchers. Panelists also discussed the
lessons that could be learned from the ongoing evolution of NCI's caBIG.
STRUCTURED, INTEROPERABLE RESEARCH AND CLINICAL
INFORMATION SYSTEMS--OR THE LACK THEREOF
Data Overload
Rapid advances in technology have led to a dramatic increase in the
output of genomic and molecular data related to cancer biology, said
Lawrence Shulman, chief medical officer and chief of the Division of Gen-
eral Oncology at the Dana-Farber Cancer Institute. These emerging data
can inform our understanding of basic cancer biology, epidemiology, and
behavior, as well as response to therapies, toxicity of therapies, and optimal
care for an individual patient or cohort. However, the sheer volume of
information presents significant data management and analysis challenges
and is becoming overwhelming from a clinical decision-making standpoint.
To be optimally useful, data should be structured in a database, and we
are still in the learning stages of how best to structure genomic and molecu-
lar data, Shulman noted. For clinical data to be useful, they should contain
certain critical elements. From an oncology perspective, examples of key
data elements include patient demographics; tumor type and anatomic and
non-anatomic staging; treatment plan, treatment intent (e.g., curative or
palliative), and actual treatment; tumor response; toxicity; patient-reported
outcomes; and disease-free and overall survival. The nation is moving, albeit
OCR for page 9
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 9
slowly, toward the adoption of electronic health records (EHRs) to facilitate
efficient clinical practice and decision making. However, many of the data
included in EHRs are not in a structured format (i.e., are entered as free
text). One must often read through the notes of clinicians, and it can be
challenging to discern exactly what has happened to the patient, Shulman
said.
Databases That Foster Learning
Shulman stressed that the most successful informatics tools will be
those that interconnect research, clinical activities, and data in an organized
and efficient manner, with as broad a database as possible. Citing a 2010
IOM workshop on evidence-driven practice in cancer care, he explained
that the patient is the center of the system around which there is a cycle
of aggregating information (including routinely collected real-time clinical
data), analyzing that information, making new discoveries, and applying
those discoveries to improve the care of individual patients (Abernethy
et al., 2010; IOM, 2010) (Figure 2-1). The American Society of Clinical
Oncology has recently launched CancerLinQ,1 a rapid-learning system
based on this model that is being pilot-tested first for breast cancer. Shulman
described an example in place at Dana-Farber, called the Synergistic Patient
and Research Knowledge Systems (SPARKS) (Box 2-1; Figure 2-2). This
system links clinical data (e.g., EHR, surgical, and pathology data), tissue
sample information, cancer registry information, patient-reported out-
comes, translational research data (e.g., gene expression), and operations
data (e.g., billing, scheduling, and other visit information).
Making Connections
Shulman offered the "life cycle" of a gene mutation as a practical
example of the value of an integrated data system. In this scenario, a basic
researcher discovers a gene mutation in the laboratory, but its clinical signifi-
cance is unclear. An association with a clinical syndrome is then determined.
Translational research ties that gene mutation to a clinical outcome (e.g.,
prognosis, response to therapy). The clinical significance of the mutation is
validated, and testing for the mutation may have therapeutic implications.
1 See http://www.asco.org/ASCOv2/Practice+%26+Guidelines/Quality+Care/
CancerLinQ+-+Building+a+Transformation+in+Cancer+Care.
OCR for page 10
10 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH
Clinical trials, comparative
effectiveness research, molecular
and biologic data
Information-rich,
patient-focused
Evaluation of data
outcomes
Data
Transformation of aggregation,
subsequent care evidence
delivery generation
FIGURE 2-1 Rapid-learning health care system for cancer care.
SOURCE: Abernethy et al., 2010; IOM, 2010.
At Dana-Farber, researchers can query both a clinical data repository
and a consented research database. A "transient data mart" houses data
involved in a current query, which is then purged when the query is com-
pleted. Data are de-identified, and queries are covered under an umbrella
protocol so that additional institutional review board (IRB) approval is not
required. Investigators could be seeking an actionable mutation in multiple
tumor types and could query, for example, the aggregate number of patients
who have human epidermal growth factor receptor 2 (HER2) amplification
in breast, gastric, and salivary gland tumors and have responded to trastu-
zumab. Queries can be very specific. For example, an investigator interested
in hormone resistance might query the frequency of a particular mutation
in women who have metastatic breast cancer that is estrogen receptor
(ER)-positive and HER2-positive, who are between the ages of 50 and 65,
and who had progressive disease while on tamoxifen. It is also possible to
access identified data with IRB approval, and investigators recruiting for a
OCR for page 11
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 11
BOX 2-1
Dana-Farber Synergistic Patient and Research
Knowledge Systems (SPARKS)
Vision To provide a cutting-edge, collaborative institutional infor-
matics framework to accelerate scientific discoveries and their
translation into clinical practice to enable early diagnosis, personal-
ized treatment, cure, and prevention of cancer and related diseases.
Objective Implement policies, standards, systems, and tools that
facilitate collection, integration, mining, analysis, and interpretation
of biomedical data to accelerate scientific discoveries and their
translation into personalized medicine and clinical practice.
Long-term goal Establish an integrated, patient-centric clini-
cal genomic data model and systems for enabling translational
research and personalized medicine.
NOTE: See also Figure 2-2.
SOURCE: Shulman presentation (February 27, 2012).
particular investigational protocol could query for actual patients who meet
specific eligibility criteria.
Robust EHR Systems and Research Databases
In closing, Shulman stressed the need for robust EHR systems and
robust research databases. All clinical data in EHRs should be codified
or structured, he said. EHRs should include detailed data on patient
demographics, tumor characteristics and staging, and treatment histories,
as well as codified treatment responses and treatment resistance develop-
ment and codified genomic and molecular data. Ideally, EHR systems
would be interoperable and have standards for data entry. Similarly, he
said there is need for robust, interoperable research databases containing
structured genomic and molecular data, entered according to defined
standards, and these research databases should link with relevant clinical
databases. S hulman said that we are not even close to these aspirations.
Such databases often exist at the laboratory level and, in many cases,
OCR for page 12
12 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH
Surgical/Path Data
Outcomes
· Procedures (CRIS,
· Survival (CRIS)
TSI)
Clinical Data · Path (CRIS, CDR) · QOL (CRIS) Patient-Reported
· Treatment (CRIS, LMR) · Staging (CRIS, · Recurrence (CRIS) Outcomes
· Medications (CRIS, CA Reg) · Demographics
LMR, DFCI Pharmacy)
· Lab results ( MISYS, · History
CRIS, RPDR) · Risk Factors
· Orders (COE) · Drug Adherence
· QOL
· Satisfaction with care
Patient
Cancer Registry
(DFCI/BWH CA Reg) Translational
· Diagnosis Research Data
· Vital Status · Genomic variations
· Follow-up information · Gene expression
Operational Data Tissue Samples · Tissue microarrays
· Billing (IDX) (CA tissue)
· Scheduling (IDX) · Type
· Visit information · Location
· Handling methodology
FIGURE 2-2 Dana-Farber Synergistic Patient and Research Knowledge Sys-
tems (SPARKS).
NOTE: BWH = Brigham and Women'sFigure 2-2.eps
Cancer Center; CA = cancer; CDR = Clinical
Data Repository; COE = computer order entry for chemotherapy and all medications;
CRIS = Clinical Research Information System; DFCI = Dana-Farber Cancer Institute;
IDX = IDX operating system; LMR = longitudinal electronic medical record; Path =
pathological; QOL = quality of life; Reg = registry; RPDR = Research Patient Data
Registry.
SOURCE: Shulman presentation (February 27, 2012).
at the institutional (e.g., cancer center) level. To be maximally effective,
however, Shulman said, there is a need for database efforts at a national or
even international level, incorporating data from academic centers as well
as community practices (where about 80 percent of patients in the United
States receive their cancer care).
CANCER CENTER INFORMATICS:
CONNECTING WITH PATIENTS
To integrate new technologies into the standard of care there must
be demonstrated value, explained William Dalton, president, CEO, and
center director of the H. Lee Moffitt Cancer Center & Research Institute.2
A given treatment or technology, however, may not provide the same value
2 In July 2012, Dr. Dalton assumed the position of CEO for the newly formed M2Gen
Personalized Medicine Institute at the Moffitt Cancer Center.
OCR for page 13
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 13
to all patients. Evidence must be generated regarding what works for which
individuals or cohorts. Dalton outlined four elements of a personalized
medicine approach that can lead to overall improved health care:
1.Addresses health care as a public issue and seeks to improve access,
affordability, and quality of care by developing an information system
to assist in making clinical decisions based on outcomes and com-
parative effectiveness;
2. Integrates new technologies into the standard of care in an evidence-
based fashion to identify populations at risk, personalize treatment,
and improve individual outcomes;
3.Provides an approach to identify the best treatment for individual
patients based on clinical and biological characteristics of patients
and their disease; and
4.Creates a network of health care providers, patients, and researchers
who contribute and share information from individual patients
to ultimately improve the care of all patients by learning from the
experience of others (Dalton et al., 2010).
Research Information Exchange
Improved medical care begins with data that provide information, from
which we derive knowledge and develop wisdom, and we are still in the data
phase of this journey, Dalton said. In implementing a research information
exchange system that will serve the key stakeholders, cancers centers face
technical, cultural (e.g., academic versus industry), regulatory, and financial
challenges. From a technical and regulatory standpoint, data sharing raises
many issues that must be addressed, for example, technical architecture,
intellectual property concerns, privacy and security, and human subjects'
protections.
Another, perhaps underappreciated, technical aspect is data gover-
nance, Dalton said. How are the validity and quality of the data entered
into the system ensured? Semantic interoperability or harmonization is
needed; the system must serve users who are looking for the same data but
in a different context or with different semantics and syntax, depending on
who is asking. Multiple data sources can actually help ensure data quality,
Dalton suggested.
The challenge is how to develop an integrated network information
system that can manage the ever-increasing amount of information being
OCR for page 14
14 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH
generated in basic, translational, and clinical research that is needed to sup-
port a personalized medicine approach. Dalton pointed out that a single
information system can serve multiple stakeholders, including researchers,
patients, administrators, and clinicians. Information is provided based on
the same data or "truth," but the presentation will depend on who is ask-
ing for it. As noted by Shulman, a database can be used by researchers to
identify a patient cohort for data analysis or for clinical trial recruitment.
Researchers might also seek data on comparative effectiveness or to do
molecular profiling. The same information system, Dalton said, should
allow patients to have their own personalized health record, which they can
interact with and contribute to. The same platform would also be able to
support evidence-based decision making by clinicians and administrators.
Dalton offered an illustration showing how an information exchange
system could be designed (Figure 2-3). The goal was to develop a system
FIGURE 2-3 Example of a research information exchange system at the Moffitt
Cancer Center, integrating data from multiple sources and providing them to
diverse stakeholders. Figure 2-3.eps
NOTE: EMR = electronic medical bitmap
record; TCCP = Total Cancer Care Protocol.
SOURCE: Fenstermacher et al., 2011. Reprinted with permission from The Cancer
Journal.
OCR for page 15
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 15
that could identify the same patient in multiple source data systems that
may use different identification numbers or ways of describing the patient.
The data are loaded into a "data factory" where they are extracted, trans-
formed, harmonized, and profiled for quality and accuracy. Data are then
stored in the warehouse to be delivered to stakeholders per their queries. In
response to a question about the broad integrity of the informatics enter-
prise, Dalton stressed the emphasis that the data factory places on ensur-
ing the quality of data before they enter the data warehouse. In addition,
multiple sources of data on the same individual allow for algorithm checks
for quality.
It was noted by a participant that a challenge in obtaining patient consent
for inclusion in such systems is ensuring that patients understand what they
are consenting to, specifically that their data may be used for studies that
are not yet defined to answer questions that have not yet been thought of. A
multiphased consent process, where patients can opt in or opt out of different
parts of the program, may be most appropriate, but it is a complicated and
lengthy process. Dalton shared the example of a large, ongoing observational
study in which patients consented to be studied throughout their lifetime,
including collecting tumor samples and being recontacted should the inves-
tigators find something that might benefit them. An analysis of patient com-
prehension found that over time, many had forgotten that they had consented
to this ongoing interaction. To help address this, patients receive a card within
2 weeks of consenting that thanks them for enrolling in the study, including
a phone number to call if they have questions or do not recall consenting.
A patient portal has also been developed, Dalton said, which again thanks
patients for consenting and explains that they can obtain information about
how their data are being used and how they can opt out at any time.
CANCER COOPERATIVE GROUP INFORMATICS:
CONNECTING RESEARCHERS
Robert Comis, president and chair of the Coalition of Cancer Coopera-
tive Groups and group chair of the Eastern Cooperative Oncology Group
(ECOG), provided an overview of the NCI Clinical Trials Cooperative
Group Program. As a result of an ongoing reconfiguration of the coopera-
tive group structure, the program now comprises five major groups:
1. Alliance for Clinical Trials in Oncology--merging the Cancer and
Leukemia Group B (CALGB), the North Central Cancer Treatment
OCR for page 16
16 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH
Group (NCCTG), and the American College of Surgeons Oncology
Group (ACOSOG);
2. Children's Oncology Group;
3. ECOG-ACRIN Cancer Research Group--merging the American
College of Radiology Imaging Network (ACRIN) and ECOG;
4. NRG Oncology Group--merging the National Surgical Adjuvant
Breast and Bowel Project (NSABP), the Radiation Therapy Oncol-
ogy Group (RTOG), and the Gynecologic Oncology Group; and
5.SWOG (formerly the Southwest Oncology Group).
Comis explained that the cooperative group environment includes aca-
demic centers, large community practices, smaller community practices, and
biomedical researchers spanning the physical sciences through the clinical
sciences. It is geographically dispersed and now includes international sites.
Historically, cooperative group clinical trial results have been crucial to
setting the standards of cancer care. The primary mode of data collection
has been case report forms, which are stored in individual group databases.
There are some ad hoc interfaces for reporting and local information tech-
nology systems for managing tissue samples.
In 1999, NCI established the Cancer Trials Support Unit (CTSU),
which, Comis said, now serves around 1,700 sites and thousands of inves-
tigators, supporting patient enrollment and randomization, data collection
using common data elements, and regulatory and administrative activities.
Informatics Tools Used by the NCI Cooperative Group Program
Comis described the development and implementation of the M edidata
Rave Clinical Data Management System software now used by the Coop-
erative Group Program. In 2005, the cooperative groups recognized that
there was a need for a unified data system across the program. Specifications
were developed; a request for proposals was sent to several vendors; and in
2008, a contract was awarded to Medidata. The unsuccessful vendors then
filed formal protests, stalling the start of the contracted work. Following
resolution of the disputes, the program was officially initiated in April 2011,
resulting in Rave, a web-based system for capturing, managing, and report-
ing clinical research data that enables the user to record patient informa-
tion using forms customized per study (visit, lab, and adverse event data).
The program is now managed by the NCI Cancer Therapy and Evaluation
Program (CTEP), with the assistance of the CTSU and Medidata, and
OCR for page 17
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 17
involves all of the cooperative groups. Hundreds of group representatives
are currently engaged in classroom, webinar, and e-learning training on how
to use the Rave system.
Another effort Comis described focuses on coordinating the tissue
banking activities across cooperative groups. The Group Banking Commit-
tee, sponsored by NCI, is developing processes and standards for a national
groupwide tissue bank virtual repository. Comis noted that each tissue
bank has developed its own biospecimen information management system.
Some systems integrate multiple banks, and some systems are integrated
with institutional information technology (IT) systems. In addition, some
groups have specimen tracking systems, which connect data entry at clinical
sites with trial operations systems and bank inventory systems. The goal is
to connect the various group IT structures to a single database that can be
queried and is available to all researchers throughout the country.
Opportunities for an Innovative Informatics Structure
Mitchell Schnall, group chair of ACRIN, described some of the
opportunities for cooperative groups in building an innovative informatics
structure. Cooperative groups are a rich source of diverse biosamples and
images that are associated with structured clinical information, often with
long-term follow-up. In addition, there is a large array of other informa-
tion, ranging from gross medical and histological images to molecular data
profiling genes and proteins, that needs to be integrated with the clinical
information.
The ECOG-ACRIN vision, Schnall explained, is to generate an inte-
grated data warehouse incorporating the individual case report form, such
as that from the Medidata Rave system, with imaging data, laboratory data,
tissue and specimen repository inventory, digitized pathology, and -omics
data (e.g., genomics, metabolomics, proteomics) as well as patient-reported
outcomes and claims data.
Schnall offered several guiding principles for moving forward with such
a system. First, it will be necessary for the cooperative groups to embrace rel-
evant standards. Deep annotation should be encouraged (e.g., spatial annota-
tion), a good query interface is important, and any new data that are derived
from analysis of ECOG-ACRIN specimens or images should be entered into
the data warehouse to further the value of the warehouse as a community
resource. With regard to deep annotation, Schnall said that cancer control
moves along a pathway from prevention to detection and characterization
OCR for page 20
20 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH
erature. Unfortunately, the value of the literature for informing study design
is negatively affected by publication bias and the lag time between study
findings and their final publication. The trial registry, ClinicalTrials.gov, is
a useful resource but is limited to clinical trials, and entries do not provide
a level of detail, rigor, or standardization necessary for scientific analysis.
There is a real need, Pollock said, for accessible meta-study data for all types
of study designs (not just clinical trials).
In this regard, Pollock drew attention to the Human Studies Database
Project, a database of past and ongoing human studies, both interventional
and observational. The primary project participants are CTSA institutions.
Pollock said that the goal is to enable computational reuse of human stud-
ies data for activities such as systematic reviews, planning future studies,
scientific portfolio analysis, and research networking. A subcomponent
of the project is the Ontology of Clinical Research (OCRe), focused on
developing an ontology to deal with issues of study design, interventions
and exposures, participants, outcomes, and statistical analysis.
Informatics Challenges for Translational Research
Studies Using Existing, Non-Research Data
Research data sources span the spectrum from EHRs to disease reg-
istries to clinical research protocol repositories, with varying degrees of
completeness, quality, and research utility. Pollock concurred with Shulman
regarding the need for structured data elements. In EHRs, for example, use-
ful information about drug exposure would include dose, schedule, inten-
sity, area under the curve, dose modifications, reasons for stopping a drug,
and patient pharmacokinetics, pharmacodynamics, and pharmaco genomic
characteristics. These data are not generally included in the EHR, however.
The major challenge when using existing information, Pollock said, is the
inability to go back and fill in the missing data needed for a particular
research investigation. Another concern with mining existing data is the
potential for systematic biases in large clinical data repositories. More data
is not necessarily better, Pollock noted, and biases can be amplified. In silico
research depends on having complete and valid information.
Lack of Harmonization
From the perspective of a clinical translational researcher, Pollock con-
curred with the challenges highlighted by the cancer centers and cooperative
OCR for page 21
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 21
groups. Lack of harmonization is a key concern, and he noted that at his
own institution, as in many academic health centers, there is more than
one EHR system in place. Clinical data systems and research data systems
do not routinely interoperate. Another issue for clinical research is the
choice of clinical trial management system (CTMS) and whether to use a
commercially available system or an open source system. There are sustain-
ability and cost issues to consider. Open source is technically free, but not
without cost because a lot more development time goes into implementing
an open source platform; however, it may be more sustainable when mov-
ing forward.
Regulatory Barriers
There are also regulatory barriers for public use datasets. As chair of
the advisory committee for the Texas Cancer Registry, Pollock expressed
concerns about the heterogeneity of across-state permissions to combine
data from multiple state cancer registries (in some cases, four levels of
approval are required before researchers can utilize the data). There
are also within-state restrictions; for example, until very recently, link-
ing hospital discharge data and Texas Cancer Registry data was not
permitted.
Tools, Technology, and Big Datasets
Imaging data are highly dimensional, and technical advances, such
as the ability to collect real-time functional imaging data, further increase
data dimensionality. There also has been an explosion in -omics data and
analysis tools. Decision support tools are available, but they require large-
scale validation and constant updating. Limitations to data storage and
networking are also a major issue. In one genome sequencing laboratory
Pollock had visited, for example, it was possible to keep active datasets on
the server for only about 1 week, after which the data had to be pulled off
to create space for new data.
Despite the challenges, big datasets can facilitate hypothesis generation
and study planning. Data can be used to assess the feasibility of a study.
Big datasets could also lower the cost of conducting clinical translational
research, Pollock said, by offering more precollected data, more automation,
and more interconnectivity.
OCR for page 22
22 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH
Moving Clinical Translational Informatics Forward
Pollock reiterated that research is a process, and more data do not
ecessarily lead to more discovery. He offered several suggestions to help
n
evolve the clinical translational research process:
· Bring clinical data up to the same standards as high-quality research
data.
· Devise statistical methodologies and study designs for use with clini-
cal data.
· Develop better data mining and filtering approaches to sort through
massive datasets.
· Connect genomic and molecular data with clinical data.
· Structure clinical data appropriately to support research.
· Ensure that these processes are guided in a way that is compatible
with a research framework.
caBIG--THE VISION AND THE REALITY
Daniel Masys, affiliate professor of biomedical and health informat-
ics at the University of Washington, Seattle, and chair of the new caBIG
oversight subcommittee of the NCI Board of Scientific Advisors, shared
the history and vision of caBIG and some of the lessons learned since its
launch in 2004. The Cancer Biomedical Informatics Grid was launched by
NCI to help address the growing problem of the overwhelming volume of
data from a multitude of sources and the inability to merge those data or
to communicate effectively across disciplines and stakeholders because of
divergent standards and terms and other barriers.
To define the priority areas for caBIG, a very extensive market deter-
mination exercise was undertaken to define the unmet needs of the NCI-
supported cancer centers (Box 2-2). As a result, caBIG was envisioned as a
common, widely distributed infrastructure that would permit the research
community to focus on innovation (rather than on the details of manag-
ing information), with the intent that raw and published cancer research
data would be available for data mining and integration into reanalyses and
meta-analyses. It would be built on shared vocabulary, data elements,
and data models that would facilitate information exchange and would
have a collection of interoperable applications and tools developed with
common standards.
OCR for page 23
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 23
BOX 2-2
Common Needs to Catalyze Effectiveness in
Cancer Research That Helped to Shape the
Priority Areas for the caBIG Activities
· Clinical data management tools
· Distributed data sharing/analysis tools
· Translational research tools
· Access to data
· Tissue and pathology tools
· Cancer center integration and program management
· Common data elements and standards
· Meta-data analysis
· Shared vocabulary and ontology tools and databases
· Statistical data analysis tools
· Visualization and imaging tools
· Proteomics
· Microarray and gene expression tools
· Licensing and intellectual property issues
· Staff resources
· High-performance computing
· Integration and interoperability
SOURCE: Masys presentation (February 27, 2012).
In 2010, NCI director Harold Varmus called for a high-level review of
caBIG. In its report released in March 2011, the working group concluded
that the need for caBIG is greater now then when it was conceived (NCI,
2011). In addition, there was very strong community support from cancer
centers for the original caBIG vision and goals of interoperability and
standards-based exchange of data. The working group also found, however,
that the many successes of the program have been offset by several problems
and that the overall impact of caBIG in transforming cancer research had not
been commensurate with the level of the investment (about $300 million).
The report highlighted findings in three main areas:
1. creation and management of standards for data exchange, and sup-
port for community-based software already in place;
OCR for page 24
24 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH
2. impact and track record of caBIG initiatives and tools with regard to
life science or integrative cancer research, clinical data management,
infrastructure, and community engagement; and
3. program administration, contracts management, and budget.
The working group concluded that the greatest impact of caBIG thus
far had been in the first area. The review found that the program had been
very effective in catalyzing progress in the development of community-
driven standards for data exchange and interoperability; development,
maintenance, enhancement, and dissemination of tools developed by
academic researchers; and community dialogue on the interoperability of
clinical and research software tools.
The group found that the main problems with the caBIG approach
that limited its uptake and impact included a "cart-before-the-horse grand
vision"; a technology-centric approach to data sharing; unfocused expansion;
a one-size-fits-all architectural approach; an unsustainable business model for
both NCI and users; and a lack of independent scientific oversight.
As a result, the working group issued five immediate tactical recom-
mendations. They are, as summarized by Masys:
1.Institute an immediate moratorium on all ongoing internal and
commercial contractor-based software development projects while
initiating a mitigation plan to lessen the impact of this moratorium
on the cancer research community.
2. Institute a 1-year moratorium on new projects, contracts, and sub-
contracts by caBIG.
3.Provide a 1-year extension on current caBIG-supported academic
efforts for development, dissemination, and maintenance of new
and existing community-developed software tools.
4.Establish an independent oversight committee, representing aca-
demic, industrial, and government (NCI, National Institutes of
Health [NIH]) perspectives to review planned initiatives for scien-
tific merit and to recommend effective transition options for current
users of caBIG tools.
5.Conduct a thorough audit of all aspects of the caBIG budget and
expenditures.
The independent oversight committee that was called for in recom-
mendation 4 (chaired by Masys) met for first time in July 2011. To begin to
address the criticisms in the March report, the committee developed review
OCR for page 25
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 25
criteria for informatics projects (similar to the evaluation criteria used by
study sections), outlining how it would assess each of the ongoing caBIG
activities, Masys noted (Box 2-3).
Looking Forward: A Three-Step Approach to
Success in Informatics Innovation
In response to a request from the NCI director, Masys consulted with
other experts to try to define the "recipe for success," that is, are there com-
parable programs that have been successful and what can we learn from
them? Masys found that two applications had "gone viral": the Research
Electronic Data Capture (REDCap) developed through a CTSA, and the
Informatics for Integrating Biology and the Bedside (i2b2), developed
with funding from NIH's National Center for Biomedical Computing.
In considering how these applications were different from caBIG, Masys
and colleagues drafted the following general steps to success in informatics
innovation.
1. Do not repeat the mistakes of the past.
· Do not try to solve all clinical and translational research informa-
tion technology problems in one framework.
· Do not worship standards over functionality.
· Do not try to have enterprise software adopted by fiat from
above.
· Do not try to buy adoption software products, because for the
sponsor, those costs grow ever larger.
· Recognize that organizations that cannot afford ongoing staffing
and help desk functions for software should not be expected to
adopt software even if it is free or provides some income to the
adopter. The acquisition cost is dwarfed by the support, mainte-
nance, and integration costs.
2. Understand the basic truth about IT complexity.
Increased functionality that is built at the expense of increased com-
plexity is always at risk of
· delays in development,
· inability of local implementers and users to understand what has
been built and how to use it, and
· being overtaken by other approaches that have a better price-to-
performance ratio (e.g., grid computing versus web services).
OCR for page 26
26 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH
BOX 2-3
NCI Informatics Project Review Criteria
1. D
oes the activity, application, or resource meet a well- articulated
and attainable need of basic, translational, or clinical research, or
cancer care (i.e., is there a "driving biological or clinical project"
and are the intended users members of the project team)?
2. How will success or failure be evaluated? Analogous to stop-
ping rules for clinical protocols, what will be the stopping rules
for ending the project if it either fails to meet its technical objec-
tives or fails to be adopted even if technically successful?
3. Will the activity, resource, or application, if successful, make
some objectively measurable incremental progress toward
an overall vision of interoperability of data and systems? Will
it enable data sharing and make use of and/or enhance open
international standards for research?
4. Is the activity, resource, or application designed to anticipate
change in a rapidly expanding knowledge base of science and
practice? Flexibility and generalizability are important charac-
teristics for longevity in an era of agile science.
5. Is the intended deliverable of the project achievable in the time
frame and budget proposed?
6. Will the output of the project be broadly implementable by organi-
zations of varying size and sophistication? Will it be used broadly
by organizations and institutions outside of NCI cancer centers
(e.g., other NIH centers or academic research organizations)?
7. Is there a documented plan for long-term maintenance,
enhancement, and fiscal sustainability of the activity, applica-
tion, or resource and its user base?
8. What is the user base and has there been a stakeholder
assessment to ensure that the activity, application, or resource
will indeed meet a currently unmet need or a reasonably antici-
pated future need?
9. Is the project generalizable and likely to create value or address
broad needs across the community of cancer centers and
investigators? Alternatively, would this activity, resource, or
application be perceived as a "pet project" of an "in" group?
10. Does the activity, resource, or application have enough market
value to gain adoption without incentives? Or if financial or
policy incentives are required, are they justified?
SOURCE: Masys presentation (February 27, 2012).
OCR for page 27
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 27
3. Observe the informatics research and development "do's."
· Solve one significant challenge at a time.
· Use small, nimble development teams led by domain experts.
· Keep development-to-implementation intervals short.
· Deploy software that can solve at least one problem that users or
adopters care about within 12 months of adoption.
· Demonstrate success first with a smaller group of the most
advanced sites and then let others follow.
· Create software that makes adoption of standards easier (not
harder) than nonstandardized alternatives.
· Let the market prioritize and vet the standards.
· Invest in simple interfaces between applications, not architectures.
· Make interested health care organizations demonstrate willing-
ness to invest their own assets and time for enterprise software.
· Allow intraorganizational and interorganizational needs and
technologies to diverge as needed, to maximize productivity.
With regard to increasing the probability of successful adoption of
informatics innovation in cancer research specifically, Masys recommended
focusing data sharing efforts (both the standards for sharing and the applica-
tions to do it) on those data for which there is a preexisting motivation to
share. Areas in which researchers really need one another's data and scientific
problems that simply cannot be solved within one laboratory or institution
are prime areas on which to focus sharing efforts (e.g., those studying rare
alleles who are trying to assemble a cohort of interest for a study). Coopera-
tive groups are a good example of this, he noted. The current challenges
are less technological and more policy oriented (e.g., IRB issues, privacy
concerns).
Increasing the uptake of innovative informatics in cancer care is a more
difficult task, given the low penetrance of EHRs in U.S. health care and
current economic pressures. Therefore, Masys advised aligning NCI efforts
with the EHR adoption incentives of the Office of the National Coordina-
tor for Health Information Technology (discussed further by Mostashari in
Chapter 5). A plausible transition goal for NCI, he suggested, would be to
make it easier for every oncology practice in America to care for a patient
on a clinical trial protocol. Building "public libraries" of decision support
tools to guide providers and patients through clinical protocols would be
an important contribution in this regard.
OCR for page 28
28 INFORMATICS NEEDS AND CHALLENGES IN CANCER RESEARCH
Community Participation in Moving Informatics Forward
During the discussion, there was much interest in involving patients
as well as the research community in efforts to advance cancer informatics.
Questions remain regarding how best to interact with NCI on caBIG activi-
ties, the potential role of publicly developed apps, open source software,
the development of institutional systems, and the importance of patient-
reported outcomes data.
NCI is very open and receptive to communications from interested
parties, Masys said. George Komatsoulis, interim director for the Center for
Biomedical Informatics and Information Technology at NCI, added that
caBIG is continuing to move forward, and there are "workspaces" in inte-
grative basic biology, clinical trials, data standards, imaging, biospecimens,
and other areas where individuals can bring their ideas to the attention of
the scientific advisory group and the caBIG program staff.
Lynn Etheredge of George Washington University asked if a national
apps strategy for cancer informatics would be appropriate, that is, whether
individuals could begin to develop the needed functionalities. Masys noted
that apps, as they are commonly understood, tend to be fairly small-scale
computer programs. They have value as lightweight applications that handle
very common tasks on a personal level, often on a personal device such as a
smartphone or tablet. However, their level of complexity is usually around
two orders of magnitude less than enterprise-level software (e.g., that used
by researchers for regular data inputs required for supporting sponsors),
with lower levels of data manipulation, storage, and security. Enterprise-
level software requires a more in-depth approach to implementation and, in
health care and clinical research, may also involve workflow modification.
Steven Piantadosi asked about open source software relative to caBIG.
Masys responded that open source software is part of the caBIG infra-
structure: i2b2 is open source, while REDCap is an academic consortium
model where a limited number of people have the ability to contribute to
the code base. Masys cautioned that in a full open source software model
(where anybody in the community can contribute to source), modules can
be introduced that have unintended transitive consequences (i.e., interfere
with other functions). As such, some open source models use quality control
mechanisms such as curators or stewards.
Mendelsohn noted that many organizations are trying to develop their
own internal standardized record of clinical trials and asked whether these
organizations should continue to spend scarce resources, time, and effort
OCR for page 29
OVERVIEW OF THE CANCER INFORMATICS LANDSCAPE 29
developing these independent information systems in light of the national
effort.
Masys responded that any major cancer center is going to have two
overlapping spheres of development, one that is outwardly focused and
one that is local and specific to the organization. There will then be transi-
tion applications that facilitate sharing sets of data by formatting them
and ensuring that they meet the standard for external collaborations. The
cooperative groups, for example, are well served by tools such as Medidata
Rave, which, while not perfect, is implemented on a large scale and provides
a conceptual basis for sharing information, he said.
Mark Gorman, cancer survivor and patient advocate, expressed support
for the patient-centered elements of the information systems discussed, for
example, those designed to collect patient-reported outcome information.
There is also great potential for informatics to support patients in their
decision making and in managing their own care. Dalton noted that patient
portals are very popular, with more than 70 percent of new cancer patients
at Moffitt creating their portal before their first visit. Shulman added that
the power of the systems discussed is the multiple sources of complementary
data obtained by different methodologies, and this includes the patient
portals being implemented by many cancer centers.
REFERENCES
Abernethy, A. P., L. M. Etheredge, P. A. Ganz, P. Wallace, R. R. German, C. Neti, P. B. Bach,
and S. B. Murphy. 2010. Rapid-learning system for cancer care. Journal of Clinical
Oncology 28(27):4268-4274.
Dalton, W. S., D. M. Sullivan, T. J. Yeatman, and D. A. Fenstermacher. 2010. The 2010
Health Care Reform Act: A potential opportunity to advance cancer research by taking
cancer personally. Clinical Cancer Research 16(24):5987-5996.
Danaei, G., M. Tavakkoli, and M. A. Hernán. 2012. Bias in observational studies of prevalent
users: Lessons for comparative effectiveness research from a meta-analysis of statins.
American Journal of Epidemiology 175(4):250-262.
Fenstermacher, D. A., R. M. Wenham, D. E. Rollison, and W. S. Dalton. 2011. Implementing
personalized medicine in a cancer center. Cancer Journal 17(6):528-536.
IOM (Institute of Medicine). 2010. A foundation for evidence-driven practice: A rapid learning
system for cancer care: Workshop summary. Washington, DC: The National Academies
Press.
NCI (National Cancer Institute). 2011. An assessment of the impact of the NCI Cancer
Biomedical Informatics Grid (caBIG®). http://deainfo.nci.nih.gov/advisory/bsa/
bsa0311/caBIGfinalReport.pdf (accessed July 11, 2012).
OCR for page 30