Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 41
3 What Would a Knowledge Network and New Taxonomy Look Like? In the previous chapter, the Committee outlined the reasons it concluded that the time is right to develop a Knowledge Network of Disease and New Taxonomy. But what would these resources look like and what implications would they have for disease classification, basic research, clinical care, and the health-care system? This chapter describes the Committee’s vision of a compre- hensive Knowledge Network of Disease and New Taxonomy that would unite the biomedical-research, public-health, and health-care-delivery communities around the related goals of advancing our understanding of disease pathogene - sis and improving health. The Committee envisions that the proposed resources would have several key features: • They would drive development of a disease taxonomy that describes and defines diseases based on their intrinsic biology in addition to traditional physical “signs and symptoms”. • They would go beyond description and be directly linked to a deeper understanding of disease mechanisms, pathogenesis, and treatments. • They would be highly dynamic, continuously incorporating newly emerging disease information. • They would be based on an Information Commons that draws upon as much disease-related information, from as large a number of individual patients, as possible. • Much of the data that would populate the Information Commons would be generated during the ordinary course of clinical care. 41
OCR for page 42
42 TOWARD PRECISION MEDICINE THE KNOWLEDGE NETWORK OF DISEASE WOULD INCORPORATE MULTIPLE PARAMETERS AND ENABLE A TAXONOMY HEAVILY ROOTED IN THE INTRINSIC BIOLOGY OF DISEASE Physical signs and symptoms are the overt manifestations of disease ob - served by physicians and patients. However, symptoms are not the best descrip- tors of disease. Symptoms are often non-specific and rarely identify a disease unambiguously. Physical signs and symptoms are generally also difficult to measure quantitatively. Furthermore, numerous diseases—including some of the most common ones such as cancer, cardiovascular disease, and HIV infec - tion—are asymptomatic in early stages. Indeed, in a strict sense, all diseases are presumably asymptomatic for some “latent period” following the initiation of pathological processes. As a consequence, diagnosis based on traditional “signs and symptoms” alone carries the risk of missing opportunities for prevention, or early intervention can readily misdiagnose patients altogether. Even when histological analysis is performed, typically on tissue obtained after diseases become clinically evident, obtaining optimal diagnostic results can depend on supplementing standard histology with ancillary genetic or immunohistochemi- cal testing to identify specific mutations or marker proteins. Biology-based indicators of disease such as genetic mutations, marker-pro - tein molecules, and other metabolites have the potential to be precise descrip - tors of disease. They can be measured accurately and precisely—be it in the form of a standardized biochemical assay or a genetic sequence—thus enabling comparison across datasets obtained from independent studies. Particularly when multiple molecular indicators are used in combination with conventional clinical, histological, and laboratory findings, they offer the opportunity for a more accurate and precise description and classification of disease. Numerous molecularly-based disease markers are already available, and the number will grow rapidly in the future. Among the most prominent parameters of disease are an individual’s: • Genome • Transcriptome • Proteome • Metabolome • Lipidome • Epigenome As discussed in Chapter 2, it is increasingly feasible to obtain substantial information about these biological features for each individual patient. The cost of sequencing an individual’s genome is rapidly dropping, and significant advances in the ability to globally and affordably characterize proteomes, me -
OCR for page 43
43 WHAT WOULD A KNOWLEDGE NETWORK AND NEW TAXONOMY LOOK LIKE? tabolomes, lipidomes, epigenomes, and microbiomes of individual subjects will continue, creating the potential for an increasingly rich molecular characteriza - tion of individuals in the future. Eventually, it is likely that extensive molecular characterization of individuals will occur routinely as a normal part of health care—even prior to appearance of disease, thereby allowing the collection of data on both sick and healthy individuals on a scale vastly exceeding current practice. In addition to providing a new resource for research on disease pro - cesses, these data would provide a far more flexible and useful definition of the “normal” state, in all its diversity, than now exists. The ability to make such measurements on both non-affected tissues and in sites altered by disease would allow monitoring of the development and natural history of many disorders about which even the most basic information is presently unavailable. THE INFORMATION COMMONS ON WHICH THE KNOWLEDGE NETWORK AND NEW TAXONOMY WOULD BE BASED WOULD INCORPORATE MUCH INFORMATION THAT CANNOT PRESENTLY BE DESCRIBED IN MOLECULAR TERMS It is well recognized that health outcomes, disease phenotypes, and treat- t treat - ment response are determined by the individual and combined effects of vari - ous factors ranging from the molecular to the environmental (Collins 2004; IOM 2006; HealthyPeople.gov 2011). Gene-environment interactions have been implicated in a diverse group of diseases and pathological processes, in - cluding some psychological illnesses (Caspi et al. 2010), hypertension (Franks et al. 2004), tumor growth (J.B. Williams et al. 2009), HIV (Nunez et al. 2010), asthma (Chen et al. 2009), and cardiovascular reactivity (Williams et al. 2001; Snieder et al. 2002). Furthermore, the fact that numerous genome-wide as - sociation studies (GWASs) have revealed rather modest, albeit highly statisti - cally significant, hazard ratios of disease risk highlights the need to investigate interactions among genetic and non-genetic factors to identify specific disease risk factors not found in conventional GWAS studies (Khoury and Wacholder 2009; Murcray et al. 2009; Cornelis et al 2010 ). Therefore, data added to the Information Commons should not be limited to molecular parameters as they are currently understood: patient-related data on environmental, behavioral, and socioeconomic factors will need to be considered as well in a thorough description of disease features1 (see Box 3-1). Despite the focus on the individual patient in the creation of the Infor- mation Commons, the Committee expects that the inclusion of patients from diverse populations coupled with the incorporation of various types of infor- 1 As with all patient-related data in electronic medical records and contributed to the Informa - tion Commons, information in the exposome layer requires that attention be paid to data sharing, informed consent, and privacy issues; see discussion Chapter 4.
OCR for page 44
44 TOWARD PRECISION MEDICINE Box 3-1 The Exposome The exposome is a characterization of both exogenous and endogenous expo- sures that can have differential effects on disease predisposition at various stages during a person’s lifetime (Wild 2005; Rappaport 2011). The emerging science of exposomics is concerned with the application of innovative approaches to compre- hensively measure a person’s exposure events, from conception to death, and determine how those exposures relate to health and disease (CDC 2010; NAS 2010; Rappaport 2011). A long-range goal is to ascertain the combined effects of these exposures by assessing the biomarkers and diseases they influence. In its broadest definition, the exposome encompasses all exposures—internal (such as the microbiome, described elsewhere in this report) and external—across the lifespan. Physical environment (e.g., occupational hazards, exposure to indus- trial and household pollutants, water quality, climate, altitude, air pollution, and liv- ing conditions (Smith et al. 2008; Klecka et al. 2010; Alexeeff et al. 2011; Brookhart et al. 2011; Cutts et al. 2011; Yorifuji et al. 2011; Zanobetti et al. 2011; McMichael and Lindgren 2011) and lifestyle and behavior (e.g., diet, physical activity, cultural practices, and use of addictive substances [DHHS 2010; Hu and Malik 2010; Arem et al. 2011]), are some of the more apparent exogenous exposures. However, the concept of the exposome extends beyond these factors to include social factors, such as socioeconomic status, quality of housing, neighborhood, social relation- ships, access to services, and experience of discrimination that can contribute to psychological stress, poor health, and health inequities (Epel et al. 2004; Krieger et al. 2005; IOM 2006; Cole et al. 2007; Unnatural Causes 2008; Bruce et al. 2009; Gravlee 2009; Williams and Mohammed 2009; Cardarelli et al. 2010; Kim et al. 2010; Pollack et al. 2010; CDC 2011; Karelina and DeVries 2011; Sternthal et al. 2011; WHO 2011). Despite the many practical and methodological challenges in character- izing and measuring these variables, rigorous evaluation of human exposures is needed. By incorporating data derived from multi-level assessments, a Knowl- edge Network of Disease could lead to better understanding of the variables and mechanisms underlying disease and health disparities, thereby helping to reveal a truer picture of the ecology of human health and facilitating a more holistic ap- proach to health promotion and disease prevention. mation contained in the exposome will result in a Knowledge Network that could also inform the identification of population-level interventions and the improvement of population health. For example, a better understanding of the impact of a sedentary lifestyle at the molecular level could conceivably facilitate the development of new approaches to physical education in early childhood. In addition, findings from the Knowledge Network and the New Taxonomy could reveal yet unidentified behavioral, social, and environmental factors that
OCR for page 45
45 WHAT WOULD A KNOWLEDGE NETWORK AND NEW TAXONOMY LOOK LIKE? are associated with particular diseases or subclassifications of diseases in certain populations and are amenable to public health interventions. The Healthy People 2020 Initiative (Healthy People.gov. 2011) emphasizes an ecological approach to disease prevention and health promotion that focuses on both individual-level and population-level determinants of health and in - terventions. While molecular variables are often more easily measured and more directly tied to disease outcomes, if the modifiable factors that have contributed to the signature are known, we will be better able to prevent disease and to phe- notype, genotype, and treat patients. Asthma illustrates the interplay of social, behavioral, environmental, and genetic factors in disease classification. It is estimated that various types of asthma affect more than 300 million people worldwide. The term “asthma” is now used to refer to a set of “signs and symptoms” including reversible airway narrowing (“wheezing”), airway inflammation and remodeling, and airway hyper-reactivity. These various signs and symptoms likely reflect distinct etiologies in different patients. Many subjects with asthma have an allergic com- ponent, while in other cases, no clear allergic contributor can be defined (Hill et al. 2011; Lee et al. 2011). In some patients, asthma attacks are precipitated by exercise or aspirin (Cheong et al. 2011). Some patients, particularly those with severe asthma, may be resistant to treatment with corticosteroids (Sear- ing et al. 2010). This phenomenological approach to asthma diagnosis has led to a plethora of asthma subtypes such as “allergic asthma,” “exercise-induced asthma,” and “steroid-resistant asthma” that may be clinically useful but pro - vide little insight into underlying etiologies. Over the years, linkage-analysis, candidate-gene, and genome-wide-associa- tion approaches have been applied to the study of the genetic underpinnings of asthma, leading to the identification of several associated genes and subpheno - types (Lee et al. 2011 ). However, these findings still leave most of the genetic influences of asthma unexplained (Li et al. 2010; Moffatt et al. 2010). Moreover, pediatric asthma research, in particular, has focused on a broad range of social and environmental, as well as genetic, contributors to the increased prevalence and severity of illness (Hill et al. 2011). Since the burden of asthma dispropor- tionately affects children living in socioeconomically disadvantaged neighbor- hoods (D.R. Williams et al. 2009; Quinn et al. 2010), asthma may prove useful as a model for testing the Knowledge Network’s value in attaining a broader and deeper understanding of disease and health, in both the clinical and public- health policy domains. A knowledge-network-derived-taxonomy based on the biology of disease may help to divide patients with asthma—as well as many other diseases—into subtypes in which the different etiologies of the disorder can be better understood, and for which appropriate, subtype-specific ap- proaches to treatment and prevention can be devised and tested.
OCR for page 46
46 TOWARD PRECISION MEDICINE THE PROPOSED KNOWLEDGE NETWORK OF DISEASE WOULD INCLUDE INFORMATION ABOUT PATHOGENS AND OTHER MICROBES Particularly because of advances in genomics, the proposed Knowledge Network of Disease has unprecedented potential to incorporate information about disease-causing and disease-associated microbial agents. Thousands of microbial genomes have been sequenced, providing a wealth of data on patho - genic and non-pathogenic organisms, and there has been an associated renais - sance in studies of the molecular mechanisms of host-pathogen interactions. In parallel with these advances in microbiology, the analysis of human-genome sequences is enhancing the understanding of host responses and variation in individual susceptibility to microbial pathogens and infectious diseases. Today, sequence data, combined with other biochemical and microbiological informa - tion, are being used to understand microbial contribution to health, improve detection of pathogens, diagnose infectious diseases, and identify potential new targets for novel drugs and vaccines. In addition, comparing the sequences of different strains, species, and clinical isolates is crucial for identifying genetic polymorphisms that correlate with phenotypes such as drug resistance, morbid- ity, and infectivity. Combining this information with the molecular signature of the host will provide a more complete picture of an individual’s diseases allow - ing custom-tailoring of therapeutic interventions. THE PROPOSED KNOWLEDGE NETWORK OF DISEASE WOULD GO BEYOND DESCRIPTION A Knowledge Network of Disease would aspire to go far beyond disease description. It would seek to provide a unifying framework within which basic biology, clinical research, and patient care could co-evolve. The scope of the Knowledge Network’s influence would encompass: Disease classification. The use of multiple molecular-based parameters to characterize disease may lead to more accurate and finer-grained classification of disease (see Box 3-2). Disease classification is not merely an academic exer- cise: more nuanced diagnostic accuracy and ability to recognize disease sub- types would undoubtedly have important therapeutic consequences, allowing treatment regimes to be customized based on the precise molecular features of a patient’s disease. Disease-mechanism discovery. A Knowledge Network in which diseases are increasingly understood and defined in terms of molecular pathways has the potential to accelerate discovery of underlying disease mechanisms. In a molecularly-based Knowledge Network, a researcher could readily compare the
OCR for page 47
47 WHAT WOULD A KNOWLEDGE NETWORK AND NEW TAXONOMY LOOK LIKE? Box 3-2 Distinguishing Disease Types Recent progress in the classification of lymphomas illustrates how a Knowl- edge Network could help distinguish diseases or disease states with similar symp- toms and clinical presentations. Gene-expression profiling led to the discovery that B-cell lymphomas comprise two distinct subtypes of disease with different driver mutations and different prognoses (Alizadeh et al. 2000; Sweetenham 2011). One subtype bears a gene-expression profile similar to germinal center B-cells and has a good prognosis, while a second subtype bears a gene-expression profile similar to activated B-cells and has a poor prognosis. Recognition of these biological and clinical differences between subtypes of B-cell lymphomas makes it possible to predict patient prognosis more accurately and guide treatment decisions. Similarly, leukemias are also now categorized based on differences in driver mutations, revealing subtypes with different prognoses and responses to particular treatment approaches. Acute myeloid leukemias with FLT3/ITD mutations have a poorer prognosis than acute myeloid leukemias with a normal FLT3 gene (Kiyoi et al. 1999; Kottaridis et al. 2001). As a consequence, patients bearing FLT3/ITD mutations are more likely to receive allogenic bone-marrow transplants or be of- fered experimental therapy with FTLs kinase inhibitors, while patients who do not have FLT3/ITD mutations are more likely to be treated only with chemotherapy. These are two of many known examples in which molecular data have been used to distinguish subtypes of malignancies with different prognoses and that benefit from different treatments. The proposed Knowledge Network of Disease could be expected to lead to many more insights of this type. By allowing any researcher to carry out analyses of this type on large numbers of patients, tracked over long periods of time, it is likely that insights such as the clinical relevance of FLT3 mutations in leukemia could be achieved for many other cancers and in situations where tumor behavior depends on a more complex interplay of influences. molecular fingerprint (such as one defined by the transcriptome or proteome) of a disease with an unknown pathogenic mechanism to the information avail - able for better understood diseases. Similarities between the molecular profiles of diseases with known and unknown pathogenic mechanisms might point directly to shared disease mechanisms, or at least serve as a starting point for directed molecular interrogation of cellular pathways likely to be involved in the pathogenesis of both diseases. Disease detection and diagnosis. A Knowledge Network that integrates data from many different levels of disease determinants collected from individual subjects over time may reveal new opportunities for detection and early diag - nosis. The availability of information on a multitude of diverse diseases should facilitate epidemiological research to identify novel diagnostic markers based
OCR for page 48
48 TOWARD PRECISION MEDICINE on correlations among diverse datasets (including clinical, social, economic, environmental, and lifestyle factors) and disease incidence, treatment deci - sions, and outcomes. In some instances, these advances would follow from the new insights into pathogenic mechanisms discussed above. The most robust early-detection tests—for example, assessment of an asymptomatic patient’s HIV status—are based on a clear understanding of pathogenic mechanism. In other cases, however, molecular profiles may prove sufficiently predictive of a patient’s future health to have substantial clinical utility long before the mechanistic rationale of the correlation is understood. Disease predisposition. A Knowledge Network of Disease that links in- formation from many levels of disease determinants, from genetic to environ - ment and lifestyle, will improve our ability to predict and survey for diseases. Following outcomes in individual patients over time will allow the prognostic value of molecular-based classifications to be tested and, ideally, verified. Multi- parameter data across the entire spectrum of disease will become available. Obviously, the clinical utility of identifying disease predispositions depends on the availability of interventions that would either prevent or delay onset of disease or perhaps ameliorate disease severity. Disease treatment. The ultimate goal of most clinical research is to improve disease treatments and health outcomes. There are many ways in which a Knowledge Network of Disease and its derived taxonomy may be expected to impact disease treatment and to contribute to improved health outcomes for patients. Accurate diagnosis is the foundation of all medical interventions. As many of the examples already discussed illustrate, finer-grained diagnoses often are the key to choosing optimal treatments. In some instances, a molecularly informed disease classification offers improved options for disease prevention or management even when different disease subtypes are treated identically (see Box 3-3). A Knowledge Network that integrates data from multiple levels of disease determinants will also facilitate the development of new therapies by identifying new therapeutic targets and may suggest off-label use of existing drugs. In other cases, the identification of links between environmental factors or lifestyle choices and disease incidence may make it possible to reduce disease incidence by lifestyle interventions. Importantly, as discussed below, the Committee believes the Knowledge Network and its underlying Information Commons would enable the discovery of improved treatments by providing a powerful new research resource that would bring together researchers with diverse skills and integrate knowledge about disease processes in an unprecedented way. Indeed, it is quite possible that the transition to a modernized “discovery model” in which disease data generated during the course of normal health care and analyzed by a diverse set
OCR for page 49
49 WHAT WOULD A KNOWLEDGE NETWORK AND NEW TAXONOMY LOOK LIKE? Box 3-3 Information to Guide Treatment Decisions The example of a patient such as Patient 1 with breast cancer, described in the Introduction, illustrates the potential of a Knowledge Network of Disease to provide patients with valuable information even when there is no difference in treatment for different diseases subtypes (e.g., sporadic vs. BRCA1/2-asso- ciated breast cancer). While mutations in the tumor-suppressor genes BRCA1 and BRCA2 strongly predispose women to breast and ovarian cancer, the extent to which particular germline mutations in these genes increase cancer risk often remains uncertain (Gayther et al. 1995). Consequently, patients and physicians must currently make decisions about whether to undertake more intensive can- cer surveillance (for example, by breast magnetic resonance imaging or vaginal ultrasound) without being able clearly to assess the risks and benefits of such increased screening and the anxiety and potential morbidity that arises from in- evitable false positives. Furthermore, some patients elect to undergo prophylactic mastectomies or oophorectomies without definitive information about the extent to which these drastic procedures actually would reduce their cancer risk. Studies attempting to quantify these risks have largely focused on particular ethnic groups in which a limited set of mutations occur at high enough frequen- cies to allow reliable conclusions from analyses carried out on a practical scale. If BRCA1/2 genotypes and health histories could be compared across the large datasets currently segregated among different health-care organizations, it would become possible to assess accurately cancer risks for people with different mu- tations and genetic backgrounds. Such data would allow more rational recom- mendations regarding risk-reduction strategies, thereby creating enormous value for individual patients, health-care providers, and payers, by making it possible to avoid unnecessary screening and treatment while reducing cancer incidence and promoting early detection. of researchers would ultimately prove to be a Knowledge Network of Disease’s greatest legacy for biomedical research. Drug development. Molecular similarities among seemingly unrelated dis- eases would also be of direct relevance to drug discovery as it would lead to targeted investigation of disease-relevant pathways that are shared between molecularly related diseases. In addition, ongoing access to molecular profiles and health histories of large numbers of patients taking already-approved drugs would undoubtedly lead to improved drug safety by allowing identification of individuals at higher-than-normal risk of adverse drug reactions. Indeed, our limited understanding of—and lack of a robust system for studying—rare
OCR for page 50
50 TOWARD PRECISION MEDICINE adverse reactions is a major barrier to the introduction of new drugs in our increasingly risk-aversive and litigious society. Health disparities. Major disparities in the health profiles of different “racial”, ethnic, and socio-economic groups within our diverse society have proven discouragingly refractory to amelioration. As discussed above, it is quite likely that key contributors to these disparities can be most effectively addressed through public-health measures and other public policies that have little to do with the molecular basis of disease, at least as we presently under- stand it. However, the Committee regards the Information Commons and Knowledge Network of Disease, as potentially powerful tools for understand - ing and addressing health disparities because they would be informed by data on the environmental and social factors that influence the health of individual patients. For the first time, these resources would bring together, in the same place, molecular profiles, health histories, and data on the many determinants of health and disease, thereby optimizing the ability to decipher the mechanisms through which exogenous factors give rise to endogenous, biological inputs, directly affecting health. Researchers and policy makers would then be better able to sort out the full diversity of possible reasons for observed individual and group differences in health and to devise effective strategies to prevent and combat them. A HIERARCHY OF LARGE DATASETS WOULD BE THE FOUNDATION OF THE KNOWLEDGE NETWORK OF DISEASE AND ITS PRACTICAL APPLICATIONS The establishment of a Knowledge Network, and its research and clinical applications, would depend on the availability of a hierarchy of large, well- integrated datasets describing what we know about human disease. These datasets would establish the foundation for the New Taxonomy and many other basic and applied activities throughout the health-care system. The Informa - tion Commons would contain the raw information about individual patients from which meaningful links and relationships could be derived. Recognizing that the Knowledge Network would need to be informed by vast amounts of information external to the network itself, the Committee envisions the need for substantial research in medical informatics directed at all steps of the creation and curation of the network, and, equally importantly, its use by individuals with diverse backgrounds and goals. The creation of the Knowledge Network and its underlying Information Commons would enable the continuous compi - lation and analysis of molecular, environmental, behavioral, social, and clinical data in a dynamic, shared platform. Such an information platform would need to be accessible by users across the entire spectrum of research and clinical care, including payers. Data would be continuously deposited by the research
OCR for page 51
51 WHAT WOULD A KNOWLEDGE NETWORK AND NEW TAXONOMY LOOK LIKE? community and extracted directly from the medical records of participating patients. The roles of the different datasets in this information resource are schematized in Figure 3-1. The precise structures of both the Information Commons and Knowledge Network of Disease remain to be determined and would be informed by pilot studies, as discussed in Chapter 4. However, given its purpose, the Committee envisions the Information Commons as (see also Figure 1-2): Multilayered. Given the inclusion of multiple parameters ranging from genomic to environmentally modulated disease factors, the Information Com - mons would likely have a multi-layered structure with each layer containing the information for one disease parameter, such as “signs and symptoms”, genetic mutations, epigenetic patterns, metabolic characteristics, or other risk factors (including social, behavioral, and environmental influences). Individual-centric. The Information Commons should register all mea- surements with respect to individuals so that the multitude of influences on pathophysiological states can be viewed at scales that span all the way from the molecular to the social level. Only in this way could, for example, individual en- vironmental exposures be matched to individual changes in molecular profiles. These data would need to be stored in an escrowed, encrypted depository that allows graded release of data depending on the questions asked, the access level of the individual making the inquiry, and other parameters that would undoubt- edly emerge in the course of pilot studies. The Committee realizes that this is a radical approach and intense public education and outreach about the value of the Information Commons to the progress of medicine would be essential to harness informed volunteerism, the support of disease-specific advocacy groups, and the engagement of other stakeholders. The Committee regards careful handling of policies to ensure privacy as the central issue in its entire vision of the Information Commons, the Knowledge Network of Disease, and the New Taxonomy. Hence, this topic is discussed in more detail in Chapter 4. The Knowledge Network of Disease, created by integrating data in the Information Commons with fundamental biological knowledge, drawn from the biomedical literature and existing community databases such as GenBank, would be the centerpiece of the informational resources underlying the New Taxonomy. The Committee envisions this network as: Highly inter-connected. In order to extract relationship information between multiple parameters—for example, the transciptome and the exposome—the multiple data layers must be inter-connected (see Figure 3-1). Ideally, each in- formation layer would be connected to every other layer: thus, “signs and symp- toms” would be linked to mutations, mutations to metabolic defects, exposome
OCR for page 52
52 TOWARD PRECISION MEDICINE FIGURE 3-1 Building a biomedical Knowledge Network for basic discovery and Medicine. At the center of a comprehensive biomedical information network is an Information Figure 3-1 Commons that contains current disease information linked to individual patients and is continuously updated by a wide set Bitmapped of new data emerging though observational stud - ies during the course of normal health care. The data in the Information Commons and Knowledge Network serve three purposes: (1) they provide the basis to generate a dynamic, adaptive system that informs taxonomic classification of disease; (2) they provide the foundation for novel clinical approaches (diagnostics, treatments, strate - gies), and (3) they provide a resource for basic discovery. Validated findings that emerge from the Knowledge Network, such as those which define new diseases or subtypes of diseases that are clinically relevant (e.g., which have implications for patient prognosis or therapy) would be incorporated into the New Taxonomy to improve diagnosis (i.e., disease classification) and treatment. The fine-grained nature of the taxonomic classifica- tion would aid in clinical decision-making by more accurately defining disease. SOURCE: Committee on A Framework for Developing a New Taxonomy of Disease.
OCR for page 53
53 WHAT WOULD A KNOWLEDGE NETWORK AND NEW TAXONOMY LOOK LIKE? to the epigenome, and so forth. The links could be one-to-one but most com- monly would be many-to-one, and one-to-many (e.g., particular signs and symp- toms arise when other parameters fall into many otherwise unrelated clusters). The interrelationships among such features within and between the layers could be characterized through a variety of representations that attempt to extract meaning from the Information Commons. For example, distinct transcriptomes may define several types of B-cell lymphomas. Meanwhile, different types of lymphomas, defined by transcriptome analysis, may have distinct metabolomic profiles. The similarities of multiple diseases could be discerned either from re- lationships among the networks of individual parameters (e.g., transcriptomes of multiple B-cell lymphomas) or from common patterns that emerge once multiple parameters are combined. Flexible. A highly inter-connected Knowledge Network would link mul- tiple individual networks of parameters in a flexible way. A user could chose to interrogate only a small part of the network by limiting his or her analysis to a single information layer, or even a small portion of this layer; alternatively, a user could interrogate the complex interrelationship of multiple parameters. High flexibility ensures easy cross-comparison and cross-correlation of any desired dataset, making it a versatile tool for a wide spectrum of applications ranging from basic research to clinical studies and healthy system administration. Widely accessible. The Knowledge Network would need to be accessible and usable by a wide range of stakeholders from basic scientists to clinicians, health-care workers, and the public. Furthermore, the available information would need to be mineable in ways that are custom-tailored to the needs of different users, possibly by implementation of purpose-specific user interfaces. While the Committee agreed upon the generalities listed above and illus- trated in Figure 3-1 about the Information Commons and Knowledge Network —and their relationship to a New Taxonomy— specifics of implementation such as the detailed design of the Information Commons, the information technology platforms used to create it, questions about where key infrastructure should be physically housed, who would oversee it, and how the Information Commons would be financed, were considered beyond the scope of the Com- mittee’s charge in a framework study. Nonetheless, dramatic developments in the fields of medical information technology—and other developments dis - cussed in Chapter 2—give the Committee confidence that the creation and implementation of this ambitious and novel infrastructure is a feasible goal.
OCR for page 54
54 TOWARD PRECISION MEDICINE THE PROPOSED KNOWLEDGE NETWORK WOULD FUNDAMENTALLY DIFFER FROM CURRENT BIOMEDICAL INFORMATION SYSTEMS Immense progress has been made during the past 25 years in organizing our knowledge of basic biology, health, and disease, even as many components of this knowledge base have grown super-exponentially. The National Library of Medicine and its National Center for Biotechnology Information division (NCBI), created in 1988, maintains the closest current counterpart to the in - formation infrastructure that the Committee envisions. The NCBI maintains a vast array of information about basic biology, health, and disease—ranging from the PubMed system for indexing the biomedical literature to GenBank, the primary depository for DNA-sequence data—and its databases are queried daily by nearly anyone involved in biomedical research. So, what is the differ- ence between the Committee’s vision of the Information Commons and Knowl- edge Network of Disease and reasonable extrapolations of what the NCBI has already accomplished? The key difference is that the Information Commons, which would under- lie the other databases, would be “individual-centric.” The various databanks curated by NCBI generally only contain a single disease parameter and even if multiple pieces of information from an individual make it into multiple data - banks—say a breast cancer patient’s transcriptome stored in the GeneOmnibus database of published microarray data and information about her chromosome translocations in the Cancer Chromosome databank—they are not linked be - tween databases. An independent researcher, who was not involved in the study that contributed these entries, has no way of knowing that they are from the same individual. As a consequence, relationships between multiple parameters that determine disease status in a given individual are impossible to extract. However, motivated by the recent proliferation of GWAS studies, NCBI has developed an individual-centric database, dbGap (the database of Genotypes and Phenotypes). This database was “developed to archive and distribute the results of studies that have investigated the interaction of genotype and phe - notype” (NCBI 2011b). The Committee considers NCBI’s success in doing so—despite severe current constraints on the sharing of phenotypic information about individuals—as evidence that the obstacles to creating an Information Commons can be overcome. This issue is discussed in more detail in Chapter 4. However, the important point is that little of the NCBI’s vast current store of information could, even in principle, be organized along the lines suggested for the Information Commons. This information was not collected in a way that allows the individual to be the central organizing principle, and no amount of redesign of the inter-connections between different entries in the current system could achieve the goals the Committee has outlined. The Committee would like to emphasize the novelty and power of an
OCR for page 55
55 WHAT WOULD A KNOWLEDGE NETWORK AND NEW TAXONOMY LOOK LIKE? Information Commons that is “individual-centric.” As discussed in Chapter 2, a useful analogy is geographical information systems (GISs) such as Google Maps (see Figure 1-2). Following public access to the Global Positioning System (GPS) and dramatic improvements in database technology in many ways analogous to the driving forces current advances in data generation and handling in biomedicine, it became apparent to many users of geographically indexed information that a surprisingly high portion of the world’s information could be organized around GPS coordinates. Like the proposed Information Commons, GISs are layered data structures that inter-connect vast amounts of information and can be mined for information that is not readily apparent in the primary GPS of an object. For example, given the coordinates of a large number of, say, backyard barbecue grills, one can suddenly overlay a vast amount of socio-economic, ethnic, climatological, and other data on what—at the start of the investigation—appeared a peculiar, anecdotal inquiry. In some respects, this approach is counter-intuitive. The GPS coordinates of someone’s backyard barbecue grill may appear to take one away from useful generaliza - tions about grills: it reveals more detail than one might want to know about an individual grill without laying any obvious foundations for developing an integrated perspective on the cultural practice of backyard-barbecuing. How - ever, it is the precise GPS coordinates of an individual grill that are the key to inter-connecting whatever has been learned about this particular grill to a larger world of information. Despite significant challenges to constructing an individual-centric Infor- mation Commons, the Committee concluded that this is a realistic undertak - ing and would be essential to the success of the Knowledge-Network/New Taxonomy initiative. The Committee is of the opinion that “precision medicine,” designed to provide the best accessible care for each individual, is not achievable without a massive reorientation of the information systems on which researchers and health-care providers depend: these systems, like the medicine they aspire to support, must be individualized. Generalizations must be built up from infor- mation on large numbers of individuals. Efforts to reverse this process will fail since indispensable information is lost when molecular profiles, data on other aspects of an individual’s circumstances, and health histories are abstracted away from the individual at the very beginning of investigations into the deter- minants of health and disease. A KNOWLEDGE NETWORK OF DISEASE WOULD CONTINUOUSLY EVOLVE Although knowledge of disease, and particularly molecular mechanisms of pathogenesis, is still limited, the pace of progress has never been greater. New insights into the biology of disease are emerging rapidly from a wealth of molecular approaches, as well as from new insights into the importance
OCR for page 56
56 TOWARD PRECISION MEDICINE of environmental factors. However, the system for updating current disease taxonomies, at intervals of many years does not permit the rapid incorpora - tion of new information, thereby contributing to the delayed introduction of advances that have the potential, over time, to guide mainstream practice. The individual-centric nature of an Information Commons is an important means of ensuring that the data underlying the Knowledge Network, and its derived taxonomy, would be constantly updated. As participating patients undergo new tests and treatments, associated information would enter the Information Com- mons and, on the basis of these data, the taxonomies, such as the ICD, could be updated continuously. Such a dynamic system would not only accept new inputs for established disease parameters, it would also accommodate new types of information generated by newly developed technologies to identify, acquire, measure, and analyze new biological features of disease. THE NEW TAXONOMY WOULD REQUIRE CONTINUOUS VALIDATION Bad information is worse than no information. A key feature of a clinically useful taxonomy is the requirement for a validation system. The logic of the classification scheme, and especially its utility for practical applications, needs to be carefully and continuously tested. This is particularly important when patients and clinicians use the New Taxonomy to inform clinical decisions. The New Taxonomy should be routinely tested to provide all stakeholders with data indicating the extent to which decisions guided by it can be made with confidence. Clearly, some patients and clinicians will be more comfortable than others with making decisions that are based on clinical intuition rather than proven evidence. However, a physician should be able to interrogate the Knowledge Network that underlies the New Taxonomy to learn whether others have had to make a similar decision, and, if so, what the consequences were. For example, if a drug has been introduced to target a particular driver mutation in a cancer, a physician needs to know whether or not rigorous clinical testing has determined that the drug is safe and effective. Is the drug effective only in some patients who can be identified in some way, such as by analyzing variants of genes that affect cell growth or drug metabolism? Similarly, if a laboratory test is considered to be a candidate predictor for the later development of disease, has that hypothesis been rigorously validated? Is the candidate test valid in some patient groups but not others? Whether a given test is used to identify predic - tors of disease or the existence of disease, the test result must be interpreted in the context of knowledge about the “normal range” of results. This require - ment is not a trivial consideration, especially for tests based on integration of vast amounts of data, such as the genome, transcriptome, and metabolome of the patient. Even with a conventional sequencing test, it is often difficult to ascertain with certainty whether a sequence change is disease-causing or insig -
OCR for page 57
57 WHAT WOULD A KNOWLEDGE NETWORK AND NEW TAXONOMY LOOK LIKE? nificant. This dilemma is multiplied many times over for genome-level testing. Some initial results from whole-human-genome-sequencing data indicate the scale of this problem: most individuals have dozens to hundreds of sequence variants that are readily recognizable, on biochemical grounds, as potentially pathogenic: examples include variants that cause premature-protein truncation or loss of normal stop codons (Ge et al. 2009; Pelak et al. 2010)—yet the clinical significance of nearly all such variants remains obscure. Defining and continu - ously refining our understanding of the normal “reference range” for such tests would require being able to access and effectively analyze biological and other relevant clinical data derived from large and ethnically diverse populations. Ul - timately, the Knowledge Network that underlies the New Taxonomy will make it possible to develop decision-support tools that synthesize information and alert health-care providers to all validated insights that emerge from the Knowl- edge Network and that are relevant to clinical decisions under consideration. THE NEW TAXONOMY WOULD DEVELOP IN PARALLEL WITH THE CONTINUED USE OF CURRENT TAXONOMIES Existing disease taxonomies, such as ICD, clearly have utility and are likely to continue to be employed throughout the health-care system far into the future. The organizational and financial costs of systematically replacing these systems would be substantial. Moreover, as noted above, those responsible for revision of the ICD taxonomy are actively engaged in incorporating molecular characteristics of disease into that system. Hence, it is quite possible that the New Taxonomy could ultimately subsume the ICD system, with the latter comprising the most rigorously validated subset of disease classifications. Such issues must be addressed but, given the magnitude of the tasks associated with launching the creation of the Information Commons and the Knowledge Net - work of Disease, and seeing it through its formative stages, their consideration can safely be postponed for many years. THE PROPOSED INFORMATIONAL INFRASTRUCTURE WOULD HAVE GLOBAL HEALTH IMPACT A Knowledge Network of Disease should ultimately provide global ben- efits. Inevitably, the Knowledge Network initially would be devised primarily through data acquired, placed in the Information Commons, and analyzed by researchers and medical institutions in developed countries. However, a comprehensive and fully developed Knowledge Network of Disease must in - clude the many diseases, including infectious diseases and disorders linked to geographically limited environmental exposures that are endemic in low- and middle-income settings throughout the world. Therefore, the Knowledge Net -
OCR for page 58
58 TOWARD PRECISION MEDICINE work effort should be extended to include analysis of data derived in these settings. Improved precision in defining disease is of particular importance in re - gions of the world with under-developed health-care systems. Disease misdiag - nosis in such settings has contributed to the improper use of therapy and the establishment of widespread drug resistance among disease-causing infectious agents. Malaria is one disease where misdiagnosis is common with dramatic consequences and costs (D’Acremont et al. 2009). In general, patients pre - senting with fever in regions where malaria is endemic are administered anti- malarial treatment without direct evidence that the patient actually has malaria. In part, this practice is due to limited resources—the state-of-the-art diagnostic test in most areas is a microscopy-based blood-smear diagnosis, which requires expert training. The lack of adequate point-of-care diagnostic tests to ascertain whether the patient has malaria represents a significant impediment to the selection of appropriate targeted therapy. As a consequence, major efforts are under way to develop molecular diagnostics for malaria and other major killers such as tubercuolosis (Boehme et al. 2010; Small and Pai 2010). Ultimately, such diagnostics will need to include tests that differentiate among various disease agents and also take into account genetic or molecular markers in the host that influence host responses to the infection or potential treatments. A globally relevant Information Commons and Knowledge Network could be useful in facilitating this process—for example, to distinguish between ma - laria caused by Plasmodium falciparum versus Plasmodium vivax, which are susceptible to different anti-malarial drugs (malERA Consultative Group on Diagnoses and Diagnostics 2011). The Knowledge Network and its associated taxonomy should not be designed exclusively to meet the needs of countries with advanced medical systems. Indeed, the individual-centric character of the Information Commons—and the inclusion of available data about contributing individuals, including information about where and in what circumstances they live—offers an unprecedented path toward a Knowledge Network of Disease that can meet global needs for health care and disease prevention.