Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Summary1 The 21st century has already seen the emergence of four pandemic viruses (chikungunya virus, Zika virus, 2009 H1N1 influenza virus, and severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]), several viral epidemics (e.g., 2003 SARS, 2012 Middle East respiratory syndrome [MERS-CoV], 2014 Ebola virus in West Africa and 2018 Ebola virus in the Democratic Republic of the Congo) and intermittent sporadic outbreaks of other viruses such as H7N9 influenza. At the time of this writing, SARS-CoV-2 had spread worldwide, infecting at least 10 million people with an estimated 500,000 deaths within 6 months. Multiple outbreaks suggest that preparedness and response strategies need modernization. New advances in metagenomics, epidemiology, and big data analyses provide new paradigms for tracing symptomatic and asymptomatic transmission networks, thereby enabling our capacity to break or delay virus transmission to reduce morbidity and mortality. Recognizing this need, the Department of Health and Human Services Office of Assistant Secretary for Preparedness and Response (HHS/ASPR) and the Office of Science and Technology Policy (OSTP), the National Academies of Sciences, Engineering, and Medicine convened an ad hoc committee to lay out a framework to define and describe the data needs for a system to track and correlate viral genome sequences with clinical and epidemiological data. Such a system would help ensure the integration of data on viral evolution with detection, diagnostic, and countermeasure efforts. Previous efforts to integrate genomic, clinical, and epidemiological data have led to new insights around the transmission and pathogenesis of disease, including for previous outbreaks of SARS-CoV, Ebola virus, Zika virus, seasonal influenza, mumps, foodborne illnesses, and antibiotic-resistant bacteria. The most successful approaches to date have involved multipronged approaches and the timely collaboration of public and private stakeholders. CURRENT GENOMIC EPIDEMIOLOGY EFFORTS FOR SARS-COV-2 Several ongoing efforts are leveraging the power of genomic epidemiology in response to the COVID-19 pandemic. In the United States, the Centers for Disease Control and Preventionâs (CDCâs) SARS-CoV-2 Sequencing for Public Health Emergency Response, Epidemiology, and Surveillance (SPHERES) consortium is working to coordinate a nationwide genomic sequencing effort. The National Institutes of Health (NIH) supports the National COVID Cohort Collaborative (N3C), a secure portal for patient-level COVID-19 clinical data, and the National Center for Biotechnology 1 This summary does not include references. Citations for the discussion presented in the Summary appear in the subsequent report chapters. PREPUBLICATION COPY: UNCORRECTED PROOFS 1
Informationâs (NCBIâs) reference sequence database. Several regional initiatives have emerged as well, integrating data sharing through existing global efforts like the Global Initiative on Sharing All Influenza Data (GISAID) and Nextstrain. Even as new efforts are being established, the committee found that several limitations blunt their effectiveness, such as insufficient funding, poor coordination, limited capacity for data integration, unrepresentative data, and lack of an adequately trained workforce with the multifaceted expertise needed to conduct this work. Fundamental governance and collaboration issues extending from the top down have led to the fragmentation of approaches and varying capacities at local and national levels. Conclusion: Current sources of SARS-CoV-2 genome sequence data, and current efforts to integrate these data with relevant epidemiological and clinical data, are patchy, typically passive, reactive, uncoordinated, and underfunded in the United States. As a result, currently available data are unrepresentative of many important population features, biased, and inadequate to answer many of the pressing questions about the evolution and transmission of the virus, and the relationships of genome sequence variants with virulence, pathogenesis, clinical outcomes and the effectiveness of countermeasures. Thus, the viral sequence data and associated data needed are not being collected. RECOMMENDATION 1. The U.S. Department of Health and Human Services should ensure the generation of representative, high-quality full genome sequences of SARS- CoV-2 across the United States, and in the future, from emerging epidemic or pandemic pathogens, in order that these data can be used to meet key needs for genomic surveillance. â¢ Pathogen samples must be obtained from individuals who represent broad diversity of factors such as race and ethnicity, gender, age, geography, and other demographic features such as housing type, clinical manifestations and outcomes, and transmissibility. â¢ Capacity for genomic sequencing should be developed and supported at many geographically distributed sites performing testing, including public health laboratories, academic and medical centers. â¢ Representative SARS-CoV-2 clinical samples from across the United States should be collected and sequenced on an ongoing basis to provide baseline data and facilitate near-real-time transmission tracking. â¢ Genome sequences should be shared openly on publicly accessible databases, such as the National Center for Biotechnology Information linked to the Global Initiative on Sharing All Influenza Data. BUILDING A FRAMEWORK TO TRACK AND CORRELATE VIRAL GENOME SEQUENCES WITH CLINICAL AND EPIDEMIOLOGICAL DATA To understand the evolution of SARS-CoV-2 and the implications for transmission and clinical manifestations, the interpretation of genomic data (see Recommendation 1) is reliant on linked clinical and epidemiological data. Table S-1 briefly outlines how viral genome sequence PREPUBLICATION COPY: UNCORRECTED PROOFS 2
data, when combined with other types of data, can be used to inform questions related to transmission, evolution, and clinical disease. TABLE S-1 Summary Table of Considerations for Transmission, Evolution, and Clinical Disease Clinical and/or Viral Genomic Epidemiologic Data Goal Question Sequence Data Needs Needsa Is outbreak due to Pathogen samples Time and place of virus multiple introductions? from individuals who isolation and travel Where is the virus represent broad history of cases coming from? diversity from outbreaks and many regions/countries Is outbreak due to local Sequences from local Local population-based spread? How and/or groups/areas with information on sites of Transmission where is the virus being increased incidence exposure, gatherings, patterns transmitted? rates isolated communities and congregate living (long-term care facilities, hospitals, prisons) Is there evidence of Sequences of virus Information on sites of super-spreading events from groups of exposure, gatherings and how important are people infected in the they? same setting Is the virus changing in Changes in viral Calculations of R0 transmissibility? genome sequence (contact tracing data â associated with number of people increased spread infected) Is resistance to antiviral Changes in viral Hospital or health care drugs or other treatments genome associated center data on patients changing? with failure to who do not respond to respond to treatment therapy or show failure of treatment Evolution/influence Is there altered escape Changes in viral Hospital data on of selective from the host immune genome associated patients who show pressures response/within host with persistence prolonged shedding evolution? Is there changed Changes in virus that Vaccine trial databases protection from vaccine- affect epitopes and post-marketing induced immunity? important for vaccine failures protective immunity and sequences of viruses associated with vaccine failure Are there Sequences of viruses Severity of symptoms, Clinical disease strains/mutations from patients with ICU, ventilation, mortality, length of PREPUBLICATION COPY: UNCORRECTED PROOFS 3
associated with changes different disease hospitalization, co- in disease severity? severity infections Are there Sequences of viruses RT-PCR data to strains/mutations that from patients with measure viral load of affect virus loads or viral load data respiratory secretions, clearance? blood, feces over time Are there strains/ Sequences of viruses Treatment type, mutations that affect from before and after duration and outcome response to different treatment treatments? Are there strains/ Sequences of viruses Clinical data on mutations that are from different body complications related to associated with response sites and patients different organ systems to different treatments? with and without (e.g., kidney, liver, specific nervous system) complications Are there strains/ Sequences of viruses Clinical data over time mutations that from children in the on immune response, predispose to same community/ viral load, treatment multisystem family with and and response inflammatory syndrome without MIS-C in children (MIS-C)? a The committee recognizes that clinical and epidemiological data often come from very different data collection sources and efforts, but for the purposes of this table these data needs have been incorporated into one column. NOTE: ICU = Intensive Care Unit; MIS-C = Multisystem Inflammatory Syndrome in Children; RT-PCR = Reverse Transcription Polymerase Chain Reaction; R0 = Basic Reproduction Number SOURCE: Committee In order to answer the questions outlined in Table S-1, development of data integration will be crucial. Currently, no central repository exists for the collection and curation of infectious disease outbreak data from multiple sources such as federal, state, and local health agencies, health care networks, and public health and clinical laboratories. In order to create a more integrated data system, insights can be gleaned from existing efforts to integrate data. Leveraging and expanding existing infrastructure and planningâthrough programs such as N3Câwill be crucial to addressing the data infrastructure challenge in a way that is both strategic, innovative, and iterative. RECOMMENDATION 2. The U.S. Department of Health and Human Services should develop and invest in a national data infrastructure system that constructively builds on existing programmatic infrastructure with the ability to accurately, efficiently, and safely link genomic data, clinical data, epidemiological data, and other relevant data across multiple sources critical to a public health response such as the current SARS-CoV-2 outbreak. Such a system should: â¢ Allow for the linkage of genomic data, clinical data, epidemiological data, and other relevant data in a way that is not overly burdensome to laboratories that collect data regularly. PREPUBLICATION COPY: UNCORRECTED PROOFS 4
â¢ Create and foster safe data sharing practices to ensure that individualsâ personal identifying information remains unexposed when data are being used and shared across the system. â¢ Be grounded in the pursuit of standardization, interoperability, flexibility, and the practical linkage of data, including consideration of a potential national patient identifier. â¢ Consider not only the data required to create such a system, but also investment in mechanisms supporting the collection and analysis of such data, including promoting formal education in âdata wranglingâ at the intersection of data science and infectious disease epidemiology. â¢ Conduct regular annual reviewsâincluding scenario-based simulationsâto identify capacity gaps, promote process improvement (based on existing U.S. infrastructure to assess the annual risk of seasonal influenza, work could improve usability and coverage of health information exchanges, and other initiatives) and ensure inclusion of entities with supporting functions across scalesâincluding private health care systems that provide data or state and local public health laboratories that collect dataâin ongoing system development and evaluation. GOVERNANCE AND LEADERSHIP In the United States, federal or state laws do not protect or mandate sharing of samples of viral sequence data. As such, any sharing of such data and samples is done voluntarily and generally without concerns about possible regulatory barriers. Conversely, federal and state laws protect clinical and epidemiological data, including through the Health Insurance Portability and Accountability Act (HIPAA) and the Common Rule at the federal level. The sharing of viral sequence data and associated information should be guided by national level leadership to create supportive legal or strategic frameworks that instill principles of good governance. These data sharing and reporting processes should be clearly established and resourced as an urgent matter, and prior to an emergency. Without a clear and urgent public health rationale, changing reporting processes during an emergency should be avoided, and emergencies should not justify not complying with principles of good governance, including data transparency. Principles and elements of good governance include accountability processes which clarify authorities and responsibilities, as well as maintenance of transparency, equity, participation, and clear and certain legal protections for public health agencies, researchers, and individualâs rights. RECOMMENDATION 3. The U.S. Department of Health and Human Services should establish an effective and sustainable science-driven leadership and governance structure for the use of SARS-CoV-2 genome sequences in addressing critical national public health and basic science issues, develop a national strategy, and ensure the funding needed for successful execution of the strategy. â¢ Leaders of this effort must have sufficient authorities and responsibilities to ensure that key issues are identified and prioritized, representative data are generated, and barriers to data sharing are diminished. PREPUBLICATION COPY: UNCORRECTED PROOFS 5
â¢ A national strategy for SARS-CoV-2 genome sequences linked to clinical and epidemiological data should be developed that articulates goals, priorities, and a path for achieving them. â¢ A board with diverse relevant expertise should be established with broad authority to oversee and advise the national strategy for SARS-CoV-2 genome sequences linked to clinical and epidemiological data, and the delivery of actionable data for related investigations. PREPUBLICATION COPY: UNCORRECTED PROOFS 6