National Academies Press: OpenBook

Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies (2020)

Chapter: 4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data

« Previous: 3 Current Genomic Epidemiology Efforts Related to SARS-CoV-2
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 39
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 40
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 41
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 42
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 43
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 44
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 45
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 46
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 47
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 48
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 49
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 50
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 51
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 52
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 53
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 54
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 55
Suggested Citation:"4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data." National Academies of Sciences, Engineering, and Medicine. 2020. Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies. Washington, DC: The National Academies Press. doi: 10.17226/25879.
×
Page 56

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

4 Framework to Track and Correlate Viral Genome Sequences with Clinical and Epidemiological Data To inform public health analysis of an infectious disease outbreak, the genomic sequence of the pathogen obtained from an infected person must be accurate and be linked with sufficient metadata for context. In this chapter, the committee lays out a framework to describe the types of clinical and epidemiological data that need to be linked to viral genome sequence data to answer specific questions related to transmission, evolution, treatment, and prevention of SARS-CoV-2, and in the future, new emerging epidemic or pandemic pathogens. It concludes with a discussion about the data integration and infrastructure considerations for a system to track and correlate genomic, clinical, and epidemiological data. Demographic factors, such as age or occupation, are also important components to understand disease transmission and data collection needs for specific populations. CONSIDERATIONS FOR TRANSMISSION, EVOLUTION, AND CLINICAL DISEASE Overarching Data Collection Considerations Acquisition of genomic data is one piece (see Recommendation 1 in Chapter 3) but will also be reliant on clinical and epidemiological data to understand the evolution of SARS-CoV-2 and the implications for transmission and clinical manifestations. The collection of clinical data is exceedingly important but also one of the biggest hurdles. Temporal and geographic information (date and location of specimen collection) are essential for assessing spread of the pathogen in time and space throughout the epidemic, establishing transmission chains, developing predictions, and identifying clusters of similar sequences as an indication of a super-spreading event, for example. Similarly, any recent travel to places, gatherings, or events that might currently or subsequently be recognized as areas of high disease activity is fundamentally important for mitigation. Residence in a long-term care facility, recent (especially inpatient) clinical encounters, and close contact with a person known to have COVID-19 would be key variables for downstream use. Comorbid disease, immunosuppression, and disease severity may reveal associations with viral evolution that would otherwise be undecipherable. Preceding receipt of antiviral treatment, episode(s) of COVID-19, and any prior SARS-CoV-2 vaccination are likely to become increasingly relevant in the future, to contextualize SARS-CoV-2 evolution in response to selective pressure such as escape from antiviral responses. PREPUBLICATION COPY: UNCORRECTED PROOFS 39

A critical overarching consideration will be in ensuring representation through participatory parties (Gould et al., 2017). To ensure that epidemiological sampling is representative of populations at risk, basic demographics should be linkable with the genomic sequence. A mixture of public health, health care, tribal leaders, bioethics, community health leaders, and those working in genomic epidemiology would be beneficial to help determine how best to represent all critical parties. In fact, it may be helpful to establish a proactive “push” team that helps resource-challenged areas—such as tribal territories and critical access hospitals— ensure they are afforded representation. Adequate representation should go beyond geographical considerations, and should also include gender, race, ethnicity, living situation, and occupation. Table 4-1 briefly outlines how viral genome sequence data, when combined with other types of data, can be used to inform questions related to transmission, evolution, and clinical disease. TABLE 4-1 Summary Table of Considerations for Transmission, Evolution, and Clinical Disease Clinical and/or Viral Genomic Epidemiological Data Goal Question Sequence Data Needs Needsa Is outbreak due to Pathogen samples Time and place of virus multiple introductions? from individuals who isolation and travel Where is the virus represent broad history of cases coming from? diversity from outbreaks and many regions/countries Is outbreak due to local Sequences from local Local population-based spread? How and/or groups/areas with information on sites of Transmission where is the virus being increased incidence exposure, gatherings, patterns transmitted? rates isolated communities and congregate living (long-term care facilities, hospitals, prisons) Is there evidence of Sequences of virus Information on sites of super-spreading events from groups of exposure, gatherings and how important are people infected in the they? same setting Is the virus changing in Changes in viral Calculations of R0 transmissibility? genome sequence (contact tracing data – associated with number of people increased spread infected) Is resistance to antiviral Changes in viral Hospital or health care Evolution/influence drugs or other treatments genome associated center data on patients of selective changing? with failure to who do not respond to pressures respond to treatment therapy or show failure of treatment Is there altered escape Changes in viral Hospital data on from the host immune genome associated patients who show with persistence prolonged shedding PREPUBLICATION COPY: UNCORRECTED PROOFS 40

response/within host evolution? Is there changed Changes in virus that Vaccine trial databases protection from vaccine- affect epitopes and post-marketing induced immunity? important for vaccine failures protective immunity and sequences of viruses associated with vaccine failure Are there Sequences of viruses Severity of symptoms, strains/mutations from patients with ICU, ventilation, associated with changes different disease mortality, length of in disease severity? severity hospitalization, co- infections Are there Sequences of viruses RT-PCR data to strains/mutations that from patients with measure viral load of affect virus loads or viral load data respiratory secretions, clearance? blood, feces over time Are there strains/ Sequences of viruses Treatment type, mutations that affect from before and after duration and outcome response to different treatment Clinical disease treatments? Are there strains/ Sequences of viruses Clinical data on mutations that are from different body complications related to associated with response sites and patients different organ systems to different treatments? with and without (e.g., kidney, liver, specific nervous system) complications Are there strains/ Sequences of viruses Clinical data over time mutations that from children in the on immune response, predispose to same community/ viral load, treatment multisystem family with and and response inflammatory syndrome without MIS-C in children (MIS-C)? a The committee recognizes that clinical and epidemiological data often come from very different data collection sources and efforts, but for the purposes of this table these data needs have been incorporated into one column. NOTE: ICU = Intensive Care Unit; MIS-C = Multisystem Inflammatory Syndrome in Children; RT-PCR = Reverse Transcription Polymerase Chain Reaction; R0 = Basic Reproductive Number SOURCE: Committee Transmission Data on viral genomic sequences can answer questions related to the source(s) of the virus causing an outbreak. The “simplest use of genomic data” is used to show how viral spread happens when combined with phylogeographic approaches where it can be used to detect transmission hot spots and help direct interventions (Holmes et al., 2016). For the current situation with SARS-CoV-2 such data can determine how the virus is spreading between PREPUBLICATION COPY: UNCORRECTED PROOFS 41

individuals and within a community. Once a vaccine is available, these data can determine whether new cases are due to virus importation or to local spread. For instance, genomic epidemiology with knowledge of viral sequences from different regions is regularly used to determine whether cases of measles virus infection are due to introduction from countries with continued endemic measles or to chains of transmission within the community due to inadequate population immunity (Harvala et al., 2015; Penedos et al., 2015). Where Is the Virus Coming From? As described in Chapter 3, sequencing 87 SARS-CoV-2 genomes from infected patients early in the spread of COVID-19 in New York City demonstrated multiple independent introductions of dominant strains circulating in Europe followed by undetected local transmissions (Gonzalez-Reiche et al., 2020). In a hypothetical scenario, the reader should imagine the first group of college students arriving to a college campus in August 2020. If cases of COVID-19 begin to be detected in the days and weeks that follow, administrators and health care providers will need to respond in near real time. Important to their mitigation strategy will be distinguishing multiple independent introductions from local transmission. To understand what proportion of students came to campus carrying SARS-CoV-2 strains from their home regions will require national and international baseline data. Moreover, to know which events to discourage, students will need to provide accurate data of their activities and contacts—many of whom they will not know. Genomic data linked to time, place, and exposure history will help to cluster cases, delineate local transmissions, and illuminate which epidemiological links need not be investigated further. Where Is the Virus Being Transmitted? Of particular epidemiological importance for SARS-CoV-2 is identification of route of transmission, asymptomatic spread, and super-spreading events. Virus sequence data can help identify transmission via different pathways, both expected and unexpected (Holmes et al., 2017). For instance, SARS-CoV-2 RNA is frequently found in stool samples as well as respiratory secretions with more persistent shedding from the gastrointestinal tract (Xu et al., 2020). Viral RNA in stool and as aerosols in the toilet areas of communal living facilities (Liu et al., 2020) may or may not represent infectious virus (Wölfel et al., 2020). Identification of fecal- oral transmission will require epidemiological information on exposures linked.to virus sequence information and could have a substantial effect on public health interventions. Likewise, knowledge of transmission from sites of virus persistence (particularly semen as now recognized for Zika and Ebola viruses) provides opportunities for late transmission to reignite outbreaks after apparent control which affect public health interventions. Super-spreading events and identification of the settings where they occur are of particular epidemiological importance. These events can only be identified with viral sequence data from multiple individuals involved in an outbreak linked to information on participant activities, such as religious services, sporting events, or concerts (Holmes et al., 2016). PREPUBLICATION COPY: UNCORRECTED PROOFS 42

Evolution and Influence of Selective Pressures To better understand the evolution of SARS-CoV-2 in the United States or elsewhere, it would be ideal to integrate patient clinical data and genomic sequence data, with representation of both abundant and rare viral genotypes, representative of geographic, gender, racial, ethnic, etc. demographics. Of course, the difficulty of this goal is the challenge in obtaining such data, given the current lack of an efficient and reliable network to connect data drawn from local regions across the United States. Thus, it remains challenging to elucidate how the virus is currently evolving, which suggests poor ability to predict its future potential for evolution in the face of ongoing and novel selection pressures, such as vaccine development. A brief comparison of the evolution patterns of SARS-CoV, Middle East respiratory syndrome (MERS)-CoV, and SARS-CoV-2 reveals interesting similarities and differences. Although SARS-CoV-2 studies suggest an emergence event involving single lineage, it is clear that multiple introductions of SARS-CoV and MERS-CoV occurred early in the expanding epidemic (Liya et al., 2020). During the SARS-CoV epidemic, distinct mutations in the receptor- binding domain were critically associated with the emergence of middle- and late-phase isolates that spread geographically, but transiently throughout the world (Hu et al., 2017). Other interesting differences include the high transmissibility of SARS-CoV-2, prior to disease symptom onset, while both SARS-CoV and MERS-CoV are primarily transmitted after clinical disease onset. Mortality rates between the three emerging coronaviruses are estimated at 1, 10, and 35 percent for SARS-CoV-2, SARS-CoV and MERS-CoV, respectively. While asymptomatic infections were and are rare in the 2003 SARS-CoV epidemic and the ongoing MERS-CoV outbreak, asymptomatic infections are common in SARS-CoV-2 infections, recently estimated to represent 40-50 percent of all cases (Feaster and Goh, 2020). What are the genetic differences between SARS-CoV and SARS-CoV-2 that regulate these fundamental differences in transmissibility, virulence and pathogenesis? Could highly virulent, highly transmissible coronavirus strains emerge from zoonotic sources or during an expanding epidemic or pandemic? How does virulence evolve after a zoonotic emergence event? What is the relationship between the evolution of virulence and coronavirus transmissibility? Using model organisms, the evolutionary relationships between virulence and transmissibility are thought to be complex traits and include examples of synergistic and antagonistic relationships (Geoghegan and Holmes, 2018). Given the large diversity of novel coronaviruses harbored in bats and other animals, it is therefore conceivable that many worse highly transmissible and highly virulent zoonotic coronaviruses may exist in nature that threaten human populations in the future. Consequently, fundamental insights into the evolutionary trade-offs and genetic relationships between SARS-CoV-2 evolution, virulence and transmissibility may better inform global preparedness efforts, designed to minimize the impact of consequential coronavirus disease outbreaks of the future (Messenger et al., 1999). For example, if the mutation rate (replication fidelity) changes such that more allele substitutions occur per round of genome replication, it would indicate that greater variation and adaptive potential is available to the virus as raw fuel for evolution by natural selection (Duffy et al., 2008; Elena and Sanjuán, 2007). In turn, this scenario could lead to adaptive change whereby the major virus variants become either more or less dangerous (virulent), such that increased or decreased mortality risk becomes associated with COVID-19. Thus, evolution of a higher mutation rate in the virus may not be necessarily problematic from a clinical or public health perspective, because viral adaptation may coincide with greater or lesser host morbidity and PREPUBLICATION COPY: UNCORRECTED PROOFS 43

mortality. The adaptive potential of any biological system relies on a positive correlation between increased mutation rate and a larger number of useful (beneficial) mutations occurring in the population (Orr, 2000). That is, mutation rate can create more changes per unit time, but there is no guarantee that this will also create a larger fraction of beneficial mutations, because the latter is determined by how well spontaneous mutations provide an adaptive match to the selective challenges faced by the population. Nevertheless, even if the mutation rate of SARS- CoV-2 remains unchanged, the short generation times of the virus—coupled with the very large number of infected human hosts—create ample opportunity for rare spontaneous mutations to arise and spread over short periods of time, indicating enormous virus evolutionary potential. Is Virus Transmissibility Changing? To date, about a dozen mutations in the gene encoding the spike protein are accumulating and being evaluated for positive selection. A prominent D614G mutation identified both in China and Europe in January 2020 is expanding in geographic range and frequency across the world (Korber et al., 2020). The mutation is located on the interface between spike protomers where it may alter stability that enhances infectivity. Identification of this mutation is leading to more detailed studies aimed at unraveling the importance of this mutation in the biology of SARS- CoV-2 and its relationship to other mutations in the genome that may contribute to the selective sweep of this genotype across the globe. In addition to the in vitro data on competitive cell entry and growth rates that have been released recently (Grubaugh et al., 2020; Hu et al., 2020; Korber et al., 2020; Zhang et al., 2020), it will be important to examine the time course and natural history of mutant and isogenic parental strain experimental infections in relevant whole animal models, as well as in naturally infected humans, and households. In addition, several other spike mutations have been observed in smaller clusters of cases. However, none have risen to global prominence. For example, signal peptide mutations (L5F and L8V) could potentially affect posttranslational modifications, folding, abundance, and glycosylation, while residue changes V367F, G476S, and V483A are found within the RBD domain. However, only G476S is located at the RBD binding interface. The functional significance of these mutations in mammalian angiotensin-converting enzyme 2 (ACE2) interaction networks (primate and animal) remain unknown and warrant additional study. Finally, several other mutations occur in regions of unknown function (H49Y, Y145H/del, Q239K) and appear to be diminishing, or are remaining stable in the population and located in and about the fusion machinery (A831V and D839Y/N/E) or c-terminal end (P1263L). Other mutations have been recorded in ORF1ab and ORF8 regions, although their functional significance remains unknown (Chang et al., 2020). Is the Virus Evolving in Response to Selective Pressures? Many selective pressures could lead to evolved changes in viral traits that improve the success and spread of SARS-CoV-2 infection. For instance, increased virus particle stability in aerosols or on surfaces could promote transmission opportunities (van Doremalen et al., 2020) and methods exist to interrogate how spontaneous mutations can improve virus stability against environmental degradation (Ogbunugafor et al., 2013). Relatedly, increased time between transmission events should select for infectious viruses with increased particle stability, although evolution of increased particle stability tends to trade-off with rate of genome replication, such that more-durable viruses suffer slower reproduction (Goldhill and Turner, 2014). However, it is PREPUBLICATION COPY: UNCORRECTED PROOFS 44

unclear whether the stability–reproduction trade-off would reduce viral load in an infected human as examined in influenza virus (Handel et al., 2014), a highly relevant clinical concern for COVID-19. Therefore, prolonged social distancing could select for SARS-CoV-2 variants with increased particle stability that may or may not affect viral load during infection. If genomic epidemiology studies point to virus evolution at loci that affect SARS-CoV-2 particle stability, it would provide motivation to closely study whether clinical symptoms in infected patients are changing as well. Additional selective pressures (e.g., antiviral and immune modulating treatments and eventually vaccination) are being introduced with unknown consequences for virus evolution. Antibody responses and drug activities are dependent on specific regions of the viral proteins. For instance, the drug remdesivir requires certain regions of the viral RdRp and ExoN (Agostini et al., 2018) and the receptor-binding domain of the spike protein is the main target for neutralizing antibody and most vaccines in development (Premkumar et al., 2020). Mutation in these proteins could affect treatment and vaccine efficacy. More recently, host susceptibility loc on human chromosome 3 and 9 may be unevenly distributed globally, potentially selecting for new variants of SARS-CoV-2 that interacts specifically well with this human genotype (Ellinghaus et al., 2020). Identification of these viral changes and their importance requires linkage of patient metadata to genome sequencing, to determine response to treatments and identify instances of vaccine failure. It will be critical to the control of this pandemic to recognize both types of evolution in real time to be able to institute corrective action. Together, these data reveal a critical need for reverse genetic strategies and well- developed models to investigate the role of these mutations in SARS-CoV-2 biology and disease. Moreover, these data reveal the critical need for linkage of patient metadata to genome sequencing, without which definitive causal epidemiological associations between genotype and phenotype cannot be determined. Clinical Disease There has been relatively little sequence variability among human SARS-CoV-2 isolates so far, so compelling associations between sequence variant/mutations and specific clinical outcomes or features have not yet been identified. Nonetheless, identification of virus strains with different clinical features would provide insights into disease pathogenesis and potentially identify patients requiring specific interventions. Linking virus sequence data with data on patient demographics, hospitalization, duration of hospitalization, clinical complications, intensive care unit (ICU) stay, co-infections, ventilation/duration, use of extracorporeal membrane oxygenation (ECMO) (MacLaren et al., 2020), duration of positive RT-PCR tests for SARS-CoV-2 RNA, and exposure/response to experimental treatments (e.g. remdesivir, convalescent plasma, dexamethasone, immune modulators) would facilitate identification of: • Strains/mutations associated with changes in disease severity, viral loads and viral shedding periods • Strains/mutations associated with more co-infections or response to certain medical interventions, such as convalescent plasma • Strains/mutations associated with specific complications, e.g., neurologic manifestations (Wood, 2020), vascular complications (Ackermann et al., 2020; Lang PREPUBLICATION COPY: UNCORRECTED PROOFS 45

et al., 2020), hypercoagulable states, gastrointestinal manifestations (Ong et al., 2020), or fecal shedding o A question that underlies all of these organ system complications: Are there SARS-CoV-2 mutations that affect organ tropism in humans? • Strains or mutations associated with the pediatric multisystem inflammatory syndrome (Feldstein et al., 2020) OPPORTUNITIES TO SUPPORT DATA INTEGRATION An essential component of all of this involves the timely integration of data. Information flows comprising viral genomic data, as well as associated clinical and epidemiological aspects will be useless unless there are proactive efforts to distill the data into the most useful components and then to organize the data into an integrated format that can be used by researchers, health care providers, public health practitioners, and policy makers. Given the potential large scale of the universe of data that might be available, it will be important to determine what types of information are thought to be the most relevant. Given the uncertainty as to how the current or future pandemics might progress, however, it will also be important to build flexibility and expansion capability in the resultant data management system in order to accommodate additional sources and types of data. A national system for integrating genomic, clinical, and epidemiological data collected during an infectious disease outbreak would receive large volumes of data coming in from multiple sources, including federal agencies, state and local public health agencies, health care networks, and clinical laboratories. Currently, no central repository exists for these different types of data and the entities contributing the information do not have dedicated staff to curate the data. Efforts to build an infrastructure to facilitate integration will likely face multiple challenges related to coordination, interoperability, flexibility, and privacy. For instance, interoperability may be a challenge because incoming data will likely be shared in a range of different formats. Constraints across existing databases, such as differences in the content fields for inputting information, can preclude the ability to record and share the full range of relevant data. Laws and regulations that govern data sharing and privacy—as well as their local-level interpretations—can also potentially impact the data in variable ways, depending on the data source, relevant regulatory or legal restrictions, and concerns related to protected health information. Regulatory and governance considerations are discussed further in Chapter 5. In terms of encouraging participation, tying participation into existing Medicare and Medicaid financial incentives—similar to the efforts of the BioSense Platform (Gould et al., 2017)—can ensure a wide group of participants, including those in ambulatory settings. This effort would link into hospital data and help answer several important clinical questions. Hospital grants could also be made available for data agreements. With competing data reporting requirements, and the fact that a new system may create new expectations for laboratories that would normally dispose of samples, data collection must find a middle ground with a return on the investment. Providing real-time data pushes to participating parties, such as infection prevention programs or hospitals, is also an important consideration. A prime opportunity to address these barriers is to develop an agreed-upon standard data packages for submission to the system. Importantly, these packages should allow for some degree of variation for different sources and types of data. For example, a hospital laboratory PREPUBLICATION COPY: UNCORRECTED PROOFS 46

would submit a comprehensive package of clinical and diagnostic data, while a commercial clinical laboratory would submit a larger population-based data package lacking clinical details. State and local public health agencies would likely have a variety of data types. To support this work, scoping of those data packages should be factored into any analytical plans. An actual data repository, along with the requisite support and analytical staff, also needs to be established. A key component of building this repository is to establish countrywide reporting relationships across all levels to ensure that comprehensive data are being submitted. This data repository should be flexible—for example, it should ensure ease of adding new data types and fields—as well as accessible for advanced analytical methods, such as machine learning and artificial intelligence analyses to inform disease and epidemiology models. INFRASTRUCTURE NEEDS Most previous efforts to integrate genomic, clinical, and epidemiological data in response to viral or microbial outbreaks have been conducted on a small scale. To optimize the application of integrated data to inform the response to SARS-CoV-2 and future outbreaks, these efforts will need to be scaled up to nationwide infrastructure through which data can be shared and reported. A primary role of the U.S. Centers for Disease Control and Prevention (CDC) is epidemiological surveillance. The agency has links to each state-level health department as well as a global network of other national agencies, in which CDC serves as the country’s representative in international cooperation to fight emerging infectious diseases. The fields of clinical microbiology and epidemiology have now largely embraced genomic sequencing. Although there have been successful efforts in applying genomic epidemiology to influenza and outbreaks of foodborne bacteria (see the case studies in Chapter 2), CDC has lagged behind in incorporating genomics to its full potential. CDC is responsible for funding public health laboratories nationwide to facilitate the integration of data; however, most of those laboratories remain substantially under-resourced. To enable larger-scale collaboration and coordination of data in a national system, insights can be gleaned from the innovative elements and constraints of CDC’s ongoing efforts and from other existing regional networks of data integration. CDC’s PulseNet,34 established in 1996, allows members of the network to compare whole-genome sequencing of bacterial DNA to help detect and mitigate foodborne outbreaks. The National Action Plan for Combating Antibiotic-Resistant Bacteria35 (CARB), a national strategy (PCAST, 2014) to track antibiotic- resistant bacteria, led to the establishment of CDC’s Antibiotic Resistance Laboratory Network.36 The network strengthens national laboratory capacity to rapidly perform genomic epidemiological studies, as well as providing a mechanism for coordination and reporting. This served as the impetus for the coordination of all reporting across New York State that was leveraged for SARS-CoV-2. 34 See https://www.cdc.gov/pulsenet/index.html. 35 See https://aspe.hhs.gov/pdf-report/national-action-plan-combating-antibiotic-resistant-bacteria- progress-report-year-3. 36 See https://www.cdc.gov/drugresistance/solutions-initiative/ar-lab-network.html. PREPUBLICATION COPY: UNCORRECTED PROOFS 47

Enclave Model in the National COVID Cohort Collaborative (NC3) to Enable Linkage of Detailed Clinical Metadata The National COVID Cohort Collaborative (N3C) embodies a massive, scalable collection of medical record data from people infected with SARS-CoV-2 in a centralized, secure enclave (see Chapter 3).37 N3C uses a project-specific hashed identifier constructed using data security standards to support linking data from disparate sources without revealing the personal identifiers used to generate the hashed ID (N3C, 2020). To support linking SARS-CoV-2 genome sequences to clinical metadata in N3C, viral genome sequences, or links (e.g. accession numbers) to their records in GenBank or the Global Initiative on Sharing All Influenza Data (GISAID), would need to be deposited into N3C. N3C is expected to contain data from 2-3 million people with confirmed SARS-CoV-2 infection by the end of 2020, and is designed with potential to accommodate data from the US population. Data in N3C are converted to the OMOP standard (version 5.3.1, currently) after ingestion, mapping, and harmonization from multiple supported data standards. Because data accessible in the N3C are a limited data set under terms preventing re-identification, important epidemiological activities such as contact tracing are not supported; nonetheless, inclusion of SARS-CoV-2 genomic data into N3C would represent a clinically phenotyped collection of viral genomic sequences that could scale to the U.S. population. Using Influenza Infrastructure to Integrate SARS-CoV-2 Data Linking genomic data for SARS-CoV-2 with clinical and epidemiological data might be possible by utilizing pre-existing systems for tracking changes in the genomic structure of the influenza virus. CDC collaborates with many partners in state, local, and territorial health departments and laboratories, offices of vital statistics, health care providers, clinics, and emergency departments to monitor influenza on an annual basis (CDC, 2020). The U.S. influenza surveillance system is designed to find out when and where influenza activity is occurring; determine what influenza viruses are circulating; detect changes in influenza viruses; and measure the impact influenza is having on outpatient illness, hospitalizations and deaths (CDC, 2020). These goals are in line with what the committee proposes for the use of genomic data on SARS-CoV-2. Approximately 100 public health and 300 clinical laboratories in all 50 states, Puerto Rico, Guam, and the District of Columbia participate in surveillance for influenza viruses through either the U.S.–World Health Organization (WHO) Collaborating Laboratories System or through the National Respiratory and Enteric Virus Surveillance System (NREVSS). Data from clinical laboratories provide useful information on the timing and intensity of influenza activity from respiratory specimens largely obtained for diagnostic purposes. Public health laboratories provide data useful to understand what influenza virus types, subtypes, and lineages are circulating and the age groups being affected, as test specimens are collected primarily for the purposes of surveillance. For genetic characterization, all influenza-positive surveillance samples are submitted for genomic sequencing by CDC to determine the genetic characteristics of circulating influenza viruses and to monitor the course of evolution of viruses 37 See https://ncats.nih.gov/news/releases/2020/NIH-launches-analytics-platform-to-harness-nationwide- COVID-19-patient-data-to-speed-treatments. PREPUBLICATION COPY: UNCORRECTED PROOFS 48

circulating in the population under surveillance. Phylogenetic analysis classifies virus gene segments into genetic clades or subclades. CDC also tests a sample of the influenza viruses collected by public health laboratories for susceptibility to antiviral, such as neuraminidase inhibitors using genomic sequencing analysis and/or a functional assay. The U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet) collects information on outpatient visits to health care providers in all 50 states, Puerto Rico, the District of Columbia and the U.S. Virgin Islands for influenza-like illness. More than 2,500 outpatient health care providers around the country report data to CDC every week recording the total number of patients seen, including specifically the number of those patients with influenza-like illness (ILI) by age group (0-4 years, 5-24 years, 25-49 years, 50-64 years, and ≥65 years). The Influenza Hospitalization Surveillance Network (FluSurv-NET) monitors laboratory confirmed influenza-associated hospitalizations in children younger than 18 years of age (since the 2003-2004 influenza season) and adults (since the 2005-2006 influenza season). High-risk medical conditions are extracted from patient medical charts at the time of hospitalization, including cardiovascular disease, chronic lung disease, immunocompromised condition, obesity and pregnancy status which match similar underlying conditions of interest in patients with COVID-19. Health Information Exchanges In the wake of the Health Information Technology for Economic and Clinical Health Act of 2009 (HITECH)38 and continuing financial incentives from the Centers for Medicare and Medicaid Services, there is widespread adoption of electronic medical records systems by hospitals and physician practices (CDC, 2019). In addition, health information exchanges, built largely to facilitate the exchange of digital health information for clinical treatment purposes, exist across the country; according to one survey, 7 out of 10 hospitals in the United States belong to at least one nationwide health data sharing network (Johnson et al., 2018). In the 21st Century Cures Act, Congress required the U.S. Department of Health and Human Services (HHS) to establish a voluntary network to facilitate nationwide digital sharing of electronic health information, and a public–private partnership has been launched, led by the Sequoia Project, to create a national health information sharing “trusted exchange framework” pursuant to a common agreement (HealthIT.gov, 2020b). In addition, on May 1, 2020, the HHS Office of the National Coordinator for Health Information Technology finalized rules to prohibit health care providers, certified electronic medical record vendors, and health information exchanges from “blocking” the sharing of information, including for public health purposes; these rules will go into effect on November 2, 2020 (HealthIT.gov, 2020a). Although this national network is still in formation, certified electronic medical record vendors and health information exchanges across the country could be leveraged today to facilitate the sharing of clinical metadata that will help public health departments and researchers answer critical questions related to SARS-CoV-2 and COVID-19. Implementation of interoperable health records systems must be cognizant of the potential for sensitive and private personal health information to be inadvertently shared, en masse. This not only risks violating individuals’ rights to privacy and non-discrimination, but also undermines public trust and as a result, the potential accuracy of public health data gathered. Upgrading existing records systems will likely be 38 See https://www.govinfo.gov/content/pkg/PLAW-111publ5/pdf/PLAW-111publ5.pdf. PREPUBLICATION COPY: UNCORRECTED PROOFS 49

necessary to allow for options to protect privacy such as segmentation of data in a manner that achieves both the goals of sharing and irrelevant or sensitive individual personal health data (Rothstein and Tovino, 2019). While unrestricted access to shared data would incur privacy risks, the enclave model of N3C described above illustrates sharing of national-scale health data with strong privacy protections. Participatory Surveillance The growing field of participatory surveillance allows individuals to report symptoms of illness through crowd-sourced, voluntary systems that allow for community-level health monitoring (Smolinski et al., 2017). A number of participatory surveillance systems already exist worldwide. Most of these systems collect epidemiological data that are provided to public health authorities and research institution and used to analyze trends and broaden surveillance beyond the traditional, sentinel surveillance approach. For instance, participatory surveillance provides a mechanism to collect information on influenza in the community at large. Because the majority of persons with influenza each year do not seek medical care (Biggerstaff et al., 2014; Van Cauteren et al., 2012; van Noort et al., 2007), a large number of self-reporting systems collect information on influenza-like-illness (ILI). Boston Children’s Hospital’s Flu Near You39 is a self-reported ILI system that has combined laboratory testing for diagnosis of influenza or other respiratory pathogens with the epidemiological data collected by the open-source GoViral Study (Li, 2016). Individuals who report symptoms of illness compatible with influenza are provided with a home test kit and asked to collect a sputum sample for testing; the results of the test are then compared to the symptom data shared in the open-source system. Such participatory surveillance systems could potentially be expanded through the use of home test kits that would allow for genomic sequencing of pathogens in the community. Numerous community-based surveillance systems have arisen during the COVID-19 pandemic. For example, the Flu Near You system was adapted with expanded symptoms related to SARS-CoV-2 infection into the COVID Near You40 system. Other systems have arisen as longitudinal research projects that are collecting epidemiological data along with testing results for COVID-19. All of these systems offer opportunities to incorporate genomic sequencing, which could target data collection from specific subsets of the population or from people in specific geographic regions. Genomic sequencing could be added to existing systems for COVID-19 symptom reporting, tracking, and contact tracing in the United States. PARTNERSHIPS, COORDINATION, AND CAPACITY CONSIDERATIONS Fostering partnerships across laboratories in different sectors and at different levels— from state and local public health to clinical, academic, and commercial laboratories—will be critical for developing capacity and facilitating coordination necessary for national-level genomic data that can be integrated with clinical and epidemiological data. These efforts should seek to partner with a range of different laboratories to better represent the entire population. CDC’s SPHERES will likely cover a large proportion of the population, but better coverage 39 See https://flunearyou.org. 40 See https://www.covidnearyou.org/us/en-US. PREPUBLICATION COPY: UNCORRECTED PROOFS 50

could be achieved by partnering with the third-party private laboratories that are often contracted with health care systems (e.g., LabCorp or Quest Diagnostics). Hospital and clinical laboratories are also valuable sources of information, especially in rural or critical access areas. Tests are going unused in many of these settings, where facilities often lack laboratory capacity and local health systems face multiple barriers to utilizing their samples (Maxmen, 2020). In such settings, partnerships can also provide access to important data on hard-to-reach patient populations. These same partnerships should exist with academic and commercial laboratories, albeit with an awareness of capacity considerations. Data usability, capacity considerations, and key outputs—to include content and periodicity of reports—are important aspects of coordinating partnerships, particularly for smaller hospital and public health laboratories operating with limited resources. For example, close support, coordination, and capacity building are all valuable for mitigating the stress experienced by laboratories in the context of an infectious disease outbreak. Many of these laboratories may have valuable data on vulnerable patient populations that are not being reported into broader databases or larger systems due to a range of barriers, such as bureaucratic red tape, dependency on limited resources, and often outdated tools for communication and data sharing (e.g., faxing). Tying data collection and integration into hospitals’ meaningful use standards could be a beneficial approach for biosurveillance. Ultimately, however, the utilization of a low- cost approach that mitigates such barriers to collecting, analyzing, and sharing data is critical. Systems should be in place that enable hospitals to seamlessly push data to their key stakeholders, such as public health agencies, infection prevention programs, and clinical partners. Furthermore, partnerships can help to ensure these data need to be presented in a way that is beneficial to the end user. For instance, genomic data are of great value for many purposes, but clinical and epidemiological data may be relevant and applicable to patient care and public health outcomes. Workforce Capacity Development for Genomic Epidemiology An important consideration in the development of such a system is the building of genome sequencing and analysis capabilities within public health agencies and health care systems. Even with advancing technology, multidisciplinary and highly trained teams remain the most valuable asset for combining genomic, clinical, and epidemiological data into actionable knowledge (Lesho et al., 2016). Since 2016, the Broad Institute has partnered with the Massachusetts State Public Health Laboratory and CDC to build distributed capacity for genomic sequencing through a train-the-trainer program for regional- and state-level laboratory personnel. This program, for example, could serve as a model for developing national coordination among state public health laboratories. CONCLUDING REMARKS There remains no central repository to house the large volume of individually identifiable data from various actors involved in the public health response to SARS-CoV-2 in the United States, just as none existed for prior infectious disease outbreaks or for hypothetical future outbreaks. While it is important to learn from several of the smaller scale examples described in this report, building out successful elements from these success stories remains a major challenge PREPUBLICATION COPY: UNCORRECTED PROOFS 51

at the national scale. The committee recognizes that advancing beyond the current small-scale efforts to a national or even global repository is a challenging undertaking, but the current pandemic puts the lack of such a system in stark relief. Incremental efforts, such as establishing regional repositories, can be taken now and leveraged in the future for a large-scale effort. As noted above, leveraging and expanding existing infrastructure and planning—through programs such as N3C, PulseNet, CARB, ILINet, and health information exchanges—will be crucial to addressing the data infrastructure challenge in a way that is both innovative and iterative. The creation of a system of data infrastructure built on a standard data package could cultivate a more interoperable data environment, a challenge of paramount importance when principles such as flexibility and privacy remain priorities. Ultimately, a data management and infrastructure system with investment in the proper resources, staff, and storage will be critical for the coordination of data needs in response to SARS-CoV-2 and future outbreak responses. RECOMMENDATION 2. The U.S. Department of Health and Human Services should develop and invest in a national data infrastructure system that constructively builds on existing programmatic infrastructure with the ability to accurately, efficiently, and safely link genomic data, clinical data, epidemiological data, and other relevant data across multiple sources critical to a public health response such as the current SARS-CoV-2 outbreak. Such a system should: • Allow for the linkage of genomic data, clinical data, epidemiological data, and other relevant data in a way that is not overly burdensome to laboratories that collect data regularly. • Create and foster safe data sharing practices to ensure that individuals’ personal identifying information remains unexposed when data are being used and shared across the system. • Be grounded in the pursuit of standardization, interoperability, flexibility, and the practical linkage of data, including consideration of a potential national patient identifier. • Consider not only the data required to create such a system, but also investment in mechanisms supporting the collection and analysis of such data, including promoting formal education in “data wrangling” at the intersection of data science and infectious disease epidemiology. • Conduct regular annual reviews—including scenario-based simulations—to identify capacity gaps, promote process improvement (based on existing U.S. infrastructure to assess the annual risk of seasonal influenza, work could improve usability and coverage of health information exchanges, and other initiatives) and ensure inclusion of entities with supporting functions across scales—including private health care systems that provide data or state and local public health laboratories that collect data—in ongoing system development and evaluation. REFERENCES Ackermann, M., S. E. Verleden, M. Kuehnel, A. Haverich, T. Welte, F. Laenger, A. Vanstapel, C. Werlein, H. Stark, A. Tzankov, W. W. Li, V. W. Li, S. J. Mentzer, and D. Jonigk. 2020. Pulmonary PREPUBLICATION COPY: UNCORRECTED PROOFS 52

vascular endothelialitis, thrombosis, and angiogenesis in COVID-19. New England Journal of Medicine 383:120-128. Agostini, M. L., E. L. Andres, A. C. Sims, R. L. Graham, T. P. Sheahan, X. Lu, E. C. Smith, J. B. Case, J. Y. Feng, R. Jordan, A. S. Ray, T. Cihlar, D. Siegel, R. L. Mackman, M. O. Clarke, R. S. Baric, and M. R. Denison. 2018. Coronavirus susceptibility to the antiviral remdesivir (gs-5734) is mediated by the viral polymerase and the proofreading exoribonuclease. mBio 9(2):e00221-18. Biggerstaff, M., M. A. Jhung, C. Reed, A. M. Fry, L. Balluz, and L. Finelli. 2014. Influenza-like illness, the time to seek healthcare, and influenza antiviral receipt during the 2010-2011 influenza season- united states. The Journal of Infectious Diseases 210(4):535-544. CDC (U.S. Centers for Disease Control and Prevention). 2019. Public health and promoting interoperability programs: Introduction. https://www.cdc.gov/ehrmeaningfuluse/introduction.html (accessed July 7, 2020). CDC. 2020. U.S. influenza surveillance system: Purpose and methods. https://www.cdc.gov/flu/weekly/overview.htm (accessed June 24, 2020). Chang, T. J., D. M. Yang, M. L. Wang, K. H. Liang, P. H. Tsai, S. H. Chiou, T. H. Lin, and C. T. Wang. 2020. Genomic analysis and comparative multiple sequences of SARS-CoV-2. Journal of the Chinese Medical Association 83(6):537-543. Duffy, S., L. A. Shackelton, and E. C. Holmes. 2008. Rates of evolutionary change in viruses: Patterns and determinants. Nature Reviews Genetics 9(4):267-276. Elena, S. F., and R. Sanjuán. 2007. Virus evolution: Insights from an experimental approach. Annual Review of Ecology, Evolution, and Systematics 38(1):27-52. Ellinghaus, D., F. Degenhardt, L. Bujanda, M. Buti, A. Albillos, P. Invernizzi, J. Fernández, D. Prati, G. Baselli, R. Asselta, M. M. Grimsrud, C. Milani, F. Aziz, J. Kässens, S. May, M. Wendorff, L. Wienbrandt, F. Uellendahl-Werth, T. Zheng, X. Yi, R. de Pablo, A. G. Chercoles, A. Palom, A.-E. Garcia-Fernandez, F. Rodriguez-Frias, A. Zanella, A. Bandera, A. Protti, A. Aghemo, A. Lleo, A. Biondi, A. Caballero-Garralda, A. Gori, A. Tanck, A. Carreras Nolla, A. Latiano, A. L. Fracanzani, A. Peschuck, A. Julià, A. Pesenti, A. Voza, D. Jiménez, B. Mateos, B. Nafria Jimenez, C. Quereda, C. Paccapelo, C. Gassner, C. Angelini, C. Cea, A. Solier, D. Pestaña, E. Muñiz-Diaz, E. Sandoval, E. M. Paraboschi, E. Navas, F. García Sánchez, F. Ceriotti, F. Martinelli-Boneschi, F. Peyvandi, F. Blasi, L. Téllez, A. Blanco-Grau, G. Hemmrich-Stanisak, G. Grasselli, G. Costantino, G. Cardamone, G. Foti, S. Aneli, H. Kurihara, H. ElAbd, I. My, I. Galván-Femenia, J. Martín, J. Erdmann, J. Ferrusquía- Acosta, K. Garcia-Etxebarria, L. Izquierdo-Sanchez, L. R. Bettini, L. Sumoy, L. Terranova, L. Moreira, L. Santoro, L. Scudeller, F. Mesonero, L. Roade, M. C. Rühlemann, M. Schaefer, M. Carrabba, M. Riveiro-Barciela, M. E. Figuera Basso, M. G. Valsecchi, M. Hernandez-Tejero, M. Acosta-Herrera, M. D'Angiò, M. Baldini, M. Cazzaniga, M. Schulzky, M. Cecconi, M. Wittig, M. Ciccarelli, M. Rodríguez-Gandía, M. Bocciolone, M. Miozzo, N. Montano, N. Braun, N. Sacchi, N. Martínez, O. Özer, O. Palmieri, P. Faverio, P. Preatoni, P. Bonfanti, P. Omodei, P. Tentorio, P. Castro, P. M. Rodrigues, A. Blandino Ortiz, R. de Cid, R. Ferrer, R. Gualtierotti, R. Nieto, S. Goerg, S. Badalamenti, S. Marsal, G. Matullo, S. Pelusi, S. Juzenas, S. Aliberti, V. Monzani, V. Moreno, T. Wesse, T. L. Lenz, T. Pumarola, V. Rimoldi, S. Bosari, W. Albrecht, W. Peter, M. Romero-Gómez, M. D’Amato, S. Duga, J. M. Banales, J. R. Hov, T. Folseraas, L. Valenti, A. Franke, T. H. Karlsen. 2020. Genomewide association study of severe COVID-19 with respiratory failure. New England Journal of Medicine NEJMoa2020283. Feaster, M., and Y.-Y. Goh. 2020. High proportion of asymptomatic SARS-CoV-2 infections in 9 long- term care facilities, Pasadena, California, USA, April 2020. Emerging Infectious Diseases 26. Feldstein, L. R., E. B. Rose, S. M. Horwitz, J. P. Collins, M. M. Newhams, M. B. F. Son, J. W. Newburger, L. C. Kleinman, S. M. Heidemann, A. A. Martin, A. R. Singh, S. Li, K. M. Tarquinio, P. Jaggi, M. E. Oster, S. P. Zackai, J. Gillen, A. J. Ratner, R. F. Walsh, J. C. Fitzgerald, M. A. Keenaghan, H. Alharash, S. Doymaz, K. N. Clouser, J. S. Giuliano, Jr., A. Gupta, R. M. Parker, A. B. Maddux, V. Havalad, S. Ramsingh, H. Bukulmez, T. T. Bradford, L. S. Smith, M. W. Tenforde, C. L. PREPUBLICATION COPY: UNCORRECTED PROOFS 53

Carroll, B. J. Riggs, S. J. Gertz, A. Daube, A. Lansell, A. Coronado Munoz, C. V. Hobbs, K. L. Marohn, N. B. Halasa, M. M. Patel, and A. G. Randolph. 2020. Multisystem inflammatory syndrome in U.S. children and adolescents. New England Journal of Medicine 383:334-346. Geoghegan, J. L., and E. C. Holmes. 2018. The phylogenomics of evolving virus virulence. Nature Reviews Genetics 19(12):756-769. Goldhill, D. H., and P. E. Turner. 2014. The evolution of life history trade-offs in viruses. Current Opinion in Virology 8:79-84. Gonzalez-Reiche, A. S., M. M. Hernandez, M. J. Sullivan, B. Ciferri, H. Alshammary, A. Obla, S. Fabre, G. Kleiner, J. Polanco, Z. Khan, B. Alburquerque, A. van de Guchte, J. Dutta, N. Francoeur, B. S. Melo, I. Oussenko, G. Deikus, J. Soto, S. H. Sridhar, Y.-C. Wang, K. Twyman, A. Kasarskis, D. R. Altman, M. Smith, R. Sebra, J. Aberg, F. Krammer, A. García-Sastre, M. Luksza, G. Patel, A. Paniz- Mondolfi, M. Gitman, E. M. Sordillo, V. Simon, and H. van Bakel. 2020. Introductions and early spread of SARS-CoV-2 in the New York City area. Science eabc1917. Gould, D. W., D. Walker, and P. W. Yoon. 2017. The evolution of biosense: Lessons learned and future directions. Public Health Reports (Washington, DC: 1974) 132(1 Suppl):7S-11S. Grubaugh, N. D., W. P. Hanage, and A. L. Rasmussen. 2020. Making sense of mutation: What d614g means for the COVID-19 pandemic remains unclear. Cell. Handel, A., C. Lebarbenchon, D. Stallknecht, and P. Rohani. 2014. Trade-offs between and within scales: Environmental persistence and within-host fitness of avian influenza viruses. Proceedings of the Royal Society B: Biological Sciences 281(1787). Harvala, H., Å. Wiman, A. Wallensten, K. Zakikhany, H. Englund, and M. Brytting. 2015. Role of sequencing the measles virus hemagglutinin gene and hypervariable region in the measles outbreak investigations in Sweden during 2013–2014. The Journal of Infectious Diseases 213(4):592-599. HealthIT.gov. 2020a. Information blocking. https://www.healthit.gov/topic/information-blocking (accessed July 7, 2020). HealthIT.gov. 2020b. Trusted exchange framework and common agreement. https://www.healthit.gov/topic/interoperability/trusted-exchange-framework-and-common-agreement (accessed July 7, 2020). Holmes, E. C., G. Dudas, A. Rambaut, and K. G. Andersen. 2016. The evolution of Ebola virus: Insights from the 2013-2016 epidemic. Nature 538(7624):193-200. Hu, B., L.-P. Zeng, X.-L. Yang, X.-Y. Ge, W. Zhang, B. Li, J.-Z. Xie, X.-R. Shen, Y.-Z. Zhang, N. Wang, D.-S. Luo, X.-S. Zheng, M.-N. Wang, P. Daszak, L.-F. Wang, J. Cui, and Z.-L. Shi. 2017. Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. PLoS Pathogens 13(11):e1006698. Hu, J., C.-L. He, Q.-Z. Gao, G.-J. Zhang, X.-X. Cao, Q.-X. Long, H.-J. Deng, L.-Y. Huang, J. Chen, K. Wang, N. Tang, and A.-L. Huang. 2020. The d614g mutation of SARS-CoV-2 spike protein enhances viral infectivity and decreases neutralization sensitivity to individual convalescent sera. bioRxiv 2020.2006.2020.161323. Johnson, C., Y. Pylypchuk, and V. Patel. 2018. Methods used to enable interoperability among U.S. non- federal acute care hospitals in 2017. The Office of the National Coordinator for Health Information Technology. Korber, B., W. M. Fischer, S. Gnanakaran, H. Yoon, J. Theiler, W. Abfalterer, N. Hengartner, E. E. Giorgi, T. Bhattacharya, B. Foley, K. M. Hastie, M. D. Parker, D. G. Partridge, C. M. Evans, T. M. Freeman, T. I. de Silva, C. McDanal, L. G. Perez, H. Tang, A. Moon-Walker, S. P. Whelan, C. C. LaBranche, E. O. Saphire, D. C. Montefiori, A. Angyal, R. L. Brown, L. Carrilero, L. R. Green, D. C. Groves, K. J. Johnson, A. J. Keeley, B. B. Lindsey, P. J. Parsons, M. Raza, S. Rowland-Jones, N. Smith, R. M. Tucker, D. Wang, and M. D. Wyles. 2020. Tracking changes in SARS-CoV-2 spike: Evidence that d614g increases infectivity of the COVID-19 virus. Cell. PREPUBLICATION COPY: UNCORRECTED PROOFS 54

Lang, M., A. Som, D. P. Mendoza, E. J. Flores, N. Reid, D. Carey, M. D. Li, A. Witkin, J. M. Rodriguez- Lopez, J. O. Shepard, and B. P. Little. 2020. Hypoxaemia related to COVID-19: Vascular and perfusion abnormalities on dual-energy ct. The Lancet Infectious Diseases. Lesho, E., R. Clifford, F. Onmus-Leone, L. Appalla, E. Snesrud, Y. Kwak, A. Ong, R. Maybank, P. Waterman, P. Rohrbeck, M. Julius, A. Roth, J. Martinez, L. Nielsen, E. Steele, P. McGann, and M. Hinkle. 2016. The challenges of implementing next generation sequencing across a large healthcare system, and the molecular epidemiology and antibiotic susceptibilities of carbapenemase-producing bacteria in the healthcare system of the U.S. Department of Defense. PLOS ONE 11(5):e0155770. Li, K. 2016. Dr. Rumi Chunara and Sofia Ahsanuddin: The goviral study. https://www.ghjournal.org/the- goviral-study (accessed July 6, 2020). Liu, Y., Z. Ning, Y. Chen, M. Guo, Y. Liu, N. K. Gali, L. Sun, Y. Duan, J. Cai, D. Westerdahl, X. Liu, K. Xu, K.-f. Ho, H. Kan, Q. Fu, and K. Lan. 2020. Aerodynamic analysis of SARS-CoV-2 in two Wuhan hospitals. Nature 582(7813):557-560. Liya, G., W. Yuguang, L. Jian, Y. Huaiping, H. Xue, H. Jianwei, M. Jiaju, L. Youran, M. Chen, and J. Yiqing. 2020. Studies on viral pneumonia related to novel coronavirus SARS-CoV-2, SARS-CoV, and MERS-CoV: A literature review. Apmis 128(6):423-432. MacLaren, G., D. Fisher, and D. Brodie. 2020. Preparing for the most critically ill patients with COVID- 19: The potential role of extracorporeal membrane oxygenation. JAMA. Maxmen, A. 2020. Thousands of coronavirus tests are going unused in US labs. Nature 580(7803):312- 313. Messenger, S. L., I. J. Molineux, and J. J. Bull. 1999. Virulence evolution in a virus obeys a trade off. Proceedings of the Royal Society B: Biological Sciences 266(1417):397-404. N3C (National COVID Cohort Collaborative). 2020. National COVID Cohort Collaborative (N3C): A national resource for shared analytics. Ogbunugafor, C., B. W. Alto, T. M. Overton, A. Bhushan, N. M. Morales, and P. E. Turner. 2013. Evolution of increased survival in rna viruses specialized on cancer-derived cells. The American Naturalist 181(5):585-595. Ong, J., B. E. Young, and S. Ong. 2020. COVID-19 in gastroenterology: A clinical perspective. Gut 69(6):1144-1145. Orr, H. A. 2000. The rate of adaptation in asexuals. Genetics 155(2):961-968. PCAST (President’s Council of Advisors on Science and Technology). 2014. Report to the president on combatting antibiotic resistance. Washington, DC. Penedos, A. R., R. Myers, B. Hadef, F. Aladin, and K. E. Brown. 2015. Assessment of the utility of whole genome sequencing of measles virus in the characterisation of outbreaks. PLOS ONE 10(11):e0143081. Premkumar, L., B. Segovia-Chumbez, R. Jadi, D. R. Martinez, R. Raut, A. J. Markmann, C. Cornaby, L. Bartelt, S. Weiss, Y. Park, C. E. Edwards, E. Weimer, E. M. Scherer, N. Rouphael, S. Edupuganti, D. Weiskopf, L. V. Tse, Y. J. Hou, D. Margolis, A. Sette, M. H. Collins, J. Schmitz, R. S. Baric, and A. M. de Silva. 2020. The receptor-binding domain of the viral spike protein is an immunodominant and highly specific target of antibodies in SARS-CoV-2 patients. Science Immunology 5(48):eabc8413. Rothstein, M., and S. Tovino. 2019. Privacy risks of interoperable health records: Segmentation of sensitive information will help. Journal of Law, Medicine & Ethics 47:771-777. Smolinski, M. S., A. W. Crawley, J. M. Olsen, T. Jayaraman, and M. Libel. 2017. Participatory disease surveillance: Engaging communities directly in reporting, monitoring, and responding to health threats. JMIR Public Health and Surveillance 3(4):e62. Van Cauteren, D., S. Vaux, H. de Valk, Y. Le Strat, V. Vaillant, and D. Lévy-Bruhl. 2012. Burden of influenza, healthcare seeking behaviour and hygiene measures during the A(H1N1)2009 pandemic in France: A population based study. BMC Public Health 12(1):947. van Doremalen, N., T. Bushmaker, D. H. Morris, M. G. Holbrook, A. Gamble, B. N. Williamson, A. Tamin, J. L. Harcourt, N. J. Thornburg, S. I. Gerber, J. O. Lloyd-Smith, E. de Wit, and V. J. Munster. PREPUBLICATION COPY: UNCORRECTED PROOFS 55

2020. Aerosol and surface stability of SARS-CoV-2 as compared with SARS-CoV-1. New England Journal of Medicine 382(16):1564-1567. van Noort, S. P., M. Muehlen, H. Rebelo de Andrade, C. Koppeschaar, J. M. Lima Lourenço, and M. G. Gomes. 2007. Gripenet: An Internet-based system to monitor influenza-like illness uniformly across Europe. Eurosurveillance 12(7):5-6. Wölfel, R., V. M. Corman, W. Guggemos, M. Seilmaier, S. Zange, M. A. Müller, D. Niemeyer, T. C. Jones, P. Vollmar, C. Rothe, M. Hoelscher, T. Bleicker, S. Brünink, J. Schneider, R. Ehmann, K. Zwirglmaier, C. Drosten, and C. Wendtner. 2020. Virological assessment of hospitalized patients with COVID-2019. Nature 581(7809):465-469. Wood, H. 2020. New insights into the neurological effects of COVID-19. Nature Reviews Neurology. Xu, Y., X. Li, B. Zhu, H. Liang, C. Fang, Y. Gong, Q. Guo, X. Sun, D. Zhao, J. Shen, H. Zhang, H. Liu, H. Xia, J. Tang, K. Zhang, and S. Gong. 2020. Characteristics of pediatric SARS-CoV-2 infection and potential evidence for persistent fecal viral shedding. Nature Medicine 26(4):502-505. Zhang, L., C. B. Jackson, H. Mou, A. Ojha, E. S. Rangarajan, T. Izard, M. Farzan, and H. Choe. 2020. The d614g mutation in the SARS-CoV-2 spike protein reduces s1 shedding and increases infectivity. bioRxiv 2020.2006.2012.148726. PREPUBLICATION COPY: UNCORRECTED PROOFS 56

Next: 5 Governance and Regulatory Considerations »
Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies Get This Book
×
Buy Prepub | $59.00 Buy Paperback | $50.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

In December 2019, new cases of severe pneumonia were first detected in Wuhan, China, and the cause was determined to be a novel beta coronavirus related to the severe acute respiratory syndrome (SARS) coronavirus that emerged from a bat reservoir in 2002. Within six months, this new virus—SARS coronavirus 2 (SARS-CoV-2)—has spread worldwide, infecting at least 10 million people with an estimated 500,000 deaths. COVID-19, the disease caused by SARS-CoV-2, was declared a public health emergency of international concern on January 30, 2020 by the World Health Organization (WHO) and a pandemic on March 11, 2020. To date, there is no approved effective treatment or vaccine for COVID-19, and it continues to spread in many countries.

Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies lays out a framework to define and describe the data needs for a system to track and correlate viral genome sequences with clinical and epidemiological data. Such a system would help ensure the integration of data on viral evolution with detection, diagnostic, and countermeasure efforts. This report also explores data collection mechanisms to ensure a representative global sample set of all relevant extant sequences and considers challenges and opportunities for coordination across existing domestic, global, and regional data sources.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!