National Academies Press: OpenBook

Next Steps for Functional Genomics: Proceedings of a Workshop (2020)

Chapter: 4 Understanding the Contributions of Non-Protein-Coding DNA to Phenotype

« Previous: 3 Case Studies on Building Functional Genomics Tools in Diverse Systems
Suggested Citation:"4 Understanding the Contributions of Non-Protein-Coding DNA to Phenotype." National Academies of Sciences, Engineering, and Medicine. 2020. Next Steps for Functional Genomics: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25780.
×
Page 29
Suggested Citation:"4 Understanding the Contributions of Non-Protein-Coding DNA to Phenotype." National Academies of Sciences, Engineering, and Medicine. 2020. Next Steps for Functional Genomics: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25780.
×
Page 30
Suggested Citation:"4 Understanding the Contributions of Non-Protein-Coding DNA to Phenotype." National Academies of Sciences, Engineering, and Medicine. 2020. Next Steps for Functional Genomics: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25780.
×
Page 31
Suggested Citation:"4 Understanding the Contributions of Non-Protein-Coding DNA to Phenotype." National Academies of Sciences, Engineering, and Medicine. 2020. Next Steps for Functional Genomics: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25780.
×
Page 32
Suggested Citation:"4 Understanding the Contributions of Non-Protein-Coding DNA to Phenotype." National Academies of Sciences, Engineering, and Medicine. 2020. Next Steps for Functional Genomics: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25780.
×
Page 33
Suggested Citation:"4 Understanding the Contributions of Non-Protein-Coding DNA to Phenotype." National Academies of Sciences, Engineering, and Medicine. 2020. Next Steps for Functional Genomics: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25780.
×
Page 34
Suggested Citation:"4 Understanding the Contributions of Non-Protein-Coding DNA to Phenotype." National Academies of Sciences, Engineering, and Medicine. 2020. Next Steps for Functional Genomics: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25780.
×
Page 35
Suggested Citation:"4 Understanding the Contributions of Non-Protein-Coding DNA to Phenotype." National Academies of Sciences, Engineering, and Medicine. 2020. Next Steps for Functional Genomics: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25780.
×
Page 36
Suggested Citation:"4 Understanding the Contributions of Non-Protein-Coding DNA to Phenotype." National Academies of Sciences, Engineering, and Medicine. 2020. Next Steps for Functional Genomics: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25780.
×
Page 37
Suggested Citation:"4 Understanding the Contributions of Non-Protein-Coding DNA to Phenotype." National Academies of Sciences, Engineering, and Medicine. 2020. Next Steps for Functional Genomics: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25780.
×
Page 38

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

  4 Understanding the Contributions of Non-Protein-Coding DNA to Phenotype The early view of the genome was that the important part was its genes, which carry the instructions for how to make proteins. The remainder of the genome—referred to as “non- coding” sequences—was thought to be less important or perhaps not important at all. However, it is now understood that this view was misguided and that non-coding elements may play crucial roles, such as regulating the transcription of genes into RNA (King and Wilson, 1975). But there is much less known about the non-protein-coding parts of the genome than about the genes themselves. Developing a clear understanding of how genotype leads to phenotype will require a greatly enhanced comprehension of the role of non-protein-coding DNA. The three speakers in this panel explained their work to illuminate the issue of how these non-coding regions of the genome contribute to an organism’s phenotype. The first speaker was Felicity Jones of the Friedrich Miescher Laboratories of the Max Planck Society, who described her studies of the role played by non-coding portions of the genome in adaptation by stickleback fish. Scott Edwards of Harvard University spoke about his research on the evolution of non- coding regulatory sequences in various taxa of birds that have lost the ability to fly through evolutionary changes. The third speaker was Francois Spitz, who described what is known about topologically associated domains, a feature of the genome that shapes interactions among various noncoding elements of the genome. A discussion with the panel members and the audience followed the three presentations. FUNCTIONAL GENOMICS OF ADAPTATION IN STICKLEBACKS Felicity Jones began her presentation by noting that many of the specialized functions that organisms develop through evolution are achieved not by modifications in genes, but rather by variations in the regulatory mechanisms that control the genes. “And in my lab,” she said, “we’re really interested in understanding how those regulatory mechanisms play a role in determining the specialized functions that enable organisms to adapt to the environment.” Her lab asks such questions as:  What are the molecular mechanisms involved in this environmental adaptation?  How does the genome interact with the environment?  How does genome function change as an organism adapts to a different environment? To address these questions, Jones works with stickleback fish. They are particularly useful for this purpose because they underwent an adaptive radiation about 10,000 to 20,000 years ago in which the ancestral marine (i.e., ocean-living) stickleback colonized a large number of newly formed freshwater habitats across the northern hemisphere. As a result, she said, “as you go across the northern hemisphere, in any freshwater body you care to look at, you’re very likely to Prepublication Copy 29

Next Steps for Functional Genomics find sticklebacks and see much phenotypic, morphological, physiological, and behavioral diversity amongst these populations.” One useful aspect of this biological system is that many traits evolved in parallel in different populations. For example, the freshwater sticklebacks have lost many or all of their bony lateral plates, which the marine ancestor still retains, and that adaptation in the freshwater fish occurred independently many different times. This makes it possible to study the mechanisms behind these parallel evolutionary changes. Another useful feature about sticklebacks, Jones said, is that despite having different forms, the different ecotypes, a form of an organism occupying a specific habitat, exist within close spatial proximity. In some cases these ecotypes even overlap with one another in the lower regions of rivers throughout the world. In these areas of overlap, hybridization and gene flow are common. In other words, the different ecotypes are not reproductively isolated. “That gene flow is super useful from a genomic point of view,” she said, because it helps identify the parts of the genome that are important to the divergence between different ecotypes. Natural selection helps maintain differences at the loci that underlie the divergence to their varying ecotypes and habitats, she explained. The resulting “divergence in the face of ongoing gene flow” provides a good way to differentiate between “signal” and “noise” in genomic differences. Furthermore, she said, just as with zebrafish, there is an entire suite of developmental and transgenic tools available for working with sticklebacks. Researchers have been able to easily adapt many of the transgenic tools used with zebrafish to sticklebacks. Finally, sticklebacks can be bred in the lab where their environment can be manipulated. Since it is also relatively straightforward to study them in the wild, Jones said, the species “bridges the gap between the lab and the field very well.” Although there are many “axes of ecological divergence” for sticklebacks, Jones said, her lab focuses mostly on the marine–freshwater axis. Many years ago her lab did some whole- genome sequencing on both marine and freshwater sticklebacks, looking for regions of the genome where the freshwater fish were consistently different from their marine ancestors (Jones et al., 2012). What they found were blocks of DNA sequence that differed consistently between the marine and freshwater species. Her team learned a lot from this initial work. First, there are many places across the genome where freshwater and marine fish have evolved divergent blocks of DNA, such that “any freshwater fish collected in the wild will be carrying the freshwater allele in as many as 81 different locations around the genome.” Furthermore, marine versus freshwater adaptation is highly polygenic. Finally, the adaptive loci are primarily intergenic—that is, they fall between genes and into the non-coding parts of the genome, which are likely to have regulatory elements that control gene expression. Understanding how these non-coding parts of the stickleback genome work has been a major focus of her team’s work. One of their first experiments was to determine whether the divergence in gene expression they saw between marine and freshwater sticklebacks was controlled by cis-regulatory mechanisms or trans-regulatory mechanisms. Cis-regulatory mechanisms, she explained, are stretches of DNA in proximity of the gene being regulated. Trans-regulatory mechanisms include genes in a different part of the genome that, for example, code for proteins that bind to a promoter to activate transcription of the gene being regulated. “We were interested in knowing whether it is the mutations in the proximal cis elements, for example, versus the trans-elements that are driving the parallel expression divergence that we see,” Jones said. What she and her team found was that in the four different pairs of marine and freshwater fish they have studied, the expression divergence in each pair is controlled by cis- 30 Prepublication Copy

Understanding the Contributions of Non-Protein-Coding DNA to Phenotype  regulatory variance (Verta and Jones, 2019). This cis-regulatory variance is primarily responsible for the difference in gene expression between marine and freshwater sticklebacks. This does not mean that the trans-regulatory factors do not play a role, but the cis-regulatory elements seem to be particularly important in explaining the divergence between the marine and freshwater sticklebacks. Now Jones and her team are working to identify the specific regulatory elements responsible for the divergence between the two types of sticklebacks. They have identified blocks of DNA in which the regulatory elements reside, but those are large blocks—as much as 40 kb in length—and up to 4 percent of the nucleotides are divergent, so the challenge is to determine exactly where the relevant regulatory elements are. They start by cloning the entire blocks from a freshwater and a marine stickleback, combining the regulatory element with a green fluorescent protein (GFP) reporter, and inserting these DNA stretches into embryos. They then look to see if the GFP was expressed in the embryo and if the marine and freshwater constructs show different expression patterns. If there were differences in GFP expression related to a specific block, they then want to find exactly where the relevant promoter is in the block. They accomplished this by mapping potential functional elements using StickleCODE, an ENCODE-style (Encyclopedia of DNA Elements project) approach to identifying functional regulatory elements across the stickleback genome. They have used a number of different assays, including ATAC-seq (assay for transposase-accessible chromatin using sequencing) profiling, ChIP-seq (chromatin immunoprecipitation sequencing) on histone modifications, and RNA-seq (RNA sequencing) to look at expression levels, and they have done that in three different tissues in two different sexes in two different ecotypes. “And,” she added, “we’ve been doing a whole bunch of transgenic assays to develop what we see.” The result has been a tremendous amount of data to analyze to identify regions that differ between the marine and freshwater ecotypes and thus to identify putatively divergent regulatory elements. Once the candidates have been identified, Jones and her team perform functional assays to test whether the blocks of DNA actually contain regulatory elements. Jones showed one example where the GFP reporter caused the livers of the larvae to glow green when the region from the freshwater fish was used but not the marine sequence, indicating that the sequence regulated gene expression in the liver. Looking more closely, Jones found that the marine sequence had a deletion that the freshwater sequence did not. When the same deletion was made to the freshwater sequence, the gene expression in the liver disappeared. Furthermore, if the relevant part of the freshwater sequence was inserted into the marine sequence, the gene was expressed. Finally, Jones spoke briefly about work to examine the regulation of open chromatin. The various sticklebacks show differences in their chromatin. Her group has been doing an allele- specific chromatin assay to determine where the regulatory control of chromatin accessibility is located. What they found with allele-specific ATAC was a pattern that was similar to what they saw for gene expression. That is, the marine–freshwater divergence is mostly due to cis- regulatory changes. Looking to the future, Jones said that the next step in her work will be to do similar experiments under varying environmental conditions to understand how the regulatory landscape gets rewired when sticklebacks are living in different conditions. Adding variable environmental conditions to the mix will sharply increase the amount of data that will need to be taken and analyzed. This will make the work even more challenging, she said. Prepublication Copy 31

Next Steps for Functional Genomics PHYLOGENETICS OF FLIGHTLESS BIRDS Scott Edwards of Harvard University spoke about a project involving paleognathous birds, a group of mostly flightless birds that includes ostriches, emus, cassowaries, rheas, and kiwis. Although flightless birds are rather unwieldy as models, Edwards said, they are powerful in what they allow people to learn about the genetics of adaptive evolution—and, in particular, about the convergent changes in non-coding regulatory sequences that occurred in various taxa of birds that independently lost the ability to fly. Edwards and his team are interested in various phenotypes mainly involving changes related to loss of flight. One such phenotypic change, for instance, is the loss or reduction of skeletal elements such as the keel, an extension of the sternum to which the flight muscles attach (de Bakker et al., 2013). Flying species such as pigeons have a prominent keel, but the keels of flightless birds tend to degenerate. “We’re also looking at big differences in body size,” he said, “as well as variable loss of forelimb elements.” The first step of the project, was to develop a substantial comparative genomics dataset of the paleognathous birds, including a draft genome of an extinct member of this clade, the little bush moa. They also developed a phylogeny of these paleognathous birds. “Our first goal here was to determine whether flight was convergently lost or not,” Edwards said. The phylogeny they developed (see Figure 4-1), indicated that there were a number of separate convergent losses of flight (Liu et al., 2010; Sackton et al., 2019). “This is in contrast to the paradigm for many decades, namely that all the flightless lineages of this clade descended from a flightless colony ancestor,” he noted. One of the biggest surprises was the position in the phylogeny of the tinamou, a flying bird. Some in the field had thought that the tinamou, because they were able to fly, would have fallen outside the group. FIGURE 4-1 Images of different types of paleognathous birds. SOURCE: Scott Edwards presentation, slide 5. 32 Prepublication Copy

Understanding the Contributions of Non-Protein-Coding DNA to Phenotype  The fact that the ability to fly had been lost multiple times meant that Edwards could use this convergent evolution to identify loci in the genome related to loss of flight—much like the process that Jones described for her work with sticklebacks, albeit on a different scale. His group focused primarily on conserved non-coding elements, explaining that these elements are easy to identify in the genome and noting that they often act as enhancers whose role is to bring together the transcriptional machinery to drive gene expression (Janes et al., 2011). “We first used the signature of rapid evolution, either adaptive evolution or release of constraint, as an indicator of which non-coding elements might be important for loss-of-flight phenotypes,” Edwards said. As an example, he mentioned a 250-base-pair, non-coding element that was highly conserved throughout the lineages of birds that maintained the ability to fly but that was visibly increasing in rate of appearance among some of the flightless lineages. The identification of such elements can be formalized in a statistical model, he said, explaining that two of the key things needed in this approach to functional genomics are statistical models that link genotype to phenotype and other models that detect changes in rate—and potentially changes in function—in non-coding elements (Hu et al., 2019). An element like the aforementioned 250-base-pair sequence, which is conserved in most of bird evolution but rapidly changed in flightless lineages, “is the kind of signature that we start with,” Edwards said. In particular, they look for regions with a large number of these accelerated non-coding elements and then look for nearby genes. What they have found is that nearby genes are often important in limb development. To find candidates for genes important in the loss of flight, the group combined the rate acceleration data with two other datasets. First they generated a lot of ATAC-seq data and also used ChIP-seq data from the literature (Sackton et al., 2019). By looking at the overlap between candidates from the three datasets, they identified 42 conserved non-exonic elements as primary candidates. These 42 loci were prioritized for further study To test one of those elements, they compared a version from the rhea, a flightless bird, with versions from the chicken and the tinamou, which are both able to fly or glide. They found that the versions from the two flying species were able to successfully drive gene expression in the developing forelimb of chickens, while the version from the rhea was not. This showed that at least one of the elements identified by their method has functional consequences for gene expression. “We are scaling this approach up in collaboration with Emma Farley to try to interrogate hundreds, if not thousands, of these enhancers using the chicken limb as a developmental model,” Edwards said. (For more information on Emma Farley’s work, see Chapter 7.) In another study they compared differences in gene expression between the forelimb and hind limb in five species of birds, two of them volant (chicken, tinamou) and three flightless (emu, ostrich, rhea). They found that there were relatively few differences between the two volant birds and a substantially larger number of differences in the other three flightless species. To more completely understand this comparative dataset in detail, Edwards said, it is necessary to interpret it in the context of phylogeny. There are some initial tools for doing this in the literature on multi-variant evolution of complex phenotypic traits. Dean Adams, in particular, has done a lot of work on how to control for the many kinds of correlations not only between species in a phylogeny, but also between genes or other aspects of a complex multi-variant trait (Adams and Collyer, 2018; Bolnick et al., 2018). Prepublication Copy 33

Next Steps for Functional Genomics In analyzing the ATAC-seq data from the forelimb/hindlimb study, Edwards concluded that there is a clear signature of convergent evolution leading to the three flightless species he and his lab studied. There are other ways to determine what might have been the ancestral ATAC-seq profile, including the use of statistical approaches on assays of extinct populations. The main takeaway, however, was the importance of treating comparative data in an appropriately phylogenetic context. Finally, he showed a slide that illustrated what the ATAC-seq data look like, with five species and about 363,000 sites of open and closed chromatin, for which he commented on its complexity. “One of my hopes,” he said, “is that we can develop models that will allow us to pull out of this complex dataset, different groups of loci of open and closed chromatin, which may predict, in essence, the patterns of phenotype that we’re interested in at the tips of the phylogeny.” For that dataset, he commented, the trait of interest is binary—volant or flightless— but one could also imagine having a continuous trait such as body size. Edwards called for a model that could one day predict the loci responsible for traits on either end of a phylogenetic tree, but noted that this is not currently possible. The existing models are simple, with most of them designed for a small number of loci or characters in the genome, and they can predict simple traits. What is needed now, Edwards said, is to develop a series of methods that are tailored to high-dimensional genomic data. To get his paradigm to work, he said, it will be necessary to analyze large numbers of species across potentially multiple developmental stages. That sort of data is appearing in the literature now, he said, but it is data for which there are no good analytical models. In conclusion, Edwards said, the non-coding genome seems to be important in the convergent loss of flight in the paleognathous birds, but new comparative models are needed to link genotype to phenotype in a phylogenetic context. “I do believe that phylogenetics is one of the most powerful approaches for that majority of biodiversity which isn’t amenable to laboratory analysis,” he said, “and yet the tools . . . aren’t there yet.” ROLE OF CHROMATIN FOLDING IN GENE EXPRESSION The final speaker in the session was Francois Spitz of the University of Chicago, who discussed the role of chromatin folding in gene expression. Spitz began by noting, as other speakers had, that gene expression is shaped to a substantial degree by cis-regulatory elements, DNA regions that are near genes but separate from the core promoter regions. What is seen in many animals, especially vertebrates, is that gene regulation is extremely modular, and is frequently related to a large series of regulatory elements around the gene. More than 80 percent of the genomic variants identified by genome-wide association studies are far from the genes they regulate. Furthermore, mutations of these distant genome regions are a frequent cause of human developmental disorders and cancer (Uslu et al., 2014). Fortunately for those interested in understanding these issues, a large toolbox has been developed to identify regulatory elements and characterize them functionally. One piece of information learned from using these tools, Spitz said, is that the regulatory elements are not necessarily controlling the genes closest to them. Indeed, it often happens that one of these elements skips over a number of intervening genes and interacts with a gene that is 100,000 or even 1 million bases away. 34 Prepublication Copy

Understanding the Contributions of Non-Protein-Coding DNA to Phenotype  Understanding the enhancer–promoter interactions that drive gene expression requires examining the three-dimensional folding of the chromatin, the protein–DNA complex in which the DNA strands are packaged. That folding determines the physical proximity between regulatory elements, such as enhancers and promoters, and shapes their function. For a long time this folding process was mysterious, he said, but the development of new technologies has made possible a much better understanding of how the genome is folded in the nucleus during interphase, the cell’s resting period between divisions. Those technologies make it possible to see which regions along a chromosome are close to one another in the folded structure. What researchers have found is that the genome is organized in distinct and nested structures of different sizes. The structure looks different at different scales, but if one examines the structure at the scale of a few hundred thousand bases, what appears are a series of self- interacting domains called topologically associated domains, or TADs. In essence, a TAD is a stretch of DNA, typically hundreds of thousands of base pairs to 1 million or more base pairs long, in which the various sequences are much more likely to interact with one another when the chromatin is folded than with sequences that lie beyond the boundaries of the TAD (Symmons et al., 2014; Lupiáñez et al. 2015). TADs are interesting, Spitz said, because they seem to define a space along the genome that is a regulatory domain, with a set of related genes and regulatory elements that work together. Furthermore, the boundaries seem to play an important role because deletions of the boundaries lead to “enhancer re-allocation,” in which enhancers within the TAD can act on new target genes outside of it (Symmons et al., 2014; Lupiáñez et al. 2015; Tsujimura et al., 2015; Franke et al., 2016). “This enhancer relocation—or enhancer “hijacking”—has been implicated in growing numbers of human diseases,” Spitz noted. In addition, abolishing a TAD will prevent efficient long-distance interactions between an enhancer and a gene even if the distance along the genome between them is unchanged. The interaction can be partially rescued by bringing the enhancer and gene closer together, he commented. Research has shown that TADs subdivide the genome into regulatory domains that are relatively invariant from one cell type to another. TADs have two basic functions. They ensure the specificity of long-distance enhancer–promoter interactions by preventing the activation of genes in adjacent domains, and they promote efficiency of the long-distance interactions, enabling distant elements to exert a robust influence on gene expression. The interactions in the TAD are organized by what Spitz described as “the two major actors,” the proteins CTCF and cohesin, which both accumulate at the boundaries of the TAD. Knocking out either CTCF or cohesin eliminates the TAD. Loss of CTCF reduces contacts between the regions inside the TAD (Nora et al., 2017), while removing cohesin leads to increased contacts between the regions inside the TAD and those outside it (Schwarzer et al., 2017). Interestingly, he said, when the cohesin is removed, “you lose the TADs, but it doesn’t mean that you lose all the structures.” In particular, the compartmentalization of the genome into active and inactive chromatin is reinforced, with the compartmental signals becoming stronger and fine-scale structures appearing within the former TAD. These experiments, Spitz said, demonstrate that two distinct processes contribute to the three-dimensional organization of the genome. The first is based on the underlying chromatin structure and segregates regions into active and inactive compartments. “That’s a system which will ensure maintenance of activities,” he said. The second is mediated by cohesin and leads to the TAD structure. “This is a dynamic process which is enabling genes to scan their neighbors Prepublication Copy 35

Next Steps for Functional Genomics looking for partners,” he said. It results in a “mix and match system” that is important for dynamic changes of gene expression. Much remains to be understood about how the various elements within a TAD interact and change in response to genetic variation. For example, the responsiveness to enhancers is not homogeneously distributed within a TAD, Spitz said, but depends on where a gene is located. Determining the factors that define the properties of enhancer–promoter interactions within a TAD will be essential in understanding the consequences of genomic variations. One goal is to predict how changes in the linear genome sequence will affect the folding and enhancer–promoter contacts within a TAD. Spitz said that his group has begun examining this in mice by engineering variants and examining the results. “So far it’s just a beginning, and there is no simple explanation.” For example, their studies found that the distance between an enhancer and a target gene could be increased significantly—in one case from 1.7 to 2.7 Mb— and no changes occured, whereas significantly decreasing the distance led to a sharp drop in the expression of the gene. The complex rules of folding are not yet fully understood. Looking to the future, Spitz said that one of the challenges will be to use tools such as Cas9- mediated genome editing to modify the genome and then assay the chromatin in ways that give insights into how the three-dimensional folding is controlled. This is particularly important, because various diseases lead to changes in the three-dimensional organization of the genome. It will also be important to characterize three-dimensional genome folding across different types of organisms. Currently few experiments have been done in a variety of organisms, he said. Other organisms do show signs of interaction domains, and future work will uncover whether these domain share the same rules of formation, or if the genomes organize differently, based on the species. DISCUSSION The discussion that followed these talks touched on topics around functional gene regulation. Charles Danko of Cornell University and Emma Farley of the University of California, San Diego, had comments and questions related to the debate around the roles of topological domains in gene regulation. Danko began by explaining that there are disagreements in the literature related to the degree by which the interactions are observed between enhancers and genes. This makes it challenging to understand the role of many topological domains. He noted that Spitz highlighted some good examples of how topological domains can influence which enhancers interact with which genes, but Danko commented that there are many other factors that could disrupt the results or make them inconclusive. Spitz responded by saying that there is no consensus in the community related to how accurately interactions between genes and enhancers can be measured, and researchers are just starting to address this question with new tools such as live imaging, high-resolution microscopy, and functional strategies. In relation to topological domains, he pointed out the fact that some loci provide a clear picture of what interaction is happening, while others show interactions that are diffuse and difficult to define. Later in the discussion, Farley alluded to the varying functional significance of topological domain interactions. She asked for Spitz’s thoughts on how to take these large datasets of interactions and discover those that have functional importance. Spitz responded by first saying that because these ideas are new, he does not believe that the field has generated enough data to be able to fully explore these questions. He proposed a 36 Prepublication Copy

Understanding the Contributions of Non-Protein-Coding DNA to Phenotype  bifurcated approach that looks at the problem on both a global and a focused level. The global approach involves a systematic exploration of the functional contribution of different ATAC-seq peaks. The focused part of the approach would dissect the interactions of model loci to understand how they operate at the levels of both sequence and structure. The plan would be to extrapolate general principles from this work and apply them to other loci. Following this, Steven Henikoff asked Jones and Edwards. In both of their talks, he noted, they showed examples where there were large regions of conservation, but when they actually did the mapping, they found discrete sites of cis-regulatory elements. “I’m wondering why,” he said, “because I wouldn’t expect the whole region to be conserved, just the elements. And I’m wondering if it’s something like TADs you might be looking at.” “We’re not quite at TADs yet,” Edwards answered. Instead, he believed that in his case what he was seeing was rate acceleration. “Those accelerated elements are accelerating for a variety of reasons. Sometimes it’s simple point mutations. Other times it’s clearly gene conversion events.” So it was a variety of elements that were leading to the sequence differences. For sticklebacks, Jones believed that it is related to the way the animals have evolved. They tend to evolve by making use of preexisting, or “standing,” genetic variation, which repeatedly sweeps to fixation. The parts that get swept to fixation repeatedly accumulate mutations, which may be slightly beneficial. “So we believe that these blocks that we see are highly pleiotropic and have a lot of mutations that have been repeatedly pre-screened by selection in the course of evolution.” Sarah Kocher from Princeton University asked the panelists to compare how much enhancer variation contributes to the degradation of an existing phenotype versus the novelty of a new trait in the first place, and also what the relative contributions are of changes in master regulators versus downstream changes in non-coding sequences. “I can’t speak exactly to that,” Jones said, “but we have some interesting data where we studied the degree of correlation in cis-regulation of gene expression in these marine–freshwater pairs.” Upregulated genes tend to be strongly cis-regulated, suggesting that most upregulation is done through cis-regulatory mechanisms. That correlation does not appear, however, when the fish lose expression. “So there might be many ways to kill gene expression, but few ways to evolve or gain gene expression.” Edwards answered the first part of Kocher’s question by saying that for non-coding elements it is often not clear whether adaptive evolution is driving changes or if it is just degradation of the element. Concerning the second part of the question related to master regulators, he said that it should be possible to combine something like ATAC-seq data and gene expression data to learn more about where the regulators are for gene expression. “There have been a few papers out on that using some really exciting approaches.” Gene Robinson from the University of Illinois asked the panel members to talk specifically about the sort of resources they need to do their functional genomics work. “What level of genomes do you need? Over what kind of diversity? . . . Just give us a flavor for what you see as where we need to be.” “For the experiments we are doing, we definitely need almost a chromosomal level assembly of the genome,” Spitz answered. The genome can have some gaps, “but I think having high-quality genomes where you can see linear or 3-D order is essential.” However, he added, the technology has now reached a point where the people in his lab can put together good genome assemblies themselves. The community can help by further annotating the genome, making sure the information is useful, and organizing it. Prepublication Copy 37

Next Steps for Functional Genomics Jones echoed Spitz’s reply. “With the technology changes we do our own de novo assemblies, and we’ve had . . . success with the linked read sequencing,” she said. “It happens that the stickleback genome is small and well behaved.” She also agreed with Spitz’s comment about the community and the resources. “We have a number of different assembly versions kicking around for different ecotypes,” she said. “I know that my academic brothers and sisters equally have that many assemblies. As a community, what we would like to have is more ability to put this together, including all the associated metadata that [go] with it.” However, she said, as a principal investigator with limited resources, she finds it hard to attract someone to take on that job. She added that suggestions for how to get that kind of support from funding agencies would be much appreciated. Edwards said that although he did not need particularly high-quality genomes for the research on flightless birds, better genomes could be valuable in other types of research. “I think that would be a great place for NSF [the National Science Foundation] to invest in,” he said. “That is a great way to jump-start a lot of this—to get more genomes out there, with good annotations.” Spitz added that one thing that should be funded in parallel is the development of tools that would enable “even someone with limited knowledge of computational biology to navigate . . . genomic maps.” It would be particularly valuable to provide ways to analyze things such as ATAC-seq maps or chromatin maps for different species in a comparative manner. Edwards agreed, saying that visualization is important. “We’ve been fortunate enough to be able to produce a whole-genome alignment with a bunch of birds and to put it up into a browser, and it’s available to the community. I think that can just lead to all kinds of discoveries. Just being able to view the data easily is a huge asset.” Lastly, Magnuson, the moderator, asked a question of all three panelists. Given that they were looking at folding, open chromatin, histone modifications, and the like, have they examined chromatin remodeling via specific chromatin remodeling complexes? While Spitz had not, Jones reiterated that a number of trans-acting elements had been discovered during experiments where they are doing standard quantitative trait locus (QTL) mapping with chromatin profiles as a phenotype. “We’re at the point of trying to identify whether they are these known complexes,” she noted. “Stay tuned.” Edwards said that he really had not heard of the chromatin remodeling factors that Magnuson asked about, but he thought this illustrated an important point. Much of the ecological community gravitates toward assays that are straightforward and easily done on nonmodel systems. It would be useful to get molecular biologists and genome biologists together with people in the evolutionary community. “Just having interactions between folks for whom these factors are their daily bread versus folks like me who hadn’t really heard of them before, that’s going to catalyze a lot of important synergy,” he said. 38 Prepublication Copy

Next: 5 Advancing Research on the Environmental Regulation of Gene Function »
Next Steps for Functional Genomics: Proceedings of a Workshop Get This Book
×
Buy Paperback | $55.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

One of the holy grails in biology is the ability to predict functional characteristics from an organism’s genetic sequence. Despite decades of research since the first sequencing of an organism in 1995, scientists still do not understand exactly how the information in genes is converted into an organism’s phenotype, its physical characteristics. Functional genomics attempts to make use of the vast wealth of data from “-omics” screens and projects to describe gene and protein functions and interactions. A February 2020 workshop was held to determine research needs to advance the field of functional genomics over the next 10-20 years. Speakers and participants discussed goals, strategies, and technical needs to allow functional genomics to contribute to the advancement of basic knowledge and its applications that would benefit society. This publication summarizes the presentations and discussions from the workshop.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!