As Gene Robinson of the University of Illinois at Urbana-Champaign commented in his remarks in the workshop’s opening session, one of the keys to developing functional genomics will be developing new sets of tools that make it possible to carry out “large-scale, discovery-oriented projects” that will derive knowledge in ways that are not yet possible and at scales that will massively accelerate researchers’ ability to explore the underpinnings of life. Thus, the workshop’s first set of talks was devoted to descriptions of some of the current tools and systems being developed and used at the cutting edge of functional genomics. In particular, as session moderator Lauren O’Connell of Stanford University said, the organizing committee wanted to hear researchers talk about their successes and failures in starting research with new organisms and about the tools they developed to map the genotype-to-phenotype landscape. O’Connell also mentioned the committee’s interest in hearing about successful examples of investigators moving tools from established researched organisms to new ones.
The session’s speakers were Andrea Sweigart from the University of Georgia, who spoke about her work with monkeyflowers; Rachel Dutton of the University of California, San Diego, who described her work with microbial communities living on cheese rinds and the tools she has developed to study those communities; Zoe Donaldson of the University of Colorado Boulder, whose presentation dealt with the neurogenetics of sociality in voles; Dominique Bergmann of Stanford University who explained how she has moved among diverse species of plants, using what is learned with one to work on another and piecing together a broader picture than would be possible using just one species; and Steven Henikoff of the Fred Hutchinson
Cancer Research Center, who described a method of low-cost, high-resolution chromatin profiling that can be applied to a wide variety of organisms.
Andrea Sweigart opened her presentation by saying that she would be describing a community that is just beginning to move into functional genomics, so she would be touching on some of the considerations facing researchers getting started in this area.
With her work in monkeyflowers, she is studying genotype to phenotype, she said, and, in particular, seeking to understand how the environment changes that relationship. “And, as an evolutionary biologist focused on speciation,” she added, “I also want to understand how variation is maintained within populations and then eventually leads to divergence between populations and species.”
Sweigart first touched on the issue of how a researcher selects an organism to work on. Through her talk, she listed the following considerations:
- The phenotypes of interest should be present in the organism. “We want to have the full range of diverse phenotypes present,” she said, and not just whatever phenotypes are present in traditional model systems.
- The experiment should be tractable. You need organisms that can be grown in the lab or greenhouse or perhaps in experimental plots.
- Having access to natural populations is valuable. This is particularly true for those researchers studying ecology and evolution, and it is useful to be able to carry out experiments in the field.
- It is preferable to have a rich history of research in the organism, if possible, and a diverse and interactive scientific community working with it.
All of those attributes are present in monkeyflowers, Sweigart said, which is an incredibly diverse genus of wildflowers. The genus, Mimulus, includes about 150 species, with its center of divergence in western North America. Rapid adaptive divergence is a key feature of the genus, and most of its taxonomic groups contain largely interfertile taxa that are phenotypically distinct. Much of the phenotypic diversity in the group is driven by divergence in pollinator attraction and mating systems.
One highly studied group of monkeyflowers is the species complex Mimulus guttatus, which Sweigart illustrated with a slide showing the flowers from half a dozen members of this group (see Figure 3-1). They are
similar in shape but have about a fivefold variation in size, which is mainly due to differences in how they breed. The smallest flowers occur in self-fertilizing plants that do not need large, showy flowers to attract pollinators.
In relation to the first requirement on Sweigart’s list of characteristics for choosing a research organism, Sweigart noted that “this genus is really famous for its ecological breadth.” M. guttatus lives in almost every imaginable habitat in the western United States. It is found on high alpine rocky outcrops, on dunes along the Pacific, in serpentine soils, in abandoned copper mines, and even in 60°C thermal soils in Yellowstone National Park. These varied habitats require significant adaptation, Sweigart said. For example, the monkeyflowers along the coast must be able to tolerate salt spray, and those in abandoned copper mines can thrive with levels of copper that are toxic to most other plants. Furthermore, monkeyflowers can adapt to these conditions with surprising speed. The copper mines, she noted, were established in the mid-19th century and abandoned by the 20th century, and so M. guttatus found a way to adapt in fewer than 150 generations in conditions where most other plants could not survive.
The genus is also experimentally tractable and has accessible natural populations, Sweigart said. They are easy to grow in the greenhouse and are highly fecund. Researchers can bring seeds from the plants collected in the wild into the greenhouse and carry out various experiments there.
After providing the background on monkeyflowers, Sweigart described two case studies from her own research to illustrate some of the challenges
in functional genomics. In the first she examined hybrid lethality between two species of Mimulus whose habitats overlapped. After bringing the two species into the lab and creating inbred lines, she crossed them to create hybrids. When these hybrids were bred to create an F2 generation, one-sixteenth of the plants died. To identify the genes responsible, she said, they used brute-force mapping and positional cloning as well as RNA-seq (RNA-sequencing), and ultimately traced the lethality to duplicates of the gene pTAC14 (Zuellig and Sweigart, 2018). In Arabidopsis thaliana, a well-characterized plant, the gene seems to be essential for proper chloroplast development.
This is what seems to be happening, Sweigart said: The gene was duplicated in one of the two species (M. guttatus), and subsequently the ancestral copy developed a one-base-pair deletion, which knocked out its function, so only one of the two once-identical genes now work. The other species, M. nasutus, never had the duplication. So when the two species are crossed, second-generation progeny that inherit only non-functional copies of one of the genes or a missing copy will lack chlorophyll and will die early during development.
Gaining this understanding is just the first step. “What we really want to do is understand the evolutionary history of those genes,” Sweigart said. “We want to know about the evolutionary dynamics of those lethality alleles in natural populations.” Is the one-base-pair deletion neutral, for instance? Could it be evolutionarily advantageous? What is nice about this system, she said, is that it is one of the first where hybrid incompatibility genes have been identified and where their effects can be studied in populations in the wild (Sweigart et al., 2006; Sweigart and Flagel, 2015; Kerwin and Sweigart, 2020).
As it turns out, Sweigart said, the pTAC14 story might have been the best-case scenario, or at least not the worst. She has been working on another case of hybrid incompatibility where she has found it difficult to identify the causal variants. There are several reasons for this: the phenotype is more complex, there are many more functional candidates for this phenotype, and there is quite a bit of copy number and microstructural variation in the relevant regions. “So we’ve realized as a community,” she said, “that we’re only going to get so far with this kind of classic approach of brute-force positional cloning.”
Fortunately, she said, her team has been awarded a grant through the National Science Foundation’s Enabling Discovery through GEnomic Tools (EDGE) program to develop robust, repeatable transgenic techniques for use in many genotypes and species and mutant libraries to be of use in multiple species. Already, she said, there have been some preliminary successes by researchers using some of these transgenic approaches, but the methods
have been hard to apply in individual labs that do not have all the necessary technical skills to do transgenic work.
In closing, Sweigart offered suggestions for what is needed to move forward. The field continues to need more functional tools. “We want the ability to do CRISPR homologous recombination for things like adaptation and speciation,” she said. “This is key because we want to be able to compare individual alleles from different lines.” She also mentioned reporter gene/sensor lines, as well as ATAC-seq (assay for transposase-accessible chromatin using sequencing) and ChIP-seq (chromatin immunoprecipitation sequencing). Furthermore, “we need more genomics resources, generally. We need whole genomes, we need the pan-genome.” Because there is such tremendous variation in the monkeyflower genus, one cannot work with a single reference sequence and expect to see the whole picture, she said. Finally, she named community resources, such as seed banks and distributors, as being critical for her work.
Rachel Dutton began studying microbial communities living on cheese rinds, she said, because of a conviction that a vast amount of biology was being missed by studying individual organisms in isolation. “My background is in E. coli genetics,” she explained, “but I started to get really interested in this idea that microbes don’t grow in isolation; they grow as parts of complex communities. So, what is the biology that is yet to be discovered if we put organisms into a more natural context?”
She set out to find a natural microbial community that could be studied in situ but that could also be separated and studied as individual components. A major challenge in the study of microbiomes is that researchers cannot culture most of the diversity present in a particular environment. In addition, many microbiomes consist of hundreds or even thousands of individual species living together, so it is not feasible to work with them experimentally. “I was looking for a system where we could actually take it apart into its individual components, and then ideally … put it back together in a lab in an in vitro system where we can manipulate the membership and manipulate the conditions under which the community is growing,” Dutton said. This would allow her to identify some of the mechanisms and functions operating in these communities.
She ended up focusing on the microbial communities that are found in fermented foods, such as cheese, beer, or wine. Most such foods form through the rapid growth of relatively simple microbial communities. She reasoned that she should be able to culture these communities because they grow rapidly and in well-defined systems.
In particular, she decided to study cheese rind biofilms, which are microbial communities that form on the surface of cheese during the aging process. Typically, the complexity of these communities ranges from low to medium, she said, or from about 3 to 10 member species.
However, these members are phylogenetically diverse, with fungi and many different types of bacterial species from across different phyla.
“We showed that these systems are completely culturable,” she continued. “We can completely deconstruct them into their individual components,” she said, and can put them back together in vitro to create systems that represent the natural behavior found in the real world. “We have in vitro cheese in the lab” (Wolfe et al., 2014).
After spending about 5 years developing the system, Dutton set out to explore the biology of these communities. One of the advantages of working with a system where one has access to both in situ communities and in vitro communities is that it is possible to take both top-down and bottom-up approaches to studying their biology. The top-down approaches include such things as comparative genomics and metagenomics, while the bottom-up approaches include genetic screens and in vitro community manipulation. She described a high-throughput genetic screening approach in which her team manipulates in vitro communities to gain insight into what is happening in the natural system (Morin et al., 2018).
The work has been done on the simplest of cheese communities, a brie- or camembert-style community with three members: one bacterial species, Hafnia alvei, and two fungal species, a yeast, Geotrichum candidum, and a fungus, Penicillium camemberti. “Even within just the simplest community,” she noted, “we have quite a bit of phylogenetic diversity here.”
The approach they take is to use large barcoded transposon libraries (Wetmore et al., 2015). “You’re making large random mutant libraries,” she explained, “but each of the transposons in the library has a random barcode, so you can associate each insertion in the genome with a barcode sequence and follow the population changes in the library just by sequencing barcode abundances.” The library they used had a pool of about 150,000 mutants, which represented about 15 different insertions and every non-essential gene in the genome, she said.
The team uses the sequencing approach to measure the barcode abundances in the starting population, and then looks for genes that are defective in a certain type of environment, that is, genes required for growth in that particular environment. The strategy then is to grow the libraries under different conditions—by themselves in culture, on cheese alone, on cheese with partners, or on cheese with the entire community—and compare the different outcomes (see Figure 3-2). If a gene has a barcode that drops out of the population under growth alone and also in the interaction condition, this shows that the gene is always required. Another category
consists of genes that are not required for growth in the “alone” condition but are required in the presence of interacting partners. A third category, which Dutton said they had not considered until they saw the data, is called interaction-alleviated genes, which consists of genes that are required in the growing-alone condition but not when a community is present.
In the first pass of their experiments, Dutton said, her team used a pre-built library that was transformed into E. coli. E. coli was used for practical reasons, because of all the knowledge and tools available for this well-studied organism. “So we grew E. coli alone in these conditions, we grew it with individual partners, in three-member communities with two pairs of cheese partners, and then in a complete community, and compared all of these results,” Dutton explained.
They calculated the gene fitness data based on the barcode abundance of the inoculums versus growth on their in vitro medium, which is a cheese curd agar. What they found was a large number of genes that were required in the grow-alone condition and a somewhat smaller set of genes that were
required to grow in the presence of a community. The overlap between these two sets was characterized as “core requirements.” Some genes were not important in the alone condition but became important when they were growing in the context of a community, and there was a relatively large set of genes that were important alone but no longer needed when growing in a community.
The latter type of gene, the community-alleviated genes, frequently map to amino acid biosynthesis pathways. “What this tells us,” Dutton said, “is that E. coli, when it’s growing alone on cheese, … has to make its own amino acids. If you knock out any genes in amino acid biosynthesis, when E. coli is growing by itself, it will die.” However, if it is growing in a community, the data imply that some other members of the community are providing it with the required amino acids. When they looked at their data in more detail and examined individual pair-wise contributions to the fitness effects, they found that it was only the fungal species that were producing this cross-feeding effect. What they believe is happening is that fungal species are secreting proteases and breaking down proteins into peptide chains and amino acids that E. coli is then able to use.
Finally, Dutton described some high-order interactions that they were able to detect in their data by looking at the patterns of genetic requirements of E. coli grown alone versus in pair-wise combinations versus in the community. Strangely, there were situations where a gene was not required when E. coli was growing alone, was required in paired conditions, but was not required when growing in a community. There were also combinations of genes required in the alone condition, not required in the paired condition, and required again in the community.
By looking at the data more closely and working with people who were more familiar with quantitative epistasis in gene patterns, Dutton was able to determine what was going on in these higher-order interactions. She described one such situation: “So there’s a multi-drug efflux pump in E. coli made up by acrA and B proteins,” she said. “When E. coli is grown alone, these genes are not required. It doesn’t need a drug efflux pump when it’s growing by itself.” However, when E. coli is grown in the presence of Geotrichum, the pump is now required, which indicates that Geotrichum is producing some antimicrobial that E. coli needs to pump out in order to survive. But when E. coli is grown in the presence of the community, again the pump is not required because Hafnia was somehow negating the effect of the antimicrobial produced by Geotrichum, allowing E. coli to survive without the drug efflux pump.
In closing, Dutton listed the successes of her team and the roadblocks they still face in their work. They have successfully implemented some high-throughput screening techniques in a relatively new model system by using E. coli as a proxy for what was happening in the environment. Since
performing those experiments, they have made libraries in cheese-related microbial species and have detected many types of interactions among the organisms, even in a simple community.
One of the greatest experimental challenges, she said, is that about 30 percent of the genes they identify, even in E. coli, have no known function. A second challenge is that the biological insights they obtained from their screens was mostly limited to well-characterized areas of biology, such as metabolism.
Moving forward, she said, what they really need are medium- and high-throughput ways to efficiently categorize, prioritize, and characterize the genes that come out of the screens. What, for instance, are the genes that represent new biology that is only happening in communities, and how can comparisons be made across systems?
“We are not really interested in learning about cheese,” Dutton said in conclusion. “We want to learn general principles, general mechanisms. How do we take our findings in this relatively simple and tractable system and compare it to other systems to figure out what are the generally important pathways and processes?”
The vole is a small rodent related to hamsters and lemmings, and, as Zoe Donaldson explained, it offers an example of how vast behavioral differences can exist even between closely related species. In particular, the prairie vole and the meadow vole live in the same habitats and appear to be nearly identical, but behaviorally they are distinct from one another. Prairie voles are monogamous. A male and a female will mate, share a territory, and raise their offspring together. Meadow voles, by contrast, are promiscuous. “Females will become behaviorally receptive to males, will mate sometimes with multiple males, and then go off and raise the offspring by themselves,” Donaldson said.
The question Donaldson asked was what genetic differences underlie this behavioral difference and makes the brains of the prairie voles, but not the meadow voles, capable of forming long-term pair bonds. It is a question that cannot be answered in a mouse model, Donaldson noted, because “mice are not monogamous and you can’t study pair bonding if it doesn’t exist within your species.” So, it was necessary to develop the tools to study the question in voles.
Skipping over much of the history of how potentially relevant genes were discovered, Donaldson identified the gene for vasopressin receptor 1a (V1aR) as crucial to the difference in behavior (Lim et al., 2004). In particular, there are striking differences in how the gene is expressed in the brains of the two species. One brain region with a particularly large difference
is the ventral pallidum, where there are high levels of gene expression in the prairie vole (the monogamous species) and lower levels in the meadow vole (the promiscuous species) (Phelps and Young, 2003). Researchers have demonstrated that these expression levels are critical for some of the differences in social behavior by using a viral vector to increase the expression of the gene in the ventral pallidum of the promiscuous species, after which they “show an affiliative preference for the animal they’re mating with, which is the basic hallmark behavior that you need in order to be monogamous” (Hammock and Young, 2005).
This fascinating transformation shows “that you can change the expression pattern of a single gene, and the architecture’s already in place within the brain to completely transition your mating,” Donaldson noted. She set out to find the genetic basis of the difference in expression patterns in the vasopressin receptors in these animals.
Her early work on this system took place in the pre-genome era, Donaldson said, so she had to use the tools that were available at that time. “We fished out the gene from a phage library and found that the coding region of this gene was nearly identical between these species,” she said. In contrast, there is a length of repetitive DNA upstream of the gene that is nearly absent in the promiscuous species but more than 600 base pairs long in the monogamous species.
Furthermore, there is an allele-like variation in the lengths of that repeat-containing element in the monogamous species that was shown in breeding studies to be associated with individual differences in social behavior, such as how attentive a father is toward his offspring. This information led her to hypothesize that variation in this repeat-containing element directly contributes to both species-specific and individual differences in patterns of expression of the vasopressin receptor gene.
At the time, she did not have the tools to test this hypothesis in voles, and so she turned to mice, where it was possible to manipulate the genome in a specific way. This is reflective of how Dutton used E. coli as a tractable system with the eventual goal of moving to more complicated and less studied yeast and fungal communities. To start her mouse work, Donaldson took a 3.5-kb region upstream of the gene encoding the V1aR from voles—the region that contained the repeat element—and put it into mice, replacing the corresponding region of the mouse genome (Donaldson and Young, 2013). She did this for three versions of the region, one from the promiscuous species and two different versions, one with a longer repeat than the other, from the monogamous species. Because the genetic differences in these mice were limited to the repeat-containing element, Donaldson could assess the effects of the repeats cleanly, without having to worry that the actual functional variants were something else nearby in the genome that were causing the differences in expression patterns.
The expression of the V1aR gene in the transgenic mice looked much closer to the expression of the gene in normal mice than the expression in voles. However, there were particular brain regions with clear changes in the gene expression pattern, including the dentate gyrus, part of the thalamus, and the central amygdala. “This wound up being a silver lining,” she said, “because now instead of having to look at the entire brain and the complexity of gene expression within multiple brain regions, I was able to focus on what was going on within these three separate brain regions.” Also, she added, the similarity in gene expression patterns in other regions of the brain implied that there were other regulatory elements outside of that 3.5-kb region that were driving the gene expression in those other parts of the brain.
When Donaldson examined how patterns of gene expression differed among the three types of transgenic mice in those three relevant brain regions, she found that the patterns differed in the same direction as the patterns in the three voles—that is, the promiscuous species and the two versions of the monogamous species with the shorter and longer repeats. For example, there were higher levels of gene expression in the dentate gyrus in a prairie vole than in a meadow vole, and that was recapitulated in the mice carrying the prairie vole or the meadow vole versions of the repeat-containing element.
“If we put all of this together,” Donaldson said, “what we’ve essentially learned is that DNA diversity in these regulatory elements leads to species and individual differences in the expression of this gene, but in a brain region-specific way, such that we have brain regions that are essentially untouched by this manipulation.”
An important aspect of this, Donaldson said, is that repeats of the sort responsible for the differences in behavior mutate at a much faster rate than the rest of the genome. If such repeats are in the right areas of the genome, they can provide an evolutionary mechanism for generating diversity in gene expression, acting as a sort of “tuning knob” in a specific brain region.
One aspect of the work that concerned her, Donaldson said, was that it had been done in mice, not voles. How might the results have been affected by the fact that the repeat-containing elements were put into mice? What effect might the genomic milieu of the mice have had on the regulatory elements from the voles? “One of my goals has been to develop techniques that will allow us to eventually answer this question,” she said.
Her first step in that direction was to develop the first germline transgenic prairie voles. She did this by injecting a lentivirus into embryos, allowing the lentivirus to infect the embryo and place DNA into its genome (Donaldson et al., 2009). While useful, this technique only allows one to add genes, not delete them, and furthermore, the lentiviral constructs become repressed after a few generations.
She has also teamed up with Devanand Manoli to use CRISPR to do germline knockouts in voles. It is a powerful technique, Donaldson said, but it is also challenging for a number of reasons. In addition, both CRISPR and the lentivirus technique require a great deal of optimization for each species. For instance, whereas researchers have learned how to cause mice to superovulate, the ability to do so in voles is still extremely limited, which means that it is necessary to use a tremendous number of the animals to get just one knockout vole and is labor intensive. “We are working [on] moving from prairie voles to meadow voles,” she added, “so we’ll soon get a sense of how much of a challenge there is even moving these techniques across closely related species.”
There is also what Donaldson called a “conceptual limitation” of the CRISPR approach. She explained that when a gene is knocked out, it gets rid of the expression of that gene throughout the life course of the organism and across the entire organism. This is particularly important in the case of V1aR, which is encoded by a multi-faceted gene that mediates many different aspects of monogamous behavior but does so via its activity in different brain regions. For example, the vasopressin receptor is known from various studies to act within the pallidum to influence not only mating behavior but also parental care—in essence, whether fathers take care of their offspring. In the hypothalamus, however, it acts to influence mate guarding, a behavioral characteristic that reinforces the bond with the mate, and in the retrosplenial cortex it may be involved in use of space and, therefore, male fidelity in the species.
“The question then,” Donaldson said, “is how we begin to parse out the pleiotropic effects of these genes, and that is where my lab is currently making a lot of effort to develop ways to go in and selectively manipulate gene expression in adult animals in specific brain regions.” She described one success that her lab has had in this effort, which involved injecting short-hairpin RNAs (shRNAs) in particular areas of the brain to decrease the expression of vasopressin receptors. And at present, she said, her lab is working with a slightly modified version of that approach using CRISPR to either inhibit or activate the transcription of these genes within specific brain regions.
She closed with a look to the future. First, she said, her lab is still trying to identify the genetic elements that contribute to species and individual differences in social behavior. To do this, it would be helpful to have high-quality genomes. Having a reference genome is not enough, she said. “There’s a huge amount of SNP [single nucleotide polymorphism] variation within these voles, so even something as simple as developing a guide RNA can be incredibly sensitive if you have a SNP within the binding of your guide RNA region.” She also reemphasized the need to be able to manipulate the genetic elements in the appropriate genomic or organismal context.
Second, she said, she needs strategies for parsing pleiotropy. In particular, she needs to be able to manipulate gene expression in a regional- and temporal-specific manner.
Finally, she said, “as a neuroscientist I feel the need to point out that genes don’t encode behavior. Genes encode mRNAs that make proteins that alter your neuronal function, and ultimately, you get differences in behavior.” So, it is important to think in terms of what might be called “neural intermediate phenotypes” and to look for general principles related to how genes can affect neurons.
While much can be learned by studying the functional genomics of one species or a group of closely related species, Dominique Bergmann of Stanford University said, there are also benefits to using what is learned in one species when working on another. In her presentation she described how she studies the genetics of diverse plants to uncover rules of developmental fate, pattern, and resilience.
Her research focus is the development of plants, Bergmann said, and she seeks to answer a series of questions first in one model system and then in others:
- How do patterns emerge?
- How are specialized cells made?
- How do environmental inputs modify development?
- Is it possible to make plants that survive or mitigate their biology in response to changing climates?
The last question is a particularly practical one, she said, because people depend on plants for survival, and plants will inevitably be changed as the climate changes.
Like many of the other speakers in the workshop, Bergmann said, she is interested in how one moves from genotype to phenotype, but the usual picture of a single arrow pointing from a genotype at the level of DNA information to a phenotype of patterned cells and tissues should be thought of as having many different and non-linear steps.
To deal with that complexity, she said, researchers typically work with models, both model organisms and “models that extract features that are common to many developmental decisions, but do it in a very simple way.” Her lab works with the model organism Arabidopsis, and focuses on a specialized cell type, the stomatal guard cell, a pair of which form a valve, which allows carbon dioxide into the plant, and water vapor and oxygen
out. Stoma are arranged in patterns that are environmentally determined in some ways, but there are also some hardwired patterns.
“Based on lots of peoples’ work over the last two decades,” she said, “we have a pretty good molecular picture of what’s required to make and to pattern these cells in Arabidopsis.” In a young leaf the cells are all essentially equivalent, but then under the direction of some key regulators, the leaf cells proliferate and then differentiate. Much of the patterning of cells on the leaf is decided by cells communicating with one another with secreted peptides and cell surface receptors, but there is also environmental and systemic information that is integrated into decisions on the development of the various cells. She noted the importance of being able to capture live how different parts interact in a dynamic system. That has been possible in Arabidopsis because of all of the tools and the deep knowledge available for that one system.
The question, though, is what happens when one moves away from Arabidopsis, Bergmann said, “because, frankly, no one eats Arabidopsis, and it doesn’t contribute a whole lot to the global climate cycles.” In the case of broad-leaf crop plants that might share much of their genome with Arabidopsis, she said, it might be simple to use the knowledge from Arabidopsis about what genes might be core features of environmental resilience, and then transfer them and their regulatory systems into the crop plant. However, in plants that are distantly related, including those with the highest commercial value, there may not be such a direct path.
Bergmann’s focus during the presentation was on this second issue. Moving from Arabidopsis, Bergmann’s group decided to look at the cereal crops because they are extremely important both economically and ecologically, and they also operate differently than Arabidopsis at the level of development. The cereals are a group within the grasses, and grass stomata are patterned differently on the leaves, for example. Another important consideration, she said, was that a great deal of work had already been done on grasses and a lot of tools were already in place.
In particular, Brachypodium distachyon, whose common name is “purple false brome,” has become an important model for cereals. Its genome was published about 10 years ago, and the genome includes many homo-logs of genes that Bergmann had studied in Arabidopsis. But there were also key phenotypic differences between Brachypodium and Arabidopsis. In particular, the stomata on Brachypodium, like all grasses, were composed of four cells rather than two, as in Arabidopsis. The two extra cells line up on either side of the guard cells and are called “subsidiary cells.” The grasses are more resilient during drought, and one reason appears to be the performance of their stomata. And this is something that studies in Arabidopsis can say nothing about, she commented.
Bergmann’s team set out to understand this novel feature. The Brachypodium genome had already been worked out, but more tool optimization was still needed. They were able to optimize transformation conditions and create constructs with engineered mutations that allowed them to learn a lot about the system. However, to answer questions about novelty, they turned to genetics—in particular, forward genetics—to identify the factors required for innovations.
“We really were interested in this novelty, these great subsidiary cells that are so wonderful for making these functional complexes,” Bergmann said, so they screened many plants with a microscope and finally found one that did not have them. “Great, we have a mutant,” she said. “How do we find that gene?” As it turned out, they were able to find it fairly quickly. Multiple accessions had been sequenced, and these enabled rapid mapping. The mutation turned out to be a small deletion in a transcription factor.
Bergmann’s group was surprised to find that the transcription factor was similar to one used in Arabidopsis to make a precursor to the stomatal guard cells. However, in Brachypodium it appears to switch its function to make the subsidiary cells. There was a change in the relationship between this one gene and the phenotype—that is, which type of cell it creates.
Ultimately, they discovered what was underlying that change in the relationship between the genotype and phenotype. It turns out that while the protein was being expressed in some cells, it was not expressed in those cells where the team believed it was required to carry out the function. The explanation, Bergmann said, is that in plants transcription factors can move from one cell to another. So the innovation—the way that the gene has changed its activity—was not by being expressed in a different place because of changes in the promoter region or by having a different biochemical function, but by changing its ability to move from one cell to another.
Summing up, Bergmann said, “So what we found here by moving from one species to another was a rewiring,” and the discovery was enabled by the fact that previous work had been done on the new organism they were interested in. “We had populations that we could screen, we had fairly good genomes, and then we created a number of tools because the genotype-to-phenotype connection needed those intermediate steps filled in.” Their success was due to a number of factors, including long-term funding by the U.S. Department of Agriculture and the U.S. Department of Energy for the sequencing of many Brachypodium accessions, the building of molecular tools for transformation, and the creation of mutant libraries. Furthermore, she noted, there is an active and open Brachypodium community “that is driven to create things and share them.”
In closing, Bergmann asked which tools are needed to find and understand innovations. In the case she described, what they actually identified was a rewiring. “Plants reused a gene that we knew a fair bit about,” she
said. But her goal is to look more broadly at diversity in animals and plants and see what can be learned. “We’re looking for innovation, we’re looking for novelty,” she said. “How do we actually find that?”
Since 2012, she said, her team has identified eight mutations in their Brachypodium screen. Four of them were known genes with conserved functions in Arabidopsis, and two were known genes that had been rewired to have different functions, including the one she described, but two were novel genes. The novel genes have not been published, she said, and they will not be for a while. Understanding real novelty takes time.
To find and understand developmental organization, she concluded, “Choose the question, and then choose the organism.” Once the organism is chosen, genome and transcriptome sequences and gene editing capabilities are prerequisites. There is an opportunity to revisit classical systems that may have been abandoned for a while but have already provided extensive descriptive data. Much of the foundational work to build new systems can be tedious.
Furthermore, there is a question as to whether the lengthy amount of time it takes to develop a new system is compatible with various timelines, such as grant cycles, Ph.D. program times, or postdoctoral funding. Finally, she asked, “How can we make going into these leaps attractive to young PIs [principal investigators]?” They represent the future.
Unlike the first four speakers in the session who had focused on one or a few particular organisms, the final speaker, Steven Henikoff of the Fred Hutchinson Cancer Research Center, described some epigenomics tools his lab has developed that can be applied to a wide variety of organisms. In particular, he spoke about methods to perform low-cost, high-resolution chromatin profiling.
Most people who are interested in the genotype-to-phenotype connection will want to perform chromatin profiling at some point, Henikoff said, and the most common chromatin profiling approach is ChIP. He explained the ChIP process involves four basic steps: (1) the DNA and associated proteins on the chromatin are cross-linked, (2) the resulting complexes are broken up into pieces about 500 base pairs long, (3) an antibody is added to precipitate out the protein of interest, and (4) the DNA is purified and the fragments are sequenced. “This part of the process really hasn’t changed for the last 35 years,” he said, but over the past decade or so, ChIP-seq has grown rapidly in popularity, and it has served as the basis for several large genome-scale projects, including the Encyclopedia of DNA Elements (ENCODE) project.
Because the ENCODE project relied so heavily on ChIP-seq, the group running the project developed a set of standards for using the technique. One of those standards was that there should be at least 10 million reads per replicate. Even with the cost of sequencing going down dramatically, Henikoff said, it can still be quite unwieldy and expensive to store and manipulate the larger and larger datasets generated with ChIP-seq, and so there is room for an alternative.
As it happens, there are other ways to perform chromatin profiling. In particular, Henikoff’s lab modified a chromatin immunoclevage method to create what they call “CUT&RUN” (cleavage under targets and release using nuclease) (Skene and Henikoff, 2017). He described the technique in this way: Live cells are mixed with Concanavalin A beads, which help the cells stick together (see Figure 3-3). In the next step, the cells are made permeable and an antibody is added to the desired target, such as a transcription factor. The antibody diffuses into the cell and finds its target, and then a fusion protein of protein A complexed to micrococcal nuclease (MNase)
enters the cell, where protein A binds to the antibody. Adding calcium activates the MNase, which cleaves the DNA on both sides of the target, at which point the cleaved DNA is extracted and sequenced. “From live cells to purified DNA, it takes about a day,” Henikoff said.
An important advantage of CUT&RUN is that the backgrounds are very low. The reason for that, he explained, is that unlike ChIP, where one solubilizes the entire cellular contents and then grabs the antibody like pulling a needle out of a haystack, “with CUT&RUN we leave the haystack behind.” The lower background means that fewer reads are needed to get a clear signal. To illustrate, he showed ChIP-seq data from ENCODE that was generated with 56 million reads and comparable data from CUT&RUN with only 7.5 million reads. When the ENCODE data were restricted to just 7.5 million reads, the background noise was much worse and obscured some of the peaks in the data (Skene et al., 2018). That background is what makes ChIP-seq so expensive, he commented. CUT&RUN is also sensitive enough that even with just 100 cells, the resulting data are of reasonable quality, Henikoff said, and it gives base-pair resolution.
Because of the sensitivity of CUT&RUN, Henikoff became interested in using it in the clinic. As a proof of concept, they set up a robot to carry out high-throughput CUT&RUN using patient-derived xenograft tissue (Janssens et al., 2018). They produced about 6 million mapped human fragments per sample for an in-house cost of about $50. This produces the entire epigenome, he said. “It’s very informative data and really doesn’t cost a whole lot.” Since that initial proof of concept, Henikoff set up the robot for a number of his clinical colleagues to do chromatin profiling, and from there it has spread to become “quite popular” and has been one of the most viewed protocols on Protocols.io, a popular open-source website for laboratory protocols.
Nonethless, Henikoff said, he believes that there is an even better way to do chromatin profiling that was recently developed by a postdoc in his lab and is called CUT&Tag (cleavage under targets and tagmentation) (Kaya-Okur et al., 2019). It is similar to CUT&RUN but with the transposase Tn5 in place of MNase, and the activation is done with magnesium instead of calcium. A major difference is that the DNA segments acquire tags on their ends so that it is possible to create sequencing libraries from the samples. Again, the process is fast—about 1 day from live cells to sequencing-ready libraries.
The process also has the same advantage as CUT&RUN in terms of low background. As an example, Henikoff showed some data where CUT&Tag identified the same peaks as ChIP-seq but with 8 million reads instead of 27 million. One million probably would have been enough, he said. “The background’s down and we don’t have to sequence that deeply.” The comparison with ATAC-seq was similar, with CUT&Tag identifying
the same peaks as ATAC-seq with 6 million reads instead of 34 million reads or more, depending on the lab that had done the work.
And, like CUT&RUN, CUT&Tag is very efficient. In one test, Henikoff’s lab used data from 6,000 cells (4.6 million reads), 600 cells (4 million reads), and even just 60 cells (2.8 million reads), and in each case the peaks were comparable to those from the ENCODE ChIP-based data that relied on 56 million reads. Taking this pattern to its extreme, Henikoff’s lab has even carried out the technique on single cells. With CUT&Tag, he explained, “everything basically holds together in the cells, so we go from antibody binding all the way through to the tagmentation, and the cells are intact.”
In conclusion, Henikoff offered some closing thoughts. First, he said, ChIP-seq is very wasteful; more than 90 percent of the sequence is noise. Both CUT&RUN and CUT&Tag offer more information with fewer reads. “Getting rid of the background is key,” he said. It also seems that the two methods can be used to perform routine clinical profiling that is cost-effective. Also, the two techniques are highly reproducible, while ChIP-seq is inherently not reproducible because of the use of cross-linking and sonication to break the chromatin complexes into fragments.
Finally, he said, the online protocols allow almost anyone, even undergraduates, to use CUT&Tag. “The cheaper it gets, the easier it gets, the more people are going to do it themselves,” he said, and this “do-it-yourself approach” is the future of chromatin profiling.
Following the presentations in this session, Gary Churchill from The Jackson Laboratory opened the discussion period with a comment about Donaldson’s presentation. “I admire your heroic work to create transgenic mice and transgenic voles,” he said. Researchers at The Jackson Laboratory carry out transgenesis at an industrial scale—people can order customized transgenic mice and get them quickly—but it generally only works in C57 black 6 mice or another equally well studied strain. As soon as one moves beyond these basic strains, even staying with Mus musculus domesticus, everything falls apart. “A lot more work needs to be done to optimize transgenesis in genetically diverse mice, much less reaching out to voles and plants and everything else that we need transgenesis for,” he said, and he asked Donaldson for her thoughts.
Donaldson agreed that bespoke methods are typically required to make germline transgenesis work in diverse species. Thus, she said, her preference is often just to use a workaround. For instance, viral vectors tend to work well between species. But it seems unlikely, she said, that there will
ever be germline transgenesis techniques that are not species specific to some degree.
Sweigart commented that this is also a problem in plants, “where many of the biological aspects of why transformation even works are mysterious to us.” So, it is difficult for researchers in the area to understand genotype-specific reasons for why some things work and others do not. At least there is quite a bit of funding for mouse research, she said. “How do we deal with these sorts of problems in organisms for which there’s never going to be that sort of funding?”
The same is true for bacteria, Dutton added. She and her colleagues have done much of their work in E. coli because of the availability of so many genetic tools that work well in that organism. They have been able to translate those tools easily to bacteria that are closely related to E. coli, but as soon as researchers move away from that species group to other less closely related bacteria, she said, “the implementation of preexisting genetic tools becomes really spotty.”
Bergmann agreed with these speakers and went on to summarize ideas the previous speakers had touched on related to research that does not involve well-studied “model organisms.” Although tools such as CRISPR are, in theory, available to all researchers in functional genomics, most researchers must still figure out how to adapt those tools to their own organisms, and this adaptation is the biggest bottleneck in the field.
Jason Rasgon of The Pennsylvania State University offered a more positive outlook. His lab has been “reasonably successful in developing new delivery methods working across a wide taxonomic diversity of arthropods, and we’re even moving into non-arthropods, mollusks, and potentially some vertebrates,” he said. In short, there are some novel ways to take some of these genomic technologies and move them into “non-model species.”
One audience member asked the panelists for thoughts on how to start exploring genes of unknown function. That is a central question, Donaldson responded. She noted that many papers do a quantitative trait locus (QTL) analysis and find the gene they expected to find based on the literature. Frequently, these papers ignore the gene hits with no known function because the researchers do not know how to start experiments with these hits. She proposed a funding model where a researcher could study these genes of unknown function, assuming they had a plan outlining a research plan. She hopes this would prevent researchers from immediately getting dismissed when trying to study a challenging but important area of research, such as gene hits with unknown function.
Bergmann agreed and said that there are two different issues at play. First is “the unknown that we know.” Researchers do have various approaches to identifying the unknown. It just takes time and funding. She suggested that it would be a good idea if every grant asked “What are you
proposing that will not have an answer in 3 years or 5 years but that will advance the field because it is really novel?” The second issue is “Are there other ways that we can explore gene function?” It will be difficult to discover these new ways without allowing researchers to, in a way, go back 50 years to when none of the current tools were available and “everyone had to do these fairly slow, painstaking ways of interrogating gene function.” Bergmann reiterated that researchers will need time and funding to follow such a path.
Another audience member offered a different perspective on the unknown, saying, “even in model organisms, there are tens of thousands of transcripts that are expressed at high levels that also make proteins, and they’re considered either lcnRNAs (long non-coding RNAs) or usually not even considered at all.” Thus, the audience member concluded, even in a model organism there are many unknowns in terms of potential genes.
Sweigart expanded on that saying that what is known and unknown also influences the phenotypes that researchers study because it is much easier to study the phenotypes that have well-known candidate genes. On the other hand, much of the interesting unknown biology is much more complex and thus not as likely to be the subject of a research project.
Finally, an audience member asked the panelists about training. As a young principal investigator, he said, he is having difficulty convincing the graduate students he hires, who have molecular and cellular backgrounds, to tackle the difficult computational challenges the work requires. What can be done?
Dutton responded that her department has just redone its graduate curriculum so that the core classes include biostatistics and bioinformatics classes, and now some of the classes that were previously part of the core curriculum that were focused on molecular and cellular biology are electives. “So, we’ve had to do a lot of hiring in those areas to teach those on a wider scale,” she said. Before that, her students were mainly self-taught through online tutorials. But there is indeed a gap between what students have traditionally been taught and what they need to know in the current functional genomics environment. Departments and universities are only now starting to address this gap, not just at the graduate level but at the undergraduate level as well. This is a shift that needs to happen, she said. (Further discussion on functional genomics education is highlighted in Chapter 9.)
This page intentionally left blank.