The workshop’s final, two-part session was devoted to looking at functional genomics going forward—where it is going, what obstacles might be encountered, and what might be done to help ensure the field’s ability to advance? The first part was an interactive session in which the participants divided into breakout groups to discuss a series of questions, after which one person from each group reported on those discussions. The second part was a “town hall” in which everyone had the chance to address the points heard during the meeting or bring up issues that had not been discussed.
Moderator Emma Farley of the University of California, San Diego, opened the first part of the session by describing its format and providing a list of issues and questions for each breakout group to address:
- List 5 to 10 research and knowledge goals for the field of functional genomics. Categorize each as a short-, medium-, or long-term goal.
- What obstacles are preventing these research and knowledge goals from being realized?
- What specific pathways or strategies could be used to overcome these obstacles? Are there strategies that could be used to overcome more than one of the obstacles listed?
The first group reporter, Gene Robinson of the University of Illinois at Urbana-Champaign, said that under the research and knowledge goals
should be “networks, networks, and more networks.” As a tool, networks can be used to study gene–gene interactions, protein–protein interactions, transcriptional regulatory interactions, chromatin, etc. One goal that came up during the workshop, noted Robinson, is to develop ways to integrate networks and be able to use them at different scales and across different species to be able to derive predictive information from those networks that could guide future work.
A second goal would be to develop tools to manipulate networks in graded ways to be able to exert minor perturbations to nodes, both for validation purposes and for prediction.
A third goal is to be able to distinguish the different forms of “functional.” The word “functional” has different meanings, Robinson noted. Over the course of the workshop, “functional” was used mostly in the context of validation, but a deeper goal of functional genomics is to understand mechanism. So “functional” in the context of validation-type analyses should be differentiated from “functional” in the context of mechanistic analysis.
A fourth goal is “to understand how networks co-evolve within species and with lineage specificity.”
A fifth would be to grapple with the issue of the right unit of analysis for different sorts of research. “Is it the gene, or is it the gene variant? Are we focused at the right level?”
Beyond that, researchers should develop standardized methods for providing phenotypic information, Robinson said. Databases are offering an increasing number of standardized ways to deposit genetic and genomic information and then to access that information. However, there are not yet comparable ways of integrating phenotypic information and making it accessible.
Finally, Robinson said, sequencing the genomes of more species would be a useful foundational effort. At present only 0.2 percent of the genomes of eukaryotic species have been sequenced.
Turning to obstacles, Robinson first mentioned funding for genomic infrastructure. An example of the value of such spending is the success of the National Science Foundation’s (NSF’s) Plant Genome Research Program, which has transformed that community.
A second obstacle is the fact that model genetic systems, which were previously well funded at the National Institutes of Health (NIH), are systematically being pushed out, specifically the historic model genetic systems such as Arabidopsis, Escherichia coli, Drosophila, and Caenorhabditis elegans. At NSF, these model systems are considered the province of NIH, and so proposed research on these species is less welcome at NSF. This leaves researchers in those model organism communities feeling that they do not have a home.
Concerning strategies to overcome obstacles, the group pointed to successful programs in integrating math and biology, which could be applied specifically to functional genomics.
Philip Benfey of Duke University reported for the second group. Viewing the exercise as a form of strategic planning, the group took the approach of splitting their goals into specific objectives, so that one can focus on individual objectives instead of mixing them together.
In the group, Benfey said, there was a strong consensus that functional genomics is a set of tools as opposed to an end in itself, and so the group’s objectives were biological in the form of the following questions: How do genes work? How do regulatory networks function? How does multicellularity function? What makes organisms different among themselves and within a species? How do organisms adapt and speciate?
The group offered specific goals for these high-level objectives. One is the need for understanding regulatory networks, as Robinson had already emphasized. Another is the ability to perturb at scale and observe how the network, not just the individual genes, responds. A third is to be able to define all the components of a network, and not just its component genes. Finally, the group offered a number of specific goals concerning how genes work; how they are organized, both in two and three dimensions; what they do; and when and where they are expressed.
Among the obstacles the group listed, Benfey said, was “the impossibility of doing a totally comprehensive analysis of anything,” which arises from the fact that even reasonably simple systems have far too many possible combinations to completely analyze every one. There were various suggestions for how to deal with that issue. One was to take a deep dive into an area that is of interest. Otherwise, one could try, as Aviv Regev discussed in her keynote address, doing a targeted random sampling that might provide 80 to 90 percent understanding by looking at only 20 percent of the relevant parts. Another approach would be to bring together as much existing knowledge as possible in one place so that it can be easily queried.
Finally, Benfey said, the group spent some time on what is more of a philosophical issue. “As humans, we need linear narratives. We think in straight lines.” But functional genomics is anything but linear. It pulls together many different factors. So how should one think about them? How should they be written about or presented? How does one get funding for them? “The problem comes back not to the quality or type of data, but to our inability to think of them except in linear narratives.”
Switching to the topic of challenges, Benfey noted the fact that functional genomics is all about generating large datasets. What is the best way to ensure that the people who generate them get credit for their work? How should the people producing those datasets be trained?
Some of the answers might be found, Benfey said, by looking to the success of genome sequencing. When researchers started sequencing a lot of genomes three decades ago, it was done in several different ways. In the case of E. coli, the sequencing was distributed to a large number of different labs, taking them 15 years to complete. In other cases the sequencing was done in centralized locations. Examining the successes and failures of these efforts could be useful to understand how to train the next generation of functional genomics researchers.
Reporting for the third group, Lauren O’Connell of Stanford University commented that her group had covered much of the same ground as the previous ones. One challenge her group identified was that multiple timescales and feedback loops are important in biological systems, but research measurements tend to be static and linear. A second concern was the lack of focus on phenotypes. To date, genotypes have received more attention because they are easier to measure, but a change is due. “We need ways to measure phenotypic diversity that is quantitative so that we can understand how environmental inputs shape the phenotype.” What makes some species more plastic than others? What are the genetic elements that govern plasticity? Her group thought these were major questions, O’Connell said.
Other challenges that O’Connell’s group discussed included the lack of generalizable computational and genomics tools, the lack of annotation of existing genomes, and whether to embrace diversity in humans and animals. So far, she said, genomics researchers do not know a lot about the genomes that they have, so would more genomes help? The answer the group came up with was “Yes, but we need to be able to annotate them.” Just having a sequence is not enough, she said. Annotation “is quite hard and a big bottleneck in our community.”
Richard Dixon reported for the fourth group. The group came up with three goals. The first was more multi-omics work and, specifically, proteomics. “We thought that the proteins were more predictive, and we want to get more goals based around that.” A second goal was adding an ecological context to provide more details for the genotype-by-environment interaction. Finally, more diverse models could be useful, moving beyond those such as the typical inbred mouse models.
Concerning obstacles, the group pointed toward the usefulness of interoperability of technologies across organisms and the fact that proteomics technology is not yet sufficiently mature or robust. Also, they reemphasized the importance of education being both wide and deep.
As for specific strategies to reach these goals and overcome these obstacles, the group identified better undergraduate training and, specifically, cross-training in various fields, not just in different areas of biology, but in math and computational methods as well.
For the final discussion, moderator Gene Robinson emphasized that anyone who wished to make sure that a particular point of view was included in the proceedings should take this opportunity to make a comment.
The first commenter was Eve Wurtele from Iowa State University who made a case for the importance of what she called the “dark transcriptome.” In any organism there are genes that are unique to that organism. These are called species-specific genes or orphan genes. In Arabidopsis thaliana, for example, some of the genes that have not been seen in any other species are very young genes and protein coding genes.
The functions of some of these species-specific genes involve interacting with external organisms. Examples include attractants and the toxins of jellyfish. Another function involves interacting with an organism’s own existing networks. An example of an orphan gene that does both is the qqs gene of Arabidopsis. Its product both mediates predator resistance and increases the plant’s protein content. Because it interacts with internal networks, it can be taken from Arabidopsis and transferred into corn or soybean, where it also produces pathogen resistance and an increase in protein.
While some orphan genes are known to have functions, Wurtele said, scientists “don’t know how many of them do anything because people don’t usually study them.” The problem is that when RNA-seq data and proteomics data are analyzed, the analysis generally only includes genes or proteins that align with known genes or proteins. The result is that many orphan genes, particularly those that are younger, do not get annotated. Of 1,000 or so known orphan genes in Arabidopsis, for example, only about 10 percent have been annotated, Wurtele added.
Furthermore, evidence indicates that many of these orphan genes are transcribed and translated. “We do know that about 80 percent of these, at least in yeast and some other organisms, make proteins,” Wurtele said. The bottom line, she said, is that there is a dark transcriptome that includes not only many orphan protein-coding genes but also non-coding genes as well, “and all this is probably intimately involved in functional genomics.”
The next speaker touched on several topics including the observation that as sample sizes become increasingly large, it is inevitable that more and more genes will be found to have a connection with a particular phenotype. On the other hand, “we know … that not every gene is involved in every phenotype.” Furthermore, of the many genes that might play some role in a particular phenotype, some will inevitably be more important than others. The question, then, is how we find those genes and gene networks that are more important.
Scott Jackson of Bayer Crop Science asked Donal Manahan from NSF if there is an interagency working group in genomics or functional genomics
similar to one that had existed for plant genomes. Manahan answered that there is such a group involving NIH, the U.S. Department of Agriculture, and NSF. Its first big meeting took place in August 2019, with many of the same people at the workshop in attendance. “It isn’t a formal interagency group,” he said, “but there’s certainly been a great deal of informal conversation over the last 6 months that we want to continue.”
Gary Churchill of The Jackson Laboratory offered a comment about the importance of variation in functional genomics. Variation underlies much of functional genomics, he said. “It’s the father of evolution. It’s what makes us all unique.” Variation makes possible many of the important approaches in functional genomics, and it is certainly important to understand how natural variability is distributed throughout populations. “But,” he said, “if we’re interested in function, natural genetic variation may not be ideally distributed in natural populations.” And that is why it is important to have constructs like the Drosophila melanogaster Genetic Reference Panel and various other panels such as those involving mice, corn, and Arabidopsis “where we can bring together genetic variation and use it as a tool.”
Next, Emma Farley of the University of California, San Diego, made a comment about what people mean by “functional genomics.” “I’m wondering if maybe I don’t understand what functional genomics is,” she said. “Two people have said to me that functional genomics is a set of tools and functional genomics is generating large datasets. That’s not how I think about functional genomics.” To her, she said, functional genomics is about trying to understand how the genome encodes biological function, whether it is how development is encoded, or how changes in the genome lead to evolutionary adaptations, or how the genome interacts with the environment. The tools were developed to help answer these questions. “I think that’s an important distinction.”
Next, she addressed Robinson’s comment from the first part of the session that one should keep in mind the distinction between functionality in relation to validation and functionality in regard to mechanism. “I would disagree that they’re different functions,” she said, “because what we’re trying to do is understand this really complex problem and we can use sparse sampling to find structure, and that’s looking for these patterns, but we need to equally validate that the patterns we’re seeing are accurate and that these patterns and biological structure actually are signatures of mechanism.” Thus, the two things cannot really be separated. “It’s really important to have validation of the true functional data so that you can actually get at the mechanism.”
Donal Manahan of NSF took the microphone to make some closing comments. First, he described the attendees as a “fearless bunch” because of their willingness to work in a field where the magnitude of the data and of the possibilities are so large. “Let’s start trying to make the case that one reason the science of biology is so dominant in the 21st century is that we’re starting to get our heads around scales of numbers that were just completely unheard of 10, 20, 30 years ago,” he said. Similarly, the group has no fear in moving from one area to the next—from molecular biology to cell biology, to organismal biology, to considerations of diversity in model systems, all put in an evolutionary context as the need arises. Furthermore, he said, “you had no fear at all of considering the massive infrastructure needs that are going to be needed to address this. You were honest, I felt, in pointing out the sometimes inadequacy of the way our current university systems and others are structured to be able to take on training for the next generation.”
One of the key lessons he said he would take from the workshop was the need for a new type of training. “Most of us in this room are professors or have had teaching experience, so we have a sense of how to train, but I think we recognize that the training of the 20th century isn’t really going to work for the biology of the 21st century when we take on these ginormous numbers as we think about the future of life sciences.”
Robinson, the chair of the workshop’s planning committee, closed out the session and the workshop with some brief comments. “I began a couple of days ago by saying that I thought we were at an interesting point in time with respect to genomics and the need to think about what the next steps are,” he said. “I think our conversations the last couple of days have really borne that out. We’ve heard some amazing science and we’ve also heard some really passionate expressions of frustration and need for how to go to the next level.”
Genomics is a very young science, he noted. It is only 40 years old. Furthermore, it is different from many other types of science. It is more of an enabling science that spans all of the classic subdisciplines in biology. “So we are pioneers,” he said. “We’re making it up here—how to take the initial discoveries and turn them into deep understandings of biology.” So, hopefully the conversations throughout the workshop have planted ideas in the minds of funders, especially NSF, about how to help the field of functional genomics go to the next level.
This page intentionally left blank.