CHAPTER FOUR
Functional Exploitation of Genome Sequences

FUNCTIONAL GENOMICS

Because the genome sequence of Arabidopsis thaliana is complete and is the most annotated plant genome sequence, this plant continues to serve as the model for determination of plant gene function. Research on the selected reference species—and, in fact, all other plant genomes—will be conducted with an awareness of this resource. Because it is so easy to use Arabidopsis experimentally for functional genomics, it should be used for applications of whole-genome, high-throughput technologies that will establish baseline knowledge and toolkits applicable to all plants. The Arabidopsis 2010 Functional Genomics Program seeks to associate every known gene in Arabidopsis with a protein or non-protein product so that it can be known where in the cell the product is produced, what biochemical pathway it is part of, and what possible function it has in the life of the organism (NSF 1999). Just as the finished Arabidopsis genome is greatly facilitating annotation of the rice genome, so will further use of Arabidopsis greatly simplify the functional-genomics challenges presented by the other reference species—and in fact by all of plant biology. That will greatly enhance the effectiveness of all the other plant-genomics projects and substantially lower overall cost. The committee therefore advocates that additional resources be dedicated, either as increased funding for the Arabidopsis



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 31
The National Plant Genome Initiative: Objectives for 2003–2008 CHAPTER FOUR Functional Exploitation of Genome Sequences FUNCTIONAL GENOMICS Because the genome sequence of Arabidopsis thaliana is complete and is the most annotated plant genome sequence, this plant continues to serve as the model for determination of plant gene function. Research on the selected reference species—and, in fact, all other plant genomes—will be conducted with an awareness of this resource. Because it is so easy to use Arabidopsis experimentally for functional genomics, it should be used for applications of whole-genome, high-throughput technologies that will establish baseline knowledge and toolkits applicable to all plants. The Arabidopsis 2010 Functional Genomics Program seeks to associate every known gene in Arabidopsis with a protein or non-protein product so that it can be known where in the cell the product is produced, what biochemical pathway it is part of, and what possible function it has in the life of the organism (NSF 1999). Just as the finished Arabidopsis genome is greatly facilitating annotation of the rice genome, so will further use of Arabidopsis greatly simplify the functional-genomics challenges presented by the other reference species—and in fact by all of plant biology. That will greatly enhance the effectiveness of all the other plant-genomics projects and substantially lower overall cost. The committee therefore advocates that additional resources be dedicated, either as increased funding for the Arabidopsis

OCR for page 31
The National Plant Genome Initiative: Objectives for 2003–2008 2010 Program or from the NPGI, to accelerate the 2010 Program goals that generate new technology platforms or plant-kingdom-wide reference toolkits aimed at similar goals in the other reference species. However, it is precisely because Arabidopsis does not do many things of importance in plant biology that we envision the development of functional-genomics toolkits in the reference species. This pertains most obviously to those genes not present in Arabidopsis or that are highly diverged from the closest Arabidopsis homologue. Nevertheless, it is also vital to develop testable hypotheses about gene function among closely related species. Conserved gene function is the key to construction of valid comparative maps and to manipulation of germplasm via breeding (introgression of traits) but can be difficult to assign by sequence alone. This is particularly true when minor amino acid changes can lead to altered function, as in the enzymes of secondary metabolism and transcriptional regulators. Hence, it is vital to develop large collections of sequence-tagged mutants, comprehensive large-insert libraries and physical maps of a variety of important species radiating in an evolutionary sense, from the references. Conserved function can be hypothesized on the basis of synteny (genes flanked by the same genes in two species may be related by descent). This information will drive testable hypotheses about gene function in other organisms. One can test those hypotheses readily by accessing mutant lines in Arabidopsis or the reference species from public stock centers. EXPANDING THE FUNCTIONAL-GENOMICS TOOLKIT Thus far, the plant-biology communities have been technology users, not creators. We endorse expenditure of funds for technology development and infrastructure that address critical questions specific to plant genomics. For example, the lack of high-throughput, robust transformation systems in many plant species and the lack of gene-replacement techniques are impediments to rapid advancement. Equally important, and equally elusive, is the development of cell cultures that maintain a differentiated state.

OCR for page 31
The National Plant Genome Initiative: Objectives for 2003–2008 To achieve economies of scale, it is important to place a high priority on reaching the genome-sequencing goals for the reference species before large-scale investment in some functional-genomics tools. However, other functional-genomics tools require little or no genomic sequence. For example, forward genetics and characterization of insertion mutants or chemically induced mutants can be accomplished in the absence of genomic sequence. A very deep and robustly annotated unigene set of ESTs can be used to make informative microarrays. However, other functional-genomics tools, such as protein chips and high-throughput proteomics require substantial cDNA or genome sequence before they can be appropriately designed and deployed in a cost-effective manner. What should be avoided are costly forays into functional genomics technologies and projects that yield partial or ambiguous results, due to incomplete sequence information, and that will need to be repeated when the full genome sequence becomes available. Our specific recommendations for tool development in the model and reference species over the 2003–2008 timeframe are based on having complete or nearly complete genome sequences and large EST collections before the beginning of large-scale investment in functional genomics. In the short term, that means that such investment may be limited to Arabidopsis, Oryza, Chlamydomonas, and Populus. Development of some functional-genomics tools in the reference species can begin (and some have) based on, for example, deep EST projects ongoing in NPGI. Other tools will require staged development as genome or EST sequences (full unigene sets) become available. For example, high-throughput proteomics as used to identify proteins in complex mixtures is only effective when a sequenced and annotated genome is available, and thus is limited currently to only Arabidopsis and rice. The technologies might come on line for each of the other proposed reference species at different times, depending on the progress of sequencing and gene annotation. Alternatively, it might not be necessary to develop each technology for each species if the biologic question is best addressed with functional-genomics tools in the model or reference species. Thus, it is critical to scientifically justify investments in functional-genomics tools.

OCR for page 31
The National Plant Genome Initiative: Objectives for 2003–2008 It is important to distinguish between pilot projects in functional genomics aimed at establishing a technology and full-genome, high-throughput use of mature technology. The distinction might in some instances lead to delay in deployment of a given technology until the sequence is available. A strong case can be made that existing infrastructures in the yeast, Caenorhaliditis elegans, and Arabidopsis communities should be expanded to make these tools available efficiently to the crop-plant communities. Eventual efforts (10-year goals for all the reference species) should encompass the following: Development of the essential genetic toolkits. These will include comprehensive sets of sequence-indexed mutants, accessible via database search and immediately available as seed stock; robust polymerase chain reaction or chip-based mapping tools; and robust conditional expression systems for sensitized and saturating genetic screens for rare alleles. High-throughput methods for predicting and experimentally validating gene models. Validated species-specific gene models enable accurate identification of genes from genomic sequence and cross-genome comparisons. Validated models also help to identify conserved cis-acting elements. Furthermore, full-length cDNA sequencing of diverse mRNA populations enables the eventual construction of high-resolution whole-genome arrays to use in gene-expression studies, and the materials to generate protein chips. Technologies for measuring gene expression. Robust, high-density arrays or chips hybridized with mRNA populations from a variety of organs and developmental stages will generate a genomewide database that contains snapshots of all the transcriptional changes during a plant’s growth (the transcriptome). We suggest further development of rapid and inexpensive ways to assess cell-specific gene expression in multiple species, preferably at the single-cell level. Spatial and temporal expression of genes at several stages of development requires high resolution, high-throughput in situ hybridization methods; single-cell mRNA population analysis; and preparation of specialized tissues and cell types.

OCR for page 31
The National Plant Genome Initiative: Objectives for 2003–2008 Technologies for profiling protein dynamics. It is critical to define the temporal and spatial regulation of protein synthesis and destruction throughout a plant’s life cycle. Technologies are needed that are more comprehensive (that display more proteins with greater dynamic range and better quantification) and that detect and measure the factors that regulate these events in plants. It is important to know both cell type and subcellular localization of all proteins. To this end, subcellular fractionation methods that are robust and clean must be developed. Technologies for building protein networks. Defining genome-scale protein-protein interaction networks—including spatial, temporal, and quantitative measurements—will require development of a variety of tools whose infrastructural basis has been set in yeast and C. elegans research. These include simple purification of protein complexes with affinity tags or immunoaffinity isolations and mass spectrometry for identification of the components; protein arrays; and high-throughput protein-protein interaction screens. We will also require new methods to detect dynamic interactions in vivo that are not ready for general use in any species. Biochemical genomics. Many of the above aims are part of a global approach to the biochemical activities and function of each gene product. Pharmacologic approaches, such as identification of small-molecule inhibitors or activators of gene function, have not been a traditional strength of plant biology. The committee endorses development of platforms to define small-molecule ligands for known proteins, with the ultimate goal of defining an inhibitor for every protein function. Also, small-molecule substrates or inhibitors tethered to affinity probes should be used to measure and identify enzymatic activities in cells. Small-molecule analytic methods continue to evolve rapidly and become more accessible to biologists, although state-of-the-art technologies are expensive to buy and run. These technologies should be made available to plant researchers in their own laboratories, through service facilities, and in multi-investigator funded projects. Systematic manipulation of gene-product expression and activity. The overriding goal of functional genomics is to elucidate the physiologic

OCR for page 31
The National Plant Genome Initiative: Objectives for 2003–2008 function of each gene product by systematic alteration of its concentration in the cell. Rapid, genome-scale systems to silence gene expression are a desirable goal, as is the capacity to target mutations, insertions, and deletions to specific genomic regions and genes via allele replacement. Those techniques will serve as research tools and enable allele replacement in crop improvement. Natural variation as a source of functional information. The development of quantitative-trait loci (QTLs) and linkage-disequilibrium analytic tools in the model and reference species is vital for assigning function to genes. High-throughput mutant and allele detection systems are also vital, as are reliable systems to detect single-base mismatches. They require appropriate mapping populations, particularly as related to exploitation of natural variation in crop and noncrop species in which the focus is on the identification of valuable alleles, not only the elucidation of gene functions at particular loci. The study of multitrait quantitative genetics is an important way to assign function to some genes (such as those whose mutations result in lethal phenotypes) and to discover proteins that are rate-limiting for important traits. Plant biology’s historical exploitation of natural variation as the raw material of breeding provides a wealth of extractable information, as does the availability of wild accessions of many species.