the sequence and location of genes in the genome will greatly facilitate further studies—not only of human genetic variability but also of functional genomics. As noted above, the latter is the comprehensive analysis of gene expression and gene-product function. In the cases of yeast and C. elegans for which the entire genome sequence is already known, projects are under way to assess systematically the function of every gene product, for example, by knocking out yeast genes (causing a loss of function) one at a time and by associating an identified messenger (m) RNA with every ORF.
Some of this functional analysis can go forward even before a genome is sequenced. In the case of humans, the study of ESTs has been an important step of such analysis. mRNAs can be isolated from the organism and converted to complementary (c) DNAs, by reverse transcription (RT) and the polymerase chain reaction (PCR). The cDNAs are then cloned and sequenced to prepare a large and well-defined library of ESTs. The information is entered in a database. These sequences represent genes expressed in the human. For example, more than 1 million human ESTs are now available, representing greater than 50,000 genes. Each EST reflects an mRNA piece, not a full-length sequence. The most comprehensive libraries are prepared from a wide range of tissues and times of development in an effort to include all expressed genes. (Unfortunately for developmental toxicologists, although the initial sources of RNA included placenta, they were underrepresented in the variety of early embryonic tissue.) These sequences are useful in the course of genome sequencing to identify DNA regions that actually encode proteins (only 5% of the human genome sequence is thought to show up in processed mRNA sequences). New methods have become available to obtain full-length cDNAs from transcripts, and these will be more useful than fragments.
A further step of analysis of genome function is the determination of the time, place, and conditions of expression of each gene. Until recently, this analysis has been done one gene at a time. DNA microarray techniques recently have made possible the description of simultaneous changes of thousands of genes as cells and tissues undergo development or various changes of environmental conditions. In the study of toxicant effects on the organism, the analysis sometimes is called “toxicogenomics” or, in the study of the effects of pharmaceuticals, “pharmacogenomics.” DNA microarray approaches are gaining widespread use (see Nuwaysir et al. 1999 for a discussion of its use in toxicology).
The technology is now suitable for simultaneously comparing the amounts of thousands of kinds of mRNA in two tissues or cell samples (e.g., a normal control tissue versus a tissue treated with a teratogenic agent). To do the comparison, thousands of different DNA sequences (e.g., each an oligomer of at least 25 nucleotides) are robotically spotted onto a microslide, and each sequence is placed on a known spot to make a DNA microarray. The DNA adheres to the glass, and each DNA spot is typically 20 micrometers (m) in diameter. For example, microarrays of 6,200 cDNA sequences, representing all the genes of yeast, have been fitted on a single slide or a few slides, and 8,900 cDNA sequences