manipulate sources of information transparently across the country. However, it is still a preliminary model and will have to be expanded considerably to demonstrate a functional community information infrastructure. For example, the sets of knowledge and analysis must be expanded, a true distributed system across platforms and networks must be developed, and a case-hardened implementation must be evolved before the technology is ready to support standard archives. But with sufficient resources, complete collaboratories for sharing, comparing, and analyzing data will be built for genome research since the need is there and the technology is available. The pattern discovery enabled by "dry-lab" analysis environments promises another significant revolution for the support of research in molecular biology.



Replication is the process in which existing DNA is used as a template for the synthesis of new DNA strands. Mutagenesis is the process by which DNA is mutated or modified. Transcription is the synthesis of RNA, a long-chain nucleic acid consisting of repeating nucleotide units, from a sequence of DNA. Translation is the process in which the genetic code directs the synthesis of proteins from amino acids.


Two examples of gene sequencing technology are Polymerase Chain Reaction (PCR) and gel electrophoresis. PCR is a method for increasing the number of copies of a specific DNA fragment to make the fragment easier to detect and identify. Gel electrophoresis is a method of separating large molecules in an electric field, allowing DNA fragments differing by single bases to be readily separated. Combined with methods such as Sanger's dideoxynucleotide chain termination procedure, gel electrophoresis can produce ladders of DNA molecules from which DNA sequences can be determined.


"A primary goal of the Human Genome Project is to make a series of descriptive diagrams—maps—of each human chromosome at increasingly finer resolutions. Mapping involves (1) dividing the chromosomes into smaller fragments that can be propagated and characterized and (2) ordering (mapping) them to correspond to their respective locations on the chromosomes. After mapping is completed, the next step is to determine the sequence of base pairs of the ordered DNA fragments. A genome map describes the order of genes or other markers and the spacing between them on each chromosome. If the full sequence of genes were known, research emphasis could shift to determining gene function." (Cantor and Spengler, 1992, p. 198)


The common stages for publishing data in the electronic domain are similar to the steps in ordinary publishing. The raw data are recorded in laboratory notebooks that are kept private, or kept on archival disk if generated directly by a sequencing machine. A processed form of these data, such as a map location or a sequence, is submitted for inclusion into a database. This is edited for publication by a central editor; typically a single curator chooses what data in what form will be included. Finally, the edited database is distributed for use by other biologists.

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement