effort has been devoted to constructing a flow chart for the various forms of molecular information. One of the fruits of this labor has been a sense that many of the main information highways are known—for example, the central dogma that describes the irreversible flow of information from DNA to RNA to proteins. The fact that reality is more complex and that surprises continually emerge, such as the recent appreciation for the variety of roles played by RNA, shows that this enormously successful field still has a long road ahead before it will be “solved” in any comprehensive sense.
One of the most successful achievements of molecular biology has been a nearly complete catalogue of underlying DNA codes for a diverse set of information-bearing macromolecules: The genomes of humans and many microbes, plants, and other animals are known with considerable completeness and accuracy. There has thus emerged the feeling that biological explanation is no longer primarily to be sought in finding new molecular actors but in understanding their individual functions and the patterns of organization and interaction that collectively determine the functions of the cell.
Molecular biology has always relied heavily on mathematics. From the analysis of sequences to techniques for determining the three-dimensional structures of molecules to studies of the dynamics of entities ranging from individual molecules up to entire networks, mathematical techniques and computational algorithms are critical.
Because of the rapid advances in the technology for DNA sequencing, DNA sequences are now easily obtained, and protein sequences can be inferred with reasonably high accuracy and completeness. Thus, we now have an abundance of those sequences for analysis. DNA, RNA, and proteins are all linear polymers, or strings, made from a small alphabet of residues, 4 for DNA and RNA and 20 for proteins. The specific properties of any molecule, or the functions it serves, are determined by its sequence and its structure (in the appropriate context), though of course the structure is a result of the sequence and the molecule’s environment. One of the mathematical challenges of biology, then, is determining the mapping from sequence space to function space. The set of linear sequences over a small alphabet leads quite naturally to the concept of sequence space and the universe of all possible sequences. The concept of function space is a little harder to imagine, but certainly we could categorize all of the functions we know and consider them to be a partial set of all possible functions.
For proteins, and for some RNAs, function is critically dependent on