accumulated over many years on field trials of crops. He found that the tools available were inadequate to the task. One review of his work (Yates and Mather, 1963) describes his contribution:

While at Rothamsted not only did he recast the whole theoretical basis of mathematical statistics, he also developed the modern techniques of the design and analysis of experiments, and was prolific in devising methods to deal with the many and varied problems with which he was confronted by research workers at Rothamsted and elsewhere.

His 1925 book Statistical Methods for Research Workers, which introduced analysis of variance (ANOVA) methods to statistics, was a revolutionary advance. “Fisher had by that time also established a rigorous framework for maximum-likelihood methods, which continue to play a central role in statistical inference,” according to Aldrich (1997). In 1935 Fisher published The Design of Experiments, which was the first book devoted to that subject. Fisher had a significant impact on both biology and mathematical statistics, and his contributions affected the theory and practice of both.

INFERENCE OF GENE FUNCTION BY HOMOLOGY

In the modern world of biology, where sequences of entire genomes are available and the number of such sequences is growing rapidly, one sees the enormous importance of mathematical and computer science methods in advancing biological knowledge. Algorithms are essential at many stages, from finding overlaps of short, noisy sequence strings, to assembling them into complete chromosomes, and to identifying regions that are likely to code for proteins or carry out other genetic functions.

One of the most important tasks is the inference of a protein’s function. There are close to 1 million different known and predicted proteins in living organisms. Two proteins are said to be homologous if their similarity is due to common ancestry—that is, if they were generated from the same gene in the genome of an ancestral species at one time in the evolutionary past and their sequences have been sufficiently conserved since that time so that they are still recognizably similar. The number of proteins that have had their functions determined experimentally is, at most, in the tens of thousands, meaning that the functions of over 90 percent of all the proteins in our databases are inferred from homology. In some cases this is easy to do. For example, if one protein has its function determined experimentally and another protein is discovered with a nearly identical sequence, then it is an easy, and quite reliable, extrapolation to assign the same function to the new protein. But, if the sequences of two proteins differ substantially, it is less clear whether they are really ho-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement