this report—astrophysics, the atmospheric sciences, and chemical separations—evolutionary biology has no definitive reports setting forth its scientific breadth and describing future challenges. Thus in developing its evaluation of the main scientific challenges facing evolutionary biology, the committee drew on workshop reports, primarily those prepared for the National Science Foundation (NSF, 1998, 2005a, 2005b, 2006), and on a document produced by eight scientific societies (Meagher and Futuyma, 2001), as well as on discussions among committee members and invited experts at a small workshop in December 2006, the agenda of which is included in Appendix B.
This chapter identifies the main challenges of evolutionary biology and evaluates the extent to which computational methods are impacting each of them. It describes the primary mathematical models that are currently available or being developed. On this basis, the committee then assesses the potential impact of HECC on the major challenges of evolutionary biology.
The most fundamental question posed by Major Challenge 1 is this: How did life arise? Despite the large body of scientific literature, this question remains unanswered. Addressing it requires knowledge spanning the physical and biological sciences: chemistry, the Earth sciences, astrophysics, and cellular and molecular biology. A key piece of information would be knowing whether life is indigenous to Earth or exists elsewhere in our solar system. More generally, people in the field ask how the assembly of simple organic compounds led to complex macromolecules and then to self-replicating entities, and what role Earth-bound processes played.
Another unknown is how many species there are on Earth. Systematic biologists have discovered and described about 1.7 million living species. How many more exist in Earth’s ecosystems has not been answered satisfactorily, even to within an order of magnitude. Without a better quantification of life’s diversity we have only a very incomplete understanding of the distribution of diversity and thus cannot characterize with precision ecosystem structure and function, extinction rates, and the amount of molecular and functional biodiversity. Lack of knowledge about Earth’s biodiversity also precludes our potential use of those species and their products.
Ultimately, Major Challenge 1 calls for us to develop an understanding of the tree of life and then to use it. With advances in methods of phylogenetic reconstruction and increasing amounts of new comparative data from DNA sequences, the last decade has seen an unprecedented increase in our knowledge about the phylogenetic relationships of organisms, which collectively constitute the tree of life. Since the beginning of the 1990s, the number of species represented in the gene sequence database GENBank has grown to more than 155,000. If this correlates roughly with the number of species that can be placed on phylogenetic trees using molecular data alone, then the combined number of extinct and living species currently included on phylogenetic trees may approximate 200,000 species. Assuming an increase in the number of researchers and technological advances, it seems safe to expect that between 750,000 and 1 million of Earth’s estimated 10 million to 100 million species will be placed on trees within the next decade.
Data sets from individual studies are becoming larger and larger, and many contain information on thousands of species and thousands of molecular and/or morphological characters (qualities or attributes). This situation creates well-known computational challenges when one searches existing phylogenetic trees or works to resolve relationships so as to add or clarify particular branches (Felsenstein, 2004). In