structure, and knowledge of structure can provide information about function. Hence, the mapping from sequence space to structure space is part of the challenge, and may be part of the solution, of creating a map from sequence space to function space. The structure of a protein provides strong clues about its biochemical function—for example, the mechanism for action by an enzyme—but at the moment, there have been only a few successes in predicting biological function from sequence. The structures of these macromolecules are also important for other research purposes—for example, they are the starting point for predicting biochemical action or for modeling the dynamics of the macromolecules, for suggesting ways to inhibit the action of undesired proteins, for predicting potential chemical inhibitors or activators of a given protein, or for altering a protein’s functionality through its environment or through reengineering its sequence and, consequently, its structure. Therefore, developing the ability to map from sequence space to structure space is a critical challenge that, if met, would have a significant impact on all biological sciences and on our understanding of life. Currently, inferences about structure and function rely on the simple assumption that sequences that are “close” in sequence space (using metrics determined from studies of evolution) are likely to map to nearby points in structure and function space. That is generally true, but there are complications. Short stretches of a protein can be exceptions to this general situation, and larger proteins are composites of folded segments or domains. In the absence of experimental determination through an x-ray crystallographic structure determination, we do not know in any detail how to ascertain where the boundaries are for domains. We also do not know which sequence differences are most critical or might be most indicative of exceptions or might most effectively predict the biological function.
Even a catalog of all the components of a cell (a complete “parts list”) detailing not only their sequences but also their structure and function would not really explain the properties of that cell, because the system is far from equilibrium and in a very dynamic state. The properties of molecules often depend on their dynamics, from the catalytic activities of enzymes to the assembly of multicomponent structures, and many of a cell’s molecules need to be transported to specific locations within or outside the cell in order to perform their functions. Cells sense their environment and respond to various stimuli by sending signals throughout the cell and to neighboring cells, modifying their behavior. Metabolic networks are subject to feedback regulation and other kinds of control, and the expression of specific genes is controlled by networks of regulatory factors and their interactions with each other and with cellular signals. Because many cellular processes are due to the actions of a small number of molecules, stochastic fluctuations cannot be ignored. In general, then, understanding