(2) provide powerful practical rules for the rational engineering of protein structure and function, (3) help predict and better understand the molecular basis for disease-causing mutations, and (4) provide the basis for beginning to understand how proteins are even possible through the random, algorithmic process of mutation and selection that we call evolution.
However, the problem is truly complex; amino acids make unequal and cooperative contributions to protein structure and function, and these contributions are generally not obvious in even high-resolution atomic structures. For example, studies of the interaction between the human growth hormone and its receptor show that the binding interface contains “hot spots” of favorable energetic interactions embedded within an overall environment of neutral interactions (Clarkson and Wells, 1995). Similarly, catalytic specificity in proteases (Hedstrom, 1996), signal transmission within G protein coupled receptors (GPCRs) (Gether, 2000) and the cooperative binding of oxygen molecules in hemoglobin (Perutz et al., 1998), catalysis in the metabolic enzyme dihydrofolate reductase (Benkovic and Hammes-Schiffer, 2003), and antigen recognition by antibody molecules (Midelfort and Wittrup, 2006; Patten et al., 1996) all depend on the concerted action of a specific set of amino acids that are distributed both near and far from the active site. Why are these energetic phenomena not obvious in atomic structures? The main problem is easily stated: we do not “see” energy in protein structures. We might observe an interaction in a crystal structure, but we do not know the net free energy value of that interaction given only the mean atomic positions. Since the native state of a protein represents a fine balance of opposing forces that operate with steep distance dependencies to produce marginally stable structures, complex and nonintuitive arrangements of amino acid interactions are possible.
These observations permit a clear statement of the goals, and that will constitute this lecture. The essence of understanding the evolutionary design of protein structure and function is globally assessing the energetic value of all amino acid interactions. Since the value of interactions is not a simple function of distance, and complex spatial arrangements of interactions between amino acids are possible, we must be open to novel strategies that go beyond structure-based inferences or high-throughput mutagenesis. In this regard we have reported a novel statistical approach (now termed “statistical coupling analysis,” or SCA) for globally estimating amino acid interactions (Lockless and Ranganathan, 1999). Treating evolution as a large-scale experiment in mutation allows this method to make the simple proposition that the energetic coupling of residues in a protein (whether for structural or functional reasons) should force the mutual evolution of those sites. That is, the conserved cooperative interactions between amino acids might be exposed through analysis of the higher order statistics of sequence variation between positions in a large and diverse multiple sequence alignment of a protein family. Application of this method in several different protein families—PDZ (Post-synaptic density 95, Discs Large, Zona Occludens 1) domains (Lockless and Ranganathan, 1999), GPCRs (Suel et al., 2003), serine proteases