to begin to do this, as Alber explained when he put the number at a google for ATCase. "It's not just that we don't have the computers to do that now," says Eisenberg. "We won't have them in 10 years."

Eisenberg has been taking the opposite approach to the problem of the relationship between sequence and structure. In his experiments he asks, as Stanford University engineering professor Eric Drexler did first in 1981: Given a specific structure, what amino acid sequences are compatible? But Eisenberg and his postdoctoral student Jim Bowie have simplified this approach in two ways.

First, they ask what known amino acid sequences are compatible with the structure. "That limits us to 50,000 sequences, whereas Ponder and Richards [researchers who also had previously tried the same approach] had looked at every conceivable sequence. Why should we care if there are sequences we don't know? Our job is to take information from the human genome and find the structure for each protein sequence. That limits the dimensionality of the search."

"The second major simplification is that instead of working with a three-dimensional structure, we have simplified that into a one-dimensional string which we can compare to amino acid sequences." ("String" is a computer term meaning things that have been strung out in one dimension.)

Eisenberg replaces three-dimensionality with the details of the chemical environment of each amino acid position. For example, "What are the amino acid side chains around that position, and do they prefer an apolar or a charged environment?" He has divided the chemical environment into 18 different environmental classes, and each amino acid in a protein is assigned to one of them. "We call the string of classes the three-dimensional profile. By looking at the environment, we are looking at the footprint, rather than the foot."

Eisenberg predicts the job will be manageable because he believes the number of types of protein folds is limited. "When we learn the structure of a new protein, it often has the structure of a known protein. But we can't necessarily predict that from the sequence, because the sequence may have diverged too much. This suggests that there may be a finite number of folds."

In early 1992 Cyrus Chotia, of the MRC Laboratory in Cambridge, England, estimated that there may be only 1000 to 1500 distinctive folds. "If that is true, then the job of the person studying protein structure will be to assign sequences to one of these folds," says Eisenberg. "That is what our method is aimed at."

Three-dimensional protein structures are being compiled in a data



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement