Keynote Address: Impact of Biotechnology and Environmental Research on Science and Society in the 21st Century
Leroy H. Hood
Department of Molecular Biotechnology
University of Washington
It is a pleasure to be here at this 50th anniversary and to talk about paradigm changes—paradigm changes in science and in the relationship between science and society—and the role that the Department of Energy (DOE) has played in catalyzing some of these changes.
Back in 1985, I first had interaction with DOE in the context of the human-genome project. The first official meeting for the human-genome project was one that Bob Sinsheimer had organized at Santa Cruz to consider the possibility of a genomic institute there. He invited about 10 investigators to explore the possibility, and I went with enormous skepticism but within a very short period became a real convert. Of course, the issue that came up at the very beginning was how one could go from an idea and a passion to a program, and this was not a trivial consideration at all. Fortunately, at about the same time, Charles DeLisi in DOE had similar ideas, for reasons that are fascinating. One of my first interactions with DOE was my testifying in Congress. There were rumors that DOE and I were going to be testifying against Jim Watson or against David Baltimore; in the beginning, those were the enthusiastic opponents of the whole idea of the genome initiative. I give DOE enormous credit: Without the insight, energy and fortitude of DOE, and in particular Charles DeLisi, the human-genome project today in the United States would be very different.
For me, what has been most interesting about the human-genome project is the first of the paradigm changes that I will talk about: the idea that biology is an information science. I moved to the University of Washington about 41/2 years ago, in part to create a new department and in part to advocate paradigm changes both in science and in the scientist's responsibility to society.
I was fortunate to meet Bill Gates, who supported the department that we started. At the 4-hour dinner at the Columbia Towers, he made the statement that 2 disciplines were going to dominate the next century: information sciences and the biologic sciences. That statement fascinated me for 2 reasons. First was the idea that both were information sciences—biology about various kinds of biologic information, and the information sciences were about the digitized information in the real world. Second was the idea that the 2 disciplines were on a collision course—that our ability to execute biology in the future is going to be determined by our capacity to interact with computer science and applied mathematics. In the last five years, we have seen an enormous increase in our capacity to decipher biologic information, and I think in the 21st century we will see an increasing ability to manipulate biologic information. In fact, the manipulation of biologic information is going to be the very basis of preventive medicine as we move into the 21st century.
I would like to talk about the types of biologic information as I see them. The first type is the information of our genes and chromosomes. It is digital information, exactly like the digital information of the information sciences except that it has a 4-letter language rather than a 2-letter language.
Two things are particularly interesting about this digital language. The first is that variation in a digital 4-letter code is capable of writing the software that has the capacity to create something as complicated as a human organism, which has many letters—perhaps 3 billion in the human genome.
The second interesting aspect of the digital language of chromosomes is that it is not a single strand, but 2 intertwining helical strands, and these strands, or strings, exhibit a molecular complementarity with one another. The pairs of letters—As and Ts, and Gs and Cs—always match up in the 2 strands. This produces one of the fundamental and most fascinating traits of this type of information: we can break our very large chromosomes into small pieces and separate the strands from one another, and they can reassociate or find one another. That molecular complementarity is at the heart of how chromosomes work in transmitting information for life and at the heart of how many of the critical diagnostic techniques of the next century will lead to preventive medicine.
The chromosomes contain units of information called genes. Different genes can be expressed in different cells. The human-genome project has brought us an enormous capacity to assess the kind of information that is expressed in different cells, to convert this expressed information into DNA copies, and to analyze in a normal cell or in the tumor cell how the information varies. The second type of biologic information is the information that ensues when we take an intermediate string of messenger RNA and make a final string of it—a protein string of 20 letters that leads to a 3-dimensional structure. The 3-dimensional structure of a protein represents a molecular machine, and it is the 100,000 or so human genes that make (to a first approximation) 100,000 or so different molecular machines; and this is what gives the body shape and form that catalyzes the chemistry of life.
At the protein level, them are 2 fascinating questions: (1) given a linear protein string, can we predict the 3-dimensional structure that will be initiated? (2) given the 3-dimensional structure, can we figure out how it executes its function? Those are 2 enormous challenges.
The third type of information, which I think will be the information of the future, is the information that arises from complex systems and networks. Of the 10 to 12 neurons with their 10 to 15 connections, we can take a single nerve cell, study it for 20 years, and know virtually everything it does, but it won't tell us anything about the fascinating fundamental properties of this system—memory, consciousness, and the ability to learn—because they are properties of the network as a whole. How do we approach the study of systems? We can't do it by studying individual components. We need the tools, the capacity, the computational wherewithal to analyze systems in toto.
If we think about deciphering biologic information, there are 2 aspects for each of the 3 types of information. The sequence of DNA bases in our chromosomes is what the human-genome project is all about. But much deeper is the challenge of translating the information that 3.7 billion years of evolution has inscribed on our chromosomes. And so it is with proteins: It is one thing to determine structure, but it is quite another to understand how that structure executes its function. And so it is with systems as well: It is one thing to define elements and even to ascertain their linkages to one another, but it is quite another to understand how the system as a whole contributes properties (so-called emergent properties). This is the challenge and the opportunity that the human-genome project has given us—in spades.
I would like to talk about how biology and medicine are going to be transformed by the paradigm changes that we have seen. We have already noted that biology is an informational science and that there are 3 types of information. I would argue that the genome is a rosetta stone rather than a periodic table. "Periodic table" implies a single, unambiguous language, and that is not true of the chromosomes. Our chromosomes have a multiplicity of languages—some discreet and some overlapping—so the idea of a rosetta stone is more fitting.
To analyze the systems, we need to use high-throughput tools that have emerged from genomics, one of which is large-scale DNA sequencing. To give you an idea of how our ability to sequence DNA has changed when the first sequencing techniques were developed in the late 1970s, it took a year to sequence 10 nucleotides. We have machines today that in a year can sequence about 25 million nucleotides!
It is such tools that we are going to use to accomplish the 3 things necessary to define the elements, to define the linkage among them, and to define the emergent properties of complex systems. The challenge is to divide complicated systems into subsystems that we can analyze and whose properties reflect the properties of the entire system.
An idea that emerges naturally is that diseases, too, have to be approached through systems analysis, that they have to be approached as complicated systems to be studied. Two ideas come out of the systems analysis of
disease: that a given disease is not a single entity, but a series of entities; and that each of those entities—whether various forms of prostatic cancer or various forms of multiple sclerosis—is encoded by, predisposes one by, a multiplicity of genes, so we can write out a hypothetical stratification of, for example, prostatic cancer in which 3 subsets of predisposing genes cause 3 types of prostatic cancer. That is hypothetical, but it would lead to different clinical features—ultimately, to different kinds of therapeutic products and so forth. All that comes from the idea that diseases are systems.
Another idea that has emerged in part from the genome project is the unity of microorganisms. The idea of the unity of all life, that the informational pathways are shared by the simplest living organisms and all the way up to humans, belies their common evolutionary origin. If we define the simplest organisms and use them to define the information pathways, they will have profound implications for the information pathways present in humans. It is not just that they have given us insights into life. They have given us complete systems that we can use to analyze information pathways. A large fraction of the world of microbiology has been captivated by several of the microbial genomes that have been generated.
With the computational tools available, biology is going to be revolutionized. Our ability to acquire data, to store them, to analyze them, to model them, and to distribute them are all fundamental components of computational biology and applied mathematics that are essential to the biology of the future—if biology is information science.
The development of new tools has been important as new questions have come up in biology, and DOE has done a magnificent job in this regard. When more-sophisticated computational tools were needed for the human-genome project, DOE played a leading role in pioneering their development. In genomics, DOE is known for its commitment to fundamental science and the development of new tools, and this has turned out to be a very important part of biology.
A very-high-throughput tool that has been developed by a company called Affimetrix and Alan Blanchard in our laboratory has the capacity to put tens of thousands or even hundreds of thousands of DNA fragments on a small glass chip. Each fragment could represent, for example, one of the 100,000 or so human genes. By using molecular complementarity, one could analyze whether a particular gene in a particular cell is expressed. In a single test system, one could look at tens of thousands or hundreds of thousands of units of information. We have developed a technology for making 25 of these chips a day; each chip can have up to 150,000 DNA fragments of 20 letters, so if one knew them all, one could put representatives, information units, of all the different human genes on a single chip. That gives us the capacity not only to look at tumor cells and normal cells to see what the differences are, but also the capacity to carry out genetic mapping. We can use the fragments to map out genetic features and follow particular genetic markers or, for example, disease phenotypes. We can then localize genes, and there are techniques for identifying them and eventually understanding the information pathways that they are a part of and how they have functioned in causing particular diseases.
Biology is going to be changed when the complete human genome has been sequenced. In the year 2005, we expect to have identified the 100,000 or so human genes and to have transformed biology in a fundamental way: biologists of the past worked by starting with the function, finding the protein, and then identifying the corresponding gene, but in the future we will have all the genes and will have a desperate need to identify the function.
How can you go from a gene to a function? A new field, called functional genomics, presents one of the enormous challenges in biology today.