Panel III: Biotechnology: Needs and Opportunities

INTRODUCTION

Edward Penhoet

University of California at Berkeley and Chiron Corporation

Dr. Penhoet convened the panel by observing that much of the day’s previous discussion had underscored the importance of the biotechnology and computing sectors to the economy, and how each field will fuel advances in the other. This panel would focus on what the computing and biotechnology sectors need to sustain the current rate of advancement, and how the government and private sector could best work together to usher the fields into the next century.

EXPLOITING THE BIOTECHNOLOGY REVOLUTION: TRAINING AND TOOLS

Marvin Cassman

National Institutes of Health

The marriage of biology and technology, said Dr. Cassman, is creating an incredible pace of change in the biotechnology field. It is also creating a glut of raw information that is threatening to engulf everyone in the field. Three technologies are driving the fast rate of change:



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies Panel III: Biotechnology: Needs and Opportunities INTRODUCTION Edward Penhoet University of California at Berkeley and Chiron Corporation Dr. Penhoet convened the panel by observing that much of the day’s previous discussion had underscored the importance of the biotechnology and computing sectors to the economy, and how each field will fuel advances in the other. This panel would focus on what the computing and biotechnology sectors need to sustain the current rate of advancement, and how the government and private sector could best work together to usher the fields into the next century. EXPLOITING THE BIOTECHNOLOGY REVOLUTION: TRAINING AND TOOLS Marvin Cassman National Institutes of Health The marriage of biology and technology, said Dr. Cassman, is creating an incredible pace of change in the biotechnology field. It is also creating a glut of raw information that is threatening to engulf everyone in the field. Three technologies are driving the fast rate of change:

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies Molecular genetics Structural biology Genomics The first two are disciplines that have turned into tools, and the third is a tool that has turned into a discipline. All three have brought us to the post-genomic era, a phrase that connotes the plethora of information that is pouring in upon researchers. Much of the information comes from genomics, but not all of it. The primary driver for modern biology, Dr. Cassman said, is the “great engine of molecular genetics.” That quote came from a report in synchrotron radiation, which is something that until about 15 years ago was the exclusive province of high-energy physicists. Now, however, synchrotron radiation is an important tool in biology for conducting research on macro-molecular structures at high resolution. This has been a revolutionary development in biology and it permits sophisticated research on DNA and proteins. Before the adaptation of synchrotron radiation for biology, it would take years to understand the structure of a protein. Such an effort might comprise an entire dissertation for a doctoral student. Now, however, understanding one protein would be one part of a larger research program. On the whole, the rate of advancement in structural biology has been extraordinary recently. One of the most important papers in structural biology in the last year, which appeared in Nature, concerned the potassium channel. None of the authors was a “card-carrying” crystallographer; traditionally, such research has required the specialized expertise of a crystallographer and biologists have been relatively unconcerned about structure. This example shows biologists’ newly found interest in structure, and advances in information technology have enabled this. Connected to biologists’ research into cell structure are advances in genomics. Genomics provides a baseline of understanding of the total complement of information in a cell. Of course, there are other things happening in the cell, but understanding the baseline is the starting point for research into the cell. With the tools of molecular genetics, structural biology, and genomics, the question arises about the discipline’s future path. A reasonable progression in biology would be to first understand the components of cells and their function, then how they link together, and finally to learn how cells interact as complex systems. This is an idealized progression—advances in biology do not take place in such a neat sequence—but it provides useful guidance nonetheless. Metaphors to Motivate Progress A Parts List The ultimate goal, Dr. Cassman continued, is to understand biology as a complex system. To lay out a research program to attain that goal, several

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies metaphors are helpful. The first is the periodic table or a “parts list” of how the parts of living systems operate and the intrinsic function of, say, a particular protein at the molecular level. The next step is to sort the parts into the proper bins, which is not at all simple and on which only limited progress has been made. Presently, only between 30 and 50 percent of the “coding regions” of most organisms’ genome are identified. Even once a coding region is identified, it may be possible to identify its biochemical properties, but its cellular function is still not obvious. A Wiring Diagram Advancing from biochemical functions to cellular functions will take more than a “parts list.” It will take an understanding of cells’ connectivity, and this is why a “wiring diagram” is an appropriate metaphor to advance biology further. Knowing the genes that contribute to a phenotype is necessary but not sufficient for understanding a cell’s function. Understanding connectivity is required, and this is why a “wiring diagram” is needed. The pace of advancement in understanding the “wiring diagram” is good, though not as rapid as for the “parts list,” but much remains to be done. For example, at one time understanding the signaling pathway mediated by a hormone was the staple of the practice of biochemistry. In the past, this was conceived of as a linear pathway, but in fact the path is not linear. It is a network and networks of connections at the molecular level govern most regulatory processes in cells. Dealing with networks is a far more complex task than dealing with a linear path, and we need more sophisticated tools today to understand properly how these networks function. Similarly, dealing with human diseases will require understanding how multiple components interact, not just identifying a single gene defect. In fact, identifying single gene defects has been the basis of much of modern biotechnology. Networks The third stage in advancing the frontiers of biology involves understanding complex systems. This stage derives from the fact that living organisms and cells are not static; they exist in space and time. The network shown in a “wiring diagram,” for example, conveys nothing about a living system’s spatial or temporal resolution. Living systems are dynamic, and biology is still a considerable distance away from fully understanding living systems in a dynamic environment. Appreciating the cell as a system requires first an understanding of cellular components and their interaction. To make progress in understanding biological systems, we will need a quantitative understanding of biological materials, and this means that truly interdisciplinary tools will be called for from chemists, mathematicians, physicists, and engineers, as well as biologists. Rather than

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies looking at biology as loosely linked molecular devices, researchers will have to think of living organisms as systems. Challenges to Progress One important challenge in biotechnology is the need to increase the use of quantification and mathematical modeling. However, these tools are not commonly used by biologists. In the field of molecular genetics, in which dramatic strides have been made, the extent of mathematical sophistication is limited to determining whether there is a spot on a cell or not. Second, multidisciplinary research is more often honored in the breach; more people talk about it than engage in it. Collaboration exists in biology, but it is based more on a mutual need than a common goal. A cell biologist will readily seek out a structural biologist to address a particular problem, but this is far from bridging the chasm between mathematicians and biologists. To bridge this gap, an accommodation to different professional practices and cultures is needed, a difficult undertaking. Finally, biologists have adopted the paradigm of the single gene defect for understanding phenomena. A certain degree of retraining will be necessary to think of biological phenomena as part of complex systems. Nonetheless, Dr. Cassman said that moving to an understanding of cells and organisms as complex systems in space and time is difficult, but not impossible. Such research is already underway, and the effort at Berkeley described by Dr. Penhoet is one example, among others around the country, of that. The goal is to arrive at an understanding of the cell’s complete function in such a way so that the phenotype can be modulated. By that, Dr. Cassman meant phenotype in its most general sense, that is, the expressed characteristic of a system or an organism. For example, a biologist wants to know what happens when a certain hormone is added to a complex system, such as a cell. Understanding how the hormone affects the function of a single cell is presently difficult, but it is crucial to understanding how to develop therapies that will modulate a cell’s function. Biology and Complex Systems The challenge is how to accelerate biologists’ understanding of organisms as complex systems. Dr. Cassman mentioned some discussions under way at the National Institute of General Medical Sciences over the past 3 to 4 years to address this issue. The Institute of General Medical Sciences has brought over 100 researchers together in these discussions, along with a set of broad operational categories that seek to address challenges associated with understanding complex systems. With respect to expanding the “parts list,” more funding is the key. Reams of data are pouring out from genomics research and the challenge is to develop new database tools to classify and store the information. Aside from additional funding, database development does not require a new targeted initia-

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies live. With respect to the “wiring diagram,” Dr. Cassman said that steady progress is being made in understanding cell functions as networks. The largest challenge lies in understanding cellular functions as complex systems that exist in time and space. Dr. Cassman says that we are far away from meeting this challenge, and an important question is how to properly define a complex system in biology. At the Institute of General Medical Sciences, a complex system is defined as “one in which the behavior or expressed characteristics of a biological system is determined by the multiple interactions of components whose quantitative expression may vary in time and space.” Dr. Cassman noted that, though useful, this definition was not fully satisfactory; indeed, Dr. Cassman observed that Science magazine devoted an entire issue to complex systems, but it was unable to develop a definition of a complex system. In addressing complex systems, Dr. Cassman said that scientists as yet really do not have a method to adequately do so. There are no obvious ways to model complex biological systems, and Dr. Cassman was unaware of anyone who has tried to do so. It is important to start with simple systems and build from there. Recently, bacterial chemotaxis was successfully modeled, a system with only four or five components, but complex nonetheless. Dr. Cassman noted that the investigator who worked on bacterial chemotaxis is a physicist who has turned to biology. This is an example of someone with the mathematical and modeling tools turning to biology. More of that must occur. NIH Initiatives to Promote Collaboration The National Institutes of Health have made efforts to increase collaboration among biologists, physicists, and mathematicians through funding collaborative research. While a useful beginning, Dr. Cassman said that presently the initiative suffers from “too many good intentions and not enough knowledge.” There is a substantial gulf separating the two disciplines; biologists are unaware of mathematical tools available from physicists, while physicists are unaware which biological systems can usefully be investigated. It will take long and hard work, said Dr. Cassman, to overcome these barriers. A second part of NIH’s efforts to increase collaboration is something called “glue grants.” Dr. Cassman said that for many difficult research problems in biology, the material and intellectual resources do not exist in a single laboratory. Even if a laboratory could employ 20 top-flight post-docs on a single project, this would not be sufficient to address some problems. NIH’s “glue grants” do not support underlying research, as support for that is assumed to be in place, but are intended to facilitate interaction among diverse sets of researchers. It was surprising to Dr. Cassman that a large number of applications were submitted for “glue grants”; this suggests that the research community has significant interest in expanding collaborative activities. The third area is the development of and access to research tools. This may

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies seem obvious, but as in the example of x-ray crystallography, today’s specialized research tools are becoming useful for a wide variety of purposes. Specialists are no longer the sole users of advanced research tools such as nuclear magnetic resonance and mass spectrometry; in recent years, for example, there has been a rapid growth in the use of mass spectrometry among cell biologists. Today, cell biologists not only need mass spectrometry tools, but also people trained in how to use them. The issue of tools also extends to data sets and material resources. In structural genomics, the research agenda is to arrive at the three-dimensional structure of every protein known. To accomplish this extremely ambitious task, researchers plan to parse the human genome into various families and sub-families to pick representatives of other proteins. Homology modeling will then be used to determine the structure of the proteins. This research program will be international in scope. In fact, a meeting will be held in Europe in Spring 2000 to ensure that collaborators in England, Germany, France, and Japan are working in concert as progress is made in modeling 10,000 structures in 5 years. This will require not only funds for laboratories and researchers’ salaries, but also the ability to analyze and share large data sets. Finally, NIH is trying to develop new training programs in computational biology and bioinformatics. Dr. Cassman cautioned against programs—at NIH or elsewhere—that train computational biologists and specialists in bioinformatics, but do not develop ways to promote true cross-fertilization. In conclusion, Dr. Cassman said that the biology and the biotech industries are today confronted with more information than they can assimilate. Biologists really have no alternative but to draw on tools from chemical engineering, physics, and computer science in order to construct a quantitative dynamic structure for biological systems. The barriers to doing this are cultural as well as scientific, but the effort to bridge the gap separating disciplines must urgently be undertaken. Discussants Dr. Dahms noted that there is a continuum of training needs among computer scientists and biologists. That is, there may be different relative payoffs from training biologists in computer science versus training computer scientists in the life sciences. Dr. Dahms asked Dr. Cassman how, from the perspective of universities, training programs should be developed in light of potentially different payoffs to training in the different disciplines. Dr. Cassman responded that whatever the relative payoffs in interdisciplinary training, the important thing is for “bodies to be exchanged” across disciplines. Computer scientists must spend time in biologists’ labs, and vice versa. Dr. Cassman has not witnessed much of that, although he said that there have been exchanges among physicists and biologists. There will always be only a few people sufficiently trained in physics and biology to perform experiments in

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies both fields at a high level. It is more important, Dr. Cassman said, to have an adequate number of people conversant in the language of the two disciplines so that meaningful collaboration can occur. Training such people is feasible, although it will take time to produce enough trained individuals. In the meantime, NIH has developed one- to two-week intensive courses designed to improve understanding among biologists, physicists, and computer scientists about each other’s disciplines. Dr. Penhoet asked why, in the age of the Internet, it was necessary to promote face-to-face interaction among scientists, as has been the thrust of many comments during the conference. He noted the paradox of vastly improved communications technology existing alongside apparently growing geographical concentration of innovation in certain regions. Dr. Cassman responded that even before the Internet, it was possible, in principle, for researchers to read all the relevant journal articles and then go replicate or expand upon a certain experiment. Almost no one works that way. Dr. Cassman added that he told students to first read a journal article, and then go work in the lab of the researcher if the student wanted to learn how to do a particular experiment. There are too many informal modes of communication that a journal article or the Internet cannot capture and convey. A questioner commented that biology departments often segregate internally, meaning that molecular biologists may rarely communicate with ecologists, and that this undermines the interdisciplinary goals Dr. Cassman has discussed. The questioner asked whether NIH has considered this problem. Dr. Cassman agreed that despite all the talk of multidisciplinary research, collaboration remains difficult among sub-disciplines in a field. To facilitate collaboration, a culture of collaboration must take hold. Developing such a culture is hard to do, and takes time. Dr. Penhoet added that our entire educational system penalizes teamwork; it is often called cheating in universities and results in academic penalties. In industry, the payoffs to teamwork are well known. Dr. Penhoet noted that the educational system appears to be doing a better job today of promoting teamwork. THE NEW FRONTIER: BIOINFORMATICS AND THE UNIVERSITY Rita Colwell National Science Foundation Dr. Colwell began her remarks by observing that two decades ago, computers and biology were rarely used in the same sentence, let alone research. In her days as a graduate student, Dr. Colwell was considered a pioneer because she wrote a computer program—in machine language on an IBM 360—to classify bacteria. A model of the computer she used, housed at the time in the attic of the

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies chemistry building at the University of Washington, is now on display at the Smithsonian. Today, doctors and hospitals automatically classify and identify bacteria by computer. In conveying the distance that biology has traveled since Watson and Crick discovered the double helix in DNA in 1953, Dr. Colwell said that we now know that the Arabidopsis genome is 1.8 inches long and contains 135 megabases of information. The human genome, estimated to be 3.3 feet long, is being elucidated today. Sequencing the human genome, as is well known, will require significant improvements in biologists’ ability to manage information. Dr. Colwell said that bioinformatics is not just about genomics or managing information, but it has led to new ways of communicating information among biologists. In taking a broad look at information technology, advances in that field will result in faster, more secure, and more reliable software for biologists. In the National Science Foundation’s (NSF) FY 2000 budget, Congress granted the agency’s request for significant funding for information systems that will provide improved tools for biotechnology research. In order to sustain the rate of innovation predicted by Moore’s Law, Dr. Colwell pointed out, it will be necessary to develop new research tools for biotechnology. Biocomplexity The promise of information technologies in biotech research has to do with allowing scientists to study entire biological systems. “Biocomplexity” is the next challenge for biologists—studying the complex interdependencies among various systems in the environment. As biologists turn to these areas of inquiry, the challenge is not in collecting data, but in managing it. As we improve our ability to manage biological data, scientists will be able to make significant strides in biocomplexity. A host of innovations have created a flood of data about the Earth’s complex biological systems and the processes that sustain them. From DNA chips, to geographic information systems, to bio-sensors, to ecological monitoring devices and satellite imaging systems, we have more data than ever before about the Earth and its inhabitants. Because of these new demands, NSF will invest $50 million in FY 2000 in a new focused research initiative on biocomplexity. As an example of NSF’s efforts to use information technology to better understand complex biological systems, Dr. Colwell described NSF-funded projects that explore global and regional distribution of temperature, precipitation, sea level, water resources, and biological productivity. Dr. Colwell displayed a map that, using sophisticated satellite and sensing technology, showed the distribution of the population of a particular bird in Mexico. The map was created using information collected by 24 museums throughout North America. In the past, these data were kept in separate databases, making research on the bird and the ecological system that supports it nearly impossible. Extending this

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies research will mean understanding the interplay among species and how different species coexist and co-evolve over time. This will place new computational demands on biology. Biocomplexity has evolved from the integration of disciplines—finding places where biology and physics explain each other—to include chemistry and geology in understanding the environment. The interconnection of disciplines is best captured by a quote from John Muir: “When we try to pick out anything by itself, we find it hitched to everything else in the universe.” Biocomplexity will present many research challenges, and universities will play a prominent role in this research. Universities will also play a crucial role in training the workforce that will conduct multidisciplinary research. Twenty-first Century Workforce Dr. Colwell raised the larger question of U.S. leadership in innovation and information technology, and how the nation must prepare for a future in which more workers will need to be literate in science, engineering, and mathematics. This is why NSF’s Twenty-first Century Workforce Initiative is important to the agency’s overall mission and the nation’s economic future. This program establishes partnerships with the private sector and universities to expand training programs in math, science, and engineering. One focus of NSF in this initiative is the issue of information “haves” and “have-nots” —the so-called Digital Divide. Dr. Colwell said that social scientists have identified demographic groups in which telephone, computer, and Internet access lag well behind national averages. In general, according to NSF’s Science and Engineering Indicators, there is a strong correlation between income level and computer use. Information gaps exist among nations as well, and Dr. Colwell observed that most people in the Third World have never used telephones. Less than 2 percent of the world is on the World Wide Web, and if the United States and Canada are subtracted, the share is less than 1 percent. The inequality in access to cyberspace should be of concern not only for humanitarian reasons, but for economic and political ones as well. An important element of the Twenty-first Century Workforce Initiative is to better understand the nature of learning. The initiative will take a quantitative approach to behavioral science research to improve our fundamental knowledge of how individuals teach and learn. In the arena of research using supercomputers, NSF has funded brain-imaging research at the University of California at Los Angeles that has developed a “brain template.” This research, though at a very early stage, is seeking to determine which parts of our brain are most active when we are learning. People learn in a variety of different ways, and those who appear to have difficulty learning may simply learn in a different way. If the different ways in which people learn can be isolated, then teaching techniques can be adjusted to individual needs. Research into the structure of the brain,

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies usually conducted by NIH, can therefore be usefully connected to research at NSF on the nature of learning. Bioinformatics and NSF To fuel the growth of bioinformatics, more must be done at the level of graduate education. As has been mentioned earlier, many of the research challenges in biology—from gene analysis to drug discovery—are in fact computational ones. As we all know, there is a desperate shortage of trained specialists able to meet these challenges. To address the problem, NSF has funded a “career awards initiative” to prepare graduate students in science and engineering not only for careers in academia, but in the private sector as well. Dr. Colwell said that with the NSF initiative, she expected to see many more graduate students doing internships in private industry in the future. In closing, Dr. Colwell noted that NSF celebrated its fiftieth anniversary in early October. The upcoming challenges in bioinformatics mean that NSF’s mission is ever more urgent, in terms of training graduate students to develop the new computational tools that will be necessary to exploit fully the promise of the biotechnology revolution. The scientific community must embrace the new methods and approaches in biotechnology research, and NSF stands ready to play a support role in encouraging this. Discussants Dr. Dahms pointed out that there have been some projections that the bioinformatics industry might be $2.5 billion by 2005, a 12-fold increase from today’s level. He also observed that the president of Compaq has predicted that by 2005 40 percent of all biotech companies will be in bioinformatics; essentially, two-fifths of all firms in biotechnology will sell information. This raises questions about workforce preparation, namely whether there will be enough people trained in bioinformatics to fill outstanding jobs. Dr. Dahms said that the industry may need as many as 20,000 workers in bioinformatics by 2005, a substantial increase. Dr. Dahms asked what NSF is doing to address these challenges. Dr. Colwell said that the NSF’s Twenty-first Century Initiative would be critical to meeting this challenge and mathematics education will be the important underpinning to this effort. In FY 2001 and 2002, mathematics will be the subject of a major NSF initiative—pure mathematics, applied mathematics, and statistics. In biology, Dr. Colwell expected to see a reemergence of mathematical biology, in addition to collaboration among physicists, biologists, and chemists. One initiative at NSF involves encouraging graduate students in math, science, and engineering to become involved in kindergarten to twelfth grade (K-12) education. Under that initiative, an NSF stipend will allow graduate students to spend up to 20 hours a week in K-12 classrooms under the super-

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies vision of a teacher to promote science education. Dr. Colwell says that she believes that there is a “valley of death” between grades 4 and 12 during which we lose the interest of many students in math and science. The NSF is attempting to address this problem in the United States and elsewhere. The United Kingdom, she said, also faces this problem and is attempting to address it. Finally, Dr. Colwell said NSF would launch regional science and technology centers, modeled after engineering research centers, to be funded at between $3 million to $5 million per year for several years, and then be potentially renewable. Another initiative that Dr. Colwell would like to implement is to raise the salaries of outstanding instructors in introductory science courses in universities. It is especially important to encourage quality teaching of science to students whose major may not be in the sciences. Another area to explore is how to permanently increase the salaries of teachers recognized for excellence. Dr. Colwell said it was oxymoronic to give outstanding teachers a plaque and a semester off from teaching. It would be better to reward them with a permanently higher salary. Greg Reyes of Schering-Plough observed that his industry was already overwhelmed by data. He asked Dr. Colwell specifically how NSF planned to facilitate allowing chemists and biologists to better manage data. Dr. Colwell said that NSF’s information technology initiative will help in this regard, investing $105 million primarily in new software and research into how to link disparate computer systems together to share biological data. NSF also received $36 million in funding for high-speed computing. Dr. Colwell also said that advances in computing power should enable dramatic improvements in social and behavioral sciences. With respect to translating some of this research into useful information for industry, Dr. Colwell said that programs such as the Small Business Innovation Research program is an excellent vehicle for outreach to industry by NSF. Partnerships between industry, government, and universities will be necessary to get the most out of the new NSF initiatives in mathematics and information technology. EMERGING OPPORTUNITIES AND EMERGING GAPS Paula Stephan Georgia State University Dr. Stephan said her remarks would focus on four dimensions of emerging gaps in the supply of workers trained to handle the vast quantities of biological data being produced. These are: An indication of strong demand; A summary of what is in the pipeline at present;

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies Explanations for the sluggish response to growing demand; and Possible solutions. Strong Demand Not only is mapping of the human genome generating a great deal of data in biology, other types of data are being created as well. There is widespread agreement that we are experiencing just the “tip of the data iceberg” today. Because of the huge amounts of data, a number of companies have begun to recognize the possibilities of computational science and in particular the potential to develop drugs from models based on biological data. As indications of strong demand, the scientific press frequently reports on students being “grabbed” from graduate programs before they have completed their degrees. Faculty members have been lured from universities as well, creating worries that bioinformatics is “eating its own seed.” Moreover, large salaries—on the order of $65,000 for master’s graduates and $90,000 for new Ph.D.’s—indicate surging demand for individuals adept at manipulating biological data. To get a better handle on the nature of the problem, Dr. Stephan and her co-author, Grant Black, examined ads for positions in bioinformatics in Science magazine and surveyed institutions of higher learning to examine their responses to the changing demand conditions. From analysis of ads for bioinformatics and computational biology positions in Science, Dr. Stephan said that the number of ads was generally higher in 1997 versus 1996. Looking at the data more closely, the number of distinct position announcements grew by 68.6 percent from 1996 to 1997, with position announcements from firms growing by 70 percent, universities growing by 29 percent, and other non-profit institutions growing by 133 percent. Another way to assess demand is to explore the placement of students in formal or informal bioinformatics programs at universities. Dr. Stephan and her colleagues surveyed 21 programs in bioinformatics, of which 16 responded, and found that placement rates were very high; over 50 students from undergraduates to post-doctorates found employment. Only one student was “grabbed” by industry before completing the degree program, undercutting the notion that industry is raiding university bioinformatics programs for talent. In general, salary levels are quite impressive, with several undergraduates earning more than $50,000 and several masters, doctorates, and post-doctorates earning more than $100,000. In summarizing demand, Dr. Stephan said that demand for bioinformatics is strong and growing, although still small relative to other areas in biology. A large amount of demand comes from industry, with salaries high in industry relative to other areas in life sciences. Graduates of formal or informal programs do not fill the majority of jobs in bioinformatics; indeed, graduates from those programs filled at best 15 percent of positions advertised in 1997. Individuals

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies not formally trained in the field fill many jobs in bioinformatics and a number of jobs remain unfilled. The Pipeline According to Dr. Stephan’s survey, as of March 1999 formal training programs had 23 undergraduates enrolled, 35 master’s students, 86 doctoral students, and approximately 25 post-doctorates. The strong demand for bioinformatics and growing enrollment occurs at a time of a “crisis of expectations” for young life scientists. The career outlook for young life scientists is not bright; concern is sufficiently high so that the National Research Council has established the Committee on Dimensions, Causes, and Implications of Recent Trends in the Careers of Life Scientists. As a member of that Committee, Dr. Stephan said that the Committee was very concerned about young life scientists’ career prospects. One recommendation was to restrain the rate of growth in the number of graduate students in life sciences. An important issue to address is whether it is contradictory to have a “crisis of expectations” along with a strong demand for specialists in bioinformatics. Why, for example, are there only 9 doctoral programs in bioinformatics and computational biology, while there are 194 programs in biochemistry and molecular biology and over 100 in molecular and general genetics? Dr. Stephan said that there were four possible explanations for the imbalance: Low incentives for individual faculty to recruit students in the area; The educational system responds differently when demand is driven by industry; The interdisciplinary nature of the field creates disincentives; and A possible quick fix—turning life scientists into computational biologists. Dr. Stephan said she would discuss in detail each possible explanation. Lack of Incentives Dr. Stephan said that it is a fact of academic life that the need for external grants to support research makes faculty very responsive to research funding opportunities. One way to encourage students to enter the bioinformatics field is to target research funds to that area; this gives faculty members incentives to populate their labs with graduate students doing research in bioinformatics. From the evidence collected by Dr. Stephan and her colleagues, research funding agencies are only beginning to direct grants to bioinformatics. In effect, agencies have placed all their computational eggs in one basket, namely training. This may be the best solution in the long term, but it does not address the short-term needs to shift the distribution of graduate students in biology pro-

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies grams toward computation. Training grants signal “collective bodies,” Dr. Stephan said, while research grants signal individual investigators. A collective response from universities is needed to expand programs in bioinformatics/computational biology, but the individuals within universities are driven by where their next grant may come from. Industry-Driven Demand The fact that the demand for computational biologists is driven by industry, according to some, is not much of an incentive for universities to respond by expanding programs in this field. The educational system is generally poised to respond quickly to changes in research funding portfolios, but moves inherently more slowly in response to industrial demands. In the life sciences, the tradition has not been to place people in industry, but rather to train people to do research in academic or non-profit laboratory settings. Moreover, recently industry has been hiring away promising faculty to conduct research, thereby “eating the seed corn” for training graduate students in bioinformatics. Interdisciplinary Challenges Dr. Stephan said that the interdisciplinary nature of computational biology creates disincentives to establish new training programs. Coordination across fields is intrinsically difficult, and it is exacerbated by geographical separation of departments within universities. Such things as allocating costs, credit for teaching hours, and finding classroom space are among the many details that makes coordination difficult. Moreover, the fields involved often have very different career goals. In biology, a master’s degree has traditionally been seen as a “consolation prize” for some students, whereas in engineering and information technology, master’s degrees are prestigious terminal degrees that successfully launch people’s careers. No Quick Fix Dr. Stephan concluded by suggesting that a “quick fix” to address the supply shortfall in computational biology was unlikely. First, salaries in computer sciences are uniformly higher than those in biology, life sciences, and health sciences. Therefore incentives do not exist for one potential quick fix—encouraging people in computer sciences to do post-docs in biology in order to become computational biologists. Computer scientists do well financially as it is, and have little reason to become computational biologists. The other possible quick fix—biologists becoming adept in computer science and math—is also unlikely. Dr. Stephan’s research showed that biologists generally did not have the training or aptitude for transforming themselves into

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies computational biologists. In their survey of top life sciences programs, Dr. Stephan and Grant Black found that most had no math prerequisite for entry and that few had any math courses as part of the degree program. Moreover, data indicates that students intending to enter biological and medical sciences have lower mathematical aptitudes than their counterparts planning to enter computer sciences and mathematics programs. Using Graduate Record Examination scores from 1993 to 1996, Dr. Stephan pointed out that students planning to enter math and computer science programs usually score substantially higher in mathematics than biology or medical science students (Table 3). A recent Science article discussed a UCLA survey that found that faculty in engineering programs used information technology most heavily, followed closely by physical scientists, and that biologists were in the middle in terms of use of information technology, slightly ahead of scholars in the humanities. Possible Solutions Dr. Stephan said that one way to increase the supply of computational biologists would be to find ways for faculty at urban campuses to interact across disciplines and institutional boundaries. Proximity of space really matters in promoting collaboration. It is also necessary to provide incentives for new interdisciplinary programs, and additional training and research awards would facilitate the formation of new programs. Additional research awards in particular, said Dr. Stephan, would give faculty the right incentive to undertake multidisciplinary research and, therefore, increase the supply of biological researchers adept at using information technology. Dr. Stephan also recommended that faculty do more to provide students TABLE 3 GRE Scores by Intended Field of Graduate Study, for Seniors and Nonenrolled College Graduates, 1993–1996 Intended graduate field of study Test sections Mean score Percent of test takers with score above 700 Percent of test takers with score of 800 Biological sciences Verbal 501 3.6 0.1 Quantitative 595 20.7 1.1 Health and medical sciences Verbal 449 0.7 0.0 Quantitative 515 5.8 0.1 Computer and information sciences Verbal 483 5.4 0.2 Quantitative 672 52.2 5.6 Mathematical sciences Verbal 502 6.5 0.2 Quantitative 698 60.6 8.8   SOURCE: 1997–1998 Guide to the Use of Scores, Educational Testing Service, 1997.

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies with information on career outcomes. This requires faculty to do a better job tracking students’ progress after graduation. Dr. Stephan was astounded that so few faculty members knew where their students were working and what salary levels were. Finally, it is important to recruit early in the educational process the right kind of student to work in computational biology. The field is new and presents different intellectual challenges than either biology or computer science alone do. Identifying students with promise to bridge the gap between the fields will not only increase supply, but also make the field more attractive to peers. Discussant Greg Reyes of Schering-Plough made a distinction between how drugs were discovered before 1990 and since. Prior to 1990, medicinal chemistry was the primary method, and small molecules (or natural product as a source of molecules) were placed in a biological screen and a biological effect was observed. That was the starting point for drug optimization. There are limits to that method of drug discovery; for example, it does not identify the mechanism by which the drug works, so one cannot know a priori what potential side effects may be. Since 1990, however, a number of new technologies have greatly aided drug discovery, and they are all information-based, such as sequence databases. Micro-array technology, gene chips, combinatorial chemistry, and other technologies all generate new drugs and tremendous amounts of data. To maximize the value from these technologies, bioinformatics is crucial. Bioinformatics also enables a new set of questions to be asked in research and drug development. For example, it will be easier to identify targets for drugs and validate the targets. A question one might pose is what genes are uniquely expressed in cancer cells, and to answer that researchers need sequence data from a normal source and a cancerous source. Researchers can then compare the differences and explore whether the normal cells have, for example, different regulator genes. Posing such questions enables researchers to develop priority gene targets for drug development. All of this involves the generation of large amounts of data and therefore requires computational capability. With the host of new drug discovery technologies, a major challenge is how to integrate them to capitalize fully on their ability to discover new drugs. Each of these technologies may be the expertise of separate biotechnology companies, and the challenge for a large pharmaceutical company is either to partner with firms to exploit the technologies or develop the in-house expertise in the technologies. A typical pharmaceutical company will generally use a combination of both strategies. At present, a limitation in drug development is in the confirmatory biology to study a treatment once the target has been identified and the development is fairly far along. This means, Dr. Reyes said, that informatics provides enough capability to generate compounds, libraries, and targets. A drug’s impact is probably better explored by investigating the molecule itself, so trained biolo-

OCR for page 129
Capitalizing on New Needs and New Opportunities: Government-Industry Partnerships in Biotechnology and Information Technologies gists remain a key part of the equation for the industry, not just skilled computational biologists. Dr. Reyes said the future is best considered by asking the question: What if we could design a molecule to uniquely fit the active side of a membrane and not cross-react with related enzymes? This requires greater multidisciplinary collaboration between chemists and biologists. Using tools in existence, it should be possible in the future to predict a drug’s pharmacology in humans. Beyond 2000, in silico drug discoveries will be key for the pharmaceutical industry. Using silicon-based information technologies, researchers will be able to do things such as study the structure of proteins faster than ever before. Hypotheses about proteins will be testable and verifiable using informatics. It will be important, however, for researchers to be able to produce the protein in adequate quantities to obtain the crystal and conduct studies. Highly trained biologists, Dr. Reyes reiterated, will be very important. That said, Dr. Reyes added that at Schering-Plough, the bioinformatics group has generated 10 years worth of data for biologists. In other words, Schering-Plough could shut down it bioinformatics operations today and keep its biologists busy for the next 10 years. DISCUSSION A questioner said that a plausible explanation for the shortfall in supply of trained individuals in bioinformatics is that the field is very new. The questioner asked Dr. Stephan how the newness of the field factored into her analysis. Dr. Stephan responded that, even though the field was new, many people contacted in the course of her research said that they had found it difficult to start computational biology programs within their academic institutions. Thus, even with very strong demand in industry, structural difficulties in universities have inhibited responses from academia. In fact, some people she interviewed left universities out of frustration when starting a bioinformatics program was impossible. A questioner recalled a meeting that he attended at which large companies such as Merck and IBM said that for bioinformatics jobs, the companies do not need people that have graduated from formal degree programs, but rather people with certain skills. Dr. Stephan agreed that developing people with proper skills was the right goal, more so than necessarily graduating more people from formal programs. She added that students must be made aware of job opportunities in the field; this is not done adequately now, so students who may be interested and ideally suited for bioinformatics remain unaware of the field. That is why, Dr. Stephan said, more outreach must be done. Dr. Kathy Behrens commented on Dr. Stephan’s observation that faculty members in the life sciences do not know where students are employed upon graduation; she suggested that granting agencies require universities to establish a tracking system as a condition for the grant. Dr. Stephan agreed that this would be useful, and she added that institutions should be required to track students after post-doctorate employment.