Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 29
Models of Large-Scale Science
To further elaborate on the concept of large-scale biomedical
science as defined in this report, this chapter provides an
overview of several examples of past and current large-scale
projects or strategies in biology and other fields. It begins with a sum-
mary of the Human Genome Project (HGP), the largest and most visible
large-scale science project in biology to date. Many examples are drawn
from NCI, in part because NCI has a longer history and more extensive
ex-perience with directed, large-scale projects compared to other branches
of NIH, and also because a major focus of this report is on cancer re-
search. Several initiatives recently launched by other branches of NIH are
described in detail, followed by examples of National Science Founda-
tion (NSF) programs, industry consortia, public-private collaborations,
and initiatives sponsored by private foundations. The chapter concludes
with an example of a nonbiology model of large-scale science for con-
trast that of the Defense Advanced Research Projects Agency (DARPA).
The DARPA model is commonly cited as a potential strategy for under-
taking large-scale, high-risk, and goal-oriented research, but this model
has rarely been replicated in biology. A review of federally funded large-
scale research projects in nonbiology fields such as high-energy physics
is provided in the Appendix.
The common theme among the examples described in this chapter is
that they are all formal programs launched by funding agencies, founda-
tions, or industry. There is certainly no shortage of other ideas for poten-
tial large-scale biomedical research projects among scientists. Without an
29
OCR for page 30
30
LARGE-SCALE BIOMEDICAL SCIENCE
initiative by a fonder, however, individual scientists may find it very
difficult to obtain the funding necessary to launch an expensive, long-
term, large-scale project because of the nature of traditional funding
mechanisms (see Chapter 4~.
Another common thread among these projects is their dependence on
new or developing technologies. Technical innovations drive scientific
discovery and determine what can be accomplished in the field. The pace
and variety of new innovations have increased greatly in recent years, in
turn increasing the feasibility of and opportunities for large-scale projects
in biology (see Box 3-1~. For example, the advent of DNA arrays and the
development of software for analyzing the data they generate have made
it feasible to study the entire transcriptional profiles of cells in health and
disease or under various conditions. However, such projects are not only
much larger in scale, but also much more expensive to undertake.
OCR for page 31
MODELS OF LARGE-SCALE SCIENCE
THE HUMAN GENOME PROJECT
31
Ever since the discoveries of genetic inheritance and the chemical
structure of DNA, there has been interest in "unlocking the secrets of life"
by deciphering the information encoded in the genome. Initially, scien-
tists concentrated on small pieces of the puzzle because they lacked the
ability to investigate genetic material efficiently on a large scale. As tech-
nological advances were made, however,1 some molecular biologists be-
gan to discuss the feasibility and potential value of mapping and sequenc-
ing the entire human genome (see Figure 3-1~. The first editorial published
in a major scientific journal advocating a large-scale approach to sequence
the human genome brought the concept to the scientific mainstream, with
an emphasis on cancer research (Dulbecco, 1986~. Nobel laureate Renato
Dulbecco suggested that a project to map the human genome was the best
way to make progress in the "war on cancer," which had been launched
by the Nixon Administration in 1971. Dulbecco compared the significance
of such a project to that of the U.S. space program, arguing that a genomic
approach would facilitate a greater understanding of the genetic changes
that lead to cancer, which would be essential in eradicating the disease.
But he also noted that research on other diseases would certainly benefit
as well.
At about the same time, a number of influential scientists were pub-
licly discussing and advocating the possibility of sequencing the entire
human genome (reviewed by Sulston and Ferry, 2002; Davies, 2001; Cook-
Deegan, 1994, Kevles and Hood, 1992~. In May 1985, Robert Sinsheimer,
chancellor of the University of California Santa Cruz and a well-known
molecular biologist, brought together a group of leading American and
European molecular biologists to discuss the technical prospects for a
human genome project. At this symposium on DNA sequencing, one of
the strongest advocates for a large-scale HOP was Walter Gilbert, a Nobel
laureate from Harvard, who had developed one of the first methods for
sequencing DNA.
The following year, in early March 1986, Charles DeLisi, director of
the Office of Health and Environmental Research at the U.S. Department
of Energy (DOE), held a workshop to discuss the idea of undertaking an
HOP under DOE. Although DOE may not have appeared to be the logical
choice of a federal agency to oversee such a project, it had a long-standing
research program on the effects of radiation on mutation rates, and the
Life Sciences Division at Los Alamos National Laboratory had already
established Genbank, a major database for DNA sequences, in 1983. DOE
1 These technical advances included recombinant DNA methods, DNA sequencing meth-
ods, techniques for genetic mapping, and computer analysis.
OCR for page 32
32 LARGE-SCALE BIOMEDICAL SCIENCE
May 1985 -- Robert Sinsheimer, UCSC chancellor hosts a meeting to discuss the technical prospects of the HGP.
March 1986 -- Editorial by Renato Dulbecco suggests that the HGP is the best way to make progress in the War
on Cancer.
March 1986 -- Charles DeLisi holds a workshop to discuss the possibility of a DOE-sponsored HGP.
May 1986 -- A molecular biology meeting at Cold Spring Harbor includes a special session to discuss the
possibility of the HGP.
February 1988 -- A report from the U.S. National Research Council endorses the HGP.
April 1988 -- The Congressional Office of Technology Assessment endorses the HGP.
September 1988 -- NIH establishes the Office of Human Genome Research, with James Watson as its head.
October 1989 -- The new NIH office becomes the National Center for Human Genome Research (NCHGR).
April 1990 -- NIH and DOE publish a 5-year mapping and sequencing plan, with a projected budget of
$200 million/year.
1991 -- NIH funds ~175 genome projects, with an average grant size of ~$300,000/year.
July 1991 -- Craig Venter, then at NIH, reveals that NIH has applied for patents on expressed sequence tags
(ESTs) identified by his laboratory.
April 1992 -- Watson resigns as head of NCHGR. Francis Collins appointed as his replacement in 1993.
June 1992 -- Venter leaves NIH to set up The Institute for Genomic Research (TIGR), a non-profit devoted to
identifying human genes using EST methods.
October 1993 -- NIH and DOE publish a revised 5-year plan, with full completion expected in 2005.
October 1993 -- The Wellcome Trust and the U.K. Medical Research Council open the Sanger Center to sequence
the human genome and model organisms.
September 1994 -- French and American researchers publish a complete genetic linkage map of the human genome,
one year ahead of schedule.
December 1995 -- Another group of American and French scientists publishes a physical map of the human genome
containing 15,000 marker sequences.
February 1996 -- International HGP partners agree to release sequence data into public databases within 24 hours.
January 1997 -- NCHGR renamed as National Human Genome Research Institute (NHGRI).
October 1997 -- Only 3 percent of the human genome is sequenced in finished form by the projected midway point
of the 15-year HGP.
1998 -- ABI PRISM 3700 automated sequencing machines enter the laboratory market.
May 1998 -- Craig Venter announces formation of a company, later named Celera, to sequence the human
genome in 3 years, using the whole genome shotgun approach.
May 1998 -- The Wellcome Trust announces that it will double its support for the HGP.
May 1998 -- Collins redirects the bulk of available NHGRI funds to three sequencing centers.
October 1998 -- NIH and DOE publish new goals for 1998-2003, expecting a working draft of the genome by
2003, and a full sequence by 2005.
March 1999 -- NIH moves the expected date for release of a working draft ahead to spring of 2000.
March 2000 Celera and academic collaborators release a draft sequence of the fruit fly genome, obtained using
the whole-genome shotgun method.
March 2000 -- Possibility for collaboration between Celera and the public HGP wanes. Disagreement over data
access is a major obstacle.
June 2000 -- HGP and Celera jointly announce a working draft of the human genome sequence.
FIGURE 3-1 A timeline of the human genome project.
SOURCE: Adapted from Macilwain (2000:983~.
OCR for page 33
MODELS OF LARGE-SCALE SCIENCE
33
was also accustomed to big-science projects involving sophisticated tech-
nologies. It tended to oversee big, bureaucratic, goal-oriented projects, in
contrast to the smaller, hypothesis-driven research that was the standard
at NIH. DeLisi, formerly chief of mathematical biology at NIH, had been
exploring the feasibility of such a project, and in 1986 he proposed a plan
for a 5-year DOE HGP that would comprise physical mapping, develop-
ment of automated high-speed sequencing, and research into computer
analysis of sequence data.
Soon after, in May 1986, a meeting on molecular biology hosted by
lames Watson at Cold Spring Harbor included a special session dedicated
to discussing the possibility of an HGP. During this session, Walter Gil-
bert estimated the cost of sequencing the human genome at $3 billion
(approximately $1 per base). Many scientists opposed the endeavor on
the basis of cost, as they assumed it would take funding away from other
projects. The project was also viewed by many as a forced transition away
from hypothesis-driven science to a directed, hierarchical mode of big
science. Many argued that sequencing efforts should focus on the genes
rather than the entire genome, which included large areas of repetitive
DNA of unknown function. Searching for and characterizing genes hy-
pothesized to be associated with human diseases was thought by oppo-
nents of the project to be the more scientifically valid approach than
"blindly sequencing the denote. However advocates for the Project
en 1 tJ tJ ' 1 ~
. . . . ~ . . . . . . . . . . . .
argued that a large-scale libel' would be a less risky undertaking than b~g-
science programs in space or physics. A failed space mission or particle
accelerator would be extremely expensive and would be unlikely to yield
partial benefits. In contrast, accomplishing even some of the goals of the
HGP (e.g., an incomplete map or a partial sequence) would likely be very
beneficial. Others suggested, however, that such a project would not ad-
vance medical science, because knowing the sequence of a gene does not
necessarily foster progress in developing new treatments. For example,
the single base-change mutation responsible for sickle cell anemia has
been known for more than 20 years, but no therapies based on this knowl-
edge have yet been developed. Many biologists also viewed DOE's efforts
as a means of expanding its influence and involvement in biological re-
search, as there were questions at the time about the future of the Na-
tional Laboratories, given the volatility of national defense and energy
policy since the 1970s (Cook-Deegan, 1994~. They argued that a federally
funded large-scale HGP, if undertaken at all, should be carried out
through NIH.
One incentive for undertaking a federally funded HGP was to main-
tain a U.S. lead in biotechnology. In the late 1980s, genome efforts were
gaining momentum in several other countries as well (reviewed by
OCR for page 34
34
LARGE-SCALE BIOMEDICAL SCIENCE
Davies, 2001; Cook-Deegan, 1994; Kevles and Hood, 1992, Sulston and
Ferry, 2002~. In 1988, the European Community proposed the launch of a
European Human Genome Project. A modified proposal was adopted in
1989, authorizing a 3-year commitment of 15 million euros, 7 percent of
which would be devoted to ethical issues. Meanwhile, human genome
programs at the national level were also prospering in Europe. For ex-
ample, in 1989 the British government committed itself to a formal human
genome program, funded at 11 million pounds per year for the first 3
years. In France, the Centre d'Etude du Polymorphisme Humain (CEPH),
a key player in developing the genetic linkage map of the human genome,
was founded by Nobelist lean Dausset with funds from a scientific award
and gifts from a private French donor. Through additional support from
the Howard Hughes Medical Institute (HHMI), CEPH made clones of its
DNA available to dozens of researchers in Europe, North America, and
Africa. Japan, which had thus far been involved only marginally in bio-
technology research, was also pushing hard to develop new automated
sequencing technologies, with the objective of a major sequencing initia-
tive. In 1988, the international Human Genome Organization (HUGO)
was formed, primarily with funding from HHMI and the Imperial Cancer
Research Fund in Great Britain (Kevles and Hood, 1992~. Its goal was to
help coordinate human genome research internationally; to foster ex-
changes of data, materials, and technologies; and to encourage genomic
studies of organisms other than human beings, such as mice.
Because of the controversies surrounding the proposed U.S. HOP,
the National Research Council (NRC) was commissioned to undertake a
study to determine a strategy for the project. The NRC study, chaired by
Bruce Alberts, generated a report (NRC, 1988) advocating an interna-
tional program led by the United States and containing the following
recommendations:
Postponing large-scale sequencing until the necessary technology
could be improved, thereby reducing the cost per base (estimated to be
about a 5-year delay)
Making technology development for sequencing a high priority
Focusing first on mapping the human genome
Characterizing the genomes of model organisms (e.g., mouse, fruit
fly, yeast, bacteria)
Providing $200 million in funding per year for up to 15 years
The report did not make a recommendation as to whether the NIH or
DOE should oversee the project. In 1988, however, NIH and DOE reached
an agreement on their working relationship for the next 5 years: NIH
would primarily map the chromosomes, while DOE would develop tech-
nologies and informatics, with collaboration occurring between the two
OCR for page 35
MODELS OF LARGE-SCALE SCIENCE
35
. . . .
agencies In overlapping areas.
In 1988, DeLisi submitted a budget from DOE of $12 million. In the
same year, NIH Director lames Wyngaarden offered lames Watson, Nobel
laureate and codiscoverer of the helical structure of DNA, the position of
associate director of human genome research. Watson built political sup-
port for the project, and made a commitment to devote about 5 percent
of its budget to the study of the project's ethical, legal, and social impli-
cations.2 In October 1989, the unit became the National Center for Hu-
man Genome Research, with a budget of $60 million for fiscal year 1990
(Davies, 2001~.
The HGP actually entailed three related endeavors: genetic mapping,
physical mapping, and sequencing. Genetic mapping is accomplished by
determining the order and approximate location of genetic markers, such
as genes and polymorphisms, on each chromosome. Physical mapping
involves breaking each chromosome into small, ordered, overlapping
fragments and placing these fragments into vectors that can easily be
stored and replicated. For the sequencing phase, fragments of each chro-
mosome are processed to determine the base pair code.3
The U.S. HGP was inaugurated as a formal federal program in 1991,
receiving about $135 million. Seven NIH centers were involved: five fo-
cused on human gene mapping, one focused on mouse gene mapping,
and one focused on yeast chromosome sequencing. These centers were
supported on a competitive, peer-reviewed basis. In 1991, the largest cen-
ter budget was $4 million, divided among several research groups. The
genome installations at DOE's National Laboratories were focused on
developing technologies for mapping, sequencing, and informatics. Four
additional projects, funded jointly by NIH and DOE, were engaged in
large-scale sequencing efforts and innovations. In addition, dozens of
smaller, investigator-initiated gene mapping and sequencing projects
aimed at single disease-associated genes were funded by NIH in laborato-
ries across the country. For example, in 1991 NIH funded about 175 differ-
ent genome projects, with an average grant size of $312,000 a year (about
1.5 times the average grant size for basic research, and about equal to the
average AIDS research grant). Thus, the HGP initially was characterized
more by loose coordination, local freedom, and programmatic and insti-
2 This commitment of NIH funds to ethical debate was unprecedented, as was making
bioethics an integral part of an NIH biological research program.
3 The original plan called for carefully orchestrated sequencing of the fragments derived
from physical mapping; more recently, however, a "shotgun" method has been used to
sequence random fragments from a chromosome, followed by application of computer
algorithms to determine the order of the sequence fragments.
OCR for page 36
36
LARGE-SCALE BIOMEDICAL SCIENCE
tutional pluralism than by strong central management or external hierar-
chy (Kevles and Hood, 1992~.
Criticism of the program continued, however, especially with regard
to funding priorities at NIH. During the late 1980s, the proportion of
grants funded by NIH fell from 40 percent to less than 25 percent (Davis,
1990~. For example, the National Institute for General Medical Sciences
(NIGMS) awarded more than 900 new and competing renewal grants for
projects unrelated to the genome in 1988; in 1990, it awarded only 550, a
43 percent decrease. Across NIH, the total number of grants had fallen
from 6000 to 4,600 a year (fewer than the number funded in 1981~. This
drop caused great consternation among biomedical scientists, and many
assumed that it was due directly to the transfer of funds to the HGP,
though close examination of concurrent changes in NIH funding patterns
suggests that this was not the case. In the mid-1980s, the average grant
period was extended from 3.3 to 4.3 years to provide greater stability for
funded projects and reduce the frequency of grant applications; the aver-
age amount of funding per grant also increased significantly. But this in
turn reduced the funds available for new awards or renewals. During the
same period, the production of Ph.D. scientists in the field of biomedicine
greatly increased, so more people were competing for grant money. Sup-
porters of the HGP argued that the project was bringing appropriations to
biomedical research that simply would not otherwise have been received.
In any case, NIH expenditures on the project in 1991 accounted for only 1
percent of the agency's total budget of $8 billion (Kevles and Hood, 1992~.
In addition, the project's deliberate emphasis on technological and
methodological innovation was contrary to the tradition and preference
of many in the biomedical research community. However, much progress
in biomedical science has been fostered and accelerated by sophisticated
tools and technologies, often those developed through work in other
fields, such as the physical sciences (Varmus, 1999~. Furthermore, unlike
technologies in the field of high-energy physics, those in biology tend to
become smaller, cheaper, and more widely obtainable and dispersed as
they improve. Thus technology development in biology is more likely to
benefit a large number of scientists in the long run, rather than making
the field more exclusive.
The HGP faced a new challenge in 1992 when lames Watson resigned.
Earlier that year, a controversy had arisen regarding patent applications on
gene fragments. T. Craig Venter, who was working at NIH at the time, had
used a high-throughput technique for sequencing fragments of genes from
cDNA libraries (known as expressed sequence tags, or ESTs). NIH applied
for patents on hundreds of ESTs on Venter's behalf. The patents were even-
tually rejected by the Patent Office on the grounds that they did not meet
the criteria of nonobviousness, novelty, and utility. Initial rejection of an
OCR for page 37
MODELS OF LARGE-SCALE SCIENCE
37
application is not unusual, and NIH had the option to appeal the decision,
but in 1994 a decision was made to abandon the effort. These patent appli-
cations were widely criticized by the scientific community at large, and the
issues surrounding DNA patents continue to be controversial.
Francis Collins was appointed in 1993 to be Watson's successor. Col-
lins had been among the first to identify a human disease gene (for cystic
fibrosis) through positional cloning, a technique that relies on genetic and
physical mapping. By the time of his new appointment, he had also been
involved in the discovery of several additional disease genes4 using simi-
lar methods.
The HOP soon faced new criticism. By 1997, the midpoint of the 15-
year project, only 3 percent of the human genome had been sequenced in
finished form, and there were many technical difficulties with the physi-
cal maps of the chromosomes (Rowen et al., 1997; Anderson, 1993~. A1-
though the first 6 years of the project had deliberately focused on smaller
genomes and on the development of techniques that would allow for a
more efficient and cost-effective approach to large-scale sequencing of the
human genome, sequencing technologies had not yet been sufficiently
improved to either dramatically speed the sequencing process or reduce
the cost (Pennisi, 1998~. As a result, there was concern about whether the
project could be completed within the projected timeframe or budget.
In 1998, the technology of DNA sequencing took a major step forward
when the Applied Biosystems Incorporated (ABI) PRISM 3700 entered
the laboratory market (Davies, 2001; Wade, 2001~. While not the first auto-
mated sequencer, the ABI PRISM was still an evolutionary advance over
existing commercial automation because it provided increased capacity
and throughput. It incorporated two major modifications to the original
Sanger sequencing method: it used fluorescent dyes instead of radioactiv-
ity to label the DNA fragments, so that a laser detector and computer
could identify and record each letter in the sequence as the DNA frag-
ments were eluted; and it separated DNA fragments in ultrathin capillary
tubes filled with a polymer solution, rather than the traditional polyacry-
lamide slab gels. These improvements were the inspiration of Michael
Hunkapiller, and the machines were produced by ABI, originally an inde-
pendent company that had been purchased by the scientific instrument
maker Perkin-Elmer (PE) and now a subsidiary of Applera. As a result of
these technological advances, DNA samples could be separated much
more quickly, and several samples could be processed each day using
very small volumes of reagents. The new machines required only about
15 minutes of human intervention every 24 hours, compared with 8 hours
4 The genes for neurofibromatosis ~ and Huntington's disease.
OCR for page 38
38
LARGE-SCALE BIOMEDICAL SCIENCE
for the traditional machine. These changes cut sequencing time by 60
percent, reduced labor costs by 90 percent, and produced sequence about
eight times faster (about 1 million bases a day) than traditional sequenc-
ing methods (Davies, 2001~.
The new sequencing machines were used early on by Craig Venter,
who had left NIH in 1992 to found The Institute for Genomic Research
(TIGR), a nonprofit organization devoted initially to identifying expressed
human genes using EST methods. The organization had since branched
out into other areas of genomic research, such as sequencing the genomes
of bacteria. It was also a major player in the federally funded HGP. TIGR
was the first center to use and verify the effectiveness of the "shotgun"
method for sequencing the relatively small, simple genomes of microbes.
The advent of the new sequencing machines led Hunkapiller to consider
the possibility of rapidly sequencing the entire human genome using a
similar approach, and he brought the idea to Venter. In 1998, Venter left
TIGR to found Celera, initially an independent subsidiary of PE Corpora-
tion and now a subsidiary of Applera the same company that produced
the ABI PRISM 3700 sequencing machines with the goal of doing just
that.
The feasibility of such a project was widely questioned in the scien-
tific community. The PRISM sequencers were still largely untested, and
the shotgun method had never been used on anything other than bacterial
genomes. Many predicted that the final product would likely have many
more gaps and errors than would result from the methodical approach of
the public project because of the size, repetitiveness, and complexity of
mammalian genomes as compared with microbial genomes. Venter and
colleagues (1998) argued that these challenges could be overcome, and
Celera launched a test project to sequence the genome of the fruit fly
Drosophila, a complex eukaryote whose genome was about one-twentieth
the size of the human genome. It took Celera 4 months to prepare a rough
sequence draft of the Drosophila genome, suggesting that the human ge-
nome could be deciphered in this way as well (Loafer, 2000; Pennisi,
2000a).
To accomplish the goal of producing a complete rough draft of the
human genome sequence by 2001 (4 years ahead of the public project's
timetable), Celera purchased about 300 PRISM 3700 sequencers and a
supercomputer for sequence analysis. The company also recruited a large
number of people who specialized in developing algorithms and soft-
ware for sifting through and organizing the huge amounts of data to be
generated. Most notable was Gene Myers, who had already been working
on shotgun assembly algorithms at the University of Arizona. Venter
estimated that the total cost to sequence the human genome would be
about $200-500 million. By this time, $1.9 billion had already been in-
OCR for page 39
MODELS OF LARGE-SCALE SCIENCE
39
vested in the publicly funded HOP, but questions were raised as to
whether Celera's efforts would now make continuation of the public
project redundant and unnecessary. On the other hand, supporters of the
public project believed the new challenge from Celera was ample reason
to accelerate the public effort. Some of the concern stemmed from the
potential commercial exploitation of genomic data, although the com-
pany had announced that it would seek patents on only 100-300 genes.
The Celera business plan entailed selling access to sequence analysis,
such as information on gene identification, DNA variants, medical rel-
evance, and comparisons with other species. Celera still planned to re-
lease raw sequence data free of charge, but only every 3 months, as op-
posed to every 24 hours as in the public project (Davies, 2001~.
Shortly after the launch of Celera, the Wellcome Trust doubled sup-
port for the Sanger Center, Great Britain's main sequencing center in the
public effort. Francis Collins also suggested producing a public rough
draft of the sequence first, by 2001, to coincide with Celera's target date.
The public consortium would then release a finished, "gold-standard"
version by the original deadline in 2004, a goal that Celera had never
established. To meet this new deadline, Collins redirected the bulk of
available NIH funds to just three centers, announcing that these three
centers would receive $80 million over 5 years. At about the same time,
the Wellcome Trust announced that it would provide another $7 million
to the Sanger Center. Thus the lion's share of the draft sequence would be
produced by five major genome centers: Sanger, three centers funded by
NIH (Whitehead Institute, Washington University, and Baylor College
of Medicine), and DOE's Joint Genome Institute. To meet the new goal,
hundreds of PRISM sequencers (or similar machines) were purchased by
the publicly funded centers (Davies, 2001~.
The competition and animosity between the public and private ef-
forts to sequence the genome escalated (reviewed by Davies, 2001; Wade,
2001), but as the self-imposed deadline to finish the draft sequence ap-
proached, a compromise was brokered between the leaders of the two
projects. On June 26,2000, Craig Venter and Francis Collins came together
for a White House press conference to formally announce completion of
the draft sequence. The first publications on the draft sequences were
published about 7 months later in the journals Nature and Science (Lander
et al., 2001; Venter et al., 2001~. Science has been criticized for its decision
to publish Celera's analysis because the company was allowed to post its
data in its own database with some restrictions on its use, rather than
depositing the sequence into a public database such as Genbank, as is
usually required for publication. Leaders of the public project have also
noted that Celera's analysis was dependent upon access to the public
databases, suggesting that the company's shotgun method alone could
OCR for page 69
MODELS OF LARGE-SCALE SCIENCE
69
Recently, the SNP Consortium collaborated with the International
Human Genome Sequencing Consortium40 to publish a paper in the jour-
nal Nature describing a map of 1.42 million validated SNPs distributed
throughout the human genome (Sachidanandam et al., 2001~. Using DNA
from a diversified, representative panel of anonymous volunteers, the
collaborators identified, on average, one SNP for every 1.9 kilobases of
DNA. Such collaboration further demonstrates that public-private coop-
eration can be an efficient means of developing basic research tools.
In the case of SNP analysis, however, international cooperation was
perhaps not as strong as it had been for the Human Genome Project. The
SNP Consortium invited Japanese companies to participate in the project,
but they declined the offer. Instead, 40 Japanese drug firms decided to
provide a total of $10 million to university researchers in Japan to study
SNPs in that country's population. They will establish their own database
of SNPs, but these data will also be made freely available to other scien-
tists (Sciencescope, 2000~.
A new public-private consortium was recently established to further
build on the work of both the SNP Consortium and the HOP. The $100
million HapMap project, with funds from six countries41 and several phar-
maceutical companies, aims to map about 300,000 haplotypes from four
populations in Africa, Asia, and the United States within 3 years (Couzin,
2002b; Adam, 2001b). Haplotypes are sets of genetic markers that are
close enough on a particular chromosome to be inherited together. Using
SNPs alone to identify disease-associated genes can be difficult and ex-
pensive, partly because it is difficult to trace individual SNPs in a genome
containing 3 billion base pairs. Haplotype analysis will reduce back-
ground noise and should make the search for genes easier and faster
because the many individual markers are consolidated into more man-
ageable clusters.
Scientists realized only recently that a haplotype map might be fea-
sible when they discovered that relatively large blocks of DNA are inher-
ited in this way. Computer simulations predicted that DNA haplotypes
40 This collaborative effort was funded by the National Human Genome Research Insti-
tute and the SNP Consortium. Three academic genome research centers the Whitehead
Institute for Biomedical Research in Cambridge, Massachusetts; Washington University
School of Medicine in St. Louis; and the Sanger Centre in Hinxton, United Kingdom-
participated directly in this collaboration. The International Human Genome Sequencing
Consortium includes scientists at 16 institutions in France, Germany, Japan, China, Great
Britain, and the United States, with funding from government agencies and public charities
in several countries.
41 Funders include NIH in the United States ($40 million) and the Wellcome Trust in the
United Kingdom ($25 million).
OCR for page 70
70
LARGE-SCALE BIOMEDICAL SCIENCE
would only be about 10,000 or fewer bases. To their surprise, genome
researchers have found that haplotype blocks tend to be much larger (up
to 100,000 base pairs), and that many such blocks come in just a few
different versions. For example, within some sequence stretches of 50,000
bases, only four of five patterns of SNPs, or haplotypes, might account for
80-90 percent of the population. It is not clear why this occurs, but some
chromosome regions may be less likely than others to recombine during
meiosis, leading to conservation of the DNA blocks (Helmuth, 2001~.
Haplotypes are found by analyzing genotype data, so the new col-
laboration will essentially be a high-throughput genotyping effort. The
work will be done by several biotechnology companies and public labora-
tories, including the Sanger Center and the Whitehead Institute, but deci-
sions are still pending on such issues as how data collection will be stan-
dardized, how the map will be structured, and how the work will be
divided. It is hoped that the new map will provide an invaluable tool to
simplify the search for associations between DNA variations and com-
plex diseases such as cancer, diabetes, and mental illness. However, many
scientists, especially population geneticists, have questioned the value of
generating a haplotype map at this time, arguing that there is too little
information on the usefulness of such a map or how to best to proceed
(Couzin, 2002a).
There is also great interest in developing more efficient, cost-effective
technologies for high-throughput analysis of SNPs (Chicurel,2001~. With-
out such improvements, screening large populations to search for dis-
ease- or therapy-associated genes could still be impractical. A number of
investigators are attempting to improve on the current technolo~v but to
date no coordinated effort has been made.
HUMAN PROTEOME ORGANIZATION
hi,
The Human Proteome Organization (HUPO) is an international alli-
ance of industry, academic, and government scientists aimed at determin-
ing the structure and function of all proteins made by the human body
(Kaiser, 2002; Abbott, 2001~. The mission42 of HUPO is threefold: to con-
solidate national and regional proteome organizations; to engage in scien-
tific and educational activities that encourage the spread of proteomics
technologies, as well as the free dissemination of knowledge pertaining to
the human proteome and that of model organisms; and to assist in the
coordination of public proteome initiatives. The organization's formation
was spurred by concerns that in the absence of such a coordinated effort,
42 help: / /www.hupo.org/.
OCR for page 71
MODELS OF LARGE-SCALE SCIENCE
71
individual companies would generate their own basic proteomics data
and protect them through trade secrecy. The organizers hope to include
more countries than participated in the HOP, and plan to generate fund-
ing contributions from companies, with matching government funds.
HUPO participants have proposed five initial research and technol-
ogy development projects to garner interest from potential funders (see
Box 3-2~. Several companies have already offered financial support, and a
number of countries are launching initiatives related to HUPO's goals.
The NIGMS Alliance for Cellular Signaling is one such initiative, but a
broader role for NIH in a global proteomics project remains unclear. Some
U.S. proteomics experts have proposed establishing a few pilot large-
scale centers to identify proteins en masse with uniform standards from
healthy and diseased tissues and blood serum (Kaiser, 2002~. But many
others question the sensitivity and specificity of current mass spectrom-
eters, suggesting that such an undertaking would be premature, and that
it would be more useful to fund individual investigators to study small
parts of large, complex protein networks (Check, 2002~.
HOWARD HUGHES MEDICAL INSTITUTE
The Howard Hughes Medical Institute (HHMI) provides an example
of an alternative strategy that could be used to undertake large-scale re-
search projects. HHMI is a nonprofit medical research organization that
employs more than 300 biomedical scientists across the United States at
more than 70 universities, medical centers, and other research organiza-
tions. It also maintains a grants program aimed at enhancing science edu-
cation at all levels. One of the world's largest philanthropies, HHMI had
an endowment in mid-2000 of approximately $13 billion, and $600 million
OCR for page 72
72
LARGE-SCALE BIOMEDICAL SCIENCE
was disbursed for medical research ($466 million), science education, and
related activities.
Created by Hughes in 1953, the Institute has always been committed
to basic research, with the charge of probing "the genesis of life itself."43
The organization's charter states that "the primary purpose and objective
of the Howard Hughes Medical Institute shall be the promotion of human
knowledge within the field of the basic sciences (principally the field of
medical research and medical education) and the effective application
thereof for the benefit of mankind." The Institute draws a clear distinction
between itself and other foundations that provide money for biomedical
research in that it operates as an organization with investigators across
the country. Hughes investigators are employed by the Institute but con-
duct their research in the laboratories of their host institutions. The
Institute's work has traditionally focused on five main areas of research:
cell biology, genetics, immunology, neuroscience, and structural biology.
More recently, clinical science programs have been added, as well as a
new focus on bioinformatics. Investigators are free to pursue their own
research interests without the burden of writing detailed proposals for
each project, but their research progress is reviewed by HHMI every 5
years. Scientists who are not renewed as HHMI investigators are pro-
vided with additional phase-out funds for 2-3 years so they will have an
opportunity to seek other funds or gradually scale back their activities.
This approach also eases the strain on affected staff and trainees in the lab
who need time to seek other positions.
In what was perhaps the Institute's first foray into large-scale science
(as defined in this report), HHMI held an Informational Forum on the
Human Genome at NIH in 1986. Subsequently, HHMI played a role in the
HOP by supporting several databases, including one at Yale University;
one at the Centre D'Etude du Polymorphisme Humaine in Paris; and one
at the Jackson Laboratory in Bar Harbor, Maine (Cook-Deegan, 1994~.
Recently, HHMI announced a novel research endeavor for the organi-
zation. This new 10-year, $500 million project44 may be viewed as another
form of large-scale science funded by a nonprofit organization. HHMI
plans to build a permanent biomedical research center that will develop
advanced technology for biomedical scientists and provide a collabora-
tive setting for the development of new research tools. Slated to open in
2005, the new center will have an annual operating budget of about $50
million (Kaiser, 2001~. Research topics have not yet been fully defined, but
are likely to focus on such areas as bioinformatics, proteomics, and imag-
43 http://www.hhmi.org/.
44 See .
OCR for page 73
MODELS OF LARGE-SCALE SCIENCE
73
ing tools (e.g., electron microscopy). Investigators are likely to include
computational scientists, chemists, physicists, engineers, and biomedical
scientists with cross-disciplinary expertise.
The center will provide laboratories for up to 24 investigators (who
will not have tenure), plus their research staffs, for a total of 200-300
people. In addition, laboratories and other facilities will be built for visit-
ing researchers and core scientific support resources. Visiting scientists
will be able to stay for as little as a few weeks or may take a sabbatical
year. Organizers hope this format will allow for rapid shifts into new
areas that show unusual scientific promise and for quick adaptation of
new discoveries for use in biological research and health-related sciences.
For collaborative research at the new center, HHMI will request pro-
posals from the scientific community at large, as well as from its own
investigators. The Institute will seek out proposals focused on cutting-
edge scientific and technological goals, and will give preference to projects
that bring together diverse individuals and expertise from different envi-
ronments. To be successful, proposals will have to demonstrate original-
ity, creativity, and a high degree of scientific risk taking. One goal of these
collaborations is to ensure that all HHMI investigators, regardless of their
home institution's facilities, can obtain access to expensive, high-technol-
ogy tools and the expertise needed to run them (Kaiser, 2001~.
HHMI leaders have acknowledged that the kind of research they are
proposing for the center is more typically undertaken by biotechnology
companies. The Institute will encourage patenting of discoveries made at
the center, which may foster the launch of new startup companies. How-
ever, the generation of royalty revenues or new private businesses is not a
stated goal of the Institute (Kaiser, 2001~. Because this project is still in the
very early stages of planning, predicting its effectiveness or impact on
the broader scientific community is impossible. Nonetheless, it provides
a novel and unique model for consideration.
SYNCHROTRON RESOURCES AT THE
NATIONAL LABORATORIES
Two institutes from NIH, the NIGMS and the NCI, are providing $23
million over three years to support the design and construction of a user
facility at Argonne National Laboratory's Advanced Photon Source (APS),
the newest and most advanced synchrotron in the country. After two
years of planning, NIGMS and NCI, which represent two-thirds of the
life-science synchrotron user community, finalized an agreement early in
2002 to increase synchrotron resources by constructing three new beam
lines at Argonne's APS that will be fully operational by 2005. The facility
is operated by the University of Chicago, but beam time will be adminis-
OCR for page 74
74
LARGE-SCALE BIOMEDICAL SCIENCE
tered by NIH. Half of the beam time will be allocated to peer-reviewed
research. NIGMS and NCI grantees will have access to the beam through
a peer-review process for research grants. Twenty-five percent of the beam
time will be divided between NIGMS and NCI for special projects, and
the remaining beam time will be reserved for staff use and maintenance.
The NIGMS/NCI facility will be fine-tuned to focus on the aspects of X-
rays most useful for biological studies. Demand for beam time is increas-
ing because of such projects as the NIGMS PPSI. NCI is particularly inter-
ested in how the synchrotron facilities will advance the study of
cancer-related molecules, because an understanding of detailed protein
structure will help cancer researchers develop targeted drug therapies.
NIGMS and NCI anticipate that information about molecular structures
will allow scientists to help develop new medicines and diagnostic tech-
niques. Once construction is complete, operation costs for the beam line
are estimated to be $4 million a year, of which NCI has committed $1
million annually. (Cancer Letter, 2001; Softcheck, 2002~.
DEFENSE ADVANCED RESEARCH PROJECTS AGENCY
The Defense Advanced Research Projects Agency (DARPA) provides
another alternative strategy for undertaking large-scale research projects.
DARPA is the central research and development organization for the
Department of Defense. It manages and directs selected basic and applied
research and development projects for the department, with a focus on
projects in which the risk and potential payoff are both very high, and in
which success could provide dramatic advances for traditional military
roles and missions.
The agency was created in 1958 by President Eisenhower following
the Soviet Union's surprise launch of Sputnik (Malakoff, 1999~. An inves-
tigation blamed delays in the U.S. military satellite program on bureau-
cratic infighting and an unwillingness to take risks. Intent on keeping the
United States at the forefront of technological innovations, Eisenhower
ordered Pentagon planners to create an agency that would be completely
different from the conventional military research and development struc-
ture and, in fact, would serve as a deliberate counterpoint to traditional
thinking and approaches. The new agency relied on a small group of
experts to look beyond near-term military needs and to fund areas offer-
in~ great potential to revolutionize military capabilities. Today, the em-
~ ~ ,' ,,
. . . .... .. . . . . .. ~ .. . ~ ..
phases Is skill on seeking out and pursuing novel Ideas. A list of the
agency's founding principles, which are still followed, is provided in Box
3-3.
Best known for its role in developing the Internet (Norberg and
O'Neill, 1996), DARPA has funded work focused primarily on computer
OCR for page 75
MODELS OF LARGE-SCALE SCIENCE
75
and software development, engineering, materials science, microelectron-
ics, and robotics. The agency has had only a limited and very recent
interest in basic molecular biology, and most of its biology research relates
to just one functionprotecting personnel against biological weapons.
However, some of this work could potentially have broader implications
for biological research, such as novel approaches for DNA sequencing
(Alper, 1999) or sophisticated biosensors. Funding for research on this
topic began in 1997, with contracts totaling about $50 million going to
biotechnology ventures and nonprofit organizations. Although a panel of
expert advisors provided some input in launching this program, it is run
essentially the same as all other DARPA programs with hands-on over-
sight by carefully selected program managers (Marshall, 1997~.
With an annual budget of $2 billion, DARPA's small group of about
125 program managers have extensive power to direct high-risk projects
that would not normally fare well in peer review. A DARPA program
manager will typically spend as much as $40 million on contracts to in-
dustry, academic, and government laboratories for one or more projects.
OCR for page 76
76
LARGE-SCALE BIOMEDICAL SCIENCE
The contracts call for defined deliverables and allow less-promising work
to be canceled easily. The agency aims to complete 20 percent of ongoing
projects each year, and renewals are not made, although projects are occa-
sionally reformulated for a subsequent attempt. The funded researchers
often attend team meetings, file frequent reports, and work cooperatively
with other contractors.
Program managers are selected on the basis of their technical exper-
tise and their aspiration to leave their mark on a field. They stay for an
average of 4 years and often return to their primary field of research when
their term is over. In addition to their technical expertise, they must dem-
onstrate bureaucratic skills, as they must lobby for their portion of the
DARPA budget, and be able to move established research communities in
a particular direction or create new collaborations in disparate fields.
Program managers identify opportunities in science or technology that
appear promising, and then make decisions about whom to fund in pur-
suing the ideas. They may make the latter decisions by probing the net-
work of experts in a field to identify the most appropriate researchers, or
by using written specifications to invite experts in the field to apply for
funds. Program managers have only two layers of supervision an office
director and the DARPA director, who reports to the Secretary of De-
fense. These supervisors monitor the performance of the managers and
hold them accountable for advancing their fields, but a major criterion for
success is positive peer assessment of the manager's performance.
This arrangement is in stark contrast to the current model at NIH, in
which peer review is used to select proposals from a competitive pool of
grant applications, rather than to assess the performance of program
managers. NIH grant management staff generally have a comparatively
passive role in project selection. It can also be difficult to determine
whether the selected grant portfolios are actually meeting the goals of
NIH programs.
Ultimately, the strength of DARPA has been in pursuing innovative
research directions to create new fields, or in solving specific technical
problems by fostering the development of new technologies. The agency
is not responsible for sustaining fields in the long run, as is NIH. Thus,
adopting a DARPA model of funding for all NIH programs would be
unworkable. However, the addition of some DARPA-like programs to the
traditional NIH portfolio might add valuable research that would not
otherwise be undertaken.
Indeed, some leaders at NIH, including former director Harold Var-
mus, have recently expressed interest in adopting some DARPA-like pro-
grams at NIH to spark innovation (Malakoff, 1999~. Under the leadership
of NCI director Richard Klausner, NCI has even launched a pilot program
modeled in part after DARPA, as well as other agencies, such as NASA.
OCR for page 77
MODELS OF LARGE-SCALE SCIENCE
77
The Unconventional Innovations Program (discussed earlier) emulates
the DARPA approach by assembling interdisciplinary research teams and
pressing them to share information, with the goal of producing break-
throughs in cancer detection technologies. NCI's traditional peer review
panels still play a major role in selecting projects, but agency managers
are more involved in program oversight than is usual. The program seeks
input from and collaboration with investigators that have not tradition-
ally been engaged in biomedical research.
Despite these new developments and the past successes of DARPA,
however, such programs do not come without difficult challenges and
criticism. One of the greatest challenges to undertaking DARPA-like pro-
grams may be the difficulty of recruiting effective managers. The DARPA
model works best when the manager is an intellectual peer of the scien-
tists being funded. But for biomedical scientists, a 4-year absence from the
laboratory and the resultant lack of published scientific papers during
that period could very well be disastrous from a long-range career per-
spective. In addition, university-based scientists in particular often feel
uncomfortable with aggressive supervision and team-dominated research,
and biomedical scientists have opposed most initiatives that involve
strong external control in the past. Furthermore, it is not uncommon for
DARPA-funded projects to fail in meeting their intended goals. This is to
be expected, given the high-risk nature of the work, but it may not be a
popular approach in other fields. And even when its projects have been
successful, DARPA has had difficulty in moving some findings into the
military venue or the marketplace (Malakoff, 1999~. All of these issues
need to be weighed carefully in attempting to emulate the DARPA pro-
gram in other fields of research.
SUMMARY
As is clear from the examples described in this chapter, the character-
istics of large-scale biomedical research projects can vary greatly, even
when such research is defined relatively narrowly. However, the examples
presented here share many common themes, characteristics, and issues.
For example, most are dependent on technology in the sense that they
require the use of expensive technologies, the development of novel tech-
nologies, refinements to current technologies, or standardization of the
way technologies are used and how the information generated is inter-
preted and analyzed.
Another common feature of the examples described here is a great
need for planning, organizational structure, and oversight. The capacity
of a large-scale project to efficiently and effectively produce data and
other end products that are novel and valuable to the scientific commu-
OCR for page 78
78
LARGE-SCALE BIOMEDICAL SCIENCE
nity can be determined by its design and the skill of the individuals who
oversee the work. Many of the large-scale projects described here are also
quite collaborative and interdisciplinary in nature. For example, the needs
for data assessment and technology development mandate the collabora-
tion of scientists who may not have been involved traditionally in biologi-
cal research, such as engineers, physicists, and computer scientists. This
new approach to biology creates additional challenges in communication
across disciplines, and can also lead to difficult questions regarding train-
ing and career advancement. If interdisciplinary scientists do not fit well
into the traditional models of academic science departments, it may be
difficult to assess their contribution and compensate them fairly with
promotions and tenure. These issues are also relevant to managers of
large-scale projects, who are crucial to the success of the effort, but often
do not find themselves on traditional academic career paths, and may be
given relatively little credit for the accomplishments of the project. These
topics are covered in more detail in Chapters 4 through 6.
One issue common to all large-scale biomedical research projects that
generate research tools or databases of information is that of accessibility.
Concerns are often raised regarding intellectual property rights, open
communication among researchers, and public dissemination of data and
information. Such concerns may be especially pertinent when for-profit
entities are involved in the undertaking. Most projects to date have
adopted a policy of making data publicly available, at least in raw form.
Research tools and reagents generated through large-scale projects funded
by NIH are also often made available to other scientists at cost, but doing
so requires a considerable commitment of NIH resources and infrastruc-
ture support. Clearly such matters need to be thoroughly addressed be-
fore a large-scale project is launched. Chapter 7 examines these issues in
greater detail.
The issue of peer review also appears to be extremely important for
large-scale projects in biology. Many of the early attempts by NCI to
undertake large-scale, directed projects resulted in harsh criticism be-
cause of a lack of peer review, which has been fairly standard for NIH
funding. Traditionally, NIH decisions about which projects and investi-
gators to fund have been made following peer review of project proposals
in grant applications. But peer review could also take other forms, such as
reviewing the progress and achievement of grant recipients to determine
whether funding should continue or whether the project's goals or objec-
tives should be altered. Peer review might also focus on the performance
of program managers who make decisions about which projects and
people to fund, as is done under DARPA. Recently, NIH has developed
some new large-scale programs that incorporate novel approaches to peer
review, whereby steering and advisory committees whose members in-
OCR for page 79
MODELS OF EUGENE SINE
79
clude scientists not directly involved with the protect assess progress and
provide advice on future direchons. It is sUll too early to determine how
e~ecUve these mechanisms are' but Bus far they appear to be acceptable
to the scientific community. These topics are addressed in more detail in
Chapter 4.
Representative terms from entire chapter:
structural genomics