Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 82
The Polar Genome Science Initiative
Evolutionary processes have created many biological communities
that are as stunning in their beauty and complexity as they are unexpected
by and novel to biological scientists (Diamond, 2001~. Some of the most
fascinating and diverse of these natural experiments have occurred among
the organisms and biological communities of the polar regions. Effective
strategies for exploring polar ecosystems using approaches based on
genome science and other technologies can rapidly advance our under-
standing of these ecosystems.
SELECTION OF ORGANISMS AND CONSORTIA
FOR GENOME ANALYSIS
The success of the publicly and privately financed human genome
initiatives (Lander et al., 2001; Venter et al., 2001) is directly attributable to
the development of high-throughput, low-cost DNA sequencing tech-
nologies and appropriate bioinformatics tools for assembling and anno-
tating the approximately 3 billion base pairs (bp) of the human genome.
Clearly, the sequencing of genomes should no longer be constrained to
"model" organisms or limited by resource considerations, and the Polar
Genome Science Initiative need not focus on technology development.
Nevertheless, the selection of organisms or consortia must be guided by
appropriate criteria. The committee proposes that selection of an organ-
ism or consortium be based on evidence that:
82
OCR for page 83
OCR for page 84
OCR for page 85
OCR for page 86
OCR for page 87
OCR for page 88
OCR for page 89
OCR for page 90
OCR for page 91
OCR for page 92
OCR for page 94
OCR for page 95
OCR for page 96
OCR for page 97
OCR for page 98
OCR for page 99
OCR for page 100
OCR for page 101
OCR for page 102
OCR for page 103
OCR for page 104
Representative terms from entire chapter:
polar genome
THE POLAR GENOME SCIENCE INITIATIVE
83
· analysis of its genome will address broad and significant scientific
questions;
· it is a good model for evolution in an isolated polar environment;
· it provides opportunities for comparisons with organisms of com-
parable ecotype from polar habitats and along polar-to-temperate latitu-
dinal clines; or
· its cellular processes possess characteristics of biotechnological or
clinical interest.
Based on these criteria, the committee provides examples of polar
species and consortia that fit the selection criteria mentioned above, but
certainly other organisms may fit the selection criteria and warrant
sequencing in the near term. The knowledge gained from these organisms
will provide an invaluable framework for identifying other organisms for
future sequencing projects. Whether some or all of the organisms listed
below or other polar organisms are selected for genome analysis will
depend on the availability of funding and on changes in research
. . .
priories.
Prokaryotes
Efforts in prokaryotic microbial genomics over the past decade have
provided a wealth of information on the nature of microbial diversity and
the forces that shape prokaryotic genomes. To date, more than 80 prokary-
otic genomes have been sequenced completely (
84
FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA
related genera, it should become apparent whether the psychrophilic
phenotype is associated with the presence of specific genes that are not
found in nonpsychrophiles, whether psychrophiles have conserved themes
in gene complements that distinguish them from their nonpsychrophilic
counterparts (for example, encoding enzymes for the synthesis of specific
lipids or fatty acid derivatives well suited for life at low temperature, the
synthesis of osmolytes that have cryoprotective properties), or whether
there are genome features (for example, number of duplicated genes,
abundance of mobile genetic elements) that might play a role in adapta-
tion to the cold. In addition, the sequence information would serve as the
basis for conducting functional genomic studies (transcriptome, proteome,
and metabalome analysis) to determine, for instance, whether growth at
low temperature involves differences in the abilities of psychrophiles and
nonpsychrophiles to express their genome complements at low tempera-
ture (for example, the synthesis of transcripts and polypeptides) or
whether it relates to the activities of certain proteins at low temperature.
Within the criteria outlined at the beginning of this chapter, top can-
didates for sequencing would be representative psychrophilic bacteria,
particularly organisms that have closely related psychrotolerant isolates
for comparison. One project currently in progress involves sequencing
the genome of a representative Colwellia sp., which belongs to the gamma-
Proteobacteria. The Colwellia being sequenced is an obligate psychrophile
isolated from Arctic sediments. It would be prudent to include
psychrophiles from other phylogenetic groups for comparison. Other
representatives of the bacterioplankton community might include polar
representatives of the SAR 11 group, which are abundant in polar bacterio-
plankton communities (Bang and Hollibaugh, 2002~. These prokaryotes
may or may not be psychrophiles. A temperate representative of this
group has just been cultured and is currently being sequenced. Given
that SAR 11 small subunit (ssu) ribosomal ribonucleic acid (rRNA) gene
sequences from polar and temperate environments are slightly but con-
sistently different (Bang and Hollibaugh, 2002; Martinez and Valera, 2000),
it is likely that polar populations differ from temperate or tropical repre-
sentatives of this group. Other groups of polar prokaryotes that should
be considered for sequencing include those isolated from the Siberian
permafrost (Vishnivetskaya et al., 2000; Vorobyova et al., 1997) and the
low-temperature Crenarchaeota that have been shown to dominate
Antarctic plankton communities at times (Delong et al., 1994; Massana et
al., 1998; Murray et al., 1998~. Unfortunately, the latter group of organ-
isms does not yet have any representatives in culture. As further research
unravels the ecology and physiology of polar plankton communities,
other candidates for genome sequencing will become obvious.
THE POLAR GENOME SCIENCE INITIATIVE
85
Cyanobacteria. Cyanobacterial mats dominated by oscillatorians are a
feature of streams, lakes, and ponds in both Arctic and Antarctic regions
(Vincent and Neale, 2000; Priscu et al., in press) and constitute a major
component of autotrophic community biomass and productivity in these
polar deserts (Priscu et al., 1998; Vezina and Vincent, 1997; Vincent et al.,
1993~. Surprisingly, although they are abundant in temperate and tropi-
cal oceans, marine cyanobacteria have not been found in polar waters.
Although the polar freshwater ecosystems are predominantly cold, with
summer temperatures rarely exceeding 0°C, most cyanobacteria isolated
from these habitats are psychrotolerant and show optimal growth and
photosynthesis at 15°C or higher (Fritsen and Priscu, 1998; Tang et al.,
1997a). These data imply that polar cyanobacteria evolved from temperate
latitudes and later colonized polar regions (Seaburg et al., 1981; Vincent
and lames, 1996~. Recently, Nadeau and Castenholz (2000) described the
first true psychrophilic strains of oscillatorians (isolated from Bratina
Island, Antarctica) that have optimal growth at 8°C and cannot survive at
temperatures in excess of 20°C. Related Arctic psychrophilic strains were
also identified. Phylogenetic analyses of these polar isolates at the ssu
rDNA level showed that the few psychrophilic oscillatorians described
have arisen in one branch, whereas evolution of the psychrotolerant
phenotype has occurred several times (Nadeau et al., 2001~. Nadeau et al.
(2001) also showed that psychrotolerant strains are most closely related to
organisms of temperate latitudes. The occurrence of a shared rare 11-nt
insertion in concert with phylogenetic relationships implies that psychro-
tolerant strains from both Arctic and Antarctic isolates originated from
temperate species, whereas psychrophilic strains appear to have arisen
independently. A complete genome sequence of a psychrophilic cyano-
bacterium will allow scientists to establish a database for examining issues
of biodiversity, biogeography, and community structure in these impor-
tant polar mat-forming organisms. Such analyses may also reveal the
mechanisms of temperature tolerance of the psychrotolerant species and
the mechanisms of low-temperature adaptation of the psychrophilic
species. Comparison of the psychrophilic Oscillatoria genome with the
genomes of marine cyanobacteria already available may reveal clues as to
the factors limiting distribution of the latter group. Understanding the
evolutionary relationships of polar mat-forming oscillatorians may have
important implications for the study of the origins of life on our planet
and others (see Chapter 2~. Given the important role that cyanobacteria
played in the formation of atmospheric oxygen, knowledge of their phy-
logeny will also provide new information on the evolution of oxygenic
groups and planetary geochemistry.
86
FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA
Protists
Polar algae. Chlamydomonas subcaudata is a green psychrophilic alga
isolated from the permanently ice-covered Lake Bonney in the McMurdo
Dry Valleys in Antarctica (Lizotte and Priscu, 1992~. Because of the ice
cover and subsequent lack of vertical mixing, the temperature, nutrient,
and irradiance regime experienced by this organism in situ is extremely
stable. C. subcaudata is the dominant species in the deep trophogenic zone
(17-20 m) of Lake Bonney, where the average temperature and maximum
irradiance during the austral summer are 4 to 6 degrees C and 14 Wool
photons ~2 S-) (Lizotte and Priscu,1992~. Light penetrating to this depth
is mostly in the blue-green wavelengths owing to differential attenuation
by the ice cover.
As a psychrophile (Morgan et al., 1998), C. subcaudata exhibits unique
physiological responses to low temperature when compared to temperate
algae or psychrotolerant cyanobacteria isolated from the poles. When
exposed to moderate irradiance (150-250,umol ~2 S-~) and low tempera-
tures (5-10°C), most species of temperate algae and psychrotolerant
cyanobacteria show lower chlorophyll content per unit biomass, smaller
amounts of photosystem II harvesting proteins, and increased carotenoids
(Maxwell et al., 1994, 1995; Tang et al., 1997b), resulting in a visually
yellow or orange color. The adjustments in pigment content and light
harvesting capabilities allow the cells to maintain balance between the
light energy absorbed through photochemistry and the energy consumed
through metabolism (Huner et al., 1998), and they protect the cells from
photoinhibition. Unlike the other algae and cyanobacteria, C. subcaudata
displayed none of these physiological characteristics when grown under
moderate irradiance (150 Wool m-2 sol) and low temperature (8°C); (T.
Pocock, 2002, University of Western Ontario, personal communication).
Compared to a mesophilic species C. reinhardtii, C. subcaudata had
rather low levels of photosystem I (Morgan et al., 1998), indicating adap-
tation to a predominantly blue-green light spectrum (Neale and Priscu,
1995~. Furthermore, C. subcaudata possessed high levels of xanthophylls
and low levels of p-carotene, suggesting that this phytoplankton species
has efficient light harvesting but reduced photoprotective ability com-
pared to C. reinhardtii (Neale and Priscu, 1995, 1998~. Despite its constant
exposure and its photoacclimation to low temperature and low irradi-
ance, C. subcaudata retains its capacity to adjust its pigment composition
via the xanthophyll cycle, thereby allowing the cell to photoacclimate to
high irradiance and to resist photoinhibition (Morgan et al., 1998~.
Given its specific adaptation to a narrow spectral distribution and its
ability to photoacclimate to low and high irradiance, genomic comparison
of the C. subcaudata to C. reinharitii could further our knowledge of how
THE POLAR GENOME SCIENCE INITIATIVE
87
algal cells photoadapt and photoacclimate to spectral quality and
quantity. C. reinhardtii, a temperate algae commonly used as model
system is currently being sequenced at Duke University (
88
FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA
lion and function of microsporine-like amino acids (MAAs) in marine
phytoplankton (Moisan and Mitchell, 2001; Riegger and Robinson, 1997~.
Furthermore, the extracellular release of MAAs can be studied in P.
antarctica in the colonial form. MAAs are found in single cells, but they
are also excreted in the colonial matrix material (Merchant et al., 1991~.
Sequencing P. antarctica will define the evolutionary position of the
Prymnesiophytes in general and the evolution of Phaeocystis as the only
Prymnesiophyte genus common in polar waters, clarify systematics of the
genus (Medlin et al., 1994), and enhance our understanding of sulfur cycle
and responses to UV radiation. The Arctic congener, P. arctics, shares
most of the physiological traits of P. antarctica but is a distinct species as
defined by a number of characteristics, including ssu rRNA sequence
(Medlin et al., 1994~. Comparing the genomes of those two organisms
would provide additional insights into the factors driving phytoplankton
speciation.
Metazoans
Antarctic fishes: Dissostichus mawsoni, Chaenocephalus aceratus, and D.
eleginoides. Among polar organisms, the phylogenetic history of the Ant-
arctic notothenioid fishes is, without doubt, the most complete (Chen et
al., 1998; Eastman, 2000; Eastman and McCune, 2000; Ritchie et al., 1996~.
Living at constant extreme cold for ectothermic bony fishes required adap-
tive changes in their biochemical and physiological functions; thus, the
notothenioids are a "swimming library" of cold-adapted genes and
proteins. We have exciting glimpses of some of these changes: (1) the
paradoxical loss of vital cell types, genes, and proteins, including the
oxygen-binding protein hemoglobin and red blood cells in the icefish
family; and (2) the evolution of novel genes that encode proteins with
new functions, exemplified by the antifreeze glycoproteins (AFGPs) of
most notothenioids. Currently, laboratories throughout the world are
engaged in mechanistic studies of biochemical and physiological adapta-
tion to cold and of the gain and loss of genes, but these efforts are focused
largely on discrete traits or gene families.
Sequencing the genomes of three select species of the suborder
Nototheniidae (Gon and Heemstra, 1990) could enhance our understand-
ing of environmentally driven evolutionary processes. Two of the three
species are endemic to the Antarctic and the other is a cool-temperate
relative: (1) the Antarctic toothfish Dissostichus mawsoni, a member the
oldest lineage (the family Nototheniidae); (2) the Antarctic blackfin icefish
Chaenocephalus aceratus, a member of the most derived family (the
icefishes, Channichthyidae); and (3) the Patagonian toothfish D. eleginoides
(a cool-temperate congener of D. mawsoni). Comparative analyses of these
THE POLAR GENOME SCIENCE INITIATIVE
89
genomes should provide major insight into the progression of evolution-
ary events that led to the explosive diversification of the notothenioid
lineage from its origin as a temperate stock. The haploid genomes of
these fishes probably measure ~2 picograms (pg.), or approximately two-
thirds the size of the human genome. Once one fish genome has pro-
gressed sufficiently, the sequencing of subsequent species will be greatly
eased by the ability to assemble onto linkage scaffolds established for the
first.
Mammalian hibernators: The Arctic ground squirrel and the black bear.
Several mammals overwinter in extreme conditions by entering a state of
suspended animation known as hibernation (Boyer and Barnes, 1999~.
Although little is known about the molecular genetic events that underlie
the hibernating phenotype, the interspersed phylogenetic distribution of
hibernating and nonhibernating species has led to the hypothesis that
rather than requiring the creation of novel gene products, hibernation
results from the differential expression of existing genes. Therefore, it is
possible that a small number of genetic events are necessary to acquire the
ability to hibernate. The mammalian hibernator genome project would
focus on sequencing the genomes of two animals that have different strat-
egies of hibernation. The sequencing work would be complemented by
studies of the patterns of tissue-specific gene expression that enable these
animals to express the hibernation phenotype.
Arctic ground squirrels (Spermophilus parryii) and black bears (Ursus
americanus) are suitable for elucidation of the genomic and transcriptome-
level changes that support hibernation because their hibernation cycles
are extremely predictable and physiological changes are so dramatic.
They survive the winters of Alaska without eating or drinking for six to
eight months by reversibly lowering their metabolism. This metabolic
shift has profound ramifications for every mammalian physiological sys-
tem, yet there are significant differences between the hibernation charac-
teristics of squirrels and bears. Ground squirrels reduce their body tem-
perature as much as 40°C and attain core body temperatures near -2.8°C
(Barnes, 1989~. Black bears reduce their body temperature by only about
5°C. Ground squirrels lose protein and bone mass, while bears maintain
both. A comparative approach to analyzing the genomes and the differ-
ences in gene expression patterns in ground squirrels and bears during
hibernation will facilitate our understanding of the underlying molecular
mechanisms that provide tolerance by molecules, cells, and organs to
these extreme changes and will provide great potential for beneficial bio-
medical applications for humans. For example, understanding the molecu-
lar mechanisms of bone mass maintenance in bears may lead to therapeutic
modalities that prevent osteoporosis in chronically hospitalized patients
(Becker et al., 2002~. During hibernation in squirrels, blood flow to the
So
FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA
several tissues is reduced by as much as 98 percent of normal for up to
three weeks, yet no tissue damage from reduced oxygen availability
occurs because the metabolic rate is similarly reduced (Boyer and Barnes,
1999~. Identification of the molecular genetic mechanisms affording pro-
tection from low blood flow and reperfusion may be applied to protection
from injury due to stroke and heart attacks in humans. Another potential
use of data obtained in the study of hibernators could be in the develop-
ment of emergency field medical protocols for inducing a state of hypo-
metabolism in gravely injured humans, for example, soldiers wounded
on the battlefield who cannot be transported rapidly to a medical center.
By inducing reductions in metabolic rate and enhancing tolerance of
reduced blood flow, mechanisms may be developed for sustaining life
until sophisticated medical attention can be given to a patient.
Polar nematodes. Nematodes in Arctic and Antarctic soils are predators
of bacteria, fungi, and other microscopic animals and can be the dominant
invertebrates in some polar soil systems. They are important in soil
foodwebs because they feed on the primary decomposers (bacteria, yeast,
fungi) and influence the rates of decomposition and nutrient cycling.
In the Antarctic Dry Valleys, the bacterial-feeding nematode species
Scottnema lindsayue lives in the extremely dry soils in water films around
soil particles. When unfavorable environmental conditions occur (such as
decreasing moisture and temperature), the animals enter into a metabolic
state, termed anhydrobiosis (life without water), enabling them to freeze
and survive (Treonis et al., 2000~. Favorable soil temperature and mois-
ture revive the nematodes, enabling them to save energy for those periods
most favorable for activity. The gene for anhydrobiois has been found in
temperate fungal feeding nematodes (Browne et al., 2002~. Elucidation of
the molecular mechanisms for survival of a nematode such as S. lindsayue,
which occurs in the most extreme soil environment on Earth, will contrib-
ute to knowledge of developmental biology and to comparisons with the
well-known model nematode, Caenorhabditis elegans, which also feeds on
bacteria but is not found in polar systems (Freckman and Virginia, 1998;
Riddle et al., 1997~. S. lindsaye thus offers an excellent polar organism for
determining and comparing genetic mechanisms of survival to those
already elucidated in temperate nematodes.
Polar insects. Insects are the most common animals on Earth. Nearly
75 percent of the known species of animals are insects. Furthermore,
insects live almost everywhere (except in the oceans), thrive in the Arctic,
but are rare in the Antarctic.
The Arctic beetle, Cucujus clavipes, is extremely cold tolerant, with a
mean lower lethal temperature of - 0°C (J. Duman, unpublished observa-
tions). It occurs over a very wide latitudinal range, from Kentucky to
Wiseman, Alaska (south side of the Brooks Range). This beetle winters in
THE POLAR GENOME SCIENCE INITIATIVE
91
several larval stages and as an adult. Generally a freeze-avoiding species,
C. clavipes prevents its tissues from freezing through production of anti-
freeze proteins. However, the beetles sometimes winter in a freeze-tolerant
state, meaning that they can freeze and survive. Clearly, the genetic
mechanisms for avoidance and tolerance of freezing may be elucidated by
genome analysis, most likely at the level of the transcriptome. Further-
more, Alaskan populations winter in a deep diapause state, whereas those
in Indiana do not. C. clavipes, therefore, provides a model system for
genetic dissection of diapause as well as survival of metazoan tissue dur-
. . .
ng treezmg.
Plants
Betula nana. The dwarf or bog birch is one of the most characteristic
plants of the low Arctic region. It is found around the world and is the
dominant plant in many areas. In other areas, such as Alaskan tussock
tundra, it remains an important secondary species. Since the extent of
shrub cover is critical in controlling snow distribution, Betula nana affects
many aspects of the biophysics and the climate dynamics of the Arctic. In
North America, it is very responsive to environmental manipulation and
changes such as increased nutrients or warming. Experimental warming
experiments (in small, in situ greenhouses) can produce a small forest of
Betula nana. However, in Scandinavia, the same species appears much
less responsive to nutrient additions. Given the importance of Betula nana
in Arctic ecology and climatology, it is important to understand its physi-
ology and its range of responses. Furthermore, it may be important to
understand the nature of genetic variation that exists around the Arctic
world.
Other Considerations
The polar species cited above, based on their biology, represent
examples of compelling opportunities for genome science projects. We
emphasize that not all projects will require the sequencing of the com-
plete organismal genomes. Depending on the scientific questions and
objectives, many projects may be addressed more effectively, and with
greater cost efficiency, by other genome-wide methods (transcriptional
profiling, protein gel profiling). Hence, it will be necessary to develop a
framework for prioritization of polar organisms for full genome sequence
characterization versus functional genomic profiling. Given the small
sizes of most prokaryotic and many protistan genomes, they can be
sequenced to completion provided that a strong scientific justification is
advanced. The large genomes of metazoans and plants, by contrast, will
92
FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA
require careful assessment of the scientific benefit versus programmatic
cost. One possible scenario for initiating a Polar Genome Science Initia-
tive is outlined here:
1. Full-scale genome projects can be launched for one or two meta-
zoan or plant species whose biology is well understood.
2. Meanwhile, 8-10 other animals and plants would be selected for
functional genomic analysis via the construction of EST (expressed
sequence tag) libraries, microarray production, and proteomic profiling.
The results of the functional genomic studies should indicate whether
these genomes deserve more detailed study, and 8-10 EST/proteomic
projects could be executed for the price of one complete genome project.
Gene expression or protein turnover profiles exhibiting potentially "adap-
tive" features would argue for advanced analysis of the appropriate
genomic regions, which could be cloned out of BAC, PAC, or YAC libraries
(see below). Furthermore, the development of specific hypotheses based
on the functional genomic approach would naturally define the appropri-
ate comparative taxa while generating economy in focusing the work.
The importance of "testing" putative environmental adaptations
within the genes and genomes of polar organisms by comparison to
phylogentically related, but temperate, species (criterion 3) cannot be over-
emphasized. A Polar Genome Science Initiative, by its very nature, will
require a strong comparative genomic component; and suggestions for
appropriate species comparison are given in previous sections. Finally, as
the initially exploratory phase of these genomic projects proceed, we
anticipate that each will transition to directed, hypothesis-driven research
based on the discoveries made in the first phase. Rigorous analysis of
adaptation using approaches such as phylogenetically independent con-
trasts (Felsenstein, 1989) will be necessary to distinguish adaptive varia-
tion from the influences of ancestry.
Because the generation times of most polar organisms are so long,
none are likely to be developed into genetic "model organisms." Thus,
the functional attributes and/or biotechnological potential of a gene
obtained from a polar species must be assessed by reverse genetic strate-
gies (e.g., manipulated expression of the gene by antisense morpholino
RNA oligonucleotide "knockdown" [Nasevicius and Ekker, 2000] or by
gene transfer methods, etc.), perhaps conducted in the organism itself or,
more likely, in conventional model systems amenable to such approaches
(e.g., various bacterial species, the plant Arabidopsis thaliana, the nematode
Caenorhabditis elegans, the zebrafish Danio rerio, or the mouse Mus
musculus).
94
FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA
and the fragments are assembled computationally into scaffolds (cf. the
Fugu genome; Aparicio et al., 2002~.
High-Throughput Sequencing of Genomic DNA and Expressed Genes
The sine qua non of a genome project is the ability to sequence rapidly
the fragments of a genome, whether large or small, with sufficient redun-
dancy (six- to tenfold coverage) to reduce the error rate to between 1 in
1,000 nucleotides (a "rough-cut" genome) and 1 in 10,000 (a "polished"
genome). Today, the most common sequencing method is based on auto-
mation of the Sanger dideoxynucleotide chain termination protocol
(Sanger et al., 1977~. If large-insert clones have been used to establish the
physical map, then the clones must be subdivided to produce pieces
(~1-2 kb) amenable to sequencing. Thus, one advantage of the shotgun
mapping strategy is that libraries of short fragments are the starting point.
Once sufficient numbers of sequenced fragments have been obtained,
they are ordered into contigs and the contigs into larger "scaffolds" of
genome sequence. The sequences of expressed genes (for example,
cDNAs, or complementary copies of messenger RNAs [mRNA]) are also
incorporated into the assembly because they help to define the intron and
exon boundaries of the genes in the genome. Irrespective of effort, some
genomic regions will be refractory or "unsequenceable." Often these
regions have a biased, high guanine-cytosine (GC) content or consist of
short repetitive elements that are difficult to resolve. Whereas microbial
genomes are generally sequenced to completion, the "finished" genomes
of eukaryotes will normally contain gaps.
A major consideration for any genome project, such as the Polar
Genome Science Initiative contemplated here, is the cost of the sequenc-
ing itself as well as the computational power required to assemble the
genome. Fortunately, new sequencing technologies promise to reduce
costs to levels unimaginable at the start of the public and private sector
human genome projects. The National Human Genome Research Institute
has just funded GenomeVision to reduce the costs of large-scale gene
sequencing projects by five- to tenfold through miniaturization over the
next two years (GenomeWeb staff, 2002~. Many alternative technologies
are being developed that should increase the speed and accuracy of
sequencing while lowering costs (Lakhman, 2002; McGowan, 2002a).
Thus, the Polar Genome Science Initiative is not only intellectually com-
pelling but also imminently practical and affordable.
THE POLAR GENOME SCIENCE INITIATIVE
Gene Identification and Annotation
95
The genome of the pufferfish, Fugu rubripes, is estimated to contain
~31,000 genes, or roughly the same number as current estimates of the
human genome (Aparicio et al., 2002~. Of predicted human proteins,
~75 percent are orthologous to pufferfish proteins, whereas the remain-
ing 25 percent either are highly divergent or are not encoded by the fish
genome. This comparison emphasizes that gene prediction must be pur-
sued both by orthology and by use of ab initio gene prediction tools.
Following identification, genes must be annotated with data regarding
presumptive function, pattern of expression, and putative orthologues
found in other genomes.
Population Analysis with Single-Nucleotide Polymorphisms
Generating a genome sequence based on one, or at most a few, indi-
viduals of a species represents merely a beginning for population biolo-
gists. Natural variation among genomes in a population is the "stuff" of
phenotypic variation and evolutionary speciation. It is generally assumed
that single-nucleotide polymorphisms and indels are responsible for
quantitative variation in phenotypic traits. SNPs may be used to track
gene flow between separate populations of a species, and their absence
signals that the populations are stratified and perhaps in the process of
speciating due to ecological, geographic, or behavioral factors. Because
DNA-sequencing costs are declining rapidly, the identification of robust
"SNP libraries" for population analysis of multiple species is a realistic
goal.
The capacity to compare distinct populations of a polar species and to
monitor community relationships between interacting species using SNP
technology promises to revolutionize polar ecology. Some potential appli-
cations (Gibson and Muse, 2002) include:
· inference of the demographic history of populations;
· analysis of mating systems;
· conservation biology, including the population forensics of com-
mercially exploited species;
· analysis of breeding structure and dispersal of soil microorganisms,
nematodes, and so forth; and
. .
clahons.
· timing the establishment of host-symbiont and host/parasite asso-
96
FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA
Web-Based Databases and Interfaces for Data Management and
Comparative Genomics
Genome projects produce massive amounts of sequence data and
annotated information. These data must be made available to the wider
biological research community by creation of appropriate relational data-
bases and web-based interfaces. Indeed, the ability to compare genomes
will speed our understanding of genome evolution and the phylogenetic
relationships of all organisms, whether polar or temperate. A compre-
hensive Polar Genome Science Initiative must make provision for cre-
ation, curation, validation, and management of these databases and for
the bioinformatics tools necessary for insightful genome analyses.
Transcriptome Analysis
As emphasized at several junctures in this report, the ability to quali-
tatively and quantitatively describe the transcriptome opens up a number
of new avenues for investigating polar organisms. All taxa can be exam-
ined through transcriptome analysis, and studies can involve complex
microbial consortia as well as individual animals or plants and tissues
thereof. A primary use of transcriptome analysis is to study how environ-
mental factors, both singly and in combination, influence patterns of gene
expression. The environmental factors of interest comprise natural vari-
ables such as temperature and UV radiation and anthropogenic factors
such as organic and heavy metal pollutants. Transcriptome analysis can
provide a "snapshot" of the organism's status in terms of gene expression
and makes it possible to follow the time course of organismal responses to
environmental change.
Although the use of DNA microarrays for examining organisms' tran-
scriptional responses to the environment is in its infancy, there are several
indications of how promising this approach can be for probing the effects
of environmental change. Studies of yeast have shown that a characteristic
set of stress-related genes is activated upon exposure of the cells to a
variety of stresses (anoxia, temperature, alcohols, and so on) (Causton et
al., 2001; Gasch et al., 2000~. Stress-specific alterations in gene expression
were also catalogued in yeast. This technology is becoming accessible to
scientists interested in all types of organisms, from model systems to
species for which no sequencing of the genome has been done (Pennisi,
2002~.
The fabrication of DNA microarrays for transcriptome analysis can
involve a number of experimental strategies. For organisms having a
fully sequenced and well-annotated genome, DNA "microarrays" fabri-
cated with specific oligonucleotide probes for the gene (mRNAs) of inter-
THE POLAR GENOME SCIENCE INITIATIVE
97
est can be built. Customized "microarrays" are available from a number
of commercial sources, and this type of commercially produced technol-
ogy will certainly become increasingly available for transcriptome analy-
sis of many different organisms. In the case of species for which sequence
information is limited or even entirely lacking, the construction of DNA
microarrays must follow a different strategy. To construct microarrays
for non-sequenced species, strategies such as that described by Gracey et
al. (2001) are likely to be effective. Through construction of subtracted
and normalized cDNA libraries, thousands of different cDNAs for spot-
ting onto microarrays can be obtained. Through iterative analysis of
these microarrays, one can screen the cDNA libraries to obtain thousands
of unique cDNAs with minimal redundancy. Techniques are also well
developed for selecting for full-length cDNAs so as to increase the utility
of the cDNAs produced in microarray studies. Although DNA micro-
arrays fabricated for "nonmodel" organisms offer an effective means for
screening changes in gene expression, they have two key limitations. One
stems from the fact that these arrays contain an incomplete representation
of the genome. The second is that the absence of extensive sequence
information limits the identification of many expressed genes. Also, the
usefulness an array constructed for one species in study of another
remains to be determined. This is an important question for future work.
Despite their limitations, DNA microarrays for "nonmodel" species offer
a powerful tool for analyzing the effects of environmental factors on gene
expression.
Proteome Analysis
Changes in the transcriptome do not map one to one with changes in
the proteome (Fiehn, 2001; Phelps et al., 2002~. Thus, depending on the
goals of a study, analysis of the transcriptome may serve as only an initial
step in the study of how environmental changes affect the phenotype.
Proteomics is a powerful approach that allows one to characterize the
suite of proteins present in a cell or tissue.
The applicability of proteomic methodologies to the study of polar
organisms, for which large amounts of DNA and protein sequence data
are not available, appears promising for several reasons. First and fore-
most, the conservation found in the sequences of orthologous proteins
facilitates identification of proteins from genetically uncharacterized spe-
cies. Second, as more and more genomes are sequenced and increasing
amounts of information are obtained on the deduced amino acid
sequences of orthologous proteins, proteomic analysis of nonsequenced
organisms will become increasingly feasible. Targeted proteomics, in
which only a minor fraction of the proteome is analyzed, for example,
98
FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA
using antibody methods, may be the most suitable strategy for screening
changes in the levels of proteins that are of interest in a particular physi-
ological context. Analysis of the transcriptome may point to the set of
proteins that are most important for proteomic analysis.
Metabolome Analysis
Characterization of the composition of metabolites in the cell-
metabolomics carries analysis one step closer to actual physiological
activity (Phelps et al., 2002~. Metabolome analysis allows the charting of
the types of substrates, end products, and biosynthetic intermediates
found in the cell. Through appropriate coupling of analytical technolo-
gies, metabolomic approaches can be quantitative as well as qualitative.
With the advent of protocols in which magnetic resonance spectroscopy
is coupled with effective separation techniques and mass spectrometry,
identification and quantification of virtually all organic molecules in the
cell is becoming possible (Fiehn, 2001~.
Like the analyses of the transcriptome and the proteome, character-
ization of the metabolome offers enormous potential for discerning the
effects of environmental factors on organismal function. Similar to
transcriptome and proteome analyses, metabolomic methods can be
applied to any type of organism and to different cell types and tissues
within an individual.
Ecogenome Analysis
Ecogenomics, the use of genome science to study ecology, has great
potential to advance our understanding of microbial ecology (Stahl and
Tiedje, 2002; Torsvik and Ovreas, 2002~. One promising approach is
metagenome analysis (Rondon et al., 2000~. This approach is based on the
same technology that is used to sequence the whole genome of specific
organisms, but it is applied to entire microbial communities. In these
analyses, large DNA fragments are extracted directly from microbial com-
munities, large extents of sequence are determined, and the sequences are
partially analyzed. In theory, the whole genomes of members of the
community sampled can be assembled. From metagenome analysis, the
following information (at a minimum) can be gleaned: phylogenetic com-
position of the sample, variability of recognizable functional genes, asso-
ciation of functional genes with a phylotype, indications of new and
unsuspected functions, dosage of a particular gene in a chromosome or
contig of interest, and insights into the regulation of gene expression.
One important task for investigators of ecogenomics is to develop means
for studying both culturable and unculturable (at least by present tech-
THE POLAR GENOME SCIENCE INITIATIVE
99
nologies) species, the latter often representing >99 percent of a microbial
community. Thus, microarrays developed for examining community
structure and function must include probes from both culturable and
unculturable species.
Although metagenome analysis is still in the development and testing
stage, it holds great promise for providing a new, integrated view of the
phylogenetic composition of microbial communities and of their functional
capabilities. To date, only a few studies have employed metagenomic
analyses (Bela et al., 2000; Rondon et al., 2000~. Ambitious plans for more
such studies have been announced (McGowan, 2002b). Perhaps the great-
est potential of such studies lies in their ability to address the critical need
to relate microbial phylogeny to function. To some extent, this has been
and can further be accomplished by determining linkages between "func-
tional genes" and indicators of phylogeny such as rRNA genes.
The metagenome approach may be particularly appropriate for polar
problems. First, contig assembly is simplified if simple rather than com-
plex communities are studied. Simple communities may be expected in
some of the more extreme polar environments, for example, wintertime
sea-ice communities, Dry Valley soils (or their Arctic equivalent), lake ice
bubbles, and possibly subglacial lakes (see Chapter 2~. Second, metagenome
approach could yield information about the composition and functioning
of microbial communities that are particularly difficult to sample without
disturbance or that are not amenable to experimental manipulation, such
as subglacial lakes and sea-ice microbial communities.
By analogy to single-organism genomics, ecogenomics must make
the transition from sequencing and annotating metagenomic data to func-
tional analyses. By further analogy to single-organism genomics, several
genomic approaches appear to hold promise for functional ecogenomics.
"Environmental microarrays" have several potential applications. Mea-
surement of the dynamics of large numbers of individual populations
may be possible using probes for indicators of phylogeny. Population
dynamics can then be related to environmental data. Similarly, estima-
tion of the abundance of large numbers of genes with known function in
communities allows population studies of microbial guilds (populations
sharing a common function in a community). Furthermore, using envi-
ronmental microarrays, it may be possible to do transcriptional analysis,
permitting estimates of in situ activity of functional guilds. Complemen-
tary to transcriptional analysis, "environmental proteomics" may provide
an additional approach to estimating the in situ activity of guilds. More-
over, "environmental metabolomics" may provide a third approach for
estimating in situ activities of guilds, which would not be limited by our
genetic knowledge of the organisms in a community. These functional
genomic approaches are applications based largely upon metagenomic
100
FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA
analyses and should be closely coordinated with the latter analyses.
Relevant metagenomic data should be readily available to facilitate the
application of the functional genomic approaches. Finally, bioinformatic
approaches may determine relationships between phylogenetic groups
and measurable functions, particularly if a common database is estab-
lished relating ecogenomic data to phenotypic, geographic, and environ-
mental data (Stahl and Tiedje, 2002~.
All of these functional ecogenomic approaches involve severe techni-
cal challenges, and none has yet been satisfactorily demonstrated. Several
groups are actively developing various types of environmental micro-
arrays (Guschin et al., 1997; Small et al., 2001; Wu et al., 2001~. Progress
has also been made in environmental transcriptional analysis (Bakermans
and Madsen, 2002; Miller et al., 1999; Park et al., 2002~. Environmental
proteomics and metabolomics are currently hypothetical approaches. All
of these approaches must address the extreme complexity of DNA, RNA,
and proteins in most environments, which tends to increase detection
limits and decrease specificity of analyses. Another problem common to
environmental samples is the complexity of the sample matrix, which can
limit analysis recovery and interfere with analyses. Despite the chal-
lenges, the great potential of these ecogenomic approaches merits explo-
ration. Leading microbial ecologists have endorsed ecogenomic research
(Stahl and Tiedje, 2002), and application of ecogenomics to marine micro-
bial ecology has been recommended in a previous report (NSF, 2000~.
Ecogenomics has great potential for addressing some of the research ques-
tions in polar biology outlined in the previous chapter.
Impediments to the Study of the Transcriptomes, Proteomes, and
Metabolomes of Polar Species
Implementation of the study of transcriptomes, proteomes, and
metabolomes of polar organisms faces a number of challenges, most of
which are common to all three types of "-omic" analysis. In each case, the
equipment needed to conduct this research is expensive and requires
skilled hands for its operation. In the case of transcriptome studies, the
equipment needed to fabricate and analyze DNA microarrays for
instance, robotic apparatus for spotting DNA onto slides and for handling
large numbers of liquid samples is very costly and it is not likely that all
research centers will be able to acquire this equipment. Therefore, efforts
should be made to provide access to the technology needed for
transcriptome analysis for scientists working at sites where technology
shortfalls exist. Identical arguments apply in the case of the equipment
required for proteomic and metabolomic studies. When the required
equipment is present at a center, it is likely to be housed in a central
THE POLAR GENOME SCIENCE INITIATIVE
101
facility for use by multiple investigators. Technical support for equip-
ment operation and maintenance will likely be required at these centers.
These technological demands for metabolomic and proteomic studies
apply not only to polar science but to other bioscience disciplines as well.
Support for a central metabolomic and proteomic center by funding agen-
cies will benefit a broad community of investigators, including polar
biologists.
Large amounts of DNA sequencing accompany transcriptome analy-
sis, and facilities for this purpose must be available to investigators. For
DNA microarrays spotted with uncharacterized cDNAs, the spots that
exhibit interesting patterns of up- or down-regulation must be sequenced
to identify the genes undergoing shifts in expression. Sequencing may
also precede the construction of microarrays to enable genes of interest to
be included on the arrays. Sufficient support for DNA sequencing at
research universities may already exist in most cases, so hurdles to imple-
mentation posed by sequencing capacity may be relatively small. Further-
more, where sequencing potential is not found, investigators may be able
to farm out the needed sequencing to commercial firms or to universities
that perform sequencing on a recharge basis.
A final aspect of "-omic" research that merits emphasis is the likeli-
hood that most aspects of these studies will be difficult, if not impossible,
to carry out at remote field sites. It seems impractical, for example, to site
sophisticated robotic systems for preparing DNA microarrays or equip-
ment for mass spectrometric or magnetic resonance experiments in
proteomics and metabolomics at field sites. Instead, what should be guar-
anteed to investigators is the technology needed for sample preservation
(for example, liquid nitrogen or dry ice) and reliable transportation of
samples from the field back to the home laboratory where the sophisti-
cated "-omics" analysis will be conducted.
Bioinformatic Tools and Databases
Another common requirement of "-omics" research is expertise in
bioinformatics. The software needed to organize and to analyze the huge
sets of data generated in all types of "-omic" studies is often available on
web sites at no cost to the user. However, given the interest in looking for
mechanisms of environmental adaptation in a given polar species' genome
sequence, current tools are not likely to be appropriate for the task. An
initiative in polar environmental genomics requires the design and devel-
opment of specific bioinformatics tools that would search for sequence
data that would support or refute hypotheses regarding adaptive processes.
Therefore, collaborations between polar biologists and computational
biologists and bioinformaticists are essential and should be encouraged.
102
FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA
Fellowships may be set up to fund students and postdoctoral researchers
in the computer and mathematics departments to participate in polar
genome studies.
As genomics-based efforts in polar biology research expand, suffi-
cient attention must be devoted to the bioinformatics issues related to the
distribution and use of the information. Genome sequence data alone are
relatively easy to access; however, the diversity of polar organisms being
considered for whole-genome analysis (microbial species and various
eukaryotes and metazoa), together with the desire to link genome sequence
data to geographical and environmental (geochemistry and climatic) data
and temporal data, presents a much greater bioinformatics challenge. This
type of data sharing and linking requires the development of a fully inte-
grated database for which no robust model yet exists. Moreover, investi-
gators must be willing to agree on a set of standards for data format and
data sharing. Such large matrix arrays of data can only be fully analyzed
using network structure models. Maintenance of such databases and
development of network analysis tools would require long-term funding,
quite likely from multiple funding agencies. The need to develop inte-
grated databases to link genome sequence, function, ecological, climatic
and geographical, and temporal data is not unique to the polar research
community. The polar research community should become actively
involved in the ongoing database discussions that are taking place.
CREATION OF A POLAR GENOME SCIENCE INITIATIVE
Given the great potential of genomic science to address important
new research questions in polar regions, some special effort to facilitate
and guide these activities is justified. The goal of such an initiative would
be to gather talent to work on these problems in an efficient and coordi-
nated manner.
One option for a Polar Genome Science Initiative would be for the
community to form some kind of virtual steering committee or core group
to provide leadership, perhaps based on the model offered by the Inter-
national Arctic Polyna Programme (IAPP) of the Arctic Ocean Sciences
Board (AOSB). The AOSB is a nongovernmental body that includes mem-
bers and participants from research and governmental institutions. Its
long-term mission is to facilitate Arctic Ocean research by supporting
multinational and multidisciplinary natural science and engineering pro-
grams. In doing so, it encourages communication, promotes information
exchange, and facilitates discussions of needs and priorities. The IAPP
Science Coordinating Group comprises volunteer scientists who serve to
define the scientific needs and to coordinate the execution of research,
and the AOSB serves primarily to facilitate discussion and build network-
THE POLAR GENOME SCIENCE INITIATIVE
103
ing opportunities. Part of the success of AOSB is the ability to build
international cooperation on what is an inherently international topic, the
Arctic Ocean. AOSB members come from Canada, Denmark, Finland,
France, Germany, Iceland, lapan, the Netherlands, Norway, Poland,
Russia, Sweden, Switzerland, the United Kingdom, and the United States
of America. Although this international emphasis is not necessary for the
Polar Genome Science Initiative, the model of an informal collaborative
body might be useful.
The main advantage of this approach is that it is relatively inexpen-
sive (although there are still costs associated with supporting a secretariat
and quality web presence, and there are costs incurred directly by each
participant for travel). The main disadvantage is that this approach may
not necessarily be able to facilitate implementation of the steering com-
mittee or core group's thinking without the ability to leverage its ideas
and plans (with funds) into activities. It requires a significant amount of
effort from its volunteer participants, so a core group of truly interested
leaders must emerge and be active if it is to make progress. If the Polar
Genome Science Initiative is modeled after AOSB's IAPP within the
United States, concerns might be raised as to why this informal group has
the credibility to "speak" for the discipline. (For more information on the
AOSB, see .)
In the committee's opinion, a more effective approach for a Polar
Genome Science Initiative would be for the National Science Foundation
(NSF) to consider this recommendation as a priority area, providing tar-
geted funding and facilitating establishment of a Science Steering Com-
mittee to lead the planning. This approach might be modeled after the
Arctic System Science Program's (ARCSS) Ocean-Atmosphere-Ice Inter-
action program (or similar ARCSS's programs). The Steering Committee
of the Polar Genome Science Initiative would include representatives of
the relevant biological communities and would meet periodically to do
strategic planning, set research priorities, discuss needs and how they
might be met, solicit further input from the broad biological community,
and encourage coordination and communication.
Using this approach to implement a comprehensive, coordinated Polar
Genome Science Initiative would generate synergies of effort that would
maximize scientific output while minimizing the resources required.
Under this approach, the Scientific Steering Committee would establish
priorities and coordinate large-scale efforts for genome-enabled polar sci-
ence (for example, genome sequencing, transcriptome analysis, coordi-
nated bioinformatics databases). There would be no immediate need for
new facilities or capabilities; instead, the initiative would support "virtual
polar" genome science centers, recruited by NSF from the many extant
genome centers, to provide the equipment and expertise necessary to
104
FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA
support the polar biological community. Other advantages of this strategy
include:
· pooling of resources for analysis of multiple genomes, with con-
comitant economies of scale;
· division of labor to enhance the efficiency of the scientific return;
· coordination of community efforts to avoid unnecessary duplica-
tion of research; and
· provision of uniform databases that facilitate cross-organismal and
interpolar genomic comparisons.
The committee believes that this approach is the most effective way to
move forward, and the Arabidopsis Genome Initiative (see Chapter 5)
shows that a well-planned effort can actually finish ahead of schedule if
tasks are delegated effectively. This approach also makes it easy for new
scientists to participate, because there is a clearly articulated way to engage
the process, make contacts within the network, locate information, and
seek potential research partners from other fields. This approach could
be designed to encourage partnerships between universities and the pri-
vate sector. The main disadvantage of this approach is that it requires
new funding; the polar science community would not likely support it if
it meant taking funds away from existing initiatives.
The committee believes that NSF is well positioned to be the lead in
the Polar Genome Science Initiative. NSF is the nation's preeminent orga-
nization for the support of basic science and the one government agency
with the scope and expertise to foster this type of effort in polar science.
NSF is a major supporter of research in the Arctic and the key provider of
support (with minor other inputs from NASA and others) for activities in
the Antarctic, so it is already the acknowledged leader in advancing polar
science. In addition, NSF has been funding genomic and integrative bio-
logical research through its current programs on Frontiers in Integrative
Biological Research (FIBR) and Genome-Enabled Environmental Sciences
and Engineering (GEN-EN). However, the suggested Polar Genome Science
Initiative is a large-scale research effort that aims to facilitate the applica-
tion of genome research in the polar regions, coordinate the sequencing
efforts of polar organisms, and encourage collaboration between polar
and nonpolar scientists. It is beyond the scope of FIBR and GEN-EN.
Together, its Office of Polar Programs and Directorate for Biological Sciences
already have the expertise necessary to start and manage this kind of
initiative, and they have contact with relevant communities to facilitate
the transfer of knowledge that would be a key component of the initiative.