Sugars (see Box 1-1) are everywhere. They are the foundation of all life on Earth. The most important biochemical process on Earth is photosynthesis—plants, algae, and other similar organisms using the energy in sunlight to combine carbon dioxide and water to make sugars. Many of the resulting sugars in plants end up as either starch or cellulose, both polymers of the sugar glucose. Such polymerized sugars—called oligosaccharides, polysaccharides, carbohydrates, or, generically, glycans—are the most abundant molecules on the planet. Cellulose is a polymer of glucose that provides the structural support for all plants and trees, as well as the raw material for clothing, paper products, and wood products. While humans cannot digest cellulose—it is an important part of the indigestible “fiber” in our diets—grazing animals can, and it serves as their major source of energy. Starch is another glucose polymer. It differs only subtly from cellulose, yet humans can digest it into its component glucose molecules, the central feedstock for our metabolic pathways. Human metabolism, and the metabolism of virtually all living things, harvests energy by breaking down glucose into water and carbon dioxide, which is then ready to undergo another round of fixation by photosynthesis.
Glucose is key to life, but it is also central to disease. Diabetes, for example, results when glucose is not properly controlled by normal metabolic mechanisms. High concentrations of glucose can result in organ damage, while low concentrations can lead to loss of consciousness and
Carbohydrate, Glycan, Saccharide, or Sugar?
Carbohydrate: A generic term used interchangeably in this report with sugar, saccharide, or glycan. This term includes monosaccharides, oligosaccharides, and polysaccharides as well as derivatives of these compounds.
Glycan: A generic term for any sugar or assembly of sugars, in free form or attached to another molecule.
Saccharide: A generic term for any carbohydrate or assembly of carbohydrates, in free form or attached to another molecule.
Sugar: A generic term often used to refer to any carbohydrate, but most frequently to low molecular weight carbohydrates that are sweet in taste.
sudden death due to inadequate energy. Diabetics must measure their blood sugar frequently to ensure proper glucose levels. Such measurements account for a significant number of the total number of diagnostic tests conducted each year in developed countries.
But glucose is not the only sugar molecule of importance to human health. Our cells carry complex sugars that comprise individual sugar molecules linked to one another in a multitude of ways. These complex sugars are usually referred to as glycans. Glycans are one of the four major classes of macromolecules—nucleic acids, proteins, and lipids being the other three—that are essential for life and are involved in every aspect of biology, medicine, and a number of practical applications. These other three classes often incorporate or rely on glycans for their activity—nucleic acids contain the carbohydrates ribose or deoxyribose, whereas proteins and lipids often require appended glycans for activity (glycoproteins and glycolipids, respectively). These structures, and combinations of these structures, contain information that is used for a wide variety of biological processes. Key facts about glycans and glycoscience are given in Box 1-2.
For example, one result of 3 billion years of evolution is that every cell of every organism is coated with a layer of glycans—the glycocalyx in animals or the cell wall in prokaryotes, plants, and fungi (see examples in Figure 1-1). The glycocalyx/cell wall contains high information content. On red blood cells the different sugars of the glycocalyx are responsible for the different blood groups—A, B, AB, and O (see Box 1-3). On cells of organs, these and other aspects of the glycocalyx can determine whether a
Important Facts About Glycans
- Glycans are the most abundant family of organic molecules on the planet.
- The potential information content of glycans vastly exceeds that of any other class of macromolecules.
- Every living cell on the planet is covered with a dense and complex array of glycans. These glycans form the glycocalyx in many types of cells (such as in humans) and comprise the cell wall in others (such as plants). Some cells do not have a nucleus, but all have a glycocalyx or cell wall.
- Every molecule, cell, or organism that interacts with a cell must do so in the context of the glycocalyx or cell wall.
- The vast majority of cellular and secreted proteins are modified with glycans, which modify, alter, and/or control their functions.
- Elimination of any single major class of glycans from an organism results in death.
- Every disease that affects humans significantly involves glycans.
- A great majority of host-pathogen interactions involve glycans, via recognition, degradation, or molecular mimicry.
- Most protein therapeutics must be glycosylated properly to be functionally effective.
- Altered glycosylation is a universal feature of cancer and contributes to pathogenesis and progression.
- Many vaccines are glycan based.
- Glycoscience is one of the only fields that directly impacts both the pharmaceutical and energy industries.
- The majority of solar energy trapped as cellular energy is converted to carbohydrates.
- There are no other candidate classes of molecules that can solve our energy and materials needs.
- Petroleum resources have finite lifetimes, but polysaccharide resources are continually being created with the sun’s energy.
- Nitrogen fixation in plants depends on carbohydrate signaling between bacteria and plant roots.
particular person in need of a heart, liver, or kidney transplant can receive an organ from a particular donor.
Indeed, cell surface glycosylation (i.e., the process by which cells create and display their glycocalyx) is as important to understanding life as is the genetic code, yet our understanding of the information contained in glycosylation is rudimentary at best. In large part this lack of knowl-
FIGURE 1-1 Glycans are significant components on biological surfaces and as parts of biological molecules. Top, Image of a red blood cell showing the glycocalyx extending from the membrane surface. SOURCE: Voet and Voet 2010, used with permission. Bottom, Scale model of a protein showing the relative sizes of the N-linked glycans and GPI-anchors that are attached to it. SOURCE: Varki et al. 2009, used with permission.
edge results from two factors: (1) the remarkable structural complexity of glycans found on cell surfaces and (2) a lack of tools for deciphering glycosylation patterns. Glycans thus got “left behind” in the initial phase of the modern revolution in molecular and cellular biology, resulting in a generation of scientists who may be largely unfamiliar with and untrained in the study of these key molecules of life.
ABO Blood Groups
One of the most familiar ways in which the glycan information of a cell influences phenotype is the ABO blood grouping, which is a significant factor in determining which blood transfusions can be carried out. With rare exceptions, human red blood cells contain on their surfaces a core carbohydrate sequence (called the “H antigen”). The familiar ABO blood types derive from further modifications to this H carbohydrate chain. In the genome, the locus that determines ABO type encodes for a glycosyltransferase. Different variants of this enzyme either are non-functional and therefore don’t alter the H carbohydrate (type O) or add slightly different sugars to it (type A and type B; see image). Because a person receives DNA from both parents, the four possible blood types are O, A, B, and AB. Immune antibodies can form against the types of sugar chains that an individual does not have on his or her red blood cells. Thus, a person with type O blood may form anti-A and anti-B antibodies that prevent him or her from successfully receiving blood from anyone other than a similar type O donor. On the other hand, a person with both type A and type B carbohydrate chains will not form antibodies against either and can receive blood from any ABO source. As a caveat, it is important to recognize that the ABO system is not the only factor that determines transfusion acceptance and thus the above description is not absolute. For example, humans also have red blood cell proteins that influence transfusion acceptance (for example, Rh factor). However, the ABO system helps illustrate how small differences in glycans translate to practical, physiological differences. The possibility of modifying the surface glycans on red blood cells to avoid ABO incompatibilities is also being explored (Olsson and Clausen 2008; Liu et al. 2007).
Representation of ABO sugars on red blood cells. SOURCE: Varki et al. 2009, used with permission.
The complexity and high information content of glycans result from the many ways in which they can be assembled from simple sugar building blocks. This is in contrast to the simple ways that building blocks of proteins and nucleic acids—the amino acids and nucleotides, respectively—are linked together. Protein and nucleic acid biopolymers are linear, and every building block is linked to the next through the same kind of connection. By contrast, sugar building blocks can be linked together at many different sites and in different spatial orientations (i.e., stereochemistries), creating both linear and branched polymers with a wide variety of shapes (see Figure 1-2). Between the combination of structural diversity and different possible connection sites, the complexity of glycans increases rapidly. This diversity not only gives rise to many important and interesting biological functions and chemical properties but also creates challenges for synthesis, purification, and characterization—structure elucidation challenges discussed in detail later in this report.
The tools available today for fully characterizing the complex structures of glycans at low levels are mostly destructive, making it largely impossible to follow the changes in glycosylation that occur on a cell’s surface over time. In addition, the diversity of glycan structures makes full characterization of the cell surface glycome (i.e., the totality of glycans with which a cell is coated) an incredible challenge, one beyond
FIGURE 1-2 Comparison of nucleic acids, proteins, and glycans. A, glycan; B, nucleic acid; C, protein.
the capabilities of current technology. Today, it is possible to obtain only a general idea of the composition of the glycocalyx or cell wall, rather than a detailed molecular-level description. Yet these surface glycans are essential to both understanding and treating many diseases. The pattern of sugars on a cell causes pathogens—viruses and bacteria—to attack certain cell types. Many bacteria and viruses recognize specific sugars on particular cell types. In turn, a person’s immune system generates antibodies to these invaders based largely on the glycans on these pathogens. Adding complexity, many pathogens carry out molecular mimicry of host glycans in order to evade immune responses. In addition, there is growing evidence that the glycans on cancer cells differ from those on normal cells, presenting a promising opportunity for diagnosis, imaging, and therapy. In addition to their roles on cell surfaces, glycans play important roles in biological communication and signaling (see Box 1-4).
In the area of energy, sugars play an increasingly important role as scientific innovations drive advances in developing energy sources that will be renewable and contribute less to global climate change. Complex glycans, such as the starches and cellulose in plant cell walls (referred to as biomass), are Earth’s primary storage location for the products of fixation of carbon into molecules via photosynthesis. These glycans are being exploited as renewable sources of liquid biofuels, such as ethanol. As described above, these materials ultimately can trace their energy content to the sun, so they can be thought of as a form of solar energy—and just as renewable. The challenge is to efficiently harvest the energy contained in the large amount of glycans produced by plants.
Glycoscience is uniquely poised to make significant contributions to this need. The polysaccharide components of the insoluble cell walls include cellulose, hemicelluloses, and pectins—polymers of sugars that are sometimes linear (cellulose) and sometimes branched (hemicelluloses and pectins). These walls have a generalized global structure, with cellulose embedded in a matrix of other molecules, although the fine details of wall structure differ across plant species, across different plant tissues and organs, and indeed across walls in single cells. A major challenge to plant glycoscientists is to understand how these cell wall components are bio-synthesized and how they are put together with lignin to form insoluble plant biomass, as well as how to manipulate and break down biomass more effectively in order to release the sugars for development into fuels.
Glycans can also be used as important materials—for example, as gelling agents in foods—and as a renewable resource for high-value chemicals, plastics, and pharmaceuticals. Wood, comprised of lignocelluloses, is a major building material and is used in myriad applications. Other materials, such as most plastics, are derived primarily from petroleum. Glycans can play an important role either as a starting material to
Glycan Signaling in Nitrogen Fixation
Nitrogen is an essential element in biological systems and is a key component of proteins and other molecules. To be usable by most organisms, however, the nitrogen available in the atmosphere must first be fixed or converted into ammonium. Before the development of chemical fertilizers, all nitrogen fixation occurred biologically through the action of bacteria capable of undertaking these reactions. Biological nitrogen fixation remains a significant source of bioavailable nitrogen. Although several types of bacteria can fix nitrogen, one important example is the symbiotic relationship that exists between species of Rhizobia bacteria and the roots of legumes. Chemical signals (flavanoids) released by plant roots activate Nod genes in the bacteria. Turning on these genes leads to the production and release of a glycoconjugate called Nod factor that binds to receptors on plant root cells, leading to changes such as nodule formation and the ability of the bacteria to enter the root. Inside the root nodule the bacteria carry out the nitrogen fixing reaction. The symbiotic process depends on communication between bacteria and plant root through the Nod factor, which is an acylated chitin oligosaccharide molecule that includes lipid and carbohydrate components. This familiar example highlights one of the many ways in which glycans play key roles in biological signaling.
Communication between plant and bacteria during the process of nitrogen fixation. SOURCE: http://www.glycoforum.gr.jp/science/word/saccharide/SA-A02E.html; accessed June 12, 2012.
the same types of feedstocks that are presently obtained from petroleum or as alternative materials that can be converted directly into plastics with similar or even superior properties to those of today’s synthetic materials. As the ability to engineer polysaccharides and tailor their chemical structures and properties advances, the capacity to design new biochemicals and materials with properties that are unachievable today also will greatly expand.
The current view of information flow in biological systems starts with the nucleic acid genome, which codes for proteins that function as parts of networks and whose own roles are still being actively studied. After proteins have been assembled, they are nearly always modified—a process generically called posttranslational modification. The terminal stage in this information flow is often the addition of glycans to proteins (glycosylation), which modulates the proteins’ activity. One way of looking at this process is that the instructions in the genome encodes the properties that will ultimately be observable in an organism (phenotype), whereas the proteome predicts the phenotype. The glycome, however, is the phenotype. The system can also be compared to a switchboard, with the sugars being the “on” and “off” switches or turn pots that modulate the functions of glycoproteins and other molecules and help control the activity of the network. Beyond this digital view of biology, glycans also serve major analog functions, allowing modulating ranges of functions of glycoproteins and other molecules as well as metabolic circuits and networks. Working backward to understand biological systems will require starting with glycobiology, just as working forward requires starting with genomics.
Unlike nucleic acids and proteins, the structures of glycans are not “hard-wired” in the genome. Because of the multiple linkages that sugars can engage in that produce isomers and branching patterns, glycan structures cannot accurately be described as simple linear sequences of building blocks. Rather, a glycan’s most basic structure must be described in three dimensions. Because glycan structures are not template encoded, they are plastic, reflecting myriad factors determined by cellular metabolism, cell type, developmental stage, nutrient availability, other cues from the cell’s environment (Rudd and Dwek 1997; Varki et al. 2009), and stochastic events. As a result, the potential information content of glycosylation is far greater than for all the other types of posttranslational protein modifications combined. It is precisely this enormous diversity and plasticity that are critical to the many biological functions of glycans,
particularly their modulation of glycoprotein activity or localization and their roles in mediating cell-cell or cell-matrix interactions that are key to both normal physiological development and diseases such as cancer.
Today, the glycoscience field is at a place similar to where genetics was at the conception of the Human Genome Project. At that time there was enough of an understanding of genetics to know that a concerted effort to sequence the human genome would lead to both fundamental advances in our understanding of genetics and practical applications that would benefit all fields of science. When this enormous effort began in the 1990s, many scientists questioned if it was even feasible to sequence the 3 billion bases in a human genome. Ten years and $2 billion later, the Human Genome Project not only had sequenced a single human genome but had also spawned a technological revolution that today makes it possible to sequence a human genome in only a week at a cost of $1,000. Similarly, the cost of identifying a single nucleotide polymorphism (SNP), a commonly used marker for genetic traits such as disease, fell from $1 per SNP to $0.004 per SNP, opening the door to a wide range of biological questions inconceivable even 10 years ago.
Another impact of the Human Genome Project has been the democratization of genomics. The result is a revolution in our understanding of genetics that spans the simplest single-celled organisms to the characterization of human variation and disease. Sequencing instruments used to be huge and expensive, and, as a result, sequencing was done only at regional centers. Today, sequencing instruments can sit on a benchtop in any laboratory. Now, any laboratory can get DNA sequenced; computer programs can predict structures from sequences for DNA, RNA, and proteins; and DNA or RNA can be ordered online and delivered the next day.
How did all of this happen in such a short period of time? The transformation of genomics, and the generation of an entire new industry, started with the research community issuing a grand challenge that was a huge leap, something beyond any technical capability available at the time. In the end, the tools that were developed to meet this grand challenge now enable and drive the science. The tools of genomics have democratized the field in such a way that thousands of laboratories are now able to ask and address questions that were previously the realm of only a few specialized facilities. Any scientist interested in getting sequence information can do so. Today, because of incredible success at developing sequencing tools, the real cost of sequencing a genome is dominated by informatics, not by
the physical process of sequencing. Making sense of genomic data costs far more than acquiring the data.
Glycoscience needs to similarly catalyze its transformation from the realm of a few specialists to a core science practiced by many. To accomplish this transformation, new technologies are needed to thoroughly characterize glycomolecules and synthesize them. Both genomics and proteomics have methods for automated synthesis, sequencing, and amplification. The emerging field of glycomics does not. There are large libraries of genes and proteins available for study but only small libraries of glycans and glycoconjugates. Genetic manipulation of genes and proteins is easy but is hard for glycans and glycoconjugates. Finally, the number of enzymes available for manipulating genes and proteins is far larger than the number of glycosidases and glycosyltransferases available. Learning from the experience of genomics, glycomics will need many new and sophisticated informatics solutions to stay abreast of technological developments and avoid the bottlenecks that now limit the advances that come from modern genomics and proteomics.
To fully understand the workings of living organisms and to fully realize the promise of genomics and proteomics, it will be imperative that science now turn its efforts to deciphering the complexity of glycomics. Unless attention is paid to glycans, a major component of biology will be missed. Glycoscience cannot be overlooked. Without a better understanding of the glycome, a clear understanding of cancer, infectious diseases, and the immune response will not be possible. Glycoscience knowledge will be similarly needed in the exploration of improved biofuels and alternative sources of carbohydrate-based energy and in the development of carbohydrate-based materials with functional new properties. It will not be possible to take full advantage of the revolution in genomics and realize the full potential of the Human Genome Project unless close attention is given to glycomics and how cells make and use the myriad complex glycans that decorate their surfaces. At the same time, advances in genomics resulting from the Human Genome Project provide a major opportunity to understand how mutations alter glycan pathways with functional consequences. Indeed, the time is right for the glycoscience community to initiate an undertaking that leads those conducting biological studies to seriously consider incorporating glycoscience into their work.
Several recent advances make now the time to examine challenges and opportunities in glycoscience and outline a possible roadmap forward. In health, for example, changes in glycosylation are common in tumor cells and specific glycans have been identified as biomarkers for
a variety of cancers (Adamczyk et al. 2012). In some cases, this information is being combined with array technologies to provide a base from which to explore key questions in cancer biology. Do particular glycosylation changes play a role in cancer outcome? Which glycans can serve as the most effective biomarkers for different stages and different types of cancer?
In 2011, the U.S. Department of Energy released an update to the Billion-Ton Study, which re-emphasized the significance of biomass feedstocks from non-food crops for energy and materials (DOE 2011). Many of the energy-rich, non-food crops require the conversion of recalcitrant cellulose into useful chemical precursors. Discoveries in the biological pathways by which plant cell walls are synthesized and deconstructed are similarly providing a compelling base from which to further advance the applications of glycoscience to these fields.
Just as studies of nucleic acids and proteins rely on a suite of tools that allow a broad range of researchers to effectively investigate these molecules, so too does glycoscience rely on its own toolkit. Over the past decade, developments in synthetic and analytical methods such as glycan microarrays are enabling high-throughput analysis of the interactions of glycans with proteins, lipids, and other glycan molecules (Rillahan and Paulson 2011). These data are increasingly being combined into glycan databases, to share and aggregate research results within the glycoscience community (Frank and Schloissnig 2010).
Genomics and proteomics have advanced rapidly. Glycoscience and glycomics also have made strides in enabling scientists to understand the role that glycans play in biological systems. Glycoscience researchers have been developing a fundamental knowledge base that can be utilized to help address many of today’s major research problems. This knowledge base, when combined with the current set of available tools to probe glycan structure and function, is a powerful resource to better understand human, plant, and microbial biology.
Glycoscience has, until recently, been explored by only a small group of experts, working with more limited information and resources than are available in fields such as genomics and proteomics. What is known about glycoscience and glycomics, the study of the complete set of glycans in an organism, is still incomplete. But the knowledge currently available now makes it possible to integrate glycoscience broadly into the fields of human health, energy, and materials science, and the set of tools, while not perfect, provides a base to enable further development and discovery.
Recognizing that glycoscience presents a frontier for discoveries across many fields, the National Institutes of Health, Food and Drug Administration, U.S. Department of Energy, and National Science Foundation asked the National Research Council to convene a committee to explore advances in glycoscience and challenges that must be overcome to move the field forward. The committee was also tasked with articulating a roadmap and a vision for future development of the field (see Box 1-5).
The committee deliberated at three in-person meetings and held numerous teleconferences to address its charge and produce the present
Statement of Task
The National Research Council of the National Academy of Sciences will convene an ad hoc committee to assess the importance and impact of glycoscience and glycomics. Glycoscience is the confluence of scientific disciplines that study complex glycans and their relationships to other molecules. Glycans are involved in all phases of life, and an improved understanding could significantly impact diverse sectors of society, including health and energy. While genomics and proteomics have produced unparalleled discoveries that have advanced the understanding of biological processes, the picture these present is incomplete. Glycoscience and glycomics, the systematic analysis and characterization of the structure and function of glycans synthesized by a cell, tissue, or organism, could be a critical next step in building on genomics and proteomics, linking gene function to an observed phenotype, and decoding the molecular makeup of an organism.
In order to realize the potential of glycoscience and glycomics to build on genomics and proteomics and forge major new roads of discovery, the National Research Council of the National Academy of Sciences will convene an ad hoc committee to:
- Conduct an in-depth analysis of the current state of research in glycoscience and glycomics in the U.S.;
- Compare current U.S. and international research efforts in glycoscience;
- Discuss key challenges to the growth and development of the field of glycoscience and glycomics;
- Develop a roadmap with concrete research goals to significantly advance glycoscience and glycomics in the U.S., including the identification of metrics that may be used to help assess efforts to achieve these goals and objectives; and
- Articulate a unified vision for the field of glycoscience and glycomics.
The ad hoc committee will conduct workshops and other data-gathering activities to inform its findings and conclusions, which will be provided in the form of a consensus report.
report. In addition, the committee convened the Workshop on the Future of Glycoscience in January 2012, which brought together approximately 75 glycoscientists and scientific thought leaders with expertise in biology, chemistry, and materials science to discuss the field and its opportunities and needs. The workshop agenda and participant list are provided in Appendix C. The committee also solicited input from the broader scientific community through its public website, which included several questions to inform the study process. These questions are provided in Appendix D, along with further information on the feedback received and the individuals who shared their thoughts with the committee. This report does not focus on the roles of carbohydrates as food sources and nutritional supplements. Although these are important areas to be explored, they were outside the scope of the committee’s study and outside the expertise of the committee’s members.
Chapter 2 discusses current glycoscience research efforts in the United States and worldwide. This general baseline helps inform the rest of the report, which lays out a vision for the future of the field. The chapter provides a brief overview of key messages arising from the committee’s data gathering, with further details and examples included in Appendix B. In Chapter 3 the committee discusses how glycoscience is embedded in the key areas of health, energy, and materials science—areas that help illustrate the breadth and impact of glycoscience as a discipline. In Chapter 4 the committee poses a set of scientific questions and opportunities designed to illustrate more concretely how new glycoscience knowledge would contribute to answering relevant scientific questions in these fields. These questions are not meant to be comprehensive but rather to provide examples of scientific challenges that, if solved, would yield important basic and applied knowledge. Chapter 5 considers the toolkit for glycoscience in such areas as synthesis, analysis, and informatics. These tools are integral to studying glycoscience and will be needed to successfully address the types of challenges described previously. Finally, Chapter 6 presents the committee’s conclusions and recommendations. In conjunction with each recommendation, the committee suggests several 5- and 10-year goals whose accomplishment would significantly advance the field. Together, these goals comprise a roadmap to help enable glycoscience to forge new roads of discovery.
The introductory and concluding chapters of this report are written with a general audience in mind. Chapters 3 and 4, which delve more deeply into the myriad ways that glycans contribute to the three focus areas of health, energy, and materials, presume a basic level of scientific
familiarity, although of necessity do not cover each topic in detail. Chapter 5, which describes the current scientific toolkit for studying glycans, is written largely for the scientific community and for those who have primary responsibility for shaping research programs and directions. The committee’s assessment of this toolkit and of the needs and gaps remaining to advance the field is encapsulated in the report’s concluding chapter, which lays out a glycoscience roadmap and research goals. Appendixes to the report contain committee member biographies (Appendix A) and additional information on the committee’s data-gathering efforts (Appendixes B, C, and D). A glossary of terms also is included (Appendix E).