As this report highlights, glycans are critical players in virtually every biochemical process that makes life possible on Earth, yet compared to the study of nucleic acids and proteins, glycoscience is in its infancy. The reasons that this leg of the four-legged stool of life (nucleic acids, proteins, lipids, and carbohydrates) is underdeveloped are many, but they largely come down to four factors: the astonishing complexity of glycan structures and biochemistry dwarfs the complexity of both nucleic acids and proteins; the relative lack of tools to probe and understand glycan structure and biochemistry in light of that complexity; the lack of adequate data resources and informatics tools for their analysis; and the lack of education in glycoscience.
Today, glycoscience sits at a crossroads. It can continue to be a specialty practiced by a smattering of investigators studying specific problems. If the field stays on that path, it will surely continue to generate important insights but not at a rate that will allow it to live up to its enormous potential and, as this report has already discussed, to address major problems in human health, energy, and materials development. Or glycoscience can take advantage of advances in genomics and proteomics, as well as such fields as chemical synthesis, microbiology, microfluidics, biochemistry, and nanotechnology, to not only enable a more aggressive and comprehensive effort involving a large cadre of researchers but also to provide a blueprint for realizing the transformative leaps in technology and methods that make such an effort feasible and likely to generate enormous benefits to society.
The previous chapter discusses the importance of glycoscience as it relates to the areas of health, energy, and materials science. This chapter describes a series of scientific questions in which glycoscience plays a central role. As in previous chapters, this list is not meant to be exhaustive but is meant to stimulate discussion and highlight the scale of progress that could be made in addressing a broad set of questions embraced not only by glycoscientists but also by the larger science and technology community. Although many of the sections pose questions immediately related to human health, addressing the core issues they raise can have relevance to other areas. Included are examples from the fields of energy and materials, and the committee invites the broader community to develop and embrace other possible challenges important to those subjects.
It is well accepted that “nothing in biology makes sense, except in the light of evolution.” But when it comes to glycans, very little is known about glycan diversification during evolution. Over three billion years of evolution has failed to generate any kind of living cell that is not covered with a dense and complex array of glycans. Why and how has evolution led to this diverse array of glycans, and what are some of their roles, for example, in determinants of host-pathogen recognition? Glycans on host cells may be targets for pathogen recognition, and glycans can undergo subtle changes that may allow evasion from a pathogen, even while preserving sufficient intrinsic function. Glycans appear to remain a preferred class of molecules for the cell surface given their tolerance of such subtle changes, while proteins appear to be somewhat less tolerant of sequence changes, which more frequently result in loss of structure or function. In some cases these changes can result in glycan polymorphisms in the population or even wholesale elimination of specific types of glycans from certain taxa. Once a type of glycan diversification has occurred in a particular species, it could then be recruited for other intrinsic functions, some of which may remain noncritical and some of which may become essential. Thus, glycan diversity may involve continuous evolutionary adaptation and diversification for the generation of intrinsic functions as well as co-evolution through interactions with pathogen and symbionts. Remarkably little is known about the evolutionary diversity of glycans in nature.
One of the confounding features of glycoproteins is the incredible diversity of specific molecular species that can exist even in a single cell. For each site on a protein that can be glycosylated, and often there are multiple sites, the number of glycans that can be attached can be large. Indeed, research suggests that cells create different glycoforms for a given protein as an important means of modulating the properties of that protein and its interactions with other biomolecules (Varki 1993). Glycan diversity as expressed in glycoforms may, in fact, help explain human complexity (Varki 2006; Bishop and Gagneux 2007).
The factors that govern glycan diversity pose significant challenges to the isolation, structural characterization, and synthesis of single glycoforms. The structural diversity of glycans arises from the various linkage combinations between the monosaccharides that make up glycans, and those linkages are determined by an array of more than 250 enzymes in the human secretory pathway that support glycan synthesis and processing, including a suite of glycosyltransferases that add sugars using activated sugar donors and glycosidases that cleave them (Ohtsubo and Marth 2006; Varki 2006). In this network, competing or overlapping substrate and donor specificities, substrate availability, and varying levels of enzyme expression, activation, and localization along the secretory pathway all contribute to functionally significant glycan heterogeneity (Lowe and Marth 2003). While such heterogeneous mixtures serve a purpose in biological systems, they are inadequate for the important task of establishing how the molecular architectures of glycans convey specific biological properties.
Access to homogeneous glycoproteins is necessary to first determine the molecular details of a glycan’s function and then to produce those glycoforms with the desired properties. Today, it is possible to isolate relatively simple homogeneous glycoforms using enzymatic trimming of glycoproteins (Schmaltz et al. 2011), but synthesizing more complex homogeneous glycoforms requires the development of more sophisticated methods for the chemoenzymatic manipulation of glycoproteins as well as new techniques for synthesizing them de novo. Establishment of single glycoform synthesis has been critical to the identification of glycoforms with distinct properties, such as enhanced glycoprotein stability (Hanson et al. 2009; Price et al. 2010; Culyba et al. 2011), altered binding or immunogenic properties (Dwek 1996), and increased therapeutic efficacy (Arnold et al. 2007; Jefferis 2009)—to name a few. The identification of desired glycoforms will also spur large-scale production needs. For example, glycoproteins with increased α-2-3 sialylation of N-glycans
have longer serum half-lives (Stockert 1995; Walsh and Jefferis 2006), and monoclonal antibodies lacking core α-1,6-fucosylation on an N-glycan in a conserved Fc region have higher therapeutic antibody-dependent cellular cytotoxicity (Satoh et al. 2006). Unfortunately, recombinant expression in cells is the only practical method for large-scale production of glycoproteins, and this process results in heterogeneity that does not maximize glycoform efficacy (Sethuraman and Stadheim 2006). Viable routes to targeted human glycoforms by recombinant means will require advances in pathway engineering, a field that is quickly emerging. There have been some recent advances in the use of recombinant glycoprotein synthesis in industry, but widespread use has yet to be achieved.
A challenge, then, is to develop new synthetic, chemoenzymatic, whole-cell routes to create single glycoforms in order to facilitate the ongoing process of better understanding glycan function and producing active glycoforms (Koeller and Wong 2001). In particular, the development of new chemical technologies for de novo glycoprotein synthesis will have the advantage of affording precise chemical control of many aspects of glycoprotein structures, including site-specific incorporation of the glycan into the protein component. At the same time, novel enzymatic and recombinant methods would augment chemical approaches and perhaps be useful for creating homogeneous glycoforms by remodeling heterogeneous glycoprotein populations from natural sources.
Not long after the discovery that most cell surface and secreted proteins are modified by covalently linked glycan moieties, it became obvious that these glycan structures on glycoproteins are extremely diverse. Structural heterogeneity is the general rule even when characterizing the glycans at a single, well-defined glycosylation site on an identified mature protein produced by a uniform population of cells. Similarly, the same protein expressed in two different cell types is often modified by different ensembles of glycans. A central hypothesis of glycobiology is that cellular control over glycoprotein microheterogeneity allows for precise regulation and diversification of function, not unlike the manner in which splice variants of a transcript impart greater flexibility to gene function. However, unlike the characterization of splice diversity, the experimental techniques that would allow assignment of specific functions to identified glycans on individual proteins are still in their infancy. Robust technologies are needed to be able to determine site-specific glycosylation of proteins in complex mixtures. Furthermore, integrated analytic and biological technologies will be required that can make correlations between
glycoprotein glycoforms, recognition by glycan-binding proteins, regulatory factors, and biological functions. Comparing the resulting glycointeractomes in specific physiological or disease contexts will provide an understanding of the molecular mechanisms that control the formation of glycoprotein glycan microheterogeneity, offer insight into the ways in which glycans mediate specific functions, and present unique opportunities for rational development of therapeutic strategies for a wide range of diseases.
The majority of proteins in mammals are glycosylated, and carbohydrate components play essential roles in development, in the immune response, in intercellular communications that may be defective (e.g., in cancer cells), in inflammatory responses, and in many other biological functions. Yet currently, proteins are frequently expressed without carbohydrates, and the effects of glycans on the structure and function of glycoproteins are avoided for expediency. What are the three-dimensional structures of intact glycoproteins? How does the carbohydrate affect the three-dimensional structure? Because the role of carbohydrates is often essential to their function, what techniques need to be developed to determine their three-dimensional structures, inclusive of the carbohydrate components themselves? For all glycans their overall structures ultimately determine their biological functions. When they bind to a specific protein, the three-dimensional conformation of structures known as determinants that have three to six monosaccharide units is involved, and sometimes the multivalent presentation of determinants is important in the strength and specificity of the glycan-protein interaction. It is also clear that the population of glycoforms produced by a cell is key to understanding their overall biological roles in modulating the function of the peptide. Thus, an ongoing challenge is to develop the tools necessary to enable robust and accurate three-dimensional structures of different defined glycoforms of glycoproteins to be determined at the atomic level.
In the early 1980s, researchers made the surprise discovery that nuclear and cytoplasmic proteins are dynamically modified at their serine and threonine residues by an N-acetylated amino sugar (O-GlcNAc) derived from glucose via the hexosamine biosynthetic pathway. While it is becoming clear that O-GlcNAc is essential to life and plays an impor-
tant role in diabetes and neurodegenerative diseases, little is known about how it functions at the molecular or metabolic level. Fundamental knowledge about O-GlcNAcylation is not only essential to understanding chronic diseases, such as diabetes and Alzheimer’s disease, but also without an intimate understanding of O-GlcNAcylation, fundamental cellular physiology cannot be understood. O-GlcNAc glycosylation is nearly as abundant as protein phosphorylation, and the two have extensive cross talk between them to regulate signaling and transcription in response to nutrients and stress. O-GlcNAc also modifies cytoskeleton proteins and the contractile machinery in cells and muscles. Key unanswered questions include:
- Why is nearly every protein involved in transcription extensively O-GlcNAcylated?
- How does O-GlcNAcylation regulate gene expression?
- How is cellular physiology regulated by the interplay between phosphorylation and O-GlcNAcylation by the cross talk with other protein modifications, such as ubiquitylation?
- Current systems biology approaches vastly underestimate the complexity of signaling. To what extent are the human kinome expression and activity regulated by O-GlcNAcylation?
- How are ribosome biogenesis and protein translation regulated by O-GlcNAcylation?
- How does O-GlcNAcylation regulate cytoskeletal and contractile protein functions?
- How are the O-GlcNAc cycling enzymes regulated?
- Currently, the lack of facile tools to study O-GlcNAc is the greatest impediment to understanding the roles of O-GlcNAcylation in cellular physiology and disease. For example, the development of a large number of O-GlcNAc-dependent site-specific antibodies (such as are now available for phosphosites), the production of better inhibitors of the enzymes that control O-GlcNAc cycling, and the invention of imaging methods to visualized O-GlcNAc dynamics in a living cell would dramatically propel this field forward.
The surface of all cells in nature is decorated with a dense, complex, cell-type and tissue-specific array of glycan structures, known as the glycocalyx or, in the case of plants or fungi, the cell wall. Many advances have been made in understanding the structure of cell surface glycans by
releasing, purifying, and fractionating them and studying their structure in detail, an approach often called glycomics. While yielding extremely valuable and important information, this approach amounts to cutting down all the trees in the Amazon jungle and separating and identifying the trees by individual structure and type. This approach does limited justice to the intact forest that is the glycocalyx or cell wall.
Evidence to date indicates that the glycans present on cell surfaces are not always available for recognition by glycan-binding proteins in the same manner as they might be when isolated from one another in a glycan array. Rather, these glycans are present in complex mixtures, interacting with each other in various ways, involving glycan-glycan and glycan-protein interactions. In some instances the nature of these interactions has been recognized as functionally important. In other cases the recognition of a defined structure on a cell surface can vary enormously depending on the nature of the other glycans on the same cell surface. For example, three different glycan-binding proteins that recognize α-26-linked sialic acids and display identical binding patterns on glycan arrays show remarkably different patterns when binding to the surface of red blood cells. Moreover, these patterns are influenced by the ABO blood group status of the cell, which is determined by a different neutral glycan that does not even have sialic acid on it (Cohen and Varki 2010). Evidence such as this indicates that glycans present on the surface of cells can form clustered saccharide patches that are unique entities distinct from the individual glycans themselves. Current evidence also suggests that cell surface glycans regulate the lateral arrangements and associations of receptors on cell surfaces. The glycocalyx even regulates the local concentrations of cations and other small molecules. Today, evidence of such cell surface organization is mostly inferential, but in the future methods might be modified or new methods developed to try to probe such cell surface structures.
There are many cases in which the specific structures of carbohydrates on a single differentiated cell are desired, often because that cell resides among composite groups of cells in tissues made up of many cell types. Such is the case in many cancers in which tumor cells are mixed with normal cells, or in type 1 diabetes where islet beta cells are only one cell type among the islets, which are only a small component among the pancreatic exocrine cells. How can the glycans, glycoproteins, and other glycoconjugates be determined on a single cell? If techniques were available for investigators to rapidly determine the glycomes and
glycoproteomes of single cells rapidly and confidently, how would this revolutionize our understanding of their roles in various developmental, immunological, pathological, and structural processes? How dramatically would this deepen our understanding of important medical conditions, some currently thought of as incurable or involving difficult treatments, such as diabetes, cancer, autoimmune diseases, or drug-resistant infections? Currently available techniques to solve the structures of glycans largely arose out of analytical chemistry, yet they fall far short of providing the sensitivity required to analyze a single cell. A key challenge is to be able to rapidly catalog all of the glycans and glycoproteins on a single cell, whether it is microbial, plant, or mammalian. Many novel technologies will be important to develop in order to sensitively separate the many different glycans and glycoproteins in a single cell, to assess the purity and isomeric heterogeneity of glycans and glycoproteins in the resulting sample, and to confidently determine the structures of the molecules in that sample.
Microbial and host interactions include microbial (pathogen, commensal) recognition of host (animal, plant) glycans, host recognition of microbial glycans, molecular mimicry of host glycans by pathogens, and microbial community interactions involving glycans.
Most pathogens gain initial access to hosts via recognition of host glycans. Conversely, glycosylation of bacteria and viruses plays multiple roles in host-pathogen interactions, including, but not limited to, shielding of immunodominant epitopes, stabilizing viral/bacterial proteins, and acting as sites of recognition for the innate immune system. Notable findings highlight the dramatic interplay between pathogen glycosylation, receptor recognition, and host immune response:
- There is now increased understanding and appreciation of the importance of glycan binding and recognition by lectins, such as the galectins, DC-SIGN, and MBL.
- Many of the pathogen-associated molecular patterns recognized by Toll-like receptors are glycoconjugates.
- There is increasing evidence that glycans themselves can be recognized by the adaptive immune response, including both B cells and killer T cells.
- Glycosylation plays a key role in receptor specificity and recognition by pathogens.
- Molecular mimicry of host glycans by pathogens provides novel virulence mechanisms.
Although there is a general appreciation that glycans play important roles in both pathogen escape and, conversely, activation of immune surveillance, a challenge is to provide a more detailed, mechanistic, and holistic understanding of the interplay between the glycosylation of pathogens and their virulence, evolution, and recognition by both innate and adaptive immune responses. This challenge can be answered by building on previous work in such fields as virology and microbiology, immunology, and structural biology and by utilizing new synthesis, sequencing, and enzymatic tools. Additionally, answering this challenge will require detailed descriptions, through a combination of analytical, structural, and data-mining approaches, of protein-glycan interactions, including those involving antibodies and lectins. Such knowledge will enable engineering of specific binding and recognition by tailored antibodies and lectins. Answering this challenge would not only dramatically increase our understanding of the evolution of infectious agents, including drug-resistant forms, but also would facilitate development of the next generation of vaccines and therapeutics.
Major roles of glycans in health and disease are mediated by binding proteins (GBPs) that decode the information content of the glycome through their recognition of glycans as ligands (Sharon and Lis 2004; Bishop and Gagneux 2007; Crocker et al. 2007; Taylor and Drickamer 2007; Varki 2007; Imberty et al. 2008; van Kooyk and Rabinovich 2008; Taylor and Drickamer 2009; Rillahan et al. 2011). Thus, key to understanding the functions of glycans is elucidation of the functions of GBPs that recognize them. For example, mammalian GBPs mediate diverse biology, including trafficking of white blood cells to sites of inflammation, regulation of cell-signaling receptors, and aiding the immune system to distinguishing between self and nonself. Plant GBPs mediate defense against pathogens while also facilitating critical symbiotic relationships with bacteria required for essential processes, such as nitrogen fixation. GBPs of pathogenic and commensal microorganisms mediate attachment to glycan ligands as their receptors on host cells. Although there has been enormous progress to define the roles of exemplary GBPs over the past two decades, the sum of the knowledge to date represents the tip of the iceberg. With the exception of humans and other mammals, no systematic effort to identify GBPs has been made, particularly for microorganisms,
including the estimated 6,000 species of commensal bacteria that comprise the gut metabiome.
Progress in elucidating the roles of known GBPs and their mechanisms of action is hampered by a lack of robust tools to establish even the most essential information. Although glycan arrays have demonstrated their utility in defining the ligand specificity of a GBP, the number of glycans elaborated on the largest arrays, comprising approximately 600 glycans, represents a tiny fraction of the human glycome. Also lacking are arrays of microorganism glycans needed to define the specificity of GBPs from animals, plants, and other microorganisms that bind them. Similarly, although analytical methods exist to profile glycans of complex biological systems, these methods provide limited information, and methods to routinely determine the complete structures of glycans from biological materials are lacking. Also needed are glycan reagents to probe the functions of GBP-ligand interactions and to produce glycan-specific antibodies. However, there is an extremely limited supply of synthetic glycan reagents, and current methods of synthesis produce them in small amounts with great effort and resources.
To address these needs, better methods to synthesize glycans, to build libraries of natural glycans for the expansion of glycan arrays, and to produce a reagent bank or service to produce glycans as reagents for biological studies are needed. There should also be a concerted effort to develop analytical methodologies that will permit determination of the complete structures of glycans from biological samples.
The major obstacle to perennial cellulosic materials serving as an economical sustainable source of liquid fuels is an inability to overcome the recalcitrance of cellulosic biomass to conversion into the sugar intermediates needed to produce liquid fuel (see Sections 3.2.1 and 3.2.2). In addition, the economic and efficient release of oligosaccharides and polysaccharides for use as replacements for petroleum-based materials is of considerable importance. An outstanding question in the area of biomass should be how to modify the cell wall, the major component of biomass, to make it less recalcitrant to processing for energy and biomaterial production. Such advances would be sufficient to create a cellulosic biofuels and biomaterials industry.
Lignocellulosic plant cell walls give shape and protection to plant cells, tissues, and organs. These cell walls have evolved resistance to microbial and enzymatic deconstruction, and it is this recalcitrance that is largely responsible for the high cost and slow kinetics of lignocellulose
conversion to energy or in the extraction of cellulose or other cell wall polymers. Current technologies to overcome recalcitrance have primarily been developed empirically with little knowledge of the biological and chemical properties of biomass. Increased understanding of plant cell wall biosynthesis and detailed structural information on cell wall components now provide unique insight for modifying cell wall biosynthesis and targeting the deconstruction of cell wall components for the production of bioenergy, bioproducts, and cellulose nanomaterials (see Section 3.2.1). However, the detailed architecture of the cell wall is far from complete. Currently, partial structures of pectin and hemicellulose matrix polysaccharides, cellulose, and lignin can be described, yet the description provides little insight into how these polymers interact to form an insoluble lignocellulosic cell wall.
Glycoscience can address this issue by developing more powerful characterization and modeling techniques, which could be used to advance research into cell wall biosynthesis, architecture, and deconstruction, as well as new methods for manipulating the genetics of biomass species and the microbes and enzymes that deconstruct biomass (see Sections 5.2, 5.3, and 5.4). Key questions to be addressed are:
- Whatisthedetailedstructureandwhataretheinterconnectionpoints in the wall between cellulose-hemicellulose-lignin-pectin-protein?
- What are the linkages or interactions that can be modified during synthesis or broken to gain access to the cellulose or matrix polysaccharides so that they can be extracted or broken down to sugars?
- What are the control points for cellulose synthesis and the cellulose microfibril structure, particularly as they pertain to the production of cellulose nanoparticles?
- What are the morphology, the crystal structure, and the surface chemical characteristics of the extracted cellulose nanoparticles and how can these be modified?
The ability to reassemble sugar units on demand to make new polymer structures will provide limitless opportunities to design materials that have tailored properties or that are functionality specific for a given application. The foundation of what gives a polymer its material properties is its chemical structure, and having the ability to strategically design the repeat sugar unit(s), the combination of different sugars, the
type of linkages between the sugars, the specific side groups that branch off the backbone structure, and properties that control self-assembly into domains or overall polymer structures gives tremendous flexibility to make new polysaccharide-based materials. Such a capability would allow scientists and engineers to develop the next generation of sugar-based polymers and nanomaterials with the desired mechanical properties, high-temperature stability, and on/off-switchable biodegradation mechanisms that the plastics industry needs to expand the numbers and types of sustainable and biodegradable products that society is increasingly demanding. One can imagine, for example, the significance of the development of a transparent plastic water bottle produced from renewable materials, with a long shelf life and the ability to switch “on” a biodegradation mechanism, when necessary, so that the bottle degrades by mechanisms abundant in nature.
Polysaccharide synthesis is currently limited to relatively small quantities of structures with relatively low degrees of polymerization and limited side-group functionalization. There are also difficulties in isolating specific polysaccharide structures, which complicates the structural characterization of these materials. These factors greatly impede the development of new polysaccharide-based materials. To achieve rapid prototyping of these materials, it will be necessary to expand the capabilities and interconnections between synthesis, characterization, and modeling tools. Areas for which further developments would be useful include the synthesis of polysaccharide polymers with higher degrees of polymerization, tailored backbone structure, and targeted side-group functionalization and the synthesis of greater than milligram-scale quantities. The ability to produce up to gram-size quantities of a homogeneous polymer structure is necessary to evaluate the role of a given derivation of the resulting material properties, such as structural configuration, thermal stability, reactivity, rheology, and mechanical characteristics. Being able to produce gram-size quantities of materials also greatly expands the number of available property characterization tools to test and evaluate it. Improvements in polysaccharide isolation procedures to produce homogenous samples, in characterization methods that have increased structural sensitivity and speed, and in predictive modeling to give additional insight into synthesis pathways and properties of the resulting materials, will be important as well.
The questions posed in this chapter address several overarching themes, many of which reflect gaps in knowledge about fundamental biological and biochemical processes:
- understanding glycan diversity—how it arose, what it does, and how to study it;
- understanding the roles of glycans as modifiers of other biological molecules, such as proteins (What are the functions of glycan modifications on glycoconjugates, and how is this process controlled?);
- understanding the role of glycans inside cells—for example, in nuclear and cytoplasmic glycosylation;
- understanding the roles of glycans on cell surfaces—for example, as the glycome; and
- understanding how to control or manipulate glycans for desired ends—for example, to reduce plant cell recalcitrance to degradation or to design polymeric materials with new properties.
Although these areas address fundamental aspects of glycans, they also form a base from which to apply glycoscience knowledge to solve practical problems, such as how to better diagnose and treat diseases, how to create new fuels, and how to design improved products. Addressing the questions posed in this chapter has relevance to glycoscientists and nonglycoscientists alike. Chapter 5 turns to a discussion of the tools and technologies needed to help address these questions.