5

The Toolkit of Glycoscience

The incredible opportunities for glycoscience in health, energy, and materials science described in the previous chapters and in the questions posed in Chapter 4 can only be realized with a set of new analytical tools. Today, glycoscience is practiced by a relatively small community of biologists, chemists, and materials scientists. This community must expand if glycoscience is to extend its impact and become pervasive. The broader scientific community must participate in the development of the necessary tools that will transform the field and empower researchers in both the glycoscience field and the larger scientific community to incorporate glycoscience into their research pursuits as a matter of course. To this end, glycoscience needs new analytical tools, including methods development for separation, purification, characterization, localization, and structure identification. The tools used today are limited in their capabilities and will not enable realization of glycoscience’s full potential. The best analytical chemists and other measurement scientists, including tools developers, need to turn their attention to glycoscience and bring their creativity to the field. They need to apply existing tools and methods that have not yet been applied to glycoscience, and they need to develop new tools to solve analytical problems that existing tools cannot address.

Similarly, the synthesis community needs to begin to embrace glycochemistry as an essential field of organic chemistry. Glycochemistry needs to be brought into the mainstream of synthetic organic chemistry rather than kept as a specialized field practiced by only a handful of glycochemists. New synthetic methods need to be brought to bear



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 85
5 The Toolkit of Glycoscience The incredible opportunities for glycoscience in health, energy, and materials science described in the previous chapters and in the questions posed in Chapter 4 can only be realized with a set of new analytical tools. Today, glycoscience is practiced by a relatively small community of biolo- gists, chemists, and materials scientists. This community must expand if glycoscience is to extend its impact and become pervasive. The broader scientific community must participate in the development of the neces- sary tools that will transform the field and empower researchers in both the glycoscience field and the larger scientific community to incorporate glycoscience into their research pursuits as a matter of course. To this end, glycoscience needs new analytical tools, including methods development for separation, purification, characterization, localization, and structure identification. The tools used today are limited in their capabilities and will not enable realization of glycoscience's full potential. The best analyt- ical chemists and other measurement scientists, including tools develop- ers, need to turn their attention to glycoscience and bring their creativity to the field. They need to apply existing tools and methods that have not yet been applied to glycoscience, and they need to develop new tools to solve analytical problems that existing tools cannot address. Similarly, the synthesis community needs to begin to embrace gly- cochemistry as an essential field of organic chemistry. Glycochemistry needs to be brought into the mainstream of synthetic organic chemis- try rather than kept as a specialized field practiced by only a handful of glycochemists. New synthetic methods need to be brought to bear 85

OCR for page 85
86 TRANSFORMING GLYCOSCIENCE on glycan synthesis, and the creativity of the entire synthesis com- munity needs to be leveraged to solve the long-standing and vexing problems of stereoselective, regioselective syntheses with simple, high- yielding reactions. The biochemistry and genetics communities need to participate in identifying all enzymes and characterizing all pathways involved in glycan metabolism. Finally, computer scientists, modelers, and bioinformaticists need to be fully engaged. The community needs to set up glycoscience databases and integrate glycoscience into existing biological databases. Glycan and proteoglycan structure prediction and modeling tools need to be developed. Full interaction pathways must be developed to incorporate all aspects of glycobiology into systems biology. Details of these opportunities are described in the remainder of this chapter, but the main message is clear: Glycoscience needs the full participation of the broader scientific community to help develop tools that can solve some of the most vexing problems in glycoscience and to catalyze its integration into the scientific mainstream. By helping develop tools for glycoscience, it is expected that these tools will have follow-on benefits to all fields of science. 5.1SYNTHESIS 5.1.1 General Aspects The development of routine procedures for automated chemical syn- thesis of oligonucleotide fragments (DNA and RNA) and peptides has brought significant change to modern biology. Unfortunately, no gen- eral methods are available for the preparation of complex carbohydrates (Boltje et al. 2009; Kiessling and Splain 2010). As a result, the synthesis of a target is often a research project unto itself, which may take many months and in some cases years to complete. This problem is compounded by the fact that glycoconjugates in biological samples are often found in low con- centrations and in microheterogeneous forms, greatly complicating their isolation and characterization. Glycomes of eukaryotic organisms are extremely diverse; for example, it has been estimated that the human gly- come contains 10,000 to 20,000 minimal epitopes for glycan-binding pro- teins (Cummings 2009). Thus, robust synthetic technologies are urgently needed that can readily provide large collections of complex oligosac- charides. Furthermore, biological and analytical studies often require glycans to be modified by a tag, immobilized to surfaces, presented at a multivalent scaffold, or attached to a lipid, peptide, or protein (Seeberger and Werz 2007; Rich and Withers 2009). As a result, additional technolo- gies are required that can readily provide such conjugates. Current approaches for obtaining well-defined oligosaccharides and

OCR for page 85
THE TOOLKIT OF GLYCOSCIENCE 87 glycoconjugates include chemical synthesis, enzymatic and chemoenzy- matic synthesis, and microbial production (Boltje et al. 2009; Kiessling and Splain 2010; Hsu et al. 2011; Schmaltz et al. 2011). The next sections cover the scope and limitations of these methodologies. Despite the shortcoming of these technologies, they have been instrumental in addressing a num- ber of important problems in glycobiology research and for the discovery of vaccines and therapeutics. In particular, the Consortium for Functional Glycomics, funded by the National Institute of General Medical Sciences, has employed a chemoenzymatic approach for the preparation of a col- lection of approximately 600 glycans derived from N- and O-linked glyco- proteins and glycolipids (Stevens et al. 2006; Rillahan and Paulson 2011). These compounds are modified with an artificial aminopropyl linker, which allows covalent attachment to N-hydroxysuccinimide-activated glass slides. The resulting microarrays have found wide utility for inte- grating binding specificities of a diverse range of glycan-binding proteins, determining dissociation constants and dissecting binding energies, and analyzing multivalent and hetero-ligand binding. The species-specific nature of the interaction between virus and host glycans and determina- tion of ligand specificities of monoclonal antibodies have allowed use of glycan arrays in rapid assessment of influenza virus receptor specificity. A significant barrier to widespread use of glycan arrays, however, is the limited availability of well-defined oligosaccharides, and current arrays contain only a fraction of naturally occurring oligosaccharides. Also, very similar arrays displaying very similar glycans can, nevertheless, provide significantly different results with regard to GBP binding. There are excit- ing challenges ahead before glycan arrays can become a standardized method of analysis. Development of a fully synthetic heparin fragment for treatment of deep vein thrombosis exemplifies the importance of the organic synthesis of glycans. Heparin and heparan sulfate are naturally occurring linear polysaccharides that are modified by sulfate esters. A heparin-derived pentasaccharide that can bind to antithrombin III (AT II) and that exhib- its anticoagulant activity has been identified. A fully synthetic analog (fondaparinux) of this domain has been developed, which is being pro- duced on a multikilogram scale to treat deep vein thrombosis (Petitou and van Boeckel 2004). In contrast to porcine mucosal tissue-derived heparin, the synthetic compound is easy to characterize and has a much-improved subcutaneous bioavailability. The importance of synthetic oligosaccha- rides for anticoagulation therapy was highlighted by the recent discovery of batches of heparin that caused hypotension and resulted in nearly 100 deaths. These reactions resulted from contamination with oversulfated chondroitin sulfate, which is a popular shellfish-derived oral supplement for the treatment of arthritis (Guerrini et al. 2008). The ability to synthe-

OCR for page 85
88 TRANSFORMING GLYCOSCIENCE size pure, well-characterized glycans would eliminate the need to rely on poorly characterized and highly variable glycans obtained from natural sources. Heparin and heparan sulfate are examples of glycosylaminoglycans (GAGs), which have been implicated in many other biological processes and can have pronounced physiological effects on lipid transport and adsorption, cell growth and migration, and development (Bishop et al. 2007). Significant changes in the structure of GAGs have been observed in the stroma surrounding tumors, which is noteworthy when considering tumor growth and invasion. GAGs are also involved in neurobiologi- cal processes and, for example, have been implicated in neuroepithelial growth and differentiation, neurite outgrowth, nerve regeneration, axonal guidance and branching, deposition of amyloidotic plaques in Alzheim- er's disease, and astrocyte proliferation. Large arrays of well-defined heparan sulfate oligosaccharides are needed to identify compounds that mediate or inhibit these processes. It is possible that synthetic analogs of heparin may find application in the treatment of several neurological diseases, cancer, and infection. Synthetic oligosaccharides have also been used in the development of vaccines for such diseases as Haemophilus influenzae type b, HIV, Plas- modium falciparum, Vibrio cholerae, Cryptococcus neoformans, Streptococcus pneumoniae, Shigella dysenteriae, Neisseria meningitides, Bacillus anthraces, and Candida albicans (Costantino et al. 2011; Morelli et al. 2011). Polysac- charides isolated from natural sources, which are conjugated to a carrier protein, are used in prevention of life-threatening bacterial infectious dis- eases such as meningitis and pneumonia. However, the wide utility of this approach is limited by such problems as the destruction of vital immuno- dominant features during the chemical conjugation to a carrier protein. Furthermore, isolated polysaccharides often display structural heteroge- neity, which may complicate reproducible production. These compounds may also contain toxic components or immunosuppressive domains that may be difficult to remove. These problems can be addressed by using chemically or enzymatically synthesized glycan epitopes. In such an approach a synthetic oligosaccharide is equipped with an artificial spacer to facilitate selective conjugation to a carrier protein. In general, antibod- ies recognize epitopes no larger than a hexasaccharide, and compounds of this complexity can be readily obtained by organic synthesis. The recent approval of Haemophilus influenzae vaccine based on a synthetic glycan epitope highlights the potential use of organic synthesis for the develop- ment of glycoconjugate vaccines. The expansion of this capability to other vaccines would have a tremendous impact on both safety and efficacy and could potentially compress the timeframe for developing new vaccines, especially for new threats.

OCR for page 85
THE TOOLKIT OF GLYCOSCIENCE 89 Synthetic oligosaccharides and glycopeptides are also being used in the development of cancer vaccines (Buskas et al. 2009). Oncogenic- transformed cells often overexpress oligosaccharides such as Globo-H, LewisY, and Tn antigen, and numerous preclinical and clinical studies have demonstrated that naturally acquired, passively administered, or actively induced antibodies that recognize glycan-associated tumor anti- gens are able to eliminate circulating tumor cells and micrometastases. The development of tumor-associated polysaccharides and glycopeptides as cancer vaccines has been complicated by the fact that they are self- antigens and therefore are tolerated by the immune system. The problem of self-tolerance is being addressed by the design, chemical synthesis, and immunological evaluation of fully synthetic vaccine candidates. Synthetic oligosaccharides are also used for the preparation of glyco- polymers, glycodendrimers, and glyconanoparticles (Garcia et al. 2010). These materials are receiving considerable attention because monovalent polysaccharides often exhibit weak affinities for their protein receptors. However, glycan-binding proteins often exist as higher-order oligomeric structures presenting multiple binding sites, acting as "polydentate" donors, and thereby circumventing the intrinsic weak binding interac- tions of monovalent ligands. Also, gold nanoparticles, quantum dots, and magnetic nanoparticles provide additional functionality as they allow detection by SPR or fluorescence or make it possible for convenient isola- tion by using a strong magnetic field. An increasing number of drugs contain glycans or glycomimetics as a major component. Examples include many antibiotics, antiviral drugs such as Relenza and Tamiflu, hyaluronic acids, and selectin antagonists (Gantt et al. 2011b). Synthetic oligosaccharides are also being used in the preparation of well-defined glycoproteins. Approximately one-quarter of new approvals are protein-based drugs, with a majority being glyco- proteins. The glycan moiety of glycoproteins plays an important role for its pharmacokinetic properties. Hence, it is critical to control the exact chemical composition of the oligosaccharide moieties of glycoproteins. Protein glycosylation is, however, not under direct genetic control and results in the formation of a heterogeneous range of glycoforms that possess the same peptide backbone but differ in the nature and site of glycosylation. In general, it is difficult to control glycoform formation in cell culture, which is a major obstacle for the development of therapeutic glycoproteins. In summary, a diverse set of glycan structures can be used to discover specific inhibitors of glycosyltransferases with pharmaceutical applica- tions, including diagnostics. Large arrays representing the diversity of glycans or focused on a specific set of glycan structures could be used for drug screening and discovery. Access to diverse glycans via synthesis

OCR for page 85
90 TRANSFORMING GLYCOSCIENCE for preparing such arrays would lead to new uses that have yet to be imagined. 5.1.2 Synthetic Tools 5.1.2.1 Chemical glycan synthesis The realization that complex oligosaccharides and glycoconjugates are involved in numerous biological processes has stimulated develop- ment of chemical and enzymatic methods for the preparation of oligo- saccharides. Unlike oligonucleotide and peptide synthesis, there are no general protocols for the preparation of glycans. As a result, the synthesis of specific targets is often a demanding and time-consuming task (Boltje et al. 2009; Kiessling and Splain 2010). However, recent technological advances are making it possible to streamline the process of oligosaccha- ride assembly and are providing opportunities to prepare collections of oligosaccharides and glycoconjugates. The chemical synthesis of glycans involves coupling a fully protected glycosyl donor, which bears a leaving group at its anomeric center, with a suitably protected glycosyl acceptor that often contains only one free hydroxyl (Zhu and Schmidt 2009). The result of this chemical reaction is a glycoside product. The process of sequentially generating glycosyl donors and acceptors can be repeated until a complex target has been obtained. Preparation of monosaccharide building blocks requires extensive pro- tecting-group manipulations, and typically 6 to 10 chemical steps are needed to create each building block. Because preparation of the mono- mer building blocks consumes the majority of effort invested in chemical glycan synthesis, rapid and inexpensive access to building blocks should greatly accelerate chemical glycan synthesis. One approach to speed up monosaccharide synthesis involves parallel combinatorial sequential one-pot multistep procedures for the selective protection of monosac- charides (Wang et al. 2007). This approach can incorporate as many as seven chemical steps, obviating the need to carry out intermittent tedious work-up and time-consuming purifications. A complementary approach involves identification of monosaccharide building blocks that can be used repeatedly for the synthesis of a wide range of target structures. For example, it has been proposed that 88 percent of glycoside motifs found in mammalian glycoconjugates could be constructed from only 20 different monosaccharide building blocks, which could then be prepared in bulk for widespread use in automated glycan synthesis technology (Werz et al. 2007). In addition, disaccharide building blocks have been identified that resemble the saccharide motifs found in heparan sulfate and that can be used repeatedly for rapid assembly of libraries of heparan sulfate

OCR for page 85
THE TOOLKIT OF GLYCOSCIENCE 91 oligosaccharides. This building block approach could be extended to many other classes of oligosaccharides, although because of the enormous structural diversity of natural glycans, it will be important to focus on glycomes of particular interest. The success of the unified building block approach relies on the prem- ise that each glycosylation will be high yielding. In practice, this has been shown to not be the case, and additional saccharide building blocks are required to address possible synthetic difficulties. Sets of orthogonal protecting groups are being developed that provide additional synthetic flexibility in that they offer the possibility to change the order of glyco- sylation. Stereoselective installation of glycosidic bonds in high yield is a major challenge in complex oligosaccharide synthesis. In recent years this aspect of oligosaccharide synthesis has progressed considerably, and a wide range of stable yet highly reactive anomeric leaving groups have become available, making it possible to examine several glycosylation protocols to achieve optimal results (Zhu and Schmidt 2009). By exploit- ing neighboring-group participation, steric and conformational effects, or direct displacement of leaving groups, glycosides can be obtained with high anomeric selectivity, even for the more challenging -sialyl and b-mannosyl linkages. However, glycoside products are often contami- nated by unwanted anomeric products, making it necessary to include time-consuming purification protocols. Minimizing purification steps has been the focus of efforts to stream- line chemical synthesis of oligosaccharides. Approaches that are being pioneered include one-pot multistep solution-phase glycan synthesis, solid-phase glycan synthesis, and fluorous tagging. One-pot multistep procedures are based on the sequential addition of glycosyl donors with well-defined anomeric reactivity to a reaction flask to provide an oligosac- charide without the need to purify synthetic intermediates (Kaeothip and Demchenko 2011). Although many variations of the one-pot strategy have been developed, there are three major concepts: chemoselective, orthogo- nal, and preactivation glycosylation strategies. In chemoselective glyco- sylation strategies, glycosyl donors with decreasing anomeric reactivity are allowed to react sequentially. Orthogonal glycosylations use glycosyl donors and acceptors that have different anomeric groups that can be activated without affecting each other. A flexible approach that exploits the advantages of the aforementioned strategies utilizes preactivation of a glycosyl donor to generate a reactive intermediate in the absence of the acceptor. After addition of the glycosyl acceptor, a glycoside product can be formed that has an identical leaving group at the reducing end. In the same reaction flask the process of anomeric activation and glycosylation can be repeated to construct complex oligosaccharides. Successful imple- mentation of this strategy requires that the promoter be completely con-

OCR for page 85
92 TRANSFORMING GLYCOSCIENCE sumed to prevent activation of a subsequent saccharide building block. Furthermore, the reactive intermediate should be sufficiently long lived to permit addition of a glycosyl acceptor yet sufficiently reactive for a high-yielding glycosylation. It has been difficult to design glycosylations that meet these requirements. Encouraged by successes with polymer-supported peptide synthesis, the first attempts at solid-phase oligosaccharide synthesis were reported in the 1970s. These efforts were not successful, largely because of a lack of efficient glycosylation methods. The past decade has seen a renewed interest in polymer-supported glycan synthesis, and different polymer support materials, linkers, synthetic strategies, and glycosylating agents have been explored (Seeberger 2008). However, a general solution for rou- tine and automated oligosaccharide synthesis remains to be established, in large part because of the need for large excesses of glycosyl donors, the lack of anomeric control when 1,2-cis-glycosides need to be installed, the unpredictability of glycosylations, and the additional steps required for linker functionalization and protecting-group removal. However, progress is being made in these areas, bringing the promise of routine automated oligosaccharide synthesis closer to fruition. In particular, it has been shown that automated synthesis can provide complex oligosac- charides such as a branched -glucan dodecasaccharide, blood-group oligosaccharides, and tumor-associated glycan antigens. Solution-based strategies have been developed in which the growing oligosaccharide chain is modified by a tag that allows selective precipita- tion, extraction, or absorption for convenient purification. In particular, light fluorous tagging technology is attractive because it makes possible protecting-group manipulations and glycosylations under conditions typically used in solution-phase chemistry (Jaipuri and Pohl 2008). In this approach, tagged products can be selectively captured by a fluorous solid-phase extraction column and then released by elution with metha- nol. Fluorous-tagged glycans can also be directly printed on fluorocarbon surfaces, providing interesting opportunities for glycan array develop- ment. Efforts are under way to automate fluorous-supported synthesis of oligosaccharides with liquid handling devices. Despite considerable progress, chemical synthesis of glycans remains a challenging endeavor that is practiced only by expert labora- tories. The lack of large collections of universal building blocks and the need to optimize glycosylation conditions complicate routine synthesis of this class of compounds. There is an urgent need for the development of more reliable glycosylation protocols, which might be accomplished with a better understanding of the mechanistic aspect of glycosylations. Furthermore, new approaches need to be developed for controlling anomeric selectivities of glycosylations. In particular, reliance on exten-

OCR for page 85
THE TOOLKIT OF GLYCOSCIENCE 93 sive protection-deprotection schemes adds to the number of steps and reduces the yields of complex glycans. High-throughput protocols for the rapid evaluation of many reaction conditions--for example, by employing microfluidics devices--may provide more reliable glyco- sylation protocols. It is also to be expected that searchable databases of reported glycosylations can accelerate the optimization process and may lead to standardized protocols. 5.1.2.2 Enzymatic synthesis of glycans Enzyme-catalyzed glycosylations offer an approach that is comple- mentary to chemical synthesis for obtaining structurally well-defined oligosaccharides, polysaccharides, and glycoconjugates. Current enzy- matic and chemoenzymatic approaches apply glycosyltransferases, gly- cosidases, glycosynthases, and trans-glycosidases to construct glycosidic linkages, whereas enzymes such as sulfotransferases, epimerases, and acetyltransferases have been used for postglycosylation modifications (Schmaltz et al. 2011). (See also Section 5.4.2.1, which discusses glycan synthesis in the context of glycoenzyme applications.) Glycans are synthesized in an assembly-line manner by enzymes such as glycosyltransferases. In this process the product of one enzyme becomes the substrate of the next. The glycosyltransferases form a con- nection, termed the glycosidic linkage, between a growing glycan chain and another sugar building block. The most common building blocks for glycosyltransferases are called nucleotide sugar donors, and the struc- tures to which those building blocks are added are generally referred to as glycosyl acceptors. In general, there is a unique glycosyltransferase for nearly every type of glycosidic linkage formed, and these enzymes are among the most specific enzymes known. They are able to distinguish the spatial orientation of a single atom, even on very large structures, within both their glycosyl donors and acceptors. As a result, glycosyl transferases are essential enzymes for oligosac- charide biosyntheses. Their ability to transfer a sugar residue from a sugar-nucleotide mono- and di-phosphate to a maturing oligosaccharide chain (Lairson et al. 2008) and their highly regio- and stereoselective nature make them ideally suited for the preparation of complex oligosac- charides. Currently, more than 50,000 genes encoding potential glycos- yltransferases have been identified, although only a very small number have been characterized. Many of these enzymes are membrane bound and glycosylated, making their isolation and utilization difficult. The number of glycosyltransferases with good catalytic activity and defined substrate specificity that can be expressed as a soluble form in large amounts is currently small, which hampers efforts to develop enzymatic

OCR for page 85
94 TRANSFORMING GLYCOSCIENCE glycoconjugate synthetic schemes. However, several activities are under way to address these problems. High-throughput assays are being devel- oped to identify activities and substrate specificities of glycosyltrans- ferases. Furthermore, it has been found that glycosyltransferases from bacterial sources often exhibit considerable substrate promiscuity, thereby offering unique opportunities for chemoenzymatic synthesis of glycan libraries and their derivatives. The substrate specificity of glycosyltrans- ferases can be altered by protein crystal structure-based rational design or directed evolution. However, glycan-modifying enzymes are exception- ally underrepresented in structural databases, particularly as enzyme substrate complexes, and the resulting incomplete and fragmentary col- lection of biochemical and structural data on this class of enzymes has led to an incomplete understanding of the molecular mechanisms that control oligosaccharide biosynthesis. Recently, the Repository of Glyco-enzyme Expression Constructs was created to focus on generating expression vec- tors that encode all human glycosyltransferases and glycoside hydrolases as well as a limited set of glycan-modifying enzymes for production in bacteria, insect cells (baculovirus), and mammalian cells. The goal of the repository is to facilitate the production of soluble forms of enzymes as catalytic domains, when possible, for use in biochemical, enzymatic, and structural studies. Many of the constructs have been designed as trun- cated forms devoid of the transmembrane protein domain and linked to affinity tags or other larger fusion proteins to facilitate affinity purification and quantification. Several convenient approaches have been developed for the prepa- ration of sugar nucleotides, key substrates for all glycosyltransferases, including in situ sugar nucleotide regeneration, fusion protein strategies, one-pot multienzyme systems, and superbead technologies. Progress is being made in identifying and characterizing many sugar nucleotide biosynthetic enzymes, including those involved in salvage pathways. However, many uncommon sugar nucleotides for glycosyltransferase- catalyzed synthesis of glycosylated natural products are less accessible because of their much more complicated biosynthetic pathways and instability. The combined use of glycosyltransferases, sulfotransferases, and epimerase has been successfully implemented for the synthesis of struc- turally defined heparin and heparan sulfate oligosaccharides, as well as for polysaccharides with specific sulfation patterns. In particular, these developments have led to the production of ultra low molecular weight heparins in 10 to 12 steps, with an overall yield of 40 to 50 percent (Xu et al. 2011). This compares well with the process now used to make the anti- coagulant drug Arixtra, which involves some 50 steps and has an overall yield of less than 1 percent.

OCR for page 85
THE TOOLKIT OF GLYCOSCIENCE 95 Currently, microfluidics and microarray formats are being explored to conduct enzymatic syntheses. These types of approaches can be used to create a wide range of products that can be analyzed using mass spectrom- etry and then assayed for biological activity. At the other extreme, work is being conducted to develop macroscale enzyme-assisted syntheses of heparin, although a major obstacle in this effort is the cost of a critical cofactor--3'-phosphoadenyl-5'-phosphosulfate (PAPS)--which donates the high-energy sulfate groups that are covalently attached to the heparin backbone to make bioactive heparan sulfate. One solution is to regenerate PAPs in situ from the byproduct 3'-phosphoadenyl-5'-phosphate (PAP) enzymatically, a process similar to that used in large-scale oligosaccharide synthesis with sugar nucleotide regeneration. Unlike chemical synthesis, the synthesis of unnatural saccharide sequences is a challenge for enzyme-based methods as a result of the strict substrate specificities of most glycan-synthesizing enzymes. To overcome this limitation, additional studies of heparan sulfate biosynthetic enzymes are necessary, especially to advance our understanding of the substrate specificities and to conduct mechanistically based mutagenesis to engi- neer the specificities. Glycosyl hydrolases are a class of enzymes that degrade oligosaccha- rides by cleavage of glycosidic linkages. The reverse hydrolytic activity of this class of enzymes can be exploited for glycosidic bond synthesis. This approach suffers, however, from relatively low yields because of the challenge of driving reactions in a thermodynamically unfavorable direc- tion. This problem has been addressed by the introduction of "glycosyn- thases," which are glycosidases rendered hydrolytically incompetent by replacement of a nucleophilic aspartic or glutamic acid with an alternative unreactive amino acid. Glycosynthases can, however, transfer activated glycosyl substrates that have the opposite anomeric configuration of the natural substrate. For example, b(1,4)-mannans, which are major plant cell wall polysaccharides, have been prepared by using a mutant endo-b- mannanase and an a-mannobiosyl fluoride as glycosyl donor. In addition, glycosynthases have been used for the preparation of -linked glucuronic and galacturonic acid conjugates. Currently, only a small number of gly- cosynthases have been developed, which limits the scope of this technol- ogy. Recent studies have shown that the catalytic activities and substrate promiscuities of glycosynthases can be improved by directed evolution. 5.1.3 Manipulating Glycans by Pathway Engineering All cells have endogenous machinery for synthesizing glycans and glycoconjugates. The glycan-modifying infrastructure in cells includes glycosidases, glycosyltransferases, mechanisms for activated sugar syn-

OCR for page 85
124 TRANSFORMING GLYCOSCIENCE lated moieties or as components of a glycoprotein or glycolipid, using one or more glycosyltransferases with defined substrate specificity and sugar nucleotide substrates whose sugar moieties are labeled with stable iso- topes (Macnaughtan et al. 2008; Skrisovska et al. 2010). Glycosyltransfer- ases could be used to assess the presence of a determinant as defined by its substrate specificity, although it is important to note that they cannot be used to solve entire structures or novel structures on their own. How- ever, this is an extremely promising technology that should be further developed to help address the current shortcomings in NMR technologies for determining tertiary structure of glycans, especially those attached to their native proteins. 5.4.2.3 Glycosyltransferases in glycan engineering of cells Because they are major components of the cell surface, glycans repre- sent attractive targets for imaging physiology and pathophysiology, both in vitro and in vivo. As mentioned above, many glycosyltransferases are promiscuous and can accept unnatural substituents. At least one group has used this principle to introduce "bio-orthogonal chemical reporters" into monosaccharides that are taken up by cells and incorporated into cell surface glycans. These reporters enable the detection and imaging of glycan structures of living cells in model organisms using bio-orthogonal chemistry to attach fluorescent label or other biological tag that make the glycans "visible" (Laughlin and Bertozzi 2009; Sletten and Bertozzi 2011). This approach has been used to label cells for in vivo imaging in mice (Chang et al. 2010) and recently to image the sialome in zebrafish (Dehnert et al. 2012). The same principle can be used to introduce modified sugars directly on to the surface of cells using glycosyltransferases (Ramya et al. 2010; Zheng et al. 2011). This approach also provides information on the underlying glycans on the cell, because the enzyme has a strict specific- ity for the acceptor sequence it uses to form the product (Khidekel et al. 2004; Boeggeman et al. 2007; Ramya et al. 2010; Zheng et al. 2011). With a large toolbox of glycosyltransferases with well-characterized specificities, this approach can be used to gain much structural information about the glycans on the cell surface. Glycoengineering approaches are also being used to influence cellular trafficking. For example, using a platform called "glycosyltransferase- programmed stereosubstitution," scientists have modified existing cel- lular glycans to create the selectin ligand HCELL (hematopoietic cell E-/L-selectin ligand), which is involved in the attachment of circulat- ing stem cells and white blood cells to endothelial cells. This technique has potential applications to the development of cell-based therapies (Sackstein 2009).

OCR for page 85
THE TOOLKIT OF GLYCOSCIENCE 125 5.4.2.4 Glycosyltransferase inhibitors Because of the central importance of glycosyltransferases to the syn- thesis of glycan structures, inhibitors of key enzymes would be of enor- mous benefit to elucidate the functions of glycans in cell communication and the roles of specific enzymes in the biosynthesis of glycans. Genetic ablation of specific glycosyltransferases in mice has already revealed important biological roles for glycans synthesized by the missing enzyme (Lowe and Marth 2003; Satoh et al. 2005; Ohtsubo et al. 2011). Many phenotypes from these mice have validated individual glycosyltransfer- ases as targets for the development of inhibitors that would provide a therapeutic benefit. Small-molecule inhibitors to such enzymes would be invaluable to the research community as probes to uncover the biological roles of glycans and to assess their therapeutic utility. Specific inhibitors could also be used in place of or in combination with glycosyltransferase knockout mice to reveal additional novel phenotypes that provide infor- mation about the functions of glycan ligands and glycan-binding proteins. Despite the obvious need, few glycosyltransferase inhibitors capable of blocking glycosylation in vivo have been identified to date (Lachmann 2003; Brown et al. 2009). Several recent reports describe approaches for high-throughput screening of glycosyltransferase inhibitors that demon- strate the feasibility of screening for inhibitors of these enzymes (Helm et al. 2003; Gross et al. 2005; Rillahan et al. 2011). A systematic effort to screen for inhibitors of a panel of key glycosyltransferases is sure to open a path to the development of inhibitors that will benefit the research com- munity and assess the potential of glycosyltransferase inhibitors as drug development targets. 5.4.3 Key Messages on Glycoenzymes Enzymes have a range of uses as tools to study glycoscience, includ- ing in enzymatic synthesis of glycans, as biochemical probes, and in struc- tural determination. Similarly, inhibitors of enzymes such as glycosyl- transferases can be used as important tools in trying to better understand glycan biology and function. Despite their utility as part of the glycosci- ence toolkit, only limited numbers of glycan-active enzymes from both bacteria and mammalian species are available, and few three-dimensional enzyme structures, particularly from mammals, are known. 5.5 SYSTEMS GLYCOBIOLOGY As a recent National Research Council report described: The field of systems biology seeks to integrate . . . multiple levels of biological knowledge into descriptive, and ultimately predictive, mathe-

OCR for page 85
126 TRANSFORMING GLYCOSCIENCE matical models, combining experimental knowledge with computational tools in order to study the interactions between the components that make up a particular biological system. As a result, a primary goal of sys- tems biology is to understand how the system being studied functions, what its properties are that arise from the interactions of its individual components (also referred to as emergent properties), and the design principles on which it operates (NRC, 2011, p. 27). Similarly, systems glycobiology is an approach that integrates biological and chemical information about glycans with mathematical modeling and bioinformatics-enabled data analysis in an effort to understand the networks that control glycan structure and function. Informatics tools are key enablers for processing the data that arise from multiple sources--bio- chemical pathways (Hossler et al. 2007) and multiple types of analytical structure determination techniques, as well as mathematical and compu- tational modeling. By analyzing and extracting information from this sea of data, glycoscience can be studied in this systems context and ultimately understood and manipulated in controlled ways. Such research as the above illustrates the possibility of whole cell simulation, but in order to perform simulations of higher organisms' cells, glycosylation and other posttranslational modifications and their kinetic reaction data will need to be incorporated. Greater advances in the Analytical Tools will aid in this. Moreover, predictions or assumptions can also be incorporated to perform simulations as necessary. With the availability and success of such simulations, perturbations to the model will enable predictions of phenotypic effects (Karr 2012). 5.6 INFORMATICS AND DATABASES It is becoming increasingly evident that complex relationships between genomic DNA, transcripts, proteins, and their posttranslational modifications, such as phosphorylation and glycosylation, critically gov- ern phenotypes of whole organisms. The development of informatics to capture, analyze, mine, and disseminate sequence information and datasets associated with genes and proteins has been instrumental in advancing genomics and proteomics. One major area of glycomics deals with understanding complex glycans that are attached to proteins during posttranslational modification and the biological functions mediated by these glycan modifications. Informatics applied to glycomics has been faced with unique challenges. The biosynthesis of glycans is complex, nontemplate driven, and involves tissue-specific isoforms of several gly- can biosynthetic enzymes. As a result, it becomes challenging to decipher

OCR for page 85
THE TOOLKIT OF GLYCOSCIENCE 127 the entire glycome of a whole organism in the same way that it has been possible for the genome and proteome. The chemical heterogeneity of glycans also makes it challenging for any single analytical approach to provide a complete description of each glycan structure isolated from a glycoprotein or a cell type. Furthermore, glycan-protein interactions, leading to either the activation or inhibition of a biological response, are often not binary but rather involve more subtle mediation of a signaling pathway. In addition, glycan-protein interac- tions typically involve multivalency with regard to both the protein and the glycan. Because of these challenges, there are layers of ambiguity in determining primary sequence or chemical structure of a glycan that also impinge on understanding the specificity of glycan-protein interactions that modulate key biological functions. An important factor in broadening appreciation of glycomics to the larger scientific community is the urgent need to develop databases, com- putational, and informatics tools to acquire, integrate, annotate, mine, and disseminate glycomics datasets such as analytical data, glycan array data, and glycogene expression data (Packer et al. 2008). Many earlier efforts in glycomics focused on structural characterization of glycans and on the development of glycan structure databases and computational tools to assist assignment of glycan structures from high-throughput analytical datasets. The development of these tools has advanced to a point where it is possible to obtain robust and detailed profiling of a majority of glycans isolated from cells, tissues, and individual glycoproteins. To accelerate the development of additional databases and infor- matics tools, glycomics can to some extent borrow many of the tools that were developed for proteomics and genomics, but there are specific characteristics of glycans that require the development of different, and unique, tools. The most obvious difference is that glycans, unlike proteins or nucleic acids, are branched, isomeric, and constructed using several types of linkages. A common theme that unites these challenges is that there is no template from which glycan structure originates, and thus an "ensemble" of structures is created. Representing the complexity of gly- can structures and the diversity of context--the fact that expression levels for each glycan, as well as glycosylation patterns, differ across cells and tissues--presents a significant challenge for bioinformatics approaches. 5.6.1 Limited Successes in Developing Broadly Available Informatics Tools Many notable advances in glycomics informatics and database development have focused on interpretation of analytical data, including

OCR for page 85
128 TRANSFORMING GLYCOSCIENCE assignment of NMR and mass spectrometry peaks. These advances have, to a certain extent, made glycan analysis more accessible to the broader research community. For example, to assist researchers in the assignment of glycan structures and features based on NMR data, the characteristic NMR chemical shifts and coupling constants of glycans reported in the literature have been compiled in accessible databases such as at the gly- cosciences.de portal (http://www.glycosciences.de/sweetdb/; Ltteke et al. 2006) and CASPER (http://www.casper.organ.su.se/casper/; Loss et al. 2006), thus improving the accessibility of NMR as a tool for gly- coscientists. Several tools, including Glyco-Search-MS (http://www. glycosciences.de/sweetdb/start.php?action=form_ms_search; Loss et al. 2002) and GlycoWorkbench (http://www.glycoworkbench.org/; Ceroni et al. 2008), have focused on interpretation of mass spectrometry fragmen- tation patterns through comparison to reference datasets, thereby deduc- ing the most likely glycan structure. Additional development in this area, including pairing with proteomics to enable analysis of glycopeptides and proteins, will further increase the usefulness of these tools. More recent efforts have focused on developing computational tools to mine multiple high-throughput datasets associated with gene expres- sion studies, glycan profiling, and glycan array screening. One area of application of these tools has been in correlating and predicting profiles of glycan structures in a cell based on expression of glycan biosynthesis enzymes (Kawano et al. 2005). Another area of active development has been in mining glycan array datasets to identify glycan sequence motifs recognized by various proteins, such as plant and animal lectins, patho- gen proteins, and antibodies (Hizukuri et al. 2005; Aoki-Kinoshita et al. 2006; Kuboyama et al. 2006; Hashimoto et al. 2008a,b; Porter et al. 2010; Jiang et al. 2011b). These glycan sequence motifs represent a combina- tion of substructures that favor binding and those that are detrimental to binding. Furthermore, identification of such binding motifs facilitates using protein-glycan co-crystal structures to translate biochemical and biophysical aspects of glycan-protein interactions to the biology mediated by these interactions. Data mining methods and tools have been developed to solve a variety of problems in glycobiology, including extraction of potential glycan biomarkers; prediction of glycan-binding patterns (Hashimoto et al. 2008a,b; Aoki-Kinoshita et al. 2006); and the analysis of glycan bio- synthesis pathways (Krambeck et al. 2009), as provided by the RINGS Web resource (http://www.rings.t.soka.ac.jp). Many computer theoretical methods have been applied to glycan analysis, including pairwise (Aoki et al. 2003) and multiple alignment of glycans and the development of "score matrices" for analysis of glycosidic linkages (Aoki et al. 2005). Such applications of existing bioinformatics methods to glycobiology

OCR for page 85
THE TOOLKIT OF GLYCOSCIENCE 129 can be made to further elucidate glycan function (Aoki-Kinoshita, 2010). However, currently there is a severe lack in interest by the bioinformatics community in glycoscience as a result of the lack of a consistent database with relevant links to major databases and an understandable glycan representation format. Without easily available data of biological interest, bioinformatics research will not progress very far in the glycosciences, cre- ating an ever-increasing gap between the genomics and proteomics world and glycomics. Moreover, without a consistent format for representing glycan structures, not only is there confusion regarding a "correct" rep- resentation of glycans, but also the integration of various computational tools becomes difficult. 5.6.2 Critical Need for Development of a Single Integrated Database Clearly, developing a structural assignment database is key to a larger integrative effort to make glycomics accessible and relevant. Indeed, in the absence of a centralized database at a location such as the National Center for Biotechnology Information (NCBI), glycomics will not gain the attention and respect of the scientific community. Long-term funding and long-term stability of such an internationally supported database is absolutely critical for the future of glycosciences. Larger and more complete informatics efforts can then focus on devel- opment of computational tools to correlate glycan structure with expres- sion of biosynthetic enzymes to link biosynthesis and end product. Also, the development of new technologies, such as glycan array platforms to characterize glycan-protein interactions, have necessitated development of novel tools and database strategies for these high-throughput sources of data. In addition, there is a wealth of data on phenotypic analysis of knockout mice that lack specific glycan biosynthesis enzymes that could benefit from a database. Integration of gene expression, structural char- acterization, glycan motif recognition by various proteins, and whole- organism phenotyping data will enable a critical understanding of glycan diversity in a normal versus a perturbed cell and how these differences correlate with the physiological state of the cell. To truly "reduce this to practice," the field will need relational databases to make sense of the huge amount of data that will come from such studies and to develop trait correlations that will ultimately lead back to candidate genes. Unfortunately, current glycobiology databases are largely incomplete, disconnected, and inaccessible to the broader community and have a high percentage of incorrect entries that require correction. Analytical data- bases today provide only "sound bytes" and are missing a great deal of the complexity. In addition, other structural databases need to be made "glycan aware."

OCR for page 85
130 TRANSFORMING GLYCOSCIENCE To circumvent these challenges, it is critical that a centralized glycan database is created wherein all glycan structures that have been sequenced and published are registered. This database, then, could be expanded to include information on gene expression and organism phenotyping data. Also needed are reporting standards that specify the minimum informa- tion that should be reported about a dataset or an experimental process that allows a user to interpret and use the data entered. This may require manual independent curation of data, although it should be possible to develop a curation system to assist in annotations to a certain extent. In addition, there needs to be a glycan equivalent of the Phred Score for nucleic acid bases that can provide the user with a measure of the level of certainty of information on a given linkage in a given structure in a database (Ewing and Green 1998; Ewing et al. 1998). This will allow incomplete yet useful structural data to be included in databases. Currently, the GlycomeDB database has incorporated many major databases in a way that consolidates unique structures and provides links so that the original database entries can be retrieved (Ranzinger et al. 2011). In contrast, the GlycoSuiteDB database is a manually curated database of structures from the literature, and thus the number of entries is small, less than 4,000 versus more than 36,000 in GlycomeDB (Cooper et al. 2003). The total GlycomeDB entries represents the sum of the several incorporated databases, and only about 1,000 of the structures are fully characterized, used in biologically known pathways, and nonredundant. Similarly, GlycoSuiteDB contains approximately 1,500 eukaryotic struc- tures fulfilling those requirements. In general, there are 10,000 structures on average in the major glycan structure databases--EurocarbDB, KEGG Glycan, Bacterial Carbohydrate Structure Database (BCSDB), and Consor- tium for Functional Glycomics (CFG)--although it should be noted that, with the exception of BCSDB, most of these contain mainly eukaryotic glycans. These databases also mainly contain N- and O-linked glycan structures, whereas glycolipid and proteoglycan structures are few. More- over, fully characterized glycan structures (including all linkage informa- tion) are limited to about 2,000 structures. Therefore, several issues must be addressed in developing a comprehensive glycan (or glycoconjugate) structure database. 5.6.2.1 Standardized representations of glycan (or glycoconjugate) structures Because glycan structures are not linear, a simple single-letter code for monosaccharides is insufficient to represent glycan structures accu- rately. This is further complicated by the various naming schemes of monosaccharides. While a database of monosaccharides is currently available (MonosaccharideDB; http://www.monosaccharidedb.org),

OCR for page 85
THE TOOLKIT OF GLYCOSCIENCE 131 different researchers prefer to use different methods for representing glycan structures, including IUPAC, LINUCS (Bohne-Lang et al. 2001), Linear Code (Banin et al. 2002), GlycoCT (Herget et al. 2008), and KCF (Aoki-Kinoshita 2010). Although Glyde-II (Sahoo et al. 2005) has been established as the standard format for exchanging glycan structures, it is not human readable. There are also a number of ways to graphically represent glycan structures as cartoons, including the system originating from Stuart Kornfeld, expanded and optimized by Varki et al. (2009) and adopted by the CFG, and the Oxford system (Harvey 2011). To resolve this issue, informatics methods will need to accurately convert across differ- ent formats, which involves creating a knowledge base on the chemical structures behind the nomenclature for each naming scheme so that the residues are mapped accurately. MonosaccharideDB stores monosaccha- ride data as chemical information and provides mappings to various database formats. Database developers will need to keep in mind the various formats that are available and allow queries using different for- mats; this may be possible by linked glycan structure components with MonosaccharideDB. 5.6.2.2 Comprehensive representation of glycan and glycoconjugate structures To characterize accurately the cellular glycome, development of more sensitive analytical techniques, including likely NMR and mass spectrom- etry, will be vital. In turn, the development of informatics methods to aid in structure determination will also be important, including ones that can integrate data from multiple techniques. It is likely that the development of such informatics methods will require collaboration among computer scientists, analytical chemists, and others. 5.6.2.3 Standard ontology for glycan function and localization An ontology for representing glycan structures has been proposed, called "GlycO." However, beyond structures, a formal representation of glycans and how they were determined, their functions, and their relationship to other molecules still needs to be established. MIRAGE-- Minimum Information Required for a Glycomics Experiment standard-- is currently being developed as a reporting standard for glycomics experi- ments, based on MIAME and MIAPE. MIRAGE aims to specify "the minimum information that should be reported about a data set or an experimental process, to allow a reader to interpret and critically evalu- ate the conclusions reached, and to support their experimental corrobora- tion." Such a standard will serve as the first step toward establishing a well-documented glycan structure database that can be linked back to the

OCR for page 85
132 TRANSFORMING GLYCOSCIENCE original experimental data. Further ontologies for annotating glycan func- tion may be similarly based on existing ontologies for genes and proteins. 5.6.2.4 Links to protein, lipid, and other related databases To integrate knowledge about glycans with the broader community, glycan structures registered in any database should be linked to the pro- teins, lipids, cells, and other entities to which the glycan structures were bound or in which they were found. Furthermore, links to the proteins, viruses, and other binders with glycans must be documented and linked wherever possible. Currently, to the committee's knowledge, the UniProt database is the only major protein database that contains information regarding potential glycosylation sites in amino acid sequences. To get to this information, however, the user must know to look for it in UniProt, because it is not directly accessible from GenBank or InterPro. Such links to the major protein and lipid databases will facilitate more communi- cation with other related fields, and some progress is occurring in this area. Glycan information in GlycoSuiteDB is currently linked to UniProt. There are plans to link it to the UniCarbKB database as a combination of GlycoSuiteDB and EuroCarbDB. Bioinformatics methods can also be applied more easily when linked with larger resources of data. Addition- ally, this effort should link with other structural biology efforts aimed at defining conformation of glycan structures and their interaction with binding partners, because conformation has proven to be one of the driv- ing parameters for specificity and affinity. 5.6.3 Key Messages on Glycan Bioinformatics and Databases The current challenge for the bioinformatics field is to develop a uni- fied, curated, stable database, with long-term funding, that encompasses glycobiology in a broader context. Although significant efforts have been made, a range of issues remain to be addressed and information about glycans is not accessible in a manner similar to other types of biological information. A particular challenge is standardization and annotation of glycan information for databases, including representation, level of struc- tural certainty, and minimal information. The development of a unified and integrated database resource not only would aid the field directly but would also help scientists from other disciplines, including clinicians, better appreciate, understand, and become involved in glycoscience. There is a need to develop bioinformat- ics tools that can make connections between disease and glycan structure and represent those connections in a straightforward manner. The initial

OCR for page 85
THE TOOLKIT OF GLYCOSCIENCE 133 efforts to create a database worthy of long-term support will require focus regarding its content and function, as defined by consensus of the com- munity that will use it. Such a database will need to do a few things very well in a sustainable and unambiguous way that is independent of new methodologies. One first step could be the creation of a centralized struc- tural database that can be extended by connecting it to other resources. Such a database must be based at a centralized location to assure long- term stability and continuity and cannot be dependent on any individual scientist or institution. Other supplemental databases with incomplete information may add value if made available in parallel to a fully curated and centralized database. A revolution in the development of such data- bases would bring other scientists into the field, demystify it, and provide a tool to educate individuals about glycoscience. 5.7 SUMMARY AND FINDINGS As this chapter makes clear, a diverse suite of tools are available to synthesize glycans; understand glycan structures, functions, and interac- tions; and share and communicate glycan information across the research community. Important limitations in the toolkit currently restrict gly- coscience to a field that is actively practiced by only a relatively small group of specialists. Existing tools are useful and provide a base from which to answer glycoscience questions; however, they are not adequate to advance the field to the point where it can realize its potential widely across biology, chemistry, and materials science. New energy and creative solutions, stemming not only from glycoscience specialists but from many others in the broader scientific community too, will be needed to address some of these technical challenges. As a result, the committee finds that: Scientists and engineers need access to a broad array of chemi- cally well-defined glycans. Over the past 30 years, tremendous advances have been made in chemical and enzymatic synthesis of glycans, but these meth- ods remain relegated to specialized laboratories capable of pro- ducing only small quantities of a given glycan. For glycoscience to advance, significant further progress in glycan synthesis is needed to create widely applicable methodologies that generate both large and small quantities of any glycan on demand. A suite of widely applicable tools, analogous to those available for studying nucleic acids and proteins, is needed to detect, describe, and fully purify glycans from natural sources and then to charac- terize their chemical composition and structure.

OCR for page 85
134 TRANSFORMING GLYCOSCIENCE Continued advances in molecular modeling, verified by advanced chemical analysis and solution characterization tools, can gener- ate insights for understanding glycan structures and properties. An expanded toolbox of enzymes and enzyme inhibitors for manipulating glycans would drive progress in many areas of glycoscience. A centralized accessible database linked to other molecular data- bases is needed to fully realize advancements in knowledge generated by an expanded effort in glycoscience. Glycan infor- mation is not currently accessible to the research community in an integrated and centralized manner similar to other biological information.