New Approaches for Identifying Unintended Changes in Food Composition
Many advances have occurred in recent years that have extended our ability to determine the chemical composition of food and other biological material with greater depth, accuracy, and precision. Improved analytical methodology has many benefits in the study of food composition, including a far more in-depth understanding of nutrient content, relationships between chemical composition and acceptability (e.g., patterns of flavor compounds), and the safety of the food material.
This improved analytical capability should, in principle, also provide a basis for evaluating the compositional differences between particular foodstuffs as a function of genetic variables, environmental factors, and agricultural practices. It is this comparative approach that is perhaps the most important in the safety evaluation of new food items derived from all means of genetic modification, including conventional breeding techniques, mutagenesis techniques, or genetic engineering (see Operational Definitions in Chapter 1).
Important advances in analytical methodology for nucleic acids, proteins, and small molecules have occurred over the past decade as a result of concurrent advances in technology and instrumentation. Many such techniques are becoming relatively more user-friendly, and instrumentation has become available at lower cost. Consequently, more laboratories have the ability to conduct detailed analyses of food composition. This situation has highlighted the need for validating and standardizing methods, certifying laboratory performance through both time and location, and having reliable standards and certified reference materials that are broadly and uniformly available. There is a great need for improvement in all of these areas, although it is clear that analytical techniques, such as profil-
ing, will continue to develop and improve independent of the need to apply these methods to the assessment of genetically modified (GM) and genetically engineered (GE) foods.
Advanced molecular genetic, proteomic, and metabolite profiling techniques are rapidly developing technologies that have the potential to provide an enormous amount of data for a given organism, tissue, or food product. The levels of analysis include:
Deoxyribonucleic acid (DNA) sequence analysis (i.e., the complete sequence of an organism’s genome or the targeted sequencing of a transgene insertion site to determine whether insertion into the genome is in a location likely to affect the expression of adjacent genes).
Gene expression analysis to determine alterations in the levels of messenger ribonucleic acid (mRNA) species.
Protein analysis to determine the pattern, identity, and relative abundance of specific proteins (i.e., Are proteins of catalytic, allergenic, or toxicological concern present?).
Specific organic compounds, especially small molecules and trace elements whose presence, pattern of relative concentrations, and absolute concentration provide information of nutritional, antinutritional, and toxicological relevance.
In theory, these data sets could be used, singly or in combination, in comparative studies to assess the nutritional quality and chemical composition of food in relationship to the environment, genetics, naturally occurring or induced mutations, and genetic engineering. A possible secondary benefit is the generation of a wealth of data that may ultimately contribute to a better understanding of the fundamental linkages between food composition and health.
An ideal situation for any analytical procedure would provide the following information:
The absolute structural identification of all compounds in a sample being analyzed.
The absolute quantification of all compounds, taking into account varying recovery and detection sensitivity for each compound in the sample.
The biological or biochemical impact of each compound (positive, neutral, or negative) in isolation and in a complex mixture at a given dosage.
The relative nutritional (or antinutritional) importance in the human diet of a compound from a given food, and the significance of modifying the concentration of this compound, on the overall nutrient profile of the general population.
The ability to perform predictive modeling of the changes to a target food organism’s metabolism and physiology as the result of a transgenic event and predictive modeling of the biological consequences of these changes to human health.
Although these items represent the analytical ideal, in almost all of these instances the current procedures for chemically analyzing food components and assessing the impact of food components on health fall well short of ideal. This chapter discusses various approaches to food analysis involving advanced and emerging analytical methods and their application. The discussions in this chapter are meant to apply to plants, animals, and microbes. Plants are the most frequently cited example because the introduction of transgenic plants into the food supply is much more pervasive and advanced than either animals or microbes. It should be recognized that improvements are occurring rapidly for both targeted and untargeted (i.e., profiling) methods.
TARGETED QUANTITATIVE ANALYSIS VERSUS PROFILING METHODS
Two basic analytical approaches exist, and each has merit in certain applications. Targeted quantitative analysis is the traditional approach in which a method is established to quantify a predefined compound or class of compounds (e.g., amino acids, lipids, vitamins, or RNAs for specific genes). In contrast, profiling methods involve the untargeted analysis of a complex mixture of compounds extracted from a biological sample with the objective of determining the pattern of detected constituents. For proteins and metabolites this is most often accomplished either by chromatographic (e.g., gas chromatography-mass spectrometry [GC-MS] or liquid chromatography-mass spectrometry [LC-MS]), electrophoretic, or spectral (e.g., nuclear magnetic resonance [NMR]) means, while for nucleic acids methods based on sequence-specific hybridization are used.
The ultimate goal in profiling methods is to quantify and identify all compounds present in a sample (i.e., RNA, protein, and metabolites). This goal is closer to being realized for RNA (the expression of genes) due to advances in gene chip technology and the fact that all DNA and RNA are composed of nucleic acids. Complete quantification and identification of all proteins and metabolites in a sample is still only a theoretical possibility for the reasons discussed in the following sections.
Profiling methods in general are intended to determine the relationship between the pattern of components and a quantitative attribute (e.g., as used widely in the sensory analysis field to evaluate compounds associated with desirable flavor or odor attributes) and to identify differences in the composition of samples by comparing chromatographic and/or spectral patterns derived from complex mixtures. A positive characteristic of the profiling methods is the fact that they allow comparison of patterns of constituents and detection of compositional differences without the requirement for identification of all of the compounds or an understanding of the functions of all genes in an organism.
The inherent difficulties, however, in identifying all of the constituents detected in profiling methods or understanding the activity and potential biological
consequences of all genes in an organism severely limit the usefulness of these methods for predictive purposes, especially in extremely complex samples, such as most plant and animal tissues and products. In addition, profiling methods are limited by issues of sensitivity, constraints imposed by sample extraction and preparation methods, and possible artifacts generated in sample handling, extraction, and preparation. This directly impacts the ability to assess the biological consequences of any changes (real or artifactual) that are observed. Selected examples of both targeted analysis and profiling methods, which illustrate the advantages and current limitations, are discussed below.
General Considerations for Accuracy, Reproducibility, and Artifacts in Analysis of Food and Other Biological Material
Important characteristics of any analytical technique are the precision and accuracy of the chosen method, the robustness and reproducibility of the method within and between laboratories, and the ongoing identification and appreciation of any potential sources of artifacts in the methods employed. Such issues are often broadly grouped into the categories of technical variation, biological variation, and artifacts. The impact of these different categories can vary greatly among different analytical procedures and, thus, the following guidelines are intended only for general consideration.
Technical variation arises from small differences in the reactivity and stability of a compound and the chemistry and physical properties of the analytical procedure. Technical variation is most often determined as the variation from the mean following the repeated processing and analysis of the same sample. It is generally the lowest source of variation in an analytical method, but it is not identical for all compounds and can vary several-fold for different compounds targeted in an analysis. Technical variation generally ranges from 1 to 20 percent and is due primarily to the differential stability and chemical reactivity of individual compounds during extraction, isolation, separation, and quantification procedures.
Biological variation, in general, is often several-fold higher than technical variation, is independent of the analytical procedure used, and can vary significantly for each compound in an analysis. Biological variation most likely arises from small interindividual differences in the growth and development of organisms and from the interactions of an organism and the environment under apparently identical conditions.
Artifacts are, by their very nature, unpredictable and can arise from alternative reactions with reagents, with or among endogenous sample compounds, or from interactions with components inherent to the analysis (e.g., column matrices or buffer components). Artifacts can generate signals, peaks, or compounds that are not present in the original sample, or they can induce the disappearance or reduction of genuine peaks, signals, or compounds that are present in the original
sample. The potential for artifacts in any procedure is an unavoidable consequence of extraction, isolation, chemical modification, and detection.
For targeted analyses (e.g., particular vitamins or amino acids) technical variation and artifacts can often be minimized as only a limited number of compounds of often similar chemistry are being targeted and analyzed. Extraction and manipulation procedures can therefore be optimized to favor the isolation of target compounds over other compounds, while simultaneously excluding many potentially interfering compounds. Similarly, the chemical manipulations required for separation and detection also can be tailored to favor the target compounds, and the potential for artifact generation is often reduced.
Profiling methods often utilize extraction procedures that are not selective and allow a wide range of compounds (with varying chemistries) to be isolated, including any interfering or confounding compounds that may be present. Similarly, the methods of separation and detection are often a compromise to allow for a broad range of compounds to be detected and, hence, the potential for higher degrees of technical variation and artifacts is increased. It should be recognized that biological variation is essentially independent of analytical technique and would be similar for both targeted and profiling techniques.
Considerations and Strategies in Targeted Analysis of Metabolites and Other Constituents
Advances in analytical chemistry have been applied widely to the quantitative analysis of specific compounds and classes of compounds in food and related biological material. For many nutrients, toxic compounds, and other food constituents, modern methodologies allow a more sensitive, precise, and rapid analysis than could previously be accomplished. However, complicated aspects of sample preparation often limit laboratory throughput, and many issues of calibration, standardization, and other quality control considerations remain to be satisfactorily resolved.
In spite of the overall advances in food analysis, a complete analysis of all nutrients and potentially relevant phytochemicals and other compounds of interest remains an arduous task for even the most advanced analytical laboratory because the complete identification and quantification of all compounds in a sample, plant, or product is yet to be accomplished. Thus the development of a paradigm of analytical requirements that focuses on the most important compositional questions would be the most prudent in terms of effective use of analytical resources and consumer safety. It is better to devote analytical resources to a thorough determination of the most nutritionally and toxicologically relevant compounds than to broaden the analysis unnecessarily by including analyses for compounds that have little importance to overall health and food safety. The following discussion illustrates this rationale and includes representative examples. This should not be viewed as an exhaustive list.
There does not appear to be an appropriate one-size-fits-all approach for targeted analysis when evaluating the safety of new food derived from genetic engineering or conventional breeding. However, as stated above, it is reasonable to propose a minimal set of analyses, in addition to basic proximate composition, as a starting point. As discussed below there are several variations on and extensions of current analytical practice that can achieve a more complete understanding of the pattern of nutrients, toxicants, antinutrients, and other relevant constituents. In this framework, requirements for additional specific analyses could be selected on a case-by-case basis according to the particular chemical composition (e.g., nutrient profile) and potential risks associated with a given type of food product.
A realistic goal in the analytical evaluation of GE food is measuring the content of relevant essential micronutrients (vitamins and minerals), essential macronutrients (essential amino acids and fatty acids), nonessential nutrients of importance to health (e.g., dietary fiber, total fat), antinutrients (e.g., enzyme inhibitors and lectins), and known toxicants. With respect to nutritional composition, primary analytical attention should be directed toward those compounds of greatest importance. For example, legumes (such as soybeans) do not constitute a significant source of dietary ascorbic acid, but they can be an important source of folate for some populations. Analytical requirements and priorities should be established accordingly.
The nature of the genetic change introduced also should be considered in establishing analytical requirements. For example, a product in which genetically introduced changes alter total protein content, expression of a specific protein or synthesis of one or more amino acids should trigger a requirement for analysis of the full pattern of amino acids. Similarly, alterations affecting the synthesis of any type or class of lipids (e.g., altered fatty acid profile or altered sterol content) should trigger a requirement for full analysis of a fatty acid profile, as well as other lipid classes and related compounds (e.g., fat-soluble vitamins).
The potential to improve the nutritional quality of a food by genetically upregulating the biosynthesis or storage of nutrients or other plant components already has been demonstrated (Yan and Kerr, 2002), and this continues to be an active area of research and development. Such changes can be achieved through breeding, by genetic engineering of the enzymes of an entire biosynthetic pathway or enzymes that have a high impact on product synthesis or accumulation, or by the biosynthesis of a limiting precursor. Compositional engineering of this type raises the potential for unintended changes in the chemical composition of the resulting food. This could occur through changes in the overall flux of total carbon or alterations of flux through pathways that supply multiple aspects of plant or animal metabolism (e.g., isoprenoid or methyl group synthesis).
There also is the potential that increasing the concentration of a precursor molecule could lead to greater concentrations of that chemical’s metabolites. For example, plants often have a large capacity to glycosylate (a biochemical modifi-
cation to add carbohydrate structures to a molecule) many compounds, and an increase in a precursor molecule that is subject to such glycosylation could lead to an increase in its glycosylated derivatives. In addition, increasing the concentration of a chemical compound, such as a vitamin or other phytochemical, could lead to an increased concentration of catabolic products. While these examples may or may not be relevant to food safety, they illustrate several ways in which unexpected compositional changes could occur.
Vitamins. Many vitamins exist as a group of structurally related chemical compounds. In some cases, all of the various forms of a vitamin exhibit approximately the same biological activity and bioavailability for humans. However, in other cases, the various forms may have different biological properties. For example, members of the carotenoid family exhibit large differences in vitamin A activity.
In the case of vitamin E, understanding of the vitamin E activity of compounds has changed recently, and this should be reflected in the interpretation of analytical data. Although all members of the tocopherol and tocotrienol families exhibit antioxidant activity, only alpha-tocopherol and its esters appear to exhibit activity in satisfying the human nutritional need for vitamin E (IOM, 2000). Any change in the alpha-tocopherol content, regardless of the relative proportions of other tocopherols or tocotrienols, would alter the net nutritional value of the food product. This principle was demonstrated in an experimental enhancement of tocopherol synthesis that, when accompanied by an engineered increase in the conversion of gamma-tocopherol to the more nutritionally active alpha-tocopherol, led to a large increase in vitamin E activity (Shintani and DellaPenna, 1998).
Another consideration with respect to vitamins and the selection or development of targeted analytical methods is the relationship between chemical form and bioavailability. A classical example is the heterogeneous group of conjugated or “bound” forms of niacin that exist in corn and certain cereal grains. Little is known about the genetic or environmental factors that affect the conversion of niacin compounds to such unavailable forms, but they have been shown to change with maturation in corn (Wall et al., 1987). Measuring free (i.e., bioavailable) niacin, in addition to total niacin, as has been conducted in many classical analyses, should be conducted in grains using contemporary methods specific for the available forms of this vitamin. It should be noted that this issue would be of lesser importance for grain products destined to be used in food or feed in which nutritional supplementation, enrichment, or fortification with niacin is practiced.
The potential for alteration in vitamin bioavailability also exists in many plants for vitamins other than niacin that can undergo glycosylation (Gregory, 1998). For example, a substantial fraction of the vitamin B6 in many foods from plant origin can exist as a beta-glucoside that exhibits only partial bioavailability in humans. A change in the proportions of free and glycosylated forms of vitamin B6 could alter the overall nutritional characteristics of a fruit, vegetable, or grain. Thus a focused analysis that included a measurement of all forms of this vitamin, glycosylated and nonglycosylated, or a full assessment of nutritional properties in plant substances that are important sources of this vitamin would be necessary.
Amino Acids. Many advances have occurred in the measurement of the amino acid content of food and other biological material, such as improved resolution, speed, and sensitivity. Genetic engineering and conventional breeding practices have the potential to alter the amino acid content of food either intentionally or unintentionally, and such changes have the potential to be nutritionally important, especially in plant-derived food. The limiting essential amino acid in legume crops is typically methionine, while cereal grains are generally limiting in lysine and/or threonine. Genetically induced changes in protein expression could either lessen or accentuate these nutritional limitations, as could changes in the biosynthesis of the essential amino acids.
Fatty Acids. Changes in the concentration of total fat and the profile of fatty acids in oilseed crops can have significant nutritional effects, and such compositional changes can be mediated through both genetic engineering and nongeneticbased engineering methods (Thelen and Ohlrogge, 2002). GC methods allow measurement of the full distribution of fatty acids and should be included in the focused analysis of oilseed crops and other organisms that contribute significantly to dietary lipid intake. Modern high-resolution GC allows for the detection of a wide range of fatty acids and their geometric and positional isomers. This analysis should include identification, by GC-MS, of novel fatty acids detected in new food being evaluated, whether the food was derived from genetic engineering or from conventional breeding.
Dietary Fiber and Related Constituents. It is not likely that major changes in total dietary fiber content would occur in new plants derived from either genetic engineering or conventional breeding. However, in view of the importance of plants as the primary sources of dietary fiber and the potential for dietary fiber constituents to affect the bioavailability of certain nutrients, measurement of total dietary fiber should be performed. Further analysis to quantify the individual classes of fiber constituents seems unnecessary unless evidence of changes in total dietary fiber is found.
A report of increased lignin in stalks of various lines of GE (Bt) corn (Saxena and Stotzky, 2001) suggests that compositional changes in dietary fiber should not
be overlooked, although more specific analytical methods, including the use of a rigorous sampling regimen, would be required to allow a more comprehensive interpretation of this reported phenomenon. Measurement of total dietary fiber, as well as the various major dietary fiber constituents, would be necessary. The concentration of other potentially undesirable constituents should also be considered when appropriate. For example, has the concentration of galactosyl sucroses (flatulence factors) been increased in new lines of legumes or certain other plants? Contemporary high-performance LC (HPLC) and LC-MS methods are well suited for such a separation and quantitative analysis of these oligosaccharides.
Food Constituents that Potentially Affect Nutrient Bioavailability
Although the concept of nutrient bioavailability is very complex, several factors that affect bioavailability are sufficiently well-characterized to merit their inclusion in a targeted analysis of relevant food. For example, phytate is a common constituent of cereal grains that affects the bioavailability of many divalent cationic minerals (e.g., zinc and iron). Oxalate is commonly found in green leafy vegetables and affects mineral bioavailability by a mechanism similar to that of phytate. Another example is the digestibility of protein from oilseeds, which is known to be poor in the uncooked state. This is due to the presence of various enzyme inhibitors and lectins, as well as to the natural resistance of native oilseed proteins to digestion. The measurement of these components would be prudent in new lines of soybeans and other constituents.
Biologically Active Non-nutritive Compounds
As stated above, biologically active and potentially toxic compounds should be selected for analysis as appropriate on the basis on their natural existence. The following is a brief discussion of several plant components that should also be considered.
Mycotoxins. The importance of determining secondary toxins from other organisms (e.g., mycotoxins) should not be overlooked. It is possible that compositional and structural changes in plant tissues due to genetic engineering could make the plants either more or less susceptible to mycotoxin contamination as a result of differences in insect resistance and, hence, susceptibility to mold infestation and contamination with mycotoxins (e.g., aflatoxins and fumonisins). For example, several reports indicate that corn engineered to express the Bacillus thuringiensis toxin has lower fumonisin content, presumably due to decreased insect herbivory and, hence, decreased introduction of fungi into the plant tissues (Bakan et al., 2002; Clements et al., 2003; Dowd, 2001; Duvik, 2001). In view of the potential for such variability in susceptibility, routine analysis of mycotoxins, as is already routine practice, continues to be warranted. Great advances have
been made in the measurement of mycotoxins using immunochemical and traditional HPLC methods.
Phytoestrogens and Other Non-nutritive Bioactive Constituents. Isoflavones and lignans are common constituents of soy and related legumes that exhibit estrogenic activity and have some potential for enhancement through genetic engineering (Liu et al., 2002). Because these compounds may have positive health effects for some consumers and adverse effects when present in excessive quantities, an understanding of phytoestrogen content and bioactivity is important. Thus phytoestrogens should be considered candidates for routine targeted analysis of relevant plants. The results of such analyses should be interpreted with caution, however, because of the potential for large variability in phytoestrogen levels among samples as a function of both genetic (i.e., variety) and environmental factors (Lee et al., 2003).
Hormones. Changes in the levels of a hormone (e.g., as a consequence of biotechnology) may produce a variety of effects, depending on the magnitude of the change. Hormone actions are complicated by several factors. Hormones cannot act unless there is an activated receptor. Thus, if the process of genetic modification or engineering does not impact on expression of the receptor, the likelihood of a metabolic effect would be minimal. Additionally, hormones often have indirect actions on nontarget tissues. For example, stromal cells that express receptors for sex steroids also produce growth factors, which affect overlying epithelial cells. Identification of an unintended adverse effect would require an assessment for effects on multiple cell types in order to detect potential downstream effects and determine whether there is a potential risk to health that may result. Testing criteria for GM, including GE, products with respect to primary hormones and potential downstream effectors may be useful to identify unintended health effects.
Alkaloids. The types of alkaloids present in plant tissue are highly species-dependent (Ashihara and Crozier, 2001; Verpoorte and Memelink, 2002). These include, but may not be limited to, gossypol, tomatine, caffeine, and solanine. Contemporary HPLC, GC, LC-MS, and GC-MS methods facilitate rapid and sensitive targeted analyses of specific alkaloids.
Targeted Analysis of DNA, RNA, and Proteins
To provide a complete picture of the genetic and compositional changes of food produced by either genetic engineering or conventional breeding, a targeted analysis using the tools of modern molecular biology should be used to provide information regarding the specific genetic changes that have occurred.
For example, a Southern blot or similar analysis could be used to confirm the introduction of one or more new genes. Sequencing upstream and downstream from the transgene would provide information regarding the site of the insertion and the possibility that the insertion has disrupted another gene or its regulatory element.
It may also be prudent to conduct a targeted analysis of the transgene transcript (and genes adjacent to the insertion site) by using gene-specific probes and methods such as Northern blot analysis or real time-polymerase chain reaction to verify and quantify the extent, developmental timing, and tissue specificity of transgene expression to ascertain that other adjacent genes are not impacted by their expression in the transgenic line. Perhaps most important in a food safety assessment is the measurement of the expression of the protein encoded by the transgene and, if enzymatically active, the concentration of the reaction products and their metabolites, as discussed above. Information on approaches to safety assessment of GM foods is also available in the Report of the Fourth Session of the Codex Ad Hoc Intergovernmental Task Force on Foods Derived from Biotechnology (FAO/WHO, 2003). The novel protein may also be assessed for allergenic potential and other safety considerations as discussed in Chapter 6.
NONTARGETED ANALYTICAL METHODS FOR METABOLITES
The term metabolomics has been used to describe the nontargeted analysis of small organic molecules in a complex sample. In theory the method should allow unambiguous identification and precise and reproducible quantification and detection of all chemical constituents of a sample, even those varying in concentration by several thousand-fold. In practice the chemical and physical manipulations required for the method, combined with the extreme diversity of the compounds being analyzed, do not allow all compounds to be studied in a single analysis.
The differential requirements for solubility, stability, and detection of different compounds and compound classes, coupled with the current limitations for absolute structural identification, pose significant limitations on the application of this methodology for use in detecting and determining the unintended consequences in a food. Indeed, the complete identification and quantification of all compounds in a sample, plant, or product are still far from a reality, even for the most intensively studied model biological systems (e.g., E. coli, Arabidopsis, yeast). Additionally, as discussed earlier and below, even if and when an unintended consequence is demonstrated, whether as a result of breeding, chemical mutagenesis, or genetic engineering, it is most often difficult to impossible to predict, based on this information alone, the effect (if any) the change might have on human or animal nutrition, biology, and health, even for extremely well-studied compounds.
Improvements in chromatographic and spectroscopic instrumentation, innovative chromatographic and electrophoretic separation techniques, and enhanced data-processing capabilities have led to major improvements in our understanding of food composition. In particular, separation techniques (e.g., HPLC, GC, and capillary electrophoresis) have greatly extended the capability of analysts to resolve the components of complex mixtures for both small molecules and macromolecules.
Major advances in MS instrumentation have led to the widespread availability of compact, highly sensitive, relatively inexpensive, and user-friendly HPLC-MS and GC-MS equipment suitable for quantitative and qualitative analysis. Such analytical improvements also have led to a better ability to detect and quantify compositional changes associated with biological variation and with variables such as agricultural practices, climate, and genetics. In addition, the application of such techniques has led to the recognition that the composition of biological material, especially plants, is far more complex and variable than previously believed, and that the majority of plant chemical constituents have yet to be identified and structurally characterized.
The potential of these contemporary methods in the detection of compositional changes in plant tissues has been discussed in reviews by Kuiper and associates (2001, 2003). These same techniques are also applicable to the analysis of metabolites in tissues and fluids from humans and test animals, and these may have applicability in the evaluation of the metabolic impact of changes in food composition (German et al., 2003).
Advantages and Disadvantages
It is believed that plants, from a purely biochemical perspective, are among the most chemically complex organisms on the planet. The total number of different compounds produced by all plants is currently estimated to be between 100,000 and 200,000 and is likely to be several times higher as analytical methodologies improve (Fiehn, 2002). The number of compounds produced by a single plant species may vary between 5,000 and 10,000. Given the complexity of metabolites in plants, no single analytical methodology is currently available that will achieve resolution and quantification of all compounds in a plant tissue. Therefore, several different methods are often used, individually or in parallel, to attempt to resolve and quantify, and where possible, to absolutely identify compounds in a plant mixture.
Three of the more common methodologies, GC-MS, LC-MS, and NMR are briefly described below. For all three methods and for the study of plant metabolism in general, major limitations include:
A lack of universally accepted standardized methods for extraction, separation, and quantification of metabolites
A lack of spectral libraries that would allow the unambiguous identification of a peak from a given analysis
A need for improved data management and data-mining systems in order to derive useful information from data sets generated through research.
Another limiting factor is the lack of commercial availability of many known natural products, thus making the task of generating spectral standards for identification more difficult. A consequence of recent consolidation in the preparative chemical industry has been a reduction in the catalog of commercially available chemicals. Efficient application of these analytical techniques may require setting up a resource capable of producing purified natural products either from natural sources or from chemical synthesis.
One of the most widely used and robust analytical methods for metabolite profiling is GC-MS. Typically plant tissues are extracted with various combinations of organic and aqueous solvent systems. All compounds cannot be extracted by a single solvent system, and usually several systems that differ in polarity are used. The compounds present in each extraction are then analyzed. Derivatization is a requirement for the separation of most compounds by GC, but it introduces the complication that the procedure can modify the target molecule such that absolute structural determination of the original molecule is not possible.
Despite the limitations of differing solubility and chemical derivatization, several studies have shown that 100 to 500 individual compounds can be reproducibly resolved in a single GC-MS analysis of plant tissue extracts. Generally 20 to 40 percent of the molecules can be unambiguously identified based on published mass spectra (Fiehn et al., 2000a, 2000b; Roessner et al., 2000). Although this represents only a fraction of the estimated 5,000 to 10,000 compounds in a given plant tissue, it is nonetheless an important advancement in the ability to broadly characterize metabolites in a nontargeted fashion. The technical variation of most GC-MS methodologies is generally less than 10 percent, while the biological variation encountered in several comprehensive studies averages 50 percent and is highly dependent on the particular compound in question (Fiehn et al., 2000b; Roessner et al., 2000).
GC-MS metabolite profiling has recently been applied to a number of experimental plant systems. In studies of potato tubers (Roessner et al., 2000, 2001), more than 150 compounds were resolved, of which 77 could be identified. Technical variation was 6 percent or lower for 29 of 33 compounds analyzed, while the biological variation for these same compounds ranged from 2 to 36 percent and exceeded technical variation by two- to tenfold. When tubers grown in soil
were compared with those grown in sterile culture with an external carbon source (glucose), large differences in a range of metabolites (sugars, amino acids, organic acids, and unknowns) were observed. Similarly, potatoes genetically engineered to have significant changes in primary carbohydrate metabolism (Roessner et al., 2001) showed significant differences in a large range of metabolites (sugars, amino acids, organic acids, and unknowns).
Another series of metabolite profiling studies utilized the model plant system Arabidopsis thaliana. In one set of studies (Fiehn et al., 2000a, 2000b), more than 326 distinct compounds were identified, of which the structures of 164 could be unambiguously determined, while 162 were of unknown chemical structure. Technical variation in these studies was less than 10 percent, while biological variation for 11 compounds studied in detail ranged from 17 to 56 percent, with an average of approximately 40 percent.
Two mutants derived from chemical mutagenesis were compared with their respective wild-type parental lines (Fiehn et al., 2000a). The dgd1 mutant caused a 90 percent reduction in the galactolipid digalactosyldiacylgylcerol, a major component of the chloroplast membrane. In this study, 153 of 326 metabolites (known and unknown) changed significantly in the dgd1 mutant compared with the wild type. A second mutant, sdd1, affected the number of stomata on the leaf surface. (Stomata are pairs of cells that work together to regulate gas exchange between the leaf and the atmosphere.) Because stomata are a minor component of the leaf, one might expect the sdd1 mutation to impact metabolism less severely than dgd1. This was indeed found to be the case as the sdd1 mutant had 41 metabolites altered relative to the wild type. It is important to note that neither the sdd1 nor dgd1 mutant is a result of genetic engineering—both were obtained by chemical mutagenesis. These mutants demonstrate the potential for large and unanticipated compositional changes as a result of genetic modifications by mutagenesis, a method other than genetic engineering.
In a separate set of experiments, different wild-type ecotypes (Col0 and Col24 [analogous to different plant varieties]) were studied to determine whether they could be distinguished by their metabolite profiles (Taylor et al., 2002). These two wild-type Arabidopsis are fully cross-fertile, and progeny from crosses between the two lines were also analyzed. Four hundred forty-three compounds were identified, but only 92 of these had structures unambiguously determined. Interestingly, the compounds showing the most variation were those whose structures and identities were known. The unknown compounds had on average tenfold less variation than known compounds. Col0 was lacking 27 peaks that were present in Col24, while Col24 was lacking 14 peaks that were present in Col0. Bioinformatics approaches to data analysis were able to differentiate the two wild-type ecotypes with relatively high precision.
A final example that highlights the biological variation that exists between and even within a plant is a metabolite profile study of pumpkin phloem exudates (Fiehn, 2003). Phloem is a specialized cell type in plants that transports a variety
of nutrients (e.g., carbon fixed by photosynthesis in leaves) from source tissues (leaves) to sink tissues (root and fruit). As with prior metabolite profiling work, approximately 400 compounds (the majority unknown) could be identified. A surprising result of this work was that each leaf on an individual plant had a distinct overall metabolite profile that could be distinguished from the others by bioinformatics analyses.
The differences between identical aged leaves (e.g., leaf 2) of different plants grown under as identical conditions as possible also differed significantly. Approximately 30 to 40 percent of the metabolites of a leaf were significantly different from the overall average leaf profile. These data highlight the resolution that can be obtained with metabolite profiling, but they also raise issues of the natural biological variation that exists when tissues are considered for sampling. There is thus a need for studies that quantify the natural variation of the chemical composition of varieties used for nutrition before it is possible to establish if changes in modified varieties are within the normal range of variation.
LC is an alternative to GC for separating compounds for analysis. A major advantage to LC is that compounds do not have to be derivatized and can be resolved intact, although derivatization may still be needed to resolve particular compounds. LC separation can be coupled to a variety of detectors, including spectrometers. One example is the coupling of LC to a photodiode array detector that evaluates the ultraviolet/visible absorption spectra of a compound. This methodology has been used to profile isoprenoids (carotenoids, plastoquinones, and tocopherols) in tomato fruit (Fraser et al., 2000). However, photodiode array detectors are limited in the breadth of compounds detected, as many metabolites of interest do not absorb well in the visible or near ultraviolet wavelengths.
MS coupled with LC separation has many advantages over LC photodiode array approaches as many more classes of plant metabolic compounds (e.g., isoprenoids, alkaloids, flavanoids, and saponins) can be separated and detected with MS (Huhman and Sumner, 2002; Tolstikov and Fiehn, 2002). However, no single LC-MS procedure allows separation and determination of all classes of compounds, and the technique suffers from what are termed matrix effects, in which the presence of one compound in the spectrometer affects the ability to detect and quantify another compound. Unlike GC-MS, LC-MS methodology is still in its infancy. Standardized protocols and an understanding of factors affecting technical variation and reproducibility are still being developed by the scientific community.
A third potential approach to profiling metabolites in an extract is NMR spectroscopy. NMR, in both proton and carbon-13 modes, can provide finger-
prints with good chemical specificity for compounds that are present in relatively high abundance in a tissue or extract, but it has more limited utility for lower-abundance compounds. Unlike GC- and LC-based methods, NMR can be performed on whole tissues and, as such, need not be destructive. NMR can also be performed on extracts or fractionated extracts and interfaced with LC methodologies. Unlike LC-MS and GC-MS analyses, NMR can be performed relatively rapidly and with moderate to high throughput.
Several studies have shown the potential of NMR as a screening tool. Because plant constituents differ widely in their solubility properties, meaningful NMR analysis cannot be performed on a single plant extract. Extractions with multiple buffers or solvent mixtures to obtain groups of compounds of different polarity are required to broadly analyze plant components by NMR. In addition, due to biological variation, independent extracts of several plants need to be individually obtained and compiled to derive an average spectrum for a genotype.
Noteborn and colleagues (2000) examined proton NMR spectra of a genetically transformed tomato fruit and an appropriate control. In that study, extracts were prepared and analyzed in five fractions obtained by extraction with solvent mixtures of differing polarity. It was concluded that 27 to 30 percent of the detected compounds varied in concentration between transgenic and nontransgenic lines. However, most of these compounds were not identified. Although this study showed the potential of NMR for identifying compositional differences, the role of NMR in food safety assessment has not been demonstrated. A particular need is the standardization and optimization of extraction protocols and other aspects of sample handling (Defernez and Colquhoun, 2003; Kuiper et al., 2003).
Carbon-13 and proton NMR techniques also are powerful tools in investigating metabolic pathways and their inherent fluxes (Ratcliffe and Shachar-Hill, 2001; Roberts, 2000). In this context, an appropriate metabolic precursor that is labeled with one or more carbon-13 or deuterium atoms is introduced into the organism where it can undergo further metabolism. Analysis of various extracts can identify the intermediates and products of metabolism and, if done sequentially over time, rates of reactions can be determined. While this does not provide information regarding composition, which is the primary analytical goal in a food safety assessment, such techniques do provide a means of assessing metabolic changes that might be introduced by genetic engineering, mutagenesis, or conventional breeding.
BIOINFORMATIC ISSUES IN PROFILING ANALYSIS
Metabolic fingerprinting is carried out by any method that can provide a pattern that is unique to a certain sample. Such methods do not necessarily attempt to identify any compounds (or proteins or DNA). In addition to the methods involving a chromatographic separation (e.g., GC-MS and LC-MS), methods that are often used for fingerprinting are direct spectral techniques. These in-
clude: NMR (Holmes et al., 2000; Raamsdonk et al., 2001); Fourier transform infrared spectroscopy (Johnson et al., 2003); and direct infusion mass spectrometry, where samples do not pass through a preliminary separation such as GC or LC (Goodacre et al., 2002).
In all of these cases the spectra are composed of peaks that may not necessarily correspond to chemical species, as they may form from their interactions (e.g., the phenomena of ion suppression in direct-infusion MS). Nevertheless, the spectra are characteristic of the sample. Such metabolic fingerprinting is mostly useful for purposes of classifying samples, such as determining whether two samples are chemically different. Classification is made by applying pattern recognition algorithms to a set of training data, for which one must know the class membership of all samples.
Many algorithms exist that are suitable for this task, such as discriminant function analysis, artificial neural networks, hidden-Markov models, decision trees, genetic programming, and other statistical and machine-learning methods. Essentially, these methods are first calibrated with the training data set so as to optimally distinguish, through the use of a combination of the spectral variables, samples that belong to different classes.
After calibration, such algorithms can be used to classify whether the unknowns are similar to or different from the training set. This particular approach of fingerprinting is more robust to artifacts arising from technical variation (e.g., in sample preparation) than the targeted approaches that are based on identification of particular chemicals in the sample. While fingerprinting may be wellsuited to distinguishing samples containing particular sources of food by detecting compositional differences (e.g., genetic engineering versus isogenic control), it is inappropriate for identifying unintended effects of genetic modification because there is no attempt to determine the specific chemical nature of the compositional change identified, only that one sample is different from another.
One particular approach that has great potential in identifying possible unintended effects of genetic modification is based on comparison with baseline metabolic profiles. The metabolic profile (obtained by GC-MS or LC-MS) of a GE organism (GEO) would be compared with the profiles of corresponding wild-type organisms, and those peaks that differ significantly would be identified (in particular, the appearance of new peaks, but also the absence or reduction of existing peaks).
Those components that appear in significantly different amounts in the modified organism as compared with the baseline would then need to be identified chemically, bearing in mind that even in plants that have not been genetically engineered, only about 20 percent of the compounds detected by the method can be associated with a known structure. The next step would be to assess whether there is a possibility of toxicity or other negative biological effect, which is also a technically daunting task (see Chapter 6).
With the baseline metabolic profile approach, an interval profile that is rep-
resentative of the wild-type organism must be created. To do so requires measuring a large number of samples to establish limits of natural variation for each component (peak) of the profile. This may include profiling several different strains, varieties, or breeds when these are common in the diet. An important detail of this approach is that the statistical significance of the differences needs to be established, which also requires several samples of the modified organism and application of appropriate statistical methods, because it is important to show that the difference between the profile of the modified organism is larger than the normal range of variation of the wild-type samples measured. The number of biological replicates for wild type and the GM crop should be large (at least several dozen) to provide ample data sets to describe the biological variation and thus provide sufficient statistical power. A database management system should also be constructed specifically for this purpose, and it needs to capture as much metadata (data about the experimental protocol) as possible.
The documentation of detailed procedures is the only way to demonstrate that the comparisons were made with a minimal chance of introducing systematic errors that may have generated false levels of similarity or differences between wild-type and modified sample populations. Such a database of interval profiles for various food sources would then be a primary way to assess differences in composition and would also be an important resource to establish the nutritional value of different food sources (literally the biochemical difference between apples and oranges).
Implications for Predicting Unintended Health Effects
Increased analytical capability does not equate to an enhanced ability to predict health outcomes for several reasons. For even the best-studied plant systems (e.g., Arabidopsis), only a fraction of the compounds present can be resolved by a given method (e.g., LC-MS or GC-MS). Of those that are resolved, only a fraction (20-30 percent) can be identified with certainty as to their structure. The vast majority of chemical peaks detected by these methods remain classified as unknown compounds.
It is very difficult to interpret or predict the effects on human health of changes in the composition of a single food item and, especially, the health effects of changes in a single food item present in the total diet. This is true even for compounds for which large amounts of nutritional data are already known (e.g., specific amino acids or fatty acids; see Chapter 6). Problems in assessing the significance to human health of compositional changes in individual food or in the total diet are further amplified by the biological variation between samples, differences in analytical protocols and results between laboratories, and changes in composition that inevitably occur over time. Although advanced technologies are promising, limited knowledge of their role in mammalian systems, along with an inability to identify or functionally characterize
differences, prohibit them from serving as a basis of safety assessment at this time. Their most useful present application may be the detection and quantification of known toxic compounds.
Interpretive Limitations of Metabolite Separation and Analysis Techniques
The past decade has seen incredible advances in the technologies associated with analytical separation of plant metabolites. The combined advances in applying GC-MS and, more recently, LC-MS and NMR to characterize plant metabolism have increased by a factor of 10 to 100 the number of compounds that can be resolved with modern instrumentation. Even though some analyses may yield up to 1,000 different chemical species, this is still a fraction of the total number of compounds present in a plant extract. The parallel combination of GC-MS and LC-MS holds promise for allowing the future analysis of a much larger percentage of metabolites in a plant extract. Both methodologies are highly demanding from an analytical perspective and are prone to the generation of artifacts unless conditions of extraction, preparation, and analysis are rigidly standardized and enforced on a global scale.
A serious limitation in these analyses, even if the numbers of compounds that can be resolved and identified increases from 1,000 to 5,000, is that a large number of compounds will remain to be identified by any of the procedures described above. For GC-MS analyses, 50 to 80 percent of potentially unidentified compounds are still unknown. A major international effort is required to chemically and structurally characterize the compounds in plants and build plant compound-specific spectral libraries and reference databases to address this issue. The development of internationally recognized and followed standards for extraction, derivatization, chromatographic separation, and detection is needed to allow data from different laboratories to be compared across space and time. The validation of separation and quantification methodologies using agreed-upon standards is also needed to ensure reproducibility and comparison.
Pattern Recognition Methods for Evaluation of Compositional Equivalence
The profiling methods described above are useful analytical tools for unique compounds. However, they have limitations when applied to complex mixtures, such as food. Food, particularly plant-derived food, is a mixture of thousands of different compounds. Many of these compounds will coelute in the analysis, even though they are different compounds. As the complexity of the mixture increases, there is a greater probability that unique compounds will not be identified by currently available profiling techniques. Thus additional analytical tools must be identified and applied to screen complex mixtures for unique compounds that may initiate an adverse health effect when consumed.
An additional consideration is the biological relevance of a new compound. Unless each individual compound in a new modified food is tested for adverse effects, the biologic relevance will not be identified (see Chapter 5). Determining biologic relevance requires analysis beyond profiling individual compounds. Generally animal models are used to detect adverse effects from new compounds. An adverse response to a new compound may be seen when it is tested as a pure compound, but this is not always the case when it is tested as a component in a mixture, such as a food. An example is the introduction of a nutrient, such as iron, into a food that contains chelating agents, such as certain polysaccharides. When tested as a pure compound, iron will have greater biologic activity than when tested in a mixture that contains chelating agents that will bind and thereby decrease its biologic activity.
An additional consideration is the level of a compound that is introduced into a test animal to detect adverse effects. Again, when the compound is pure, higher levels can be tested in in vivo animal systems than can be introduced as a food component. Thus adverse effects may be detected with high levels of a pure compound that would not be seen at low levels and in a complex mixture.
PROFILING METHODS FOR ANALYSIS OF INORGANIC ELEMENTS OF NUTRITIONAL AND TOXICOLOGICAL IMPORTANCE
The major focus of discussion has been the analysis of organic compounds; however, the analysis of trace elements poses equally important but distinctly different challenges. Unlike the vast array of organic compounds present in foods, the inorganic constituents of foods constitute a much smaller array of analytes to be measured. As discussed in Chapter 6, many mineral elements are nutritionally essential but have toxic potential at only slightly higher levels of intake, and interactions can occur among the nutritionally essential minerals. In addition, changes in plant genetics, especially modifications intended to alter mineral concentration (Clemens et al., 2002; Holm et al., 2002) have the potential to alter the content of multiple trace elements. Thus, it is essential that the focus of mineral analysis not be overly narrow.
The traditional targeted analytical approach for the measurement of inorganic elements is generally based on sensitive and specific methods such as atomic absorption or atomic emission spectrophotometry. Although targeted analyses are fully adequate for the determination of individual elements of nutritional and toxicological interest, analytical approaches that allow a determination of multielement profiles have conceptual and practical advantages for monitoring the composition of GM food products.
Nontargeted mineral analysis is performed typically using either inductively coupled plasma-optical emission spectrometry or inductively coupled plasmamass spectrometry, with thermal ionization mass spectrometry as an alternative
in some applications. These techniques, with proper attention to sample preparation and method calibration, are capable of providing quantitative data in profiling a wide range of elements, including aluminum, iron, potassium, magnesium, sodium, lead, zinc, arsenic, cadmium, calcium, molybdenum, cobalt, chromium, copper, mercury, manganese, nickel, tin, selenium, strontium, and vanadium in foods and many other types of samples (Almeida et al., 2002; Brescia et al., 2003; Cariati et al., 2003; Frachler et al., 1998; Losso et al., 2003; Wang et al., 2000). These applications provide quantitative data on the total content of the elements in a sample, while coupling multielement analysis with a preliminary separation of proteins yields information about the metal content of specific metal-binding proteins such as metallothioneins (Goenaga Infante et al., 2003).
As discussed in the context of organic constituents, a critical aspect of multielement analysis is the interpretation of data. Subtle differences in patterns of inorganic elements can be indicative of the geographic origin of agricultural commodities (Brescia et al., 2003) but may have little or no nutritional or toxicological significance. As with other aspects of the profiling of food constituents, the interpretation of data from multielement analysis is the critical issue in evaluating differences due to genetic modification in the context of compositional effects of geographic location, climate, and agronomic variables.
It has been proposed that differential gene expression be used as a method to determine the substantial equivalence of genetically modified organisms (GMOs), including between genetically engineered organisms (GEOs) and other GMOs (GAO, 2002; Kuiper et al., 2001, 2003; van Hal et al., 2000). Genomic technologies can measure the level of thousands of transcripts simultaneously, thereby providing a molecular phenotype that can be used to compare transcript expression between the immediate progenitor and the GM species. Differential gene expression can be measured using open and closed technologies (Green et al., 2001).
Open technologies, such as serial analysis of gene expression, do not require prior sequence knowledge of the organism, can survey all transcripts of the organism in a given tissue under study, can capture transcript sequence information (e.g., splice variations or small nucleotide polymorphisms), and are quantitative. However, open systems require extensive DNA sequencing to achieve a critical mass of data to adequately profile gene expression of the organism and are not likely to be cost-effective for routine screening. Serial analysis of gene expression has been used to identify differentially expressed genes in rice (Matsumura et al., 1999).
In contrast, closed systems, such as GeneChips or cDNA/oligonucleotide microarrays, require a priori sequence information for each gene that is to be monitored (van Hal et al., 2000). Microarrays only measure the expression of genes represented on the array and, in general, do not adequately account for
differences resulting from naturally occurring differences in a gene sequence between organisms (splice variants). In addition, closed systems are specific to the organism and, to some extent, the strain. Analyses of data for both open and closed systems are still emerging, with different approaches significantly affecting outcome, interpretation, and the conclusions drawn (Quackenbush, 2002).
There are a number of challenges that must be addressed before incorporating microarray technology into the safety assessment of GM food. One major issue that limits the utility of differential gene expression technology to assess substantial equivalence is the lack of data regarding the expression level of genes in an organism under various growth conditions and developmental stages, as well as in cells and tissues (GAO, 2002). Ranges of these expression levels must be defined. Furthermore there is a questionable correlation between the level of a transcript and the gene product. Therefore, differences in gene expression between the progenitor and the GMO may not be reflected in differences in the level of expressed protein.
The term proteomics ideally refers to the analysis of the complete complement of proteins of an organism. In practical terms it is still impossible to detect all proteins of an organism but, at least for model organisms, a large proportion of the predicted proteins can be detected. Proteins are extremely important biochemical components as they are very abundant in all biological material, and they are the molecular machines that function in cells. They are made up of linear chains of any of 21 individual amino acid units (plus a few more very rare amino acids) that occur in varying quantities, patterns, and fold in characteristic, but diverse, secondary and tertiary structures. Proteins are also the major component of the human immune system and are able to recognize other proteins by binding.
In general, proteins are broken down into small peptides and amino acids in the digestive tract and so their amino acid composition is important in human and animal nutrition. However, some proteins are very stable and resist digestion, while others are detected by the immune system at extremely low levels and cause severe allergic reactions in a proportion of humans and animals. Analyzing the constituent proteins of plants and animals for human consumption is an important component of assessing the consequences of genetic or other modifications. An additional factor is that assessing changes in the composition of particular proteins may reveal changes in chemical compounds that are not detectable or identifiable with the techniques of metabolomics.
Detection and Identification of Proteins
Proteomic analysis differs in both the techniques used and the analytical intent from the targeted analyses (e.g., HPLC, enzyme-linked immunosorbant assay, and
immunoblot techniques) used to detect and quantify individual proteins or groups of proteins. Protein identification is almost exclusively done by MS methods, while quantification is done through 2-dimensional gel electrophoresis or tandem-LC. In all cases, the protein separation precedes the MS identification step.
As recently reviewed by Regnier and colleagues (2002), a growing application in protein analysis is termed comparative proteomics, in which a reference sample is compared with a sample derived from an altered state. Comparative proteomics has great potential in the evaluation of the compositional effects of genetic changes in food and other biological material. However, at present the technology has not been sufficiently well developed for use in the routine assessment of GE food. As with other forms of profiling analysis, a major problem is the issue of data processing and bioinformatics (Regnier et al., 2002). For example, how effectively and reliably can any proteomic technique identify and quantify changes in a protein of potential toxicological interest among thousands of other proteins in the sample?
Comparative proteomic analysis can be performed by either of two basic approaches (Regnier et al., 2002). The traditional approach is the separation of intact proteins in a sample. Two-dimensional polyacrylamide gel electrophoresis is the original and probably remains the most common application of such a mass separation approach.
Separation of proteins in 2-dimensional gel electrophoresis is a technique that has been practiced in laboratories for nearly 30 years, since its development in 1975 (O’Farrell, 1975). Proteins are separated first by their acidic properties in one direction and then by their size in the orthogonal direction in the polyacrylamide gel. The gel is then stained with a dye that reveals where protein spots are located; the protein abundance is calculated based on the size and intensity of the spot. Gels usually resolve several hundred to a few thousand different protein spots from a biological matrix. After visualizing the protein spots, they are identified by MS using one of two methods. (A few other methods have been proposed, but their development is still at a very early stage and their use is not realistic in production environments.)
The initial interpretation generally involves a comparison of patterns among reference and test sample and, thus, is similar in principle to other profiling methods in which the pattern is analyzed without knowledge of the identity of most of the components. The identity of unknown proteins (i.e., spots on a 2-D gel electrophoresis) can frequently be determined by further analysis of a partial amino acid sequence obtained by chemical or MS analysis.
MS analysis of proteins (e.g., from 2-D gel electrophoresis) is based on the assumption that an adequate database of amino acid and nucleotide sequences exists for the plant being analyzed or that sufficient homology exists with more fully characterized species. This issue is a major limitation in the analysis of many plant proteins at this time.
When sequence information does lead to a tentative identification, the iden-
tity of the unknown can be supported by additional information (mass and iso-electric point) derived from protein separation by electrophoretic mobility. This technique is, at best, semiquantitative. In addition, the identification of potentially important differences among reference and test samples is complicated by the extreme complexity of the array of expressed proteins, which is further complicated by post-translational modification. Although this traditional approach to proteomic analysis has been used to characterize many proteins in plant systems, there has been little or no application of protein patterns in comparing plants or animals that have been altered by genetic engineering or other variables that would affect protein expression.
When the genome of the organism under analysis has been fully sequenced as, for example, for baker’s yeast and rice, the method of peptide mapping can be used to identify proteins. This method consists of breaking the proteins in each gel spot with a single endoprotease enzyme that cuts the sequence of amino acids at well-known positions, resulting in a mixture of smaller peptides. The peptide mixture is then injected into a mass spectrometer; usually a matrix-assisted laser desorption/ionisation-time of flight mass spectrometry instrument, which can accurately measure the mass of the peptides (down to fractions of atomic units), and the amino acid composition of the peptide is then calculated. The full genome of that organism would have been previously scanned to list all protein sequences it encoded, and these would be used to predict the products of digestion by the endoprotease used in the assay and to accurately calculate their masses.
Next, the proteins are identified by matching the accurate peptide masses measured with the theoretical masses of the peptides obtained from the genome sequence. This method is routine in a growing number of research laboratories and core facilities. When the full genome sequence of the organism has not been determined, which is true for the majority of species of nutritional importance, a different strategy for protein identification must be used. This requires using mass spectrometers that interface with LC (quadrupole-time of flight or, essentially, ion traps) and that are able to perform tandem MS so that the peptides can be broken one amino acid at a time and the accurate mass of the resulting peptides can be measured. A partial amino acid sequence is thus obtained for each of the constituent peptides of the original protein, and identification is carried out by finding other known proteins whose sequence is similar to the partial sequences obtained from the mass spectrometer. Both methods are absolutely dependent on bioinformatic methods and on complete genome-sequencing efforts.
The alternative approach in proteomic analysis is termed peptide mapping. In this approach the proteins in the sample are subjected to partial hydrolysis prior to any separation, and this is followed by GC-MS or LC-MS analysis (Regnier et al., 2002). The resulting pattern is an extremely complex mixture of peptides that may reflect compositional differences among samples. Determining the identity of the differing proteins in this method requires that a database of predicted peptides be created for each organism. This is a one-time effort and is
not a limiting factor in using this technique as all equipment vendor software comes equipped with such databases. However, a limitation does exist in the current state of the art in annotation of gene function. While all of the proteins of rice may be able to be identified, more than half of these are of unknown function. Similar ratios are typical from other plants and animals of nutritional interest. The method of identification by partial protein sequencing requires a more involved bioinformatic approach. In this case the identification relies on the similarity of sequence between the protein of interest and other known proteins of any origin. Thus the reference sequence database should include all known proteins and be kept up to date.
Usually the databases used in proteomic analysis are the Swiss-Prot (a database with high-quality annotation) and TrEMBL (a computer-annotated supplement to Swiss-Prot) combination (Boeckmann et al., 2003; http://www.ebi.ac.uk/swissprot/) or the nonredundant version of GenBank (translated to protein sequence code). One improvement to this approach is to use a focused database of partial-known sequences that derives from expressed sequence tag (EST) projects. EST projects exist for many species of nutritional interest and, in several cases these are large bodies of expressed sequences (e.g., soybean or corn). The idea then is to attempt to match the partial peptide sequences to the partial mRNA sequences. Matches allow the relation of the protein to the one particular mRNA fragment, but identification is still ultimately done through a match to all known protein sequences because the EST is also annotated in the same way. Proteomics is indeed most effective with fully sequenced genomes.
An issue that is problematic in many current applications of proteomics relates to the inefficiencies of the 2-dimensional gel separation. One problem is that a number of proteins do not migrate well in these gels, membrane proteins being the most abundant of this class. A second issue is related to the limit of detection of the staining process that reveals where proteins are located in the gel. The commonly used Coomassie-blue dye has a rather high limit of detection, resulting in many protein spots in the gel never being revealed in the analysis.
Alternatives to this dye are silver staining and fluorescently labeled stains. While silver is problematic because of interference with the MS processes, the use of fluorescence dyes has lowered the limit of detection. However, the dynamic range of these dyes is also not as good as needed. As a result, it is still not possible to accurately measure the very abundant proteins or the very low abundant proteins. This is perhaps the greatest obstacle to using proteomics to monitor unintended effects of modification because strongly allergen proteins are often present in very low concentrations.
What Should be Analyzed?
Proteomics, like metabolomics, can be used either to profile proteins, resulting in lists of proteins present in the analyte, or to fingerprint, where only a char-
acteristic signature of the biological matrix is obtained. The latter can be effective in terms of comparing modified organisms with the wild types, but it does not necessarily identify the sources of difference. This may be a faster first-screening process, which can be followed by more detailed profiling when changes exceed a defined threshold. As stated previously, immunochemical assays for specific proteins of interest (e.g., known allergens) can be incorporated into targeted analysis independent of a proteomic analysis.
INFORMATION OBTAINED FROM NEW ANALYTICAL TECHNIQUES
With the increased sensitivity and resolution of technology during the past decade, an analyst now has the ability to detect and quantify tens of thousands of possible changes in biological molecules (e.g., DNA, RNA, protein, and metabolites) in a given system. Complete genome sequences have been obtained for many organisms, and this allows scientists to identify nearly all the genes (protein encoding and otherwise) in an organism. However, the majority of the proteins encoded by the genes in an organism are novel to biology and their functions remain unknown. An understanding of the physiological, developmental, and biochemical roles and interrelationships of selected genes for the growth and development of an organism are known for only a small fraction of the genes in a given genome.
Complete genome sequences and other technological advances have driven the development of techniques that make it possible to simultaneously analyze the expression of many thousands of genes in an organism in a particular tissue, such as the time of development and growth. However, the ability to identify which changes in gene expression are biologically significant and to place this global view of gene expression into a biological context is quite limited, even for the best-studied organisms.
The situation for the global analysis of protein expression levels and protein modifications (proteomics) is even less advanced, as the protein complement of an organism is more complex than DNA because proteins can be post-translationally modified and exist in several different forms, not all of which may have the same function in a cell. Finally, new techniques have allowed characterization of changes in the levels and types of a wide variety of biochemical compounds. Again, in this case no technique is available that can provide a complete characterization of all molecules in a cell or tissue. Even if this were possible, the vast majority of metabolites observed have not been identified chemically, and the biological significance for the organism in which the compounds are produced or for other organisms that ingest these compounds as part of their diet remains unknown.
Thus while new global technologies for profiling gene expression, proteins, and metabolites have increased the breadth and resolution of analyses that are
possible in biological systems and now allow scientists to generate vast expression, protein, and metabolite datasets for a single tissue, our ability to relate these vast datasets in a predictive food safety context remains limited. Analytical capabilities have increased substantially during the past decade, but these have not been accompanied by parallel increases in the ability to understand the biological consequences of individual compounds or complex mixtures of compounds or by the ability to predict adverse health effects from exposure to new compounds in food. The chemical identity and biological relevance of a large percentage of new compounds that may be identified by the methods described in this chapter are unknown.
DNA sequencing can, however, provide the full complement of the genetic information encoded in an organism. For transgenic organisms, DNA sequencing allows the precise location of the inserted transgene in the genome and the context of the inserted gene to be determined. Thus it can readily be determined if the transgene inserted has disrupted a gene encoded in the organism.
Regardless of the analytical methods used and the quality and depth of compositional data obtained, data interpretation remains the critical issue for evaluating the significance of unintended compositional changes. Questions that bear consideration include the following: How is analytical data for a new food, whether genetically engineered or produced by conventional methods, interpreted? Against what references should the new food be compared? Should the same analytical path be used for GE and non-GE food? In Chapter 7, an analytical framework is proposed for addressing these and related issues.
Almeida CM, Vasconcelos MT, Barbaste M, Medina B. 2002. ICP-MS multi-element analysis of wine samples—A comparative study of the methodologies used in two laboratories. Anal Bioanal Chem 374:314–322.
Ashihara H, Crozier A. 2001. Caffeine: A well known but little mentioned compound in plant science. Trends Plant Sci 6:407–413.
Bakan B, Melcion D, Richard-Molard D, Cahagnier B. 2002. Fungal growth and fusarium mycotoxin content in isogenic traditional maize and genetically modified maize grown in France and Spain. J Agric Food Chem 50:728–731.
Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C, Phan I, Pilbout S, Schneider M. 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31:365–370.
Brescia MA, Kosir IJ, Caldarola V, Kidric J, Sacco A. 2003. Chemometric classification of Apulian and Slovenian wines using 1H NMR and ICP-OES together with HPICE data. J Agric Food Chem 51:21–26.
Cariati F, Fermo P, Gilardoni S. 2003. Optimization of an urban particulate matter multi-element analysis method by inductively coupled plasma-atomic emission spectrometry (ICP-AES). Ann Chim 93:539–550.
Clemens S, Palmgren MG, Krämer U. 2002. A long way ahead: Understanding and engineering plant metal accumulation. Trends Plant Sci 7:1360–1385.
Clements MJ, Campbell KW, Maragos CM, Pilcher C, Headrick JM, Pataky JK, White DG. 2003. Influence of Cry1Ab protein and hybrid genotype on fumonisin contamination and fusarium ear rot of corn. Crop Sci 43:1283–1293.
Defernez M, Colquhoun IJ. 2003. Factors affecting the robustness of metabolite fingerprinting using 1H NMR spectra. Phytochemistry 62:1009–1017.
Dowd PF. 2001. Biotic and abiotic factors limiting efficacy of Bt corn in indirectly reducing mycotoxin levels in commercial fields. J Econ Entomol 94:1067–1074.
Duvick J. 2001. Prospects for reducing fumonisin contamination of maize through genetic modification. Environ Health Perspect 109S:337–342.
FAO/WHO (Food and Health Organization of the United Nations/World Health Organization). 2003. Report of the Fourth Session of the Codex Ad Hoc Intergovernmental Task Force on Foods Derived from Biotechnology, Yokohama, Japan. Online. Available at ftp://ftp.fao.org/docrep/fao/meeting/006/y9220e.pdf. Accessed September 21, 2003.
Fiehn O. 2002. Metabolomics: The link between genotypes and phenotypes. Plant Mol Biol 48:155–171.
Fiehn O. 2003. Metabolic networks of Cucurbita maxima phloem. Phytochemistry 62:875–886.
Fiehn O, Kopka J, Dörmann P, Altman T, Trethewey RN, Willmitzer L. 2000a. Metabolite profiling for plant functional genomics. Nat Biotechnol 18:1157–1161.
Fiehn O, Kopka J, Trethewey RN, Willmitzer L. 2000b. Identification of uncommon plant metabolites based on calculation of elemental compositions using gas chromatography and quadrupole mass spectrometry. Anal Chem 72:3573–3580.
Frachler M, Rossipal E, Irgolic KJ. 1998. Trace elements in formulas based on cow and soy milk and in Austrian cow milk determined by inductively coupled plasma mass spectrometry. Biol Trace Elem Res 65:53–74.
Fraser PD, Pinto ES, Holloway DE, Bramley PM. 2000. Application of high-performance liquid chromatography with photodiode array detection to the metabolic profiling of plant isoprenoids. Plant J 24:551–558.
GAO (General Accounting Office). 2002. Genetically Modified Foods: Experts View Regimen of Safety Tests as Adequate, but FDA’s Evaluation Process Could Be Enhanced. GAO-02-566. Washington, DC: GAO.
German JB, Roberts MA, Watkins SM. 2003. Genomics and metabolomics as markers for the interaction of diet and health: Lessons from lipids. J Nutr 133:2078S–2083S.
Goenaga Infante H, Van Campenhout K, Schaumlöffel D, Blust R, Adams FC. 2003. Multi-element speciation of metalloproteins in fish tissue using size-exclusion chromatography coupled on-line with ICP-isotope dilution-time-of-flight-mass spectrometry. Analyst 128:651-657.
Goodacre R, Vaidyanathan S, Bianchi G, Kell DB. 2002. Metabolic profiling using direct infusion electrospray ionisation mass spectrometry for the characterisation of olive oils. Analyst 127:1457–1462.
Green CD, Simons JF, Taillon BE, Lewin DA. 2001. Open systems: Panoramic views of gene expression. J Immunol Methods 250:67–79.
Gregory JF. 1998. Nutritional properties and significance of vitamin glycosides. In: McCormick DB, ed. Annu Rev Nutr. Palo Alto, CA: Annual Reviews. Pp. 277–296.
Holm PB, Kristiansen KN, Pedersen HB. 2002. Transgenic approaches in commonly consumed cereals to improve iron and zinc content and bioavailability. J Nutr 132:514S–516S.
Holmes E, Nicholls AW, Lindon JC, Connor SC, Connelly JC, Haselden JN, Damment SJ, Spraul M, Neidig P, Nicholson JK. 2000. Chemometric models for toxicity classification based on NMR spectra of biofluids. Chem Res Toxicol 13:471–478.
Huhman DV, Sumner LW. 2002. Metabolic profiling of saponins in Medicago sativa and Medicago truncatula using HPLC coupled to an electrospray ion-trap mass spectrometer. Phytochemistry 59:347–360.
IOM (Institute of Medicine). 2000. Vitamin E. In: Dietary Reference Intakes for Vitamin C, Vitamin E, Selenium, and Carotenoids. Washington, DC: National Academy Press. Pp. 186–283.
Johnson HE, Broadhurst D, Goodacre R, Smith AR. 2003. Metabolic fingerprinting of salt-stressed tomatoes. Phytochemistry 62:919–928.
Kuiper HA, Kleter GA, Noteborn HP, Kok EJ. 2001. Assessment of the food safety issues related to genetically modified foods. Plant J 27:503–528.
Kuiper HA, Kok EJ, Engel KH. 2003. Exploitation of molecular profiling techniques for GM food safety assessment. Curr Opin Biotechnol 14:238–243.
Lee SJ, Ahn JK, Kim SH, Kim JT, Han SJ, Jung MY, Chung IM. 2003. Variation in isoflavone of soybean cultivars with location and storage duration. J Agric Food Chem 51:3382–3389.
Liu CJ, Blount JW, Steele CL, Dixon RA. 2002. Bottlenecks for metabolic engineering of isoflavone glycoconjugates in Arabidopsis. Proc Natl Acad Sci U S A 99:14578–14583.
Losso JN, Munene CN, Moody MW. 2003. Inductively coupled plasma optical emission spectrometric determination of minerals in catfish frame. Nahrung 47:309–311.
Matsumura H, Nirasawa S, Terauchi R. 1999. Technical advance: Transcript profiling in rice (Oryza sativa L.) seedlings using serial analysis of gene expression (SAGE). Plant J 20:719–726.
Noteborn HPJM, Lommen A, van der Jagt RC, Weseman JM. 2000. Chemical fingerprinting for the evaluation of unintended secondary metabolic changes in transgenic food crops. J Biotechnol 77:103–114.
O’Farrell PH. 1975. High resolution two-dimensional electrophoresis of proteins. J Biol Chem 250:4007–4021.
Quackenbush J. 2002. Microarray data normalization and transformation. Nat Genet 32S:496–501.
Raamsdonk LM, Teusink B, Broadhurst D, Zhang N, Hayes A, Walsh MC, Berden JA, Brindle KM, Kell DB, Rowland JJ, Westerhoff HV, van Dam K, Oliver SG. 2001. A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nat Biotechnol 19:45–50.
Ratcliffe RG, Shachar-Hill Y. 2001. Probing plant metabolism with NMR. Annu Rev Plant Physiol Plant Mol Biol 52:499–526.
Regnier FE, Riggs L, Zhang R, Xiong L, Liu P, Chakraborty A, Seeley E, Sioma C, Thompson RA. 2002. Comparative proteomics based on stable isotope labeling and affinity selection. J Mass Spectrom 37:133–145.
Roberts JKM. 2000. NMR adventures in the metabolic labyrinth within plants. Trends Plant Sci 5:30–34.
Roessner U, Wagner C, Kopka J, Trethewey RN, Willmitzer L. 2000. Technical advance: Simultaneous analysis of metabolites in potato tuber by gas chromatography-mass spectrometry. Plant J 23:131–142.
Roessner U, Luedemann A, Brust D, Fiehn O, Linke T, Willmitzer L, Fernie A. 2001. Metabolic profiling allows comprehensive phenotyping of genetically or environmentally modified plant systems. Plant Cell 13:11–29.
Saxena D, Stotzky G. 2001. Bt corn has a higher lignin content than non-Bt corn. Am J Bot 88:1704–1706.
Shintani D, DellaPenna D. 1998. Elevating the vitamin E content of plants through metabolic engineering. Science 282:2098–2100.
Taylor J, King RD, Altmann T, Fiehn O. 2002. Application of metabolomics to plant genotype discrimination using statistics and machine learning. Bioinformatics 18:S241–S248.
Thelen JJ, Ohlrogge JB. 2002. Metabolic engineering of fatty acid biosynthesis in plants. Metab Eng 4:12–21.
Tolstikov VV, Fiehn O. 2002. Analysis of highly polar compounds of plant origin: Combination of hydrophilic interaction chromatography and electrospray ion trap mass spectrometry. Anal Biochem 301:298–307.
van Hal NL, Vorst O, van Houwelingen AM, Kok EJ, Peijnenburg A, Aharoni A, van Tunen AJ, Keijer J. 2000.The application of DNA microarrays in gene expression analysis. J Biotechnol 78:271–280.
Verpoorte R, Memelink J. 2002. Engineering secondary metabolite production in plants. Curr Opin Biotechnol 13:181–187.
Wall JS, Young MR, Carpenter KS. 1987. Transformation of niacin-containing compounds in corn during grain development: Relationship to niacin nutritional availability. J Agric Food Chem 35:752–758.
Wang T, Wu J, Hartman R, Jia X, Egan RS. 2000. A multi-element ICP-MS survey method as an alternative to the heavy metals limit test for pharmaceutical materials. J Pharm Biomed Anal 23:867–890.
Yan L, Kerr P. 2002. Genetically engineered crops: Their potential use for improvement of human nutrition. Nutr Rev 60:135–141.