Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 5
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary Workshop Proceedings The workshop was divided into four sessions (Appendix C). The flrst focused on properties of DNA after formalin flxation. The second examined ways to obtain sequence information from formalin-flxed samples. In the third session, participants discussed applications of bioinformatics for reconstructing DNA sequences from formalin-flxed samples. Each session began with brief presentations by participants with relevant expertise, followed by open discussion. The challenge to participants was to identify a path to successful recovery of DNA sequence information from formalin-flxed samples stored in either alcohol or formalin. In the flnal session, participants made suggestions on areas of research and experimentation needed to investigate the mechanisms and kinetics of DNA damage by formalin flxation and on how to develop ways to repair DNA that would make it useful for study. Workshop cochair Donald M. Crothers (Yale University) acknowledged the enormous potential for advancing science that could accrue to DNA sequence information from museum specimens. Biologists in various disciplines, for example, would like to use natural history specimens collected over 100-year spans for evolutionary, molecular, and genetic studies. However, users encounter major problems in obtaining both the DNA and the sequence information from those samples because of interference from the formalin flxation. Although the proximate goal is to obtain sequence information of the cytochrome c oxidase subunit 1 (COI) gene for DNA
OCR for page 6
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary barcoding,1 the ultimate goal is to obtain genome sequences for other studies. Biologists have used different methods for extracting DNA from formalin-flxed samples, and some have yielded DNA sequence information, but only under narrow conditions. However, those nominal successes suggest that the problem of recovering DNA sequence information from formalin-flxed samples is solvable. This workshop brought a group of experts together to discuss potential solutions and alternative methods. Crothers clarifled that the specimens in question had been flxed and stored in 5 to 10 percent formalin solution or had been flxed in formalin solution for a few days and then preserved in ethanol. Workshop cochair Ann Bucklin (University of Connecticut), explained that in some cases the formalin for preservation or storage was unbuffered and therefore acidic. Mark Rubin (Brigham and Women’s Hospital) and David Schindel (Consortium for the Barcode of Life; CBOL) suggested that, although the workshop’s focus was on biological samples stored in aqueous solution, much could be learned from protocol development for DNA extraction from formalin-flxed and paraffln-embedded samples. Marvin Caruthers (University of Colorado) said that paraffln embedding creates a more stable environment for the formalin-flxed sample than storage in aqueous formalin or alcohol. For example, whereas the pH of the paraffln does not change, the formaldehyde in formalin can be oxidized to formic acid by exposure to atmospheric oxygen, thereby reducing its pH. DNA IN SAMPLES EXPOSED TO FORMALIN Reactions of DNA and Formaldehyde To begin the discussion on the effect of formalin exposure on DNA, Crothers showed a slide that sums up the reactions that occur in formaldehyde flxation of a drug, adriamycin (Figure 1). During flxation, 1 “DNA barcoding is a technique for characterizing species of organisms using a short DNA sequence from a standard and agreed-upon position in the genome. DNA barcode sequences are very short relative to the entire genome, and they can be obtained reasonably quickly and cheaply. The cytochrome c oxidase subunit 1 mitochondrial region (COI) is emerging as the standard barcode region for higher animals. It is 648 nucleotide base pairs long in most groups, a very short sequence relative to 3 billion base pairs in the human genome, for example” (“DNA barcoding,” Consortium for the Barcode of Life, http://barcoding.si.edu/DNABarCoding.htm).
OCR for page 7
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary FIGURE 1 Covalent adriamycin-DNA adduct. SOURCE: Zeman et al., 1998. formaldehyde reacts with amino groups of guanine (G), adenine (A), and cytosine (C), and a guanine residue is a typical reaction product. The formaldehyde forms covalent linkages with the amino groups, which then cross-link with proteins, so the drug is linked to the guanine moiety. The other guanine moiety has a strong hydrogen bond with adriamycin, which produces a tight, stable complex. However, the drug must be kept cold to maintain the stability of the flxation; heat can cause the disassociation of the entire complex. The cross-links are labile to the aromatic amines of DNA, and the cross-links or the reaction to formaldehyde are stable. The process of formaldehyde flxation alters DNA in three ways: through fragmentation, sequence modiflcation, and cross-linking. Cross-linking is not destructive to nucleic acids, and is reversible. Using 13C-labeled formaldehyde, Crothers said, it is possible to see that the methylene carbon came from the formaldehyde (Figure 1). Nuclear magnetic resonance (NMR) spectroscopy can be used to show what happens to the formaldehyde carbon when it reacts with DNA in different circumstances so that the kinetics of
OCR for page 8
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary the reactions between formaldehyde and different kinds of nucleotides over time can be revealed. Using modern NMR methods to study formaldehyde’s reactions with double-helical or single-stranded oligonucleotides could reveal information on the kinetics. If formaldehyde is left unbuffered, it is oxidized to form formic acid, which has destabilizing properties. The formic acid depurinates DNA, that is the cleavage of the N glycosidic link between purine bases and deoxyribose in DNA resulting in the loss of purine from the DNA backbone, and the degradation is likely to be irreversible. Crothers mentioned a paper by Quach and colleagues (2004) that assessed sequence modiflcations due to formalin flxation. That group reported that formalin flxation speeds sequence modiflcation, but that the rate does not depend on the duration of formalin flxation. The ability to make a longer amplicon using polymerase chain reaction (PCR) analysis decreases dramatically with increasing duration of formalin flxation. Crothers said he suspected that the formaldehyde used for flxation is oxidized to formic acid over time which causes denaturation of DNA and more cross-linking reactions. Storage of samples in unbuffered formalin for prolonged periods is likely to produce DNA that is so degraded it cannot be used for PCR analysis. Crothers cautioned that, in some cases, even if PCR did not produce amplifled DNA, the lack of an amplicon does not imply an absence of DNA. Rather, DNA puriflcation reagents could contain PCR inhibitors. In response to that comment, Charles Cantor (Sequenom, Inc.) suggested the use of an internal control (that is, adding copies of a standard that is known to amplify with its primers) to ensure that PCR was not inhibited. Crothers suggested mass spectrometry or single-molecule sequencing for small DNA fragments as an alternative to PCR. Because single-molecule sequencing can be done on multiple molecules, the resulting sequences could be compared to locate the damage in each sequenced molecule. Caruthers agreed with Crothers that sequencing DNA from formalin-flxed samples is comparable to sequencing apurinic acid with small stretches of pyrimidines. He suggested that sequence information can be recovered from those small stretches but the informatics involved would be challenging. Mitochondrial DNA (MtDNA) has many adenine-thymine base pairs (that is, it is A-T rich) so that the method that Caruthers suggested is not likely to work, said Robert DeSalle (American Museum of Natural History). Timothy O’Leary (U.S. Department of Veterans Affairs) added that MtDNA is less accessible than is nuclear DNA, probably because of the abundance of adenine-thymine base pairs.
OCR for page 9
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary Caruthers stated that the major problem is in the cross-linkages. First, there are cross-linkages between bases—adenine with adenine or adenine with cytosine. Second, there are cross-linkages with proteins—for example, histones with a lot of lysine. The samples would require protease treatment to yield DNA. Protease K is commonly used to degrade proteins, but Caruthers said that the cross-linkages would halt DNA polymerization. He suspected that the polymerase would be stopped by lysine-adenosine or adenine-adenine linkages. Aside from cross-linkages, Caruthers said it is not clear how DNA would be degraded by formalin unless a solution were acidic and acid hydrolysis were causing depurination. Daniel Ryan (Agilent Technologies, Inc.) suggested a possible solution to the depurination problem. In DNA, purines are base paired with pyrimidines so that all that remains in depurinated DNA is the pyrimidines—absent their complementary bases. Ryan and his colleagues are working with microarrays that are functionally similar to addressable beads. They have seen a single nucleic acid bound to a microarray spot, and they have detected one molecule or one of those spots. He suggested that it might be possible to isolate depurinated DNA using the DNA’s remaining binding energy. Crothers questioned whether there is a hybridization system that can recognize depurinated DNA. Ryan suggested that it is a stringency problem. Cantor suggested a method complementary to Ryan’s. He said that if a universal base, such as inosine, could be added to those apurinic sites, the fragments would become nucleic acids again, and working with nucleic acids is simpler. Caruthers and Timothy Harris (Helicos BioSciences Corporation) agreed that sequence information could be obtained from nucleic acids reconstructed from fragments of depurinated DNA by single-molecule sequencing. However, Tom Evans (New England Biolabs, Inc.) said the proposed repair method would work only with double-stranded DNA. O’Leary showed the reactions of formaldehyde with nucleotides and nucleic acids over a short period (under 24 hours) (Equations 1 and 2). [Equation 1] [Equation 2] He reported that the reactions are reversible. The reaction between formaldehyde and nucleic acid carried out at 24°C could be reversed by incubation at 70°C or by dialysis. However, if the sample were transferred to alcohol after flxation, other reactions and molecular alterations would occur.
OCR for page 10
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary Transferring the formalin-flxed sample to ethanol triggers depurination of DNA in the sample and formation of ethanol adducts. The ethanol adduct could be cleaved at pH 4, which is not acidic enough to cause depurination. Even a short period of formalin flxation followed by dehydration can cause damage but the nature of that reaction is not well-studied. Investigating the reaction with single nucleotides could prove useful. Reliability of Sequence Information Some participants questioned the reliability of the sequence information obtained from formalin-flxed samples. For example, would certain DNA strands be more susceptible to sequence modiflcation as a result of formalin flxation? If so, the DNA sequence obtained would not be a true representation of the original specimen. DeSalle suggested that multiple clones could be examined to see whether there is consensus in the DNA sequences among the clones. Harris pointed out that examination of multiple clones would be only useful if the sequence alterations that result from flxation are random. By comparing multiple clones and looking for overlap in sequences, a true sequence can be deciphered. However, if the alteration is systematically biased—that is, if sequence modiflcation occurs in the same region of every clone—then there is no way to determine where the alteration occurs unless the formalin-flxed sample can be compared with a fresh sample. Rubin pointed out that conducting a systematic comparison of fresh and preserved samples could lead to a better understanding of the problems associated with recovering DNA from formalin-flxed samples. The comparative study would allow documentation of alterations in formalin-flxed samples. Then, more research could determine whether any of those alterations hampers the determination of the original sequence. Oxidative Damage Miral Dizdaroglu (National Institute of Standards and Technology) discussed his work on oxidative stress and damage. His laboratory uses mass spectrometric techniques to observe oxidative damage in DNA isolated from animal tissues between 10,000 and 20,000 years old. Oxidative stress causes the formation of highly reactive hydroxyl radicals, which react with DNA bases and with the sugar moiety of DNA, possibly to cause base damage, sugar damage, DNA protein cross-links, and single- and double-strand
OCR for page 11
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary breaks in DNA. The damage occurs because a hydroxyl radical can add to the double bonds, forming other intermediate radicals that react further. Cytosine and purine lesions, for example, form as a result of reactions with hydroxyl radicals. Most lesions are mutagenic, which means that polymerase will not stop but will go onto the wrong base across from those lesions. Some lesions are lethal; they stop polymerase, and DNA cannot be synthesized from that point on. Dizdaroglu has found thymidine dimers in samples exposed to ultraviolet radiation, but he had not compared lesions in DNA extracted from fresh samples with that extracted from formalin-flxed samples. Harris questioned whether a vial of DNA that sustains oxidative damage could be repaired by an enzyme cocktail. Several participants said that it could be repaired to some extent. Basic repair, however, involves many enzymes, said Dizdaroglu. Crothers added that DNA had to remain double stranded for successful enzymatic repair. Furthermore, hydroxyl damage has some minimal sequence preference, so lesions might not always occur in the same region. Variations in Curatorial Treatments Participants who work with natural history collections discussed the variations in the curatorial processing of biological samples. Buffered and unbuffered formalin, for example, have been used for flxation and storage. Although most zooplankton samples are flxed and stored in formalin, others often are flxed in formalin and transferred to a 70 percent ethanol solution after flxation. The duration of formalin flxation varies widely among samples stored in ethanol. Ryan asked whether anyone had determined the size of DNA extracted from formalin-flxed specimens. Speciflcally, he was wondering whether a speciflc formalin treatment or curatorial treatment of a specimen leads to increased fragmentation of DNA. Crothers asked whether anyone had obtained PCR products from specimens preserved in aqueous formalin. Bucklin replied that she and her colleagues had examined DNA sequences for northern krill, Meganyctiphanes norvegica (Crustacea, Euphausiacea), flxed and stored in formalin for 2, 3, 15, and 18 years (Bucklin and Allen, 2004). When they amplifled the DNA to determine the size of fragments, they found that the longer the specimen had been preserved in formalin, the shorter the DNA fragments. Bucklin and colleagues had not been able to obtain DNA from specimens stored in unbuffered formalin. Crothers reiterated that obtaining DNA from samples that had been stored in
OCR for page 12
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary pH 2 formalin is fruitless because the formic acid is likely to have irreversibly depurinated the DNA. Caruthers stressed that an attempt to restore damaged DNA by neutralizing pH 2 formalin might incur more damage, at least to the sample’s RNA. Christoffer Schander (University of Bergen) also has compared the success of different protocols for extracting DNA from tissues flxed in formalin for different durations (Schander and Halanych, 2003). He reported that the success of DNA extraction depends not only on the protocol, but on the tissue from which the DNA was extracted. O’Leary suggested that bones and teeth could be useful alternatives to soft tissue for obtaining DNA. Bones and teeth might be better protected from damage by formalin. However, Evon Hekkala (U.S. Environmental Protection Agency) and Gonzalo Giribet (Museum of Comparative Zoology, Harvard University) pointed out that the strategy would be useless for many organisms that have neither bones nor teeth. Some participants asked how many repeats of the sequencing process would be needed to obtain reliable information. The number, according to Ernie Mueller (Sigma-Aldrich Company) would depend on the size of the DNA fragments. The shorter the fragment, the more repeats are needed. One participant indicated that fragments of 500-600 base pairs were the longest that had been obtained from formalin-flxed samples stored in aqueous solution. Rubin suggested that information about the curatorial history of samples is critical to developing an optimal protocol. He mentioned that he chaired a task force at the National Cancer Institute to devise an optimal protocol for obtaining DNA from archival human tissues. That group reported that researchers in different laboratories use different protocols for sample preservation, and sometimes, even a slight variation can make a big difference in the success of DNA extraction. Determining which variation in a preservation or processing protocol has the largest effect is an important step in identifying optimal protocols for DNA extraction. Hekkala thought that Rubin’s approach might be useful for developing a matrix that could be used to predict whether particular specimens would be useful for recovering sequence information. Participants agreed that a survey of curatorial practices could be useful for determining which other factors should be considered in identifying specimens that have the potential for DNA extraction. Giribet recalled a project by Bhadury and colleagues (2005) that examined nematodes preserved in formalin. When the nematodes were flxed for
OCR for page 13
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary 7 days, the extracted and amplifled DNA showed clear bands on a gel. But when the nematodes were flxed for 11 days, the gel showed smears instead of distinct bands, indicating that much of the usable material was lost. Bucklin said that she had obtained good DNA from samples flxed for a week or less, regardless of whether the formalin was buffered or not. Steven Hofstadler (Isis Pharmaceuticals) questioned whether Bhadury’s group quantifled the extracted DNA. If not, the results could be interpreted as a lower yield in extracted DNA from samples flxed for longer duration instead of as a decrease in ampliflcation. Schindel asked whether there is a way to detect the DNA’s integrity without extraction. Crothers said that would be a true analytical challenge. Cantor mentioned that the DNA sequences with multiple thymines in a row (called poly-T tracts) are more stable than others. Those sequences preserve well because they do not react with formaldehyde, said Crothers. One of Cantor’s students found that poly-T tracts in closely related bacteria can be distinguished if they are measured precisely. However, Cantor does not know the variability of poly-T tracks in higher organisms. Schindel asked about the relative importance of DNA degradation and PCR inhibition when DNA ampliflcation has not been observed. There is a potential for small molecules to block PCR, especially if there are only few copies of the DNA to be amplifled, said Crothers. Therefore, an internal control would ensure that PCR was not inhibited. Hofstadler cautioned that the amount of internal standard could mask the DNA to be amplifled if the DNA is present only in a low concentration. In addition to small molecule inhibitors, a lesion in DNA also can inhibit PCR, said Crothers. A lesion is an inhibitor in a sense that even though a molecule is of a given length, it cannot be amplifled because the enzymes cannot get through it. That kind of PCR inhibition is more difflcult to control for. The larger the DNA fragment, the more likely it is to have a lesion or protein bound that blocks PCR. The discussion turned to questions of how to assess the integrity of DNA before extraction. Alison Williams (Princeton University) suggested that capillary electrophoresis might be sensitive enough, and that perhaps a few bases could be observed from one or two kilobases. Cantor suggested ethidium bromide and 4,6-diamidino-2-phenylindole (commonly known as DAPI), and DeSalle suggested spectroscopic analysis. Ryan suggested reversing cross-linking by hydrolysis with water vapor at 65°C. Intercalating dyes that have speciflc fiuorescence signatures, such as ethidium bromide or the more sensitive PicoGreen, can be added (Ahn et al., 1996).
OCR for page 14
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary Schindel asked whether heating the samples would enhance extraction. O’Leary explained that the aqueous heating might not be useful for samples from natural history collections because those samples are likely to sustain more extensive damage than are paraffln-embedded, formalin-flxed samples. Hekkala said she had followed the critical drying method proposed by Fang et al. (2002) but did not obtain any ampliflable DNA from her samples. Crothers noted that some participants had not shared information about failed attempts at DNA extraction from formalin-flxed samples until this workshop. The user community has no means of communicating the protocols that they have tried to use to extract DNA and failed. Sharing that information is important because the collective information could shed light on why some attempts succeed. Crothers suggested a Web site be set up for that purpose. The development of a set of standardized reference samples to test DNA extraction protocols would be useful for comparing the protocols to determine what works under which conditions, said Schander. Scott Miller (Smithsonian Institution) explained that the Smithsonian has a set of specimens—goldflsh exposed to a series of different formalin treatments—that could be used as the standards. The specimen set includes samples of frozen tissue that has not been exposed to formalin. Schindel said he would like to see the goldflsh standard held in reserve until an acceptable approach to extraction protocol testing is developed. For example, Schindel had hoped that chemists could elucidate the degradation processes in formalin flxation that block DNA extraction or hamper PCR analysis, and then identify or develop better protocols. “There is no gold standard method at the moment,” Crothers said. Because of variations in curatorial processes, the chemistry of DNA degradation in formalin-flxed specimens is largely unknown. In summation, Crothers listed possible alterations or damage to DNA exposed to formalin. They include irreversible depurination caused by acidiflcation (if formaldehyde is unbuffered), cross-linking, oxidation from reactions with minor content of formaldehyde, cytosine deamination, and minor adducts. The chemistry of cross-linking is not well understood, and some cross-linkages are rather stable. Cytosine deanimation is enzymatically reversible in samples that contain double-stranded DNA.
OCR for page 15
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary OPTIMIZATION OF DNA SEQUENCE INFORMATION This section explored how sequence information can be optimized after DNA is extracted from samples. Several participants explained methods that could be used to assemble sequence information from DNA. PCR-MASS SPECTROMETRY Cantor explained a PCR-mass spectrometry method developed at Sequenom. The method’s advantages are in its sensitivity, precision, and cost-effectiveness. Its sensitivity is better than what is possible in conventional sequencing methods because mass spectrometry produces no background noise, and it is less expensive than real-time PCR analysis. Although the technology cannot be used to survey an entire genome, it is a cost-effective method for examining hundreds of loci in thousands of samples. PCR-mass spectrometry is fully automated, and it can process about 3000 samples per day. Some 160 entities are using the PCR-mass spectrometry technology; many are organizations that provide the service for a fee. PCR-mass spectrometry is a multiplex method that can analyze 30 genotyping or 20 gene expression samples simultaneously in one tube. Matrix-assisted laser desorption/ionization time-of-fiight (MALDI-TOF) mass spectrometry requires a smaller amount of DNA as input than PCR, so that the method is optimally designed for small amplicons. The mass spectrometry method covers an unlimited dynamic range. Because mass spectrometry is expensive, it is not used for standard sequencing. Rather, nucleic acids are sequenced by base-speciflc cleavage reactions. Sequenom has all four single base-speciflc cleavage reactions working with complicated, but single-tube, technology. It is possible to quantify reactions at every locus in mixed, complicated samples, including deanimated samples. Conventional sequencing requires one continuous target. But sequencing in the mass spectrometer by fragmentation does not require a continuous target, so a discontinuous set of sequences can be sampled for the cost of a single sequencing reaction. The PCR-mass-spectrometry method yields 98-99 percent correct typing. Cantor encouraged the users to try the method because it is a mature and available technology. Caruthers asked whether multiplexing PCR is a problem. Cantor explained it is not, because the amplicons used are short and because all amplicons are close to the same size. Sequenom had experience working with short amplicons, and the multiplex is designed by software. Generally, 28 of 30 multiplexes work well.
OCR for page 24
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary also would be unable to determine the reliability of the sequence. However, if resequencing were done by cloning the PCR products so that the entire population of DNA in the sample were analyzed, it would be possible to identify and correct the sequence modiflcations by assembling those reads. Salzburg emphasized that depth of coverage is necessary to distinguish between sequencing errors, sequence modiflcations, and true mutations. A minimum of four-times coverage could be necessary, especially if the DNA is likely to have been damaged; the cost of sequencing 700 bases is within reason for most laboratories. Schander agreed that it would be worthwhile to increase depth of coverage to examine sequence modiflcations caused by formalin flxation. If sequence modiflcation is induced by formalin, and if that is a common phenomenon, then the more that is known about it, the effects could be better predicted, said Hall. He reiterated that error screening requires sequencing of the cloned PCR products, and not the direct sequencing of a PCR product. Participants discussed how many replicate sequences of COI gene would be needed to create a reference library for DNA barcoding. Schindel said the goal of the barcoding project is to create a reference library with bidirectional reads of flve specimens per species, but no replicate per individual specimen. Hekkala said that sequencing several individuals of the same species is important so that sequence variation within a group can be observed. More important, if one formalin-flxed specimen is used for sequencing, it would be necessary to compare it against fresh or frozen tissue to ensure that sequence variation is not an artifact of formalin flxation. Schander questioned the likelihood of sequence error attributable to properties of formalin flxation. O’Leary stated that cloning would be more likely to introduce artifacts than would cycle sequencing. He suggested that data collection on the likelihood of sequence errors as a result of formalin flxation could help determine whether 5 replicates per 10,000 or more specimens would be necessary. Error introduction in PCR sequences is not unheard of, said Hall. Bioinformatics could be used to reassemble the short sequences and to identify errors in sequences if many small PCR products with overlapping regions were being assembled. The overlapping regions show where the sequence information differs. However, Hall said he did not have a good sense of the magnitude of error that would be introduced to a PCR sequence as a result of formalin flxation; so it would be difflcult to decide whether it is worth investigating. Turning the question around, Schindel asked how many times a formalin-flxed specimen that produces
OCR for page 25
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary DNA fragments would need to be resequenced to ensure that a correct sequence was assembled. Salzburg said the number of replications would depend on the confldence level sought. Bioinformaticians can quantify the likelihood of an error in a sequence if they are given a particular number of raw sequences. On the basis of that information, they can estimate the number of replicates necessary to achieve a given confldence level. SUMMING UP This section reviews the questions in the charge to the workshop participants and their answers to the questions as presented by the rapporteurs in their summary on the second day of the meeting. The questions are listed in boldface type. What is the state of preservation of DNA in the presence of formalin? Are the DNA chains intact or broken? Does formalin denature DNA or is it the process of extraction that is fragmenting the DNA? Are the nucleotides at each site being preserved or altered? The quality of DNA in a sample, the percentage of recoverable or ampliflable DNA, the length of the fragments, and whether the DNA is well preserved or nucleated in formalin-flxed samples are largely unknown. The variations in processing of formalin-flxed samples partly contribute to that lack of knowledge. For example, some samples are stored in unbuffered formalin, others are flxed in formalin for different durations and some are transferred to ethanol after flxation. Because of those variations in curatorial treatment, the kinetics of formaldehyde and DNA reactions and the byproducts of different reactions are largely unknown. DNA damages and degradation that can occur in formalin-flxed samples include: Cross-linking with formaldehyde. Fragmentation. Sequence modiflcation. Modiflcations to adenosine, including methylol adduct formation and depurination. Formation of oxidative adducts that lead to mutagenic lesions. Modiflcation of bases, including adduct formation, if the sample is stored in ethanol.
OCR for page 26
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary It is not known whether extraction processes fragment DNA, but they could introduce PCR inhibitors that prevent ampliflcation of the extracted product. Therefore, quantiflcation and characterization of DNA extracted from formalin-flxed samples would reveal both whether DNA can be obtained from those samples and what the quality of the extracted DNA would be. Ampliflcation of an internal control sequence would verify that PCR inhibitors are not the source of the problem. How can the physical and chemical states of the DNA-formalin cross-linkages be better characterized? What additional information on these cross-linkages is needed? The condition of the DNA obtained from formalin-flxed tissue can be characterized by NMR spectroscopy and by mass spectrometry. In addition to characterizing the cross-linkages and other damage to DNA, it is important to correlate the type of damage and degradation attributable to different curatorial practices. Detailed knowledge of curatorial history might signal the likely damage or degradation. For example, if a specimen were kept in unbuffered formalin, its DNA is likely to have become depurinated and useless for sequencing. Additional important information includes data on the kinetic stability of formaldehyde adducts and cross-linkages, whether the stable products are read as mutations by DNA polymerase, and whether they serve to block polymerase altogether. Mass spectroscopy and NMR on small, single-stranded and duplex DNA samples would aid in characterizing the structure and the stability of the formaldehyde reaction products. Additional work also could focus on the reactions that ensue when a sample is exposed to ethanol. What new chemical and physical methods for DNA extraction should be tested, beyond those that have already been applied to formalin-fixed tissue? Participants agreed that testing new methods for DNA extraction will not be fruitful if the condition of the DNA in formalin-flxed tissue is largely unknown because a failure to obtain sequence cannot be attributed unamibiguously to a failure of extraction protocol, or the absence of usable DNA in formalin-flxed samples, or the presence of PCR inhibitors. Some participants, including Schander, Bucklin, and Hekkela, reported that published protocols have led to some success in obtaining sequence information from formalin-flxed tissue. Instead of testing new protocols, they said testing existing protocols with a set of standardized samples could
OCR for page 27
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary provide greater insight. Those samples would include several tissue samples from one organism flxed in formalin for different periods; frozen or fresh tissue would be used as a control. Testing different protocols on samples that had been flxed and preserved with standardized curatorial methods could shed light on which extraction protocol is optimal for each type of sample. Once DNA is successfully extracted from formalin-flxed samples, different methods—mass spectrometry, single-molecule sequencing, or whole-genome ampliflcation—could be used to obtain sequence information. As with the extraction process, the optimal sequencing method might depend on the quality of the extracted DNA. In what ways and to what extent can fragmented DNA be repaired physically and chemically after extraction from formalin? In some cases, fragmented DNA can be repaired with excision enzymes mixed with other enzymes and with polymerase. But without more information about the type of damage sustained as a result of formalin flxing, designing the appropriate mix of enzymes for the repair will be difflcult. Can bioinformatics techniques be used to reconstruct the original sequence in silico from the DNA fragments recovered from formalin? Bioinformatics can be used to construct large, contiguous, consensus sequences from short fragments of DNA. One complication is that formalin flxation could cause sequence modiflcation and a sequence obtained from a formalin-flxed sample might not accurately represent the original sequence of the untreated sample. Whether formalin flxation introduces random or systematic error into a DNA sequence is not known, but it is worth investigating. The potential for error introduction could be studied by repeated sequencing of cloned PCR products and by repeating PCR analysis from replicated DNA preparations. The repeated sequencing and PCR from replicated DNA preparations could reveal whether there are overlapping consensus sequences. Based on the overlapping sequences, random errors could be corrected accordingly. Systematic errors would be more difflcult to correct. They tend to occur consistently in the same place in the sequence, thereby appearing to be a correct base. In that case, the only way to determine whether formalin flxation alters the sample’s sequence would be to compare it with a sequence from a fresh sample. In his summary, Cantor stressed that without knowledge of the quality of the DNA, flnding a solution to the problem of obtaining sequence information from formalin-flxed samples is difflcult. Rubin agreed and suggested
OCR for page 28
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary initiating a process to determine which formalin-flxed samples could be used for DNA extraction and how. He listed three elements: First, it would be necessary to screen the specimen before DNA extraction to assess its usability. Screening could be done with mass spectrometry to detect free purines or by testing sample pH. If the DNA damage appeared reversible, a repair could be attempted. The second phase would involve using different protocols to extract DNA from samples that have been subjected to various curatorial treatments. That test would provide a framework for predicting which specimen would be most likely to yield high-quality DNA in each protocol. The last phase would test how well the framework developed in the second phase could predict success in DNA extraction with a whole new set of specimens. To reveal practical limitations, O’Leary said, the process should be iterative and cover a spectrum of samples representing various species curatorial treatments. THE PATH TO EFFECTIVE RETRIEVAL OF GENOMIC INFORMATION FROM FORMALIN-FIXED SAMPLES A better understanding of the quality of DNA in samples and of how quality relates to the success of DNA extraction will be needed to inform solutions for effective retrieval of genomic information from formalin-flxed samples. To conclude the workshop, Crothers urged participants to suggest action items for advancing the retrieval of genomic information from formalin-flxed samples. This section compiles the participants’ suggestions. Properly characterize formalin-fixed samples for DNA extraction. Discussion during the workshop involved the difflculty of deriving effective ways to obtain sequence information from formalin-flxed samples—especially when there is little information about the causes of the problems. To help identify cause-and-effect relationships, participants developed a table with columns of curatorial treatments and rows of problems caused by each (Table 1). The information to be collected would include curatorial history (duration of formalin flxation and whether the formalin was in a buffered solution) and information about the quality of the DNA in the sample (presence of free purines or adducts). The quantity of DNA and its ability to be amplifled would be assessed after extraction, and PCR would be conducted on highly conserved sequences. The information collected would be used to flll in the table’s rows and columns, and
OCR for page 29
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary TABLE 1 Curatorial Treatments of Formalin-Fixed Samples and Factors That May Impair DNA Extraction or PCR Ampliflcation Factors Prohibiting DNA Extraction or PCR Ampliflcation Curatorial Treatments Excessive Fixation Excessive Heat Impurities in Alcohol Low Alcohol Level Unbuffered Formalin Other Treatments Cross-linking Cytosine deamination Denaturation Depurination Formalin-ethanol interaction Oxidative damage Point sequence modiflcation Presence of PCR inhibitors Other factors The matrix is designed to identify classes of samples in natural history collections that should not be used for DNA and sequence information on the basis of their curatorial history. The table provides some examples of curatorial treatments that could affect the quality of DNA and PCR ampliflcation. Others that could affect the quality samples also should be considered.
OCR for page 30
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary those data in turn would serve as a guide for determining whether sequence information could be obtained from a given sample. A survey could be sent to curators to obtain information on the variety of curatorial treatments in natural history collections. Establishing a network also would be useful to facilitate communication among researchers who want to obtain DNA and sequence information from formalin-fixed samples. Table 1 presents only the curatorial treatments discussed during the workshop. Other factors could inhibit DNA extraction or PCR ampliflcation, especially given the multiplicity of treatments used to preserve natural history specimens. Participants suggested designing a survey—in consultation with experts at institutions that have natural history collections—to gather information on curatorial history and on any successes or failures in the extraction of DNA or PCR ampliflcation. Participants also noted that testing of DNA extraction protocols has been done mostly by groups interested in obtaining sequence information from formalin-flxed samples. Although occasional successful attempts are reported in the literature, failed attempts are not reported. Yet comparison of successful and failed attempts could provide clues about determining factors. Thus, the establishment of a Web forum was suggested to allow researchers to pool information on their work with formalin-flxed samples. In addition to retrospective assessment, controlled experiments on standardized, formalin-fixed samples could be used to examine the mechanisms and kinetics of chemical and physical reactions that could hamper efficient DNA extraction or PCR amplification. Several experiments could begin immediately to reveal the mechanisms prohibiting efflcient DNA extraction: Preliminary studies in selected laboratories could be conducted to assess the effect of formalin and alcohol on the integrity of duplex DNA and RNA. The time course of DNA degradation and the efflcacy of DNA repair also could be examined. The effects of extraction versus flxation could be examined by studying the properties of freshly flxed tissue, oligonucleotides, and DNA samples mixed with protein.
OCR for page 31
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary The standard goldflsh samples that had been subjected to different curatorial treatments can be used for testing DNA extraction protocols and for elucidating the effects of different treatments on DNA. In addition, each goldflsh sample could be sent to different institutions for independent testing. Each institution could then test its own extraction method by quantifying DNA yield (using an Agilent analyzer, for example) and then using the extracted DNA for PCR ampliflcation. Because the flxation and extraction processes could introduce PCR inhibitors, PCR analysis would be conducted with an internal control to ensure that the reaction is not blocked. Results from the multiple-institution protocol testing would be used to develop a standard protocol for DNA extraction from formalin-flxed samples. Repeating the experiment with an invertebrate standard (for example, a fiatworm) could provide useful information. Genomic library clones could be established for samples from which DNA had been successfully extracted. The DNA could then be correlated to the flxation process and to sample history and properties. Cloning PCR products for selected formalin-flxed samples that have a known gene sequence or an equivalent frozen or fresh sample could reveal whether formalin flxation induces sequence modiflcation. A comparison of cloned PCR products from flxed tissue with products from fresh or frozen samples—or with the known sequence—would help to quantify mutations. Replicating the experiment on different samples and different species could lead to characterization of patterns and level of sequence modiflcations attributable to flxation. A database can be established to collate the information as collected in retrospective and experimental studies. Information in the database could guide the determination of whether particular formalin-fixed specimens could be used for DNA sequencing on the basis of the specimens’ chemical and physical properties. Participants drafted an example of how the data collected from retrospective and experimental studies could be organized (Table 2). Because institutions with natural history collections have so many formalin-flxed specimens, an assessment of curatorial history and of the chemical and physical properties of the specimens would help identify those that are still useful for DNA sequencing and help prioritize sequencing and barcoding efforts.
OCR for page 32
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary TABLE 2 An Example of How Data Collected from the Retrospective and Experimental Studies on DNA Extraction and Sequencing from Formalin-Fixed Biological Samples Could Be Organized. Such a Database Could Serve as a Tool for Assessing the Feasibility of Using Certain Specimens for DNA Sequencing. Fixation Type Tissue Type Duration of Formalin Fixation Quantity of DNA Obtained, by Protocols Shedlock Protocol Leeds Qiagen Chelex Critical Drying Point Other Formalin-flxed, paraffln embedded Formalin-flxed, stored Formalin-flxed, stored in ethanol Other flxation type
OCR for page 33
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary After potentially useful DNA extraction protocols are identifled from the preliminary experiments, random and focused sampling of PCR products could be conducted on a variety of sample types (from different taxa or flxed and preserved with different curatorial treatments) to identify the best protocol for each type. The issues to be addressed involve the recoverability of DNA, including sequences other than polypyrimidine tracts. Samples with euchromatic and heterochromatic DNA also could be considered. In the long term, consideration of high-throughput processing of formalin-flxed samples for DNA barcoding and genomic studies would be appropriate, given the large number of samples in museum collections. Some individuals in institutions with collections are identifying specimens in their collections or taxa suitable for high-throughput processing, but a systematic and collaborative effort could facilitate and speed up the process. Ideas and suggestions from the workshop participants could further the efflcient extraction of DNA from formalin-flxed samples, and that in turn could improve access to the sequence information of many rare or difflcult-to-collect species in natural history collections. Action by the Consortium for the Barcode of Life to follow up on the workshop participants’ ideas and suggestions could facilitate the effective recovery of sequence information from formalin-flxed biological samples.
OCR for page 34
Path to Effective Recovering of DNA from Formalin-Fixed Biological Samples in Natural History Collection: Workshop Summary This page intially left blank
Representative terms from entire chapter: