ence database comparisons are limited by the fact that not all sampled sequences are covered in these databases, and species-level assignments in particular are hard to find; however, because the method yields very little noise, researchers can be fairly confident of the assignments that are made. The second method is de novo clustering, that is, clustering 16S rRNA sequences on the basis of similarity in sequence (e.g., by allowing only up to 3 percent divergence). De novo clustering yields more granularity, that is, more species-level assignments, but it also generates more noise (e.g., sequencing and amplification artifacts). Because of the different advantages and disadvantages of each method, HMP investigators use both methods to analyze HMP data.
As an example of how OTU classification is being used to analyze the presence and abundance of microbes, HMP researchers analyzed the presence and abundance of bacterial species in stool samples from 200 subjects. While the presence of specific genera was relatively constant among individuals, the relative proportions of those genera were extremely variable. As another, non-HMP example, Wortman mentioned the Kostic et al. (2012) study of the colorectal cancer microbiome. The researchers detected a very clear signal that people with colorectal cancer have an enrichment of Fusobacteria in their tumor tissue. As a final example, HMP researchers used both reference-based and de novo OTU classification to analyze OTU data from all five major body sites among the “healthy cohort” study individuals. Reference-based OTU classification methods were used to analyze genus-level trends, while de novo classification was used to analyze species-level trends. Results of the two methods were consistent for all body sites except for vaginal samples, where researchers found the least amount of genus-level diversity but high levels of species diversity. According to Wortman, previous work by Jacques Ravel and colleagues (2011) has shown that the vaginal microbiome is dominated by the Lactobacillus genus but that there are many different Lactobactilli species present in various abundances.
Metabolic Reconstruction: What Are They Doing?
The goal of metabolic reconstruction is to identify putative pathways by assigning enzymatic functions to sequencing reads wherever possible, based on information in various enzymatic functional databases. As with the phylogenetic analysis, HMP researchers started with a large volume of data, in this case about 3.6 terabases of shotgun sequencing data from 690 samples. Again, they examined both presence (Which pathways are present?) and abundance (How much of each pathway is present?). They used a software program, the HMP Unified Metabolic Analysis Network