HMP investigators are collecting three types of microbiome sequencing data: 16S rRNA sequences,11 shotgun sequences,12 and reference genome sequences.13 Jennifer Russo Wortman described how HMP Consortium investigators are using this sequencing data to address three key questions: (1) What organisms are present? (2) What do they do? and (3) How do they change in health and disease?
Wortman referred workshop participants to a review article (Kuczynski et al., 2012) for more detail on some of the methodologies she covered (for more information on caveats of the different sequencing technologies, informatics challenges, etc.).
Phylogenetic Analysis: Who Is There?
Investigators are using 16S rRNA sequencing data to address the question, What organisms are present? The initial HMP analysis yielded about 72 million 16s rRNA reads. As Wortman said, “That is a lot of sequenced data to analyze.” The goal was to use those 72 million reads to get a sense of not only which species are present in the various body sites, but also how abundant the various species are. Very generally, using various quality controls, “de-noising” algorithms, and other computational tools (Caporaso et al., 2010), the 72 million reads were clustered into what are known as operational taxonomic units (OTUs). OTUs are proxies for species. OTU data can be used not only to identify how many of which species are present (per-sample OTU counts), but also to infer the evolutionary relationships of those present.
There are two ways to classify OTUs. The first is to use what is already known about 16S rRNA sequences from cultured organisms, that is, data already stored in various reference databases (e.g., the RNA Database Project, or RDP). By comparing 16S rRNA sequences from HMP samples to those reference sequences, in most cases researchers can identify their samples to at least the family or genus level. Refer-
10 This section summarizes Jennifer Russo Wortman’s presentation.
11 The 16S rRNA gene encodes for a small subunit of the ribosomal RNA. HMP researchers use 16S rRNA sequencing for phylogenetic analysis because the gene has both conserved regions (which are used to develop primers for amplification) and variable regions (which are used to identify specific microbial species).
12 HMP researchers use shotgun sequencing to sequence all of the DNA that is present within a microbial community. By comparing specific sequence reads to sequences with known functions, they can infer function.
13 HMP researchers are sequencing as many microbiome reference genomes as possible as part of the “healthy cohort” study that Lita Proctor described (see previous section for a summary of her presentation).