Stool samples from 15 different individuals were randomly selected from the HMP Data Analysis and Coordination Center (http://www.hmpdacc.org ; parameters defining health can be obtained from the website). Raw nucleotide read sequences were aligned (blastn) against our database, requiring a minimum alignment length of 70 bp and sequence identity of ≥80%. Only the best-scoring alignment (lowest E value) was used for further analysis. The abundance of individual butyrate-producing pathways (Fig. 4 ) was calculated as follows: (i) (#readstot × lengthpathway)/4 × 106 bp = th100%, and (ii) #readspathway/th100% = result (genomes exhibiting pathway [%]), where #readstot is the total number of reads for a sample, lengthpathway stands for the total length (bp) of all unique pathway genes (calculated from the median length of all entries in the database for a specific gene), 4 × 106 bp corresponds to an average genome size, th100% is the theoretical number of reads if all genomes exhibit the pathway, and #readspathway corresponds to the number of reads matching the pathway (BLAST result). Detailed results are presented in Fig. S7 in the supplemental material.
Prior to diversity analysis, individual genes from the database were subjected to multiple complete linkage clustering (using the Pyrosequencing Pipeline provided by the Ribosomal Database Project;http://rdp.cme.msu.edu ) on the nucleotide level, applying a 10% cutoff. All genes of an individual pathway clustered very similarly (clusters for all individual pathway genes were usually associated with the same genomes), allowing us to group individual clusters of all genes of a specific pathway together. Thus, obtained groups contained all genes of a specific pathway. If cluster results varied between genes (e.g., all thl genes from three candidates cluster together, whereas two clusters were generated for the hbd gene), then clusters were manually merged (e.g., merging of all three hbd genes as associated thl genes) to achieve consistency, and the most conservative approach was always applied, i.e., clusters were only merged and never split. Genes of the same strain were always merged. For metagenomic analysis, a specific group (e.g., the group Faecalibacterium prausnitzii for the acetyl-CoA pathway consists of all pathway genes from all five strains of this taxon) was considered present only if all pathway genes could be identified for that group in the BLAST result (thus, BLAST hits did not have to match all genes from the same strain but only from the same group—an example [sample A] is shown in Fig. S5 in the supplemental material). Results presented in Fig. 5 are a median value for all individual pathway genes (see Fig. S5 ). The degree of explanation was calculated as the percentage of reads matching groups that were included in the diversity analysis (average from individual genes) from the total number of reads matching any gene in the database.
Prior to diversity analysis, individual genes from the database were subjected to multiple complete linkage clustering (using the Pyrosequencing Pipeline provided by the Ribosomal Database Project;