Here, we describe the methodology employed in this study in two parts: first, the computational pipeline for metagenomic metabolic reconstruction implemented in HUMAnN, and second its application to the 741 microbial community samples of the Human Microbiome Project. HUMAnN inputs metagenomic DNA sequences and infers community-wide gene and pathway abundances through a process of seven steps (
Figure 1):
HUMAnN has additionally adapted ecological diversity metrics in order to provide functional diversity and richness profiles for each sample, and we validated its gene- and pathway-level accuracy using four synthetic communities of varying complexity.
To assess microbial community function and metabolism in the human microbiome, we applied this process to the metagenomic data generated by the HMP [9] , comprising >3.5 Tbp of microbial DNA from 7 body sites spanning 102 individuals (
Table 1). We identified modules over- or under-represented in individual body sites using the LEfSe [23] (
link) biomarker detection system, as well as associating the resulting gene and module abundances with subject clinical metadata and with external data including CAZy [21] abundances using standard nonparametric Spearman correlation.