We combined prokaryotic genome sequences from two resources to create a panel of reference genomes across which we determined the phylogenetic and taxonomic distribution of ARGs. The first data set consisted of 154,723 metagenome-assembled genomes reconstructed from the same human microbiome samples that we obtained metagenome assemblies for26 (link). Of these reconstructed genomes 70,178 were labeled as ‘high-quality’ in the original study, based on >90% completeness and < 0.5% strain heterogeneity. The second data set consists of 152,497 bacterial and archaeal genomes from NCBI RefSeq accessed on 19 April 2019. These genomes included representatives from the principal phyla found in the human gut microbiome although were dominated by Proteobacteria (Proteobacteria: 83,445; Firmicutes: 44,484; Actinobacteria: 16,529 and Bacteroidetes: 3563, Others: 3634).
Genome sequences from the two sources were clustered into species-level bins (SGBs) based on 5% average nucleotide identity (ANI) radius according to the method described in Pasolli et al. (2019). The list of reconstructed genomes used in this study and their mapping to SGBs and full-rank taxonomy are provided in Supplementary Data 3. The list of RefSeq genome accession numbers used in this study and their mapping to SGBs and full-rank taxonomy are provided in Supplementary Data 4.
Free full text: Click here