We also tested the sensitivity of the result that accessory gene contents differed by environment to sampling biases of the plasmid database by repeating our main analyses on subsets of the data. To do this, we first calculated the taxonomic distribution of bacterial hosts in the data set using the R package ggsankey (https://github.com/davidsjoberg/ggsankey). Next, we investigated patterns within the most abundant phyla, as explained above. We then examined trends within E. coli, the most abundant species represented. Finally, we removed the three most abundant genera from Proteobacteria (Acinetobacter, Escherichia, and Klebsiella) and the most abundant genus from Firmicutes (Staphylococcus). We then retested that the results held within these two dominant phyla.
Free full text: Click here