The first dataset comprises 41 isolates of the Gram-positive bacterium Enterococcus faecium, for which the phenotypic resistance to vancomycin is known for each sample [23 (link)] (Table S1). This dataset, which was used to evaluate SRST2 in its initialpublication [2 (link)
], allowed validation of the accuracy of ARIBA when identifying the presence or absence of genes of interest in each sample, testing the sensitivity of the methods at varying depths of read coverage, and verifying MLST calling by ARIBA and SRST2.
The ARG-ANNOT sequences included with SRST2 were used as reference sequences for the benchmarking on this dataset. However, the VanS-B gene, called ‘47__VanS-B_Gly__VanS-B__1672 no;yes;VanS-B;Gly;AY655721;731–2073;1343’ by SRST2, originally from ARG-ANNOT, was missing its final nucleotide A. This was confirmed by comparing with the GenBank record AY655721. It would cause ARIBA to exclude this sequence because the translation into amino acids results in a sequence that does not end with a stop codon. Therefore an ‘A’ was manually added to the end of the sequence before running ARIBA.
In order to sample the E. faecium reads at a range of depths, the reads were mapped to the reference genome CP006620 using Bowtie2 version 2.2.29 with the option -fast-local. The depth for each sample was estimated across the vanB gene CP006620.1476 by running SAMtools depth with the options -a -r CP006620 : 774 918–775 946 and calculating the resulting mean depth. This was used as an estimate for read depth and the reads were randomly sampled accordingly (this is implemented in the supplementary script make_read_subsets.pl) using fastaq to_random_subset with a different random seed for each run, producing independent read subsets.
Free full text: Click here