The second dataset, published by Holt
et al. [24 (
link)], consists of 130 globally distributed genomes of
Shigella sonnei (Table S2), a Gram-negative bacterium that is a causative agent of dysentery. It enabled a comparison of ARIBA, SRST2, and KmerResistance with the manual method employed in the study of Holt
et al. [24 (
link)], confirming the accuracy of ARIBA for identifying known resistance SNPs as well as the presence or absence of genes of interest.
The phenotypic resistance profile for a number of antimicrobials is known for each isolate, and is attributable to both acquired resistance genes and SNPs. The three tools were run on all 130 samples using the reference database from CARD, version 1.1.2. To ensure our results were comparable with those originally reported in Table S1 of Holt
et al. [24 (
link)], we manually added those AMR genes listed on page 4 of their supplementary text not already included in the database (Table S3). The AMR determinants originally reported in the study of Holt
et al. [24 (
link)] were identified from mapping data, and reported as the proportion of bases in the gene sequence that were covered by reads from each isolate. From these originally reported data, we used a cut-off of
>90 % to indicate that a gene was present by their method.
In order to interpret the output of each tool as an AMR call, the following rules were used, where all relevant genes are listed in Table S4. A gene was counted as present by ARIBA if ariba summary reported yes or yes_nonunique; present by KmerResistance if it appeared in its output file; and present by SRST2 if it was reported without a ‘?’.
The focus for the genes of interest for each AMR call were those originally identified and reported in Holt
et al. [24 (
link)]. Given that the discovery and classification of AMR gene variants is an ongoing process, an AMR gene was called as present if it was either the originally identified gene in Holt
et al. [24 (
link)], or in the same CD-HIT cluster. Genes conferring resistance to antimicrobials not examined in the original paper were excluded, as were genes conferring resistance to the antimicrobials examined in the paper but falling in different CD-HIT clusters from the originally identified genes. For each antimicrobial examined, an AMR call for a resistant genotype was identified using the following rules. Ampicillin (Amp): the presence of any gene from a set of
blaTEM,
blaCTX-M and
blaOXA genes. Chloramphenicol (Cmp): the presence of any gene from a set of
cat genes. Nalidixic acid (Nal): the
gyrA gene present, together with one of the SNPs S83L, D87G, or D87Y. Streptomycin (Str): both of the
strA and
strB genes, or one of the
aadA genes. Sulfonamides (Sul): any gene from the set of
sul1 and
sul2 genes. Tetracycline (Tet): both of
tetA +
tetR, or all of
tetA,
C,
D,
R, where each of the two sets of
tetA and
tetR genes are disjoint. Trimethoprim (Tmp): any one of a set of
dfrA or
dhfr genes.
Hunt M., Mather A.E., Sánchez-Busó L., Page A.J., Parkhill J., Keane J.A, & Harris S.R. (2017). ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads. Microbial Genomics, 3(10), e000131.