We downloaded the Pathogenwatch pairwise distance matrix and corresponding neighbour-joining tree for the full set of assemblies. The distance matrix is available at https://figshare.com/s/6026855223031e769d8a (DOI: 10.6084 /m9.figshare.19745608), and the tree for interactive viewing at Microreact (https://microreact.org/project/sUrpBsvXi1aiKD7ssPv9pu-nanopore-only-assemblies-for-genomic-surveillance-of-klebsiella-pneumoniae). Pathogenwatch calculates pairwise SNP distances between genomes based on a concatenated alignment of 1972 genes (2 172 367 bp) that make up the core gene library for K. pneumoniae in Pathogenwatch and infers a neighbour-joining tree from the resulting pairwise distance matrix [46 (link)]. Here, we assessed the feasibility of identifying potential nosocomial transmission clusters using these distance matrices. Several studies have proposed thresholds in the range of 21–25 genome-wide SNPs for identifying nosocomial transmission clusters of K. pneumoniae [66–68 (link)]. However, as Pathogenwatch calls SNPs only in 1972 core genes and not genome-wide, we compared the SNP distances calculated by Pathogenwatch with genome-wide SNP counts obtained by mapping short reads to a reference genome to determine the equivalent cut-off for clustering analysis using Pathogenwatch distances. To do this, we used the genome-wide SNP alignment generated previously for n=270 K. pneumoniae isolated at Alfred Health, based on mapping of Illumina reads to the K. pneumoniae NTUH-K2044 reference genome using the RedDog pipeline [69 ] (see full details in [43 (link)]). Pairwise SNP counts were extracted using snp-dist [70 (link)]. Assemblies for these 270 genomes (assembled from Illumina reads de novo using SPAdes optimised with Unicycler v0.4.74, see full details in [43 (link)]) were uploaded to Pathogenwatch, and the pairwise distance matrix was downloaded and compared against that generated from RedDog. We then used R to fit a linear regression model for Pathogenwatch distances as a function of genome-wide mapping-based SNP distances (see Fig. S1). This indicated that a Pathogenwatch distance threshold of 10 SNPs would be approximately equivalent to the established genome-wide distance threshold of 25. These thresholds assume accurate basecalling from Illumina data. To ascertain a corresponding threshold distance using ONT-only data, we compared pairwise Pathogenwatch distances calculated using ONT-only (SUP+Medaka) assemblies vs Illumina assemblies, for pairs of strains linked via probable transmission clusters. Using R to fit a linear regression model indicated that an ONT-only Pathogenwatch distance of 50 SNPs would approximate the Illumina-based Pathogenwatch distance of 10 SNPs or genome-wide distance of 25 SNPs (see Fig. 2).
We compared the topologies of neighbour-joining trees generated from Pathogenwatch distance matrices calculated using SUP+Medaka, HAC+Medaka or Fast+Medaka assemblies against the reference tree (calculated from hybrid SUP+Medaka+pilon assemblies), using the tanglegram function in the R package dendextend v1.15.2 to generate comparative tree plots and calculate entanglement coefficients. We also used the phytools package v1.0–3 in R to compute the Robinson–Foulds distance [71, 72 (link)] between tree topologies, which represents a sum of the number of partitions inferred by the first tree but not the second tree and that inferred by the second tree but not the first tree.
Free full text: Click here