The chromosome sequence of strain TW14359 was reannotated by dfast [46 (link)], and its PP regions were predicted by phaster [47, 48 (link)], followed by manual curation to precisely identify each PP region, including attL and attR sequences. IEs not detected by phaster were identified by searching genes annotated as ‘integrase genes’, followed by manual inspection. For the other closed genomes, PP/IE integration sites (attB sites) identified in strain TW14359 were analysed for the presence of PPs/IEs and their sequences, if present. PPs/IEs not found in strain TW14359 were identified by integrase gene search, as described above. The PPs/IEs found in all closed genomes (Table S3) were annotated by dfast, followed by manual curation. Insertion sequences (ISs) in the PP/IE sequences were detected and typed by ISfinder [49 (link)]. Genetic organizations of PPs/IEs were visualized by GenomeMatcher v3.0.2. Sequence similarity of PPs/IEs located at the same loci was analysed by dot-plot analyses using GenomeMatcher v3.0.2 and by calculating pairwise Mash distances [50 (link)] with default parameters (k-mer size of 21, and sketch size of 1000). The results of pairwise Mash distance analysis are presented as violin plots using RAWGraphs 2.0 beta (https://dl.acm.org/doi/10.1145/3125571.3125585).
Free full text: Click here