The dataset used for benchmarking consisted of 60 T7-like phages genomes, from the Autographviridae family, downloaded from the GenBank RefSeq database [20 (link)]. These genomes were chosen because they are related, colinear and have an average genome size of 39.4 kbp (range: 31.5–41.7) and G + C mol% content of 50.7 (range: 42.6–61.8; Table S1). The testing dataset also contained the Pelagibacter phage HTVC011P genome, used as outlier for the T7-like phages. For this dataset of 61 phages, the intergenomic similarities were calculated with the following tools: Sequence Demarcation Tool (SDT) [10 (link)], pairwise sequence comparison (PASC) [8 (link)], OrthoANI [4 (link)], Gegenees [6 (link)], and VIRIDIC.
Additionally, two Salmonella phages (GE_vB_N5 and FE_vB_N8) were used for the illustration of genome and alignment length differences. The genome of the K155 strain of the T7 phage was used to test the effect of genome permutations and reverse complementarity on the intergenomic distances. Lastly, two artificial DNA sequences were generated by (i) scrambling the T7 genome with Shuffle DNA, part of the Sequence Manipulation Suite [21 (link)] and (ii) using Vladimír Čermák’s Random DNA Sequence Generator at http://www.molbiotools.com/randomsequencegenerator.html to generate a 39,937 bp (48.4% GC) sequence.
Free full text: Click here