Each of the five Trycycler testers was given the ONT rapid read set for the six genomes used in the real-read tests (all real genomes excluding Serratia marcescens 17-147-1671) and produced one Trycycler assembly (without Medaka or Pilon polishing) for each. The number of input assemblies and which assemblers were used are available in Additional file 4: Tester assemblers. We then compared the assemblies produced by single tools (Flye, Raven, and Miniasm/Minipolish), by Trycycler (from the developer and the five testers), and a hybrid-assembled reference (the developer’s Trycycler+Medaka+Pilon assembly).
For each genome, we clustered the contigs from all assemblies (using Trycycler cluster), and using the developer’s Trycycler assembly as the reference, we classified the genome replicons for each assembly as either present, present with misassemblies, or absent (Additional file 4: Matrix). Each chromosome was rotated to a consistent starting position and a multiple sequence alignment was performed (using Trycycler MSA). We then extracted pairwise distances from the alignment (using the msa_to_distance_matrix.py script, available in Supplementary data) and built a FastME [28 (link)] tree from the distances. The distances were then normalized to the genome size (using the normalise_distance_matrix_to_mbp.py script, available in Supplementary data) to quantify the differences between each assembled chromosome for each of the genomes.
Free full text: Click here