The genomes were assembled following the approach described in Chakraborty et al.22 (link). For all calculations of sequence coverage, a genome size of 130Mbp is assumed (G = 130 × 106 bp). For individual strain, we generated a hybrid assembly with DBG2OLC65 (link) and longest 30X PacBio reads, and a PacBio assembly with canu v1.366 (link) (Supplementary Data 3). The paired end Illumina reads were obtained from King et al.24 (link). The hybrid assemblies were merged with the PacBio only assemblies with quickmerge v0.222 (link),67 (l = 2 Mb, ml = 20000, hco = 5.0, c = 1.5), with the hybrid assembly being used as the query. Because the PacBio assembly sizes were closer to the genome size of D. melanogaster, we added the contigs that were present only in the PacBio only assembly but not the hybrid assembly by performing a second round of quickmerge67 . For the second round of quickmerge (l = 5 mb, ml = 20000, hco = 5.0, c = 1.5), the PacBio assembly was used as the query and the merged assembly from the first merging round the reference assembly. The resulting merged assembly was processed with finisherSC to remove the redundant sequences and additional gap filling using raw reads68 (link). The assemblies were then polished twice with quiver (SMRTanalysis v2.3.0p5) and once with Pilon v1.1669 (link) using the same Illumina reads as used for the hybrid assemblies.
Free full text: Click here