To estimate the accuracy of LoMA, we compared CSs assembled using the ONT data of NA18943 with GRCh38. We randomly selected 108 positions from the human genome while excluding centromeres and gaps (Additional file 1: Table S1). We collected all reads mapped within 20 kbp of each position from the data of NA18943 and constructed CSs using LoMA. We aligned the generated CSs to GRCh38 using minimap2 [16 (link)] and calculated the error rates from the edit distance. We also aligned all raw reads to GRCh38 and calculated error rates for the raw reads again using the edit distance. For a comparison, we assembled matched regions using lamassemble [15 ]:
-P 8 -a -v -p 2e-3 -m 2*(number of reads) -z 1000 promethion.mat
The error rate of lamassemble was calculated as above.
We also evaluated LoMA using simulated data. We randomly selected one hundred regions from GRCh38 (Additional file 1: Table S2). Simulated reads were generated using NanoSim with the error profile of NA12878 (total error rate, 10.8%) provided by the developers [23 (link)]. Various data sets were generated for each region: coverage 10, 20, 30, 40, and 50 (with a fixed size of 20 kbp), targeted size 20 kbp, 40 kbp, 60 kbp, 80 kbp, and 100 kbp (with a fixed mean coverage of 30×). The error rate, CPU time, and peak memory (RSS) were measured. A computer with M1 chip (Apple) was used to measure the performance. The error rate (edit distance) was calculated as described above.
Free full text: Click here