To evaluate the working efficiency and assembly success, we selected 50 datasets of vascular plants with raw reads from the GenBank Sequence Reads Archive (SRA) (Additional file 2 : Table S1). The 50 vascular plants represented 42 species of angiosperms (from eight major clades, 21 orders and 29 families), four species of gymnosperms, three species of ferns, and one species of lycophytes. Notably, the raw reads of these 50 samples are associated with published plastomes [56 (link)–59 (link)], allowing comparison with newly reassembled plastome using GetOrganelle. Since 2018, NOVOPlasty has received more than 400 citations for assembly chloroplast genome in Google Scholar (accessed 31 Dec 2019) and became one of the most widely used tools for plastome assembly. We thus reassembled 50 samples using NOVOPlasty for comparisons.
The data resources are paired-end reads. The read length varied from 100 to 300 bp (Additional file2 : Table S1). In all tests, if the tested data included fewer than 10,000,000 reads for each end, we used all the reads; if the data included more than 10,000,000 reads of each end, we only select the first 10,000,000 reads for each end. We set up four testing groups, i.e., three groups with different word size values (w = 0.6, 0.7, 0.8) (i.e., GetOrganelle-W0.6, GetOrganelle-W0.7, GetOrganelle-W0.8) and an auto-estimated word size group (i.e., GetOrganelle-auto). The extension rounds of all tests were set to 10. All other options including the seed were set to default. Because incomplete assemblies are unsuitable for comparing mapping qualities in the next part, we additionally added extra runs for eight samples, in which GetOrganelle-auto could not achieve complete plastomes, with customized options (GetOrganelle-customized) for mapping quality comparison. A detailed record of commands, as well as the final results and log files recording the memory usage and time cost of all the tests are available at https://github.com/Kinggerm/GetOrganelleComparison (version 1.1.1).
Plastomes from the same 50 datasets were also reassembled by NOVOPlasty using four k-mer values, i.e., 23, 31, 39, and 47. The config file of NOVOPlasty was downloaded from the NOVOPlasty GitHub repository (https://github.com/ndierckx/NOVOPlasty/blob/master/config.txt ), with “Type” as “chloro,” “Genome Range” as 15,000–180,000, “Save assembled reads” as “yes,” “Seed Input” as the same seed as running GetOrganelle, and “Read Length” as the mean read length of each sample (seed Additional file 2 ), with all other parameters unchanged.
The data resources are paired-end reads. The read length varied from 100 to 300 bp (Additional file
Plastomes from the same 50 datasets were also reassembled by NOVOPlasty using four k-mer values, i.e., 23, 31, 39, and 47. The config file of NOVOPlasty was downloaded from the NOVOPlasty GitHub repository (
Full text: Click here