Sequence coverage for seven sequencing libraries used for the Larix sibirica genome assembly
Unlike the inbred highly homozygous plant used for the genome sequencing and assembly, such as A. thaliana, the L. sibirica tree used for genome sequencing in our study represented a common forest tree with a relatively high level of individual heterozygosity and, respectively, high within individual biallelic variation. The number of ambiguous positions in the L. sibirica sequencing data was estimated at the level of 3.0% of the genome size. The presence of duplicate contigs was detected in the preliminary draft assembly of L. sibirica obtained in the second step, thus revealing the higher data ambiguity in the L. sibirica sequencing data compared to the A. thaliana data. To resolve the ambiguities in the second stage, the total number of all contigs resulting from the fifth set was increased by 16 folds by multiplying each contig 16 times, respectively. This trick allowed the CLC assembler to apply the majority rule when picking one of the alternative alleles, using the alleles selected in the fifths set in the first step of assembly. The same approach was used also for the Arabidopsis thaliana genome stepwise assembly by four different assemblers (Fig.
Results of the Arabidopsis thaliana genome stepwise assembly by four different assemblers using raw reads partitioned into five sets following the approach used for assembling of the Larix sibirica genome. Minimum contig length used for assembling was 200 bp
The traditional and stepwise CLC Assembly Cell genome assembly parameters for peach (Prunus persica). Minimum contig length used for assembling was 200 bp