GMAP [23 (link)] version 2014-12-06 was used to align existing RNA-seq transcriptome assemblies of eleven M. zebra tissues. The transcriptome assemblies were created using Trinity [24 (link)] as part of the cichlid genome project [13 (link)] and made available as supplementary information [25 ].
Three BAC clones that were previously sequenced and assembled using Sanger sequencing technology were aligned to the existing and newly produced assemblies for validation. These published BACs correspond to several opsin gene loci: SWS2A/SWS2B/LWS (GenBank accession JF262084.1, 107.6kbp), SWS1 (GenBank accession JF262085.1, 77.6kbp), and RH2B/RH2A (GenBank accession JF262089.1, 83.5kbp) [26 (link)]. The BAC sequences were aligned to the corresponding M_zebra_v0 and M_zebra_UMD1 assembly sequences using Gepard [27 (link)] version 1.30 to create dotplots for comparison.
Completeness of the intermediate and final M_zebra_UMD1 assemblies was assessed using CEGMA [28 (link)] version 2.5 optimized for vertebrate genomes (−−vrt). CEGMA relied on GeneWise version 2.4.1, HMMER version 3.1b1, and NCBI BLAST+ version 2.2.29+. The 248 mostly highly conserved core eukaryotic gene set provided by CEGMA was used.
The likelihoods of the intermediate and final M_zebra_UMD1 assemblies were evaluated using ALE [29 (link)]. Each of the Illumina libraries were aligned to the assemblies using Bowtie2 [30 (link)] version 2.0.2 with the ‘--very-sensitive’ preset parameter. The uncorrected PacBio reads were aligned to assemblies with BLASR version 1.3.1.127046 using the same parameters used above for PBJelly and the ‘-sam’ option to produce a SAM file for input to ALE. ALE was then run on each of the respective alignment files to produce likelihood and mapping statistics for each library.
Summary statistics of the assemblies were compiled using the assemblathon_stats.pl script [31 ].
Three BAC clones that were previously sequenced and assembled using Sanger sequencing technology were aligned to the existing and newly produced assemblies for validation. These published BACs correspond to several opsin gene loci: SWS2A/SWS2B/LWS (GenBank accession JF262084.1, 107.6kbp), SWS1 (GenBank accession JF262085.1, 77.6kbp), and RH2B/RH2A (GenBank accession JF262089.1, 83.5kbp) [26 (link)]. The BAC sequences were aligned to the corresponding M_zebra_v0 and M_zebra_UMD1 assembly sequences using Gepard [27 (link)] version 1.30 to create dotplots for comparison.
Completeness of the intermediate and final M_zebra_UMD1 assemblies was assessed using CEGMA [28 (link)] version 2.5 optimized for vertebrate genomes (−−vrt). CEGMA relied on GeneWise version 2.4.1, HMMER version 3.1b1, and NCBI BLAST+ version 2.2.29+. The 248 mostly highly conserved core eukaryotic gene set provided by CEGMA was used.
The likelihoods of the intermediate and final M_zebra_UMD1 assemblies were evaluated using ALE [29 (link)]. Each of the Illumina libraries were aligned to the assemblies using Bowtie2 [30 (link)] version 2.0.2 with the ‘--very-sensitive’ preset parameter. The uncorrected PacBio reads were aligned to assemblies with BLASR version 1.3.1.127046 using the same parameters used above for PBJelly and the ‘-sam’ option to produce a SAM file for input to ALE. ALE was then run on each of the respective alignment files to produce likelihood and mapping statistics for each library.
Summary statistics of the assemblies were compiled using the assemblathon_stats.pl script [31 ].
Full text: Click here