The completeness of the genome assembly was estimated with BUSCO v.3 [89 (link)], using a set of 843 conserved metazoan single-copy orthologs as a reference, and the resulting data about the present, fragmented, duplicated, and missing gene models were compared with previous genome assembly efforts carried out in M. galloprovincialis [18 (link), 19 (link)] (Additional file 1: Data Note 1.3.3). The residual presence of artefactual duplications was assessed with the Kmer Analysis Toolkit [90 (link)]. Consensus gene models were obtained by combining transcript alignments generated with PASA v 2.0.2 [91 (link)], bivalve protein alignments created with SPALN v2.2.2 [92 (link)], and ab initio gene predictions obtained with GeneID [93 (link)], GeneMark-ES [94 (link)], GlimmerHMM [94 (link)], and Augustus [95 (link)]. Evidences derived from these methods were assigned different weights and combined into consensus CDS predictions with EvidenceModeler-1.1.1. Gene models were subjected to an additional round of quality control to refine the annotation of UTRs and alternatively spliced exons (Additional file 1: Data Note 2.1 and 2.2). Gene models were functionally annotated with InterPro [96 (link)], KEGG [97 (link)], Blast2GO [98 (link)], SignalP [99 (link)], and NCBI CDsearch [100 (link)] (Additional file 1: Data Note 2.3). The gene models supported by PASA, but lacking a CDS, were considered as non-coding genes and included in a separate annotation track (Additional file 1: Data Note 2.5).
The completeness and integrity of gene models, as well as the genome assembly size and the number and density of gene models, were compared with several other recently sequenced molluscan genomes (Additional file 1: Data Note 3). Each gene model was assigned a support level (high, mild, or low) based on evidence obtained from Lola gills and digestive gland transcriptomes, as well as from several publicly available M. galloprovincialis RNA-seq datasets (Additional file 1: Data Note 4).
Free full text: Click here