Protein-coding genes in the B. tabaci genome were predicted with MAKER [75 (link)], which integrates the results from three different approaches: ab initio, homologous protein mapping, and transcript mapping. Augustus [77 (link)] and SNAP [78 (link)] were used for ab initio gene prediction. For homologous protein mapping, protein sequences from the SwissProt database and the Drosophila melanogaster and A. pisum proteomes were aligned to the B. tabaci genome using Spaln [79 (link)] with default parameters. For transcript mapping, the B. tabaci mRNA sequences collected from GenBank were aligned to the genome using Spaln [79 (link)], and only mRNAs aligned to the genome with coverage greater than 90% and sequence identity greater than 97% were retained. In addition, the alignments of the reference-guided assembled transcripts from our RNA-Seq data, i.e., the GFF file generated by Cufflinks, were directly used by MAKER. From the ab initio predicted genes, MAKER generated a set of high-confidence gene models, which were supported by transcript mapping and/or homologous protein mapping. The remaining ab initio predicted genes without evidence support were compared to the InterPro domain database [80 (link)] using InterProScan [81 (link)], and those containing InterPro domains were added into the predicted gene models. Finally, predicted gene models that overlapped with repeat sequences by 70% of their lengths were removed from the final predicted gene dataset.
The B. tabaci predicted genes were annotated by comparing their protein sequences against UniProt (TrEMBL and SwissProt), fruit fly, and pea aphid proteomes, as well as the InterPro domain database. GO annotation was performed using Blast2GO [82 (link)].
Free full text: Click here