The gene models of the M. martensii genome were predicted using a
comprehensive gene prediction pipeline combining ab initio modelling,
homology-based modelling and EST-based modelling, involving several programs
Augustus42 (link) (version 2.5.5), Fgenesh++43 (link), GENEID44 (link) (version 1.2), SNAP45 (link), GlimmerHMM46 (link) and
Gnomon (http://www.ncbi.nlm.nih.gov/genome/guide/gnomon.shtml ). First, TEs and
other sequence repeats in the genome assembly were masked before gene modelling. Then
a gene prediction training set (including 876 manually curated genes) was constructed
with the combined results from the self-trained prediction methods. The training set
was used to optimize the parameters for a second round of gene modelling, and the
results were incorporated to produce a minimum gene set of 32,016 for M.
martensii. (version 1.0 gene models:http://lifecenter.sgst.cn/main/en/Scorpion-Suppl/gene-models-v1.0.gff ).
The transfer RNAs and ribosomal RNAs were identified using the programs t-RNAscan
SE47 (link) and RNAmmer48 (link), respectively. Gene annotation
and ontology assignment was performed with BLAST searches against the NCBI nr,
Swiss-Prot and TrEMBL databases using the E-value cutoff of 1E−5. The
NCBI CDD and Sanger Pfam databases were used for functional domain annotation. Genes
were mapped to metabolic pathways using KAAS based on the KEGG database49 (link). (Additional attributes for gene models are described inSupplementary Note 1 ).
comprehensive gene prediction pipeline combining ab initio modelling,
homology-based modelling and EST-based modelling, involving several programs
Augustus42 (link) (version 2.5.5), Fgenesh++43 (link), GENEID44 (link) (version 1.2), SNAP45 (link), GlimmerHMM46 (link) and
Gnomon (
other sequence repeats in the genome assembly were masked before gene modelling. Then
a gene prediction training set (including 876 manually curated genes) was constructed
with the combined results from the self-trained prediction methods. The training set
was used to optimize the parameters for a second round of gene modelling, and the
results were incorporated to produce a minimum gene set of 32,016 for M.
martensii. (version 1.0 gene models:
The transfer RNAs and ribosomal RNAs were identified using the programs t-RNAscan
SE47 (link) and RNAmmer48 (link), respectively. Gene annotation
and ontology assignment was performed with BLAST searches against the NCBI nr,
Swiss-Prot and TrEMBL databases using the E-value cutoff of 1E−5. The
NCBI CDD and Sanger Pfam databases were used for functional domain annotation. Genes
were mapped to metabolic pathways using KAAS based on the KEGG database49 (link). (Additional attributes for gene models are described in