In the default configuration, functional annotation will be processed in the following order:
Orthologous assignment (optional) All-against-all pairwise protein alignments are conducted between a query and each reference genome. Orthologous genes are identified based on a Reciprocal-Best-Hit approach. It also conducts self-to-self alignments within a query genome, in which genes scoring higher than their corresponding orthologs are considered in-paralogs and assigned with the same protein function. This process is effective in transferring annotations from closely related organisms and in reducing running time.
Homology search against the default reference database DFAST uses GHOSTX as a default aligner, which runs tens to hundred times faster than BLASTP with similar levels of sensitivity where E-values are less than 10−6 (Suzuki et al., 2014 (link)). Users can also choose BLASTP. For accurate annotation, we constructed a reference database from 124 well-curated prokaryotic genomes from public databases. See
Pseudogene detection CDSs and their flanking regions are re-aligned to their subject protein sequences using LAST, which allows frameshift alignment (Kiełbasa et al., 2011 (link)). When stop codons or frameshifts are found in the flanking regions, the query is marked as a possible pseudogene. This also detects translation exceptions such as selenocysteine and pyrrolysine.
Profile HMM database search against TIGRFAM (Haft et al., 2013 (link)) It uses hmmscan of the HMMer software package.
Assignment of COG functional categories RPS-BLAST and the rpsbproc utility are used to search against the Clusters of Orthologous Groups (COG) database provided by the NCBI Conserved Domain Database (Marchler-Bauer et al., 2017 (link)).
While the workflow described above is fully customizable in the stand-alone version, only limited features are currently available in the web version, e.g. orthologous assignment is not available. As a merit of the web version, users can curate the assigned protein names by using an on-line annotation editor with an easy access to the NCBI BLAST web service. We also offer optional databases for specific organism groups (Escherichia coli, lactic acid bacteria, bifidobacteria and cyanobacteria). They are downloadable from our web site and can be used in the stand-alone version. We are updating reference databases to cover more diverse organisms.