Repeat sequences were masked, and the repeat-masked genome was used for gene set annotation with three methods: ab initio annotation, RNA‐seq‐based annotation, and homologue‐based annotation.
In the ab initio method, the software packages Augustus (v2.5.5) and SNAP (v2006‐07‐28) were employed with default settings. Genes with incomplete open reading frames (ORFs) or a protein‐coding length less than 300 bp were filtered out. In the RNA-seq-based method, published gene sets from Aedes aegypti (GCF_002204515.2; from NCBI), Spodoptera frugiperda (GCF_011064685.1; from NCBI), P. japonica (GCA_013421045.1; from NCBI), Anoplophora glabripennis (GCF_000390285.2; from NCBI), Drosophila melanogaster (GCF_000001215.4; from NCBI), Bombyx mori (GCF_014905235.1; from NCBI), Photinus pyralis (GCF_008802855.1; from NCBI), Diabrotica virgifera (GCF_003013835.1; from NCBI), Apis mellifera (GCF_003254395.2; from NCBI), Tribolium castaneum (GCF_000002335.3; from NCBI), and H. axyridis (GCA_011033045.1; from NCBI) were downloaded from NCBI or their own databases and used for homology-based annotation. The longest transcript of each protein-coding gene was aligned to the H. vigintioctomaculata genome using BLAST (tblastn, v2.6.0) with an e-value of 1 × 10−5, and gene structures were predicted using GeneWise (v2.2.0).52 (link) In the RNA‐seq‐based gene approach, the de novo assembled transcripts were aligned to the H. vigintioctomaculata genome using BLAT (v35),45 (link) and PASA (v2.1.0)53 (link) was used to link the spliced alignments. Finally, the gene prediction results were integrated into a final gene set using EVidenceModeler (v1.1.1) software.54 (link)For gene function annotation, the predicted protein-coding genes were searched against the following public databases: Gene Ontology (http://geneontology.org/), the Integrated Resource of Protein Domains and Functional Sites (InterPro: https://www.ebi.ac.uk/interpro/), Kyoto Encyclopedia of Genes and Genomes (KEGG: https://www.kegg.jp/), Clusters of Orthologous Groups of proteins (COG: https://www.ncbi.nlm.nih.gov/COG/), Swiss-Prot (www.uniprot.org), TrEMBL (www.uniprot.org), and NCBI non-redundant proteins database (NR: https://ftp.ncbi.nlm.nih.gov/blast/db).