The assemblies of C. floridanus (version 3.5) and H. saltator (version 3.5) were downloaded from the Hymenoptera Genome Database [89] (link). Protein sequences of reported chemosensory gene were also collected from Apis mellifera, Acyrthosiphon pisum, Drosophila melanogaster, Nasonia vitripennis, L. humile, and P. barbatus[15] (link), [26] (link), [28] , [47] (link), [48] (link), [50] (link), [54] (link). An in-house bioinformatics pipeline was developed to identify candidate chemosensory genes in C. floridanus and H. saltator. First, all collected chemosensory gene sequences were searched against the two ant genomes using TBLASTN [90] (link) with an e-value cutoff of 1e-5. Resulting High-scoring Segment Pairs (HSPs) were sorted by their blast bit-scores, and an average bit-score of the top 75% HSPs were calculated. Any HSPs with a bit-score less than 25% of the average was discarded. Chains of HSPs were than created from retained HSPs. Two HSPs were chained together if the following criteria were met: 1) they are derived from the same query; 2) they are located within 3 kb on the same strand of a scaffold/contig; and 3) the corresponding query region of the upstream HSPs must also be N-terminal to that of the downstream HSPs. The third criterion was applied to avoid artificial concatenation of neighboring chemosensory genes. Genomic regions covered by HSPs chains were considered putative chemosensory gene coding regions. For each putative gene, we then selected the query corresponding to the highest scoring HSPs at that region as reference sequence for homology-based gene prediction using GeneWise (version 2.2.0) [91] (link). All predictions were sorted by ORF length and the lowest 25% was filtered. This pipeline was iterated by adding results of previous run to input until no additional genes were found.
Multiple sequence alignments (MSAs) of predicted OR/GR/IRs were constructed using MUSCLE (version 3.8) [92] (link) and manually inspected. Attempts to improve annotations were made whenever an obvious problem was identified (e.g. missing exon, incorrect exon-exon junction). In addition, in the OR and GR families, we observed many fragmented gene models, likely due to pseudogenization and incomplete genome assembly. For the convenience of subsequent analyses, a minimum size cutoff of 300 amino acids was used for the ORs and GRs. For IRs, we screened all predicted protein sequences with InterProScan (V4.8) [93] (link) and filtered the ones without characteristic domains of IR (PF10613 and PF00060) [26] (link).
Free full text: Click here