For the phylogenetic analysis of shared transposases we first clustered all genes annotated as transposases by prokka [57 (link)] into gene families using SiLiX (v1.2.11) [65 (link)]. For each gene family that was shared by two or more endosymbionts we searched for homologous sequences using the blastp function of ISfinder [73 (link)] and created a multiple sequence alignment with MAFFT (v7.453; ‘--maxiterate 1000’) [50 (link)]. Afterwards, the alignments were manually checked and sequences showing clear signs of degradation either on the 3′ or 5′ end were removed. We took care to only remove transposase sequences that seemed degraded (i.e. pseudogenized) in comparison to otherwise highly identical genes in order to keep the dataset clear of sequences that might be under different selective pressures. Finally, the alignments were trimmed using BMGE (v1.12) [74 (link)] and used for phylogenetic reconstruction using iqtree2 (v2.1.2; ‘-bnni’ ‘-alrt 1000’ ‘-m TESTNEW’ ‘-bb 1000’ ‘-mset LG’ ‘-madd LG+C10,LG+C20,LG+C30,LG+C40,LG+C50,LG+C60’ ‘-keep-ident’ ‘-wbtl’) [55 (link)]. For transposase sequences showing a clear sister-clade relationship in the de novo trees and belonging to the same eggNOG gene family, we reconstructed phylogenetic trees using gene families based on the eggNOG database (v5.0) [60 (link)] and EggNOG-mapper (v2.1.0) [61 (link)]. For this, we added the protein sequences from the respective gene families from the eggNOG database (v5.0) [60 (link)] to the endosymbiont gene families. For each gene family we then calculated multiple sequence alignments, curated them and reconstructed phylogenies as described above.
Free full text: Click here