Genes encoding the enzymes of the trypanothione biosynthetic pathway were considered to be present in a genome or transcriptome when the following conditions were fulfilled: (i) a protein could be identified by BLAST with an E value cut-off of 10−20 and/or a corresponding KEGG ID was assigned to a protein and (ii) p-distances between a reference protein and a putative hit calculated using MEGA v.7 did not exceed 0.7 or a different threshold specified in Additional file 13: Tables S41-S51 [204 (link)]. Additionally, the presence of a splice leader (SL) sequence was checked in the case of transcriptomic data, requiring a match with a minimal length of 12 nt. When a protein of interest could not be identified among predicted proteins, additional BLAST searches with raw transcriptome/genome sequences as a database were performed using an E value threshold of 10−10. For glutathionylspermidine (GspS) and trypanothione synthetases (TryS), as well as trypanothione (TR), glutathione (GR), and thioredoxin (TrxR) reductases, HMM-based searches using the HMMER package v.3.1 [77 (link)] were performed in addition to BLAST searches. An HMM model for GspS was generated using the Pfam seed alignment PF03738, and HMM models for other enzymes were obtained based on alignments of annotated sequences from the KEGG database. Two groups of proteins, GspS + TryS and TR represent related proteins, share a certain degree of sequence similarity and could be aligned (Additional file 13: Tables S50 and S51). For the identification of GspS/TryS homologues outside Euglenozoa, TryS of T. brucei was used as a query in a BLASTP search against the NCBI nr database (E value 10−20) and 1000 best hits for two groups, prokaryotes (group I) and other organisms (excluding Euglenozoa; group II), were obtained and combined into one file. Then, the sequences were filtered using CD-HIT-EST software v.4.6.7 [181 (link)] with 98% protein identity threshold. For the TR/GR/TrxR phylogeny, the corresponding protein sequences of Emiliania huxleyi, Homo sapiens, and trypanosomatids Blechomonas ayalai, Endotrypanum monterogeii, and T. cruzi were used as a reference. Sequences were aligned using Muscle v.3.8.31 with default parameters [205 (link)]. The resulting alignments were trimmed using trimAl v.1.4.rev22 with the “-strict” option [206 (link)]. Maximum-likelihood trees for both protein groups were build using IQ-TREE v.1.5.3 with 1000 and 100 bootstrap replicates, for reductases and synthases, respectively and the LG+I+G4 model (automatically selected). Bayesian trees were inferred using MrBayes v.3.2.6 with the models of rate heterogeneity across sites chosen based on IQ-TREE results, while models of amino acid substitutions were assessed during the analysis (mixed amino acid model prior). The resulting model was WAG+I+G4 for both synthetases and reductases. The analysis was run for one million generations with sampling every 100th of them and discarding the first 25% of samples as a burn-in.
Free full text: Click here