Flavonoid biosynthesis proteins from A. thaliana were used to search for homologs in S. cerevisiae using DELTA-BLAST with an e-value < 1e-15. To search for CHI homologs in yeasts, a HMMER profile was constructed using S. cerevisiae Aim18p and Aim46p and used to conduct a HMMER search 332 annotated budding yeast genomes (15 (link)) to identify hits with score >50 and e-value <0.001. To search for CHI homologs in more divergent fungi, tBLASTn was used to identify hits in over 700 unannotated Ascomycota genomes (16 (link)) with e-value <1e-6. The full plant and fungal CHI homolog phylogeny was built by adding the fungal sequences to a structural alignment of plant CHI homolog sequences (7 (link)) using the —add function of MAFFT (55 (link)) using default settings. The resulting alignment was truncated to the regions matching the initial structural alignment that included the CHI domain and filtered to sites with less than 50% gaps using trimAl (56 (link)). A phylogenetic tree was constructed using fasttree (57 ) with the settings (d -lg -gamma -spr 4 -slownni -mlacc 2), midpoint rooted, and edited to collapse clades using iTOL (58 (link)). The absence of any CHI homologs in metazoan lineages was determined by repeated search attempts using both blastp and DELTA-BLAST at permissive e-value thresholds restricted to relevant taxonomic groups. The determination that the single CHI homolog in the published genome of S. arboricola was due to a fusion of paralogs was made by aligning the coding sequence of all hits from Saccharomyces species via MAFFT using default settings (55 (link)) and looking for signatures of recombination between AIM18 and AIM46 homologs in the S. arboricola gene using RDP4 (59 (link)). A single recombination event was found in this sequence (all tests of recombination were significant at p<1e-13), which supports a fusion with AIM46 providing roughly the 5′ 60% and AIM18 the final 3′ 40% of the sequence.
A subset of protein sequences of plant and fungal CHIs were aligned via MAFFT using default settings (55 (link)). Protein sequence alignments were visualized using Jalview 2.11.2.5 and colored by sequence conservation (60 (link)). Mid-point–rooted phylogenetic trees of plant and fungal CHIs were generated through the Phylogeny.fr “one click” mode workflow (http://phylogeny.lirmm.fr/) using default settings (61 , 62 (link), 63 (link), 64 (link), 65 ). Alignments and phylogenetic trees were exported as svg files and annotated in Adobe Illustrator 26.5.
Free full text: Click here