In this study, we first used terpene synthase protein sequences from fully sequenced genomes of A. thaliana100 and E. grandis29 (link), to classify the putative genes found in P. cattleyanum according to the previous classification in the subfamilies TPS-a,-b,-c,-e/f, and -g by sequence similarity26 (link).
To examine the evolutionary history of TPS genes, a second analysis including more species (E. grandis, E. globulus, A. thaliana, P. trichocarpa, V. vinifera, C. citriodora, and M. alternifolia) was carried out. We generated a tree with TPS sequences related to primary metabolism (subfamilies -c, -e, and -f) with a total of 45 sequences and a second tree related to secondary metabolism (subfamilies a, b, g) including 360 sequences29 (link),32 (link),55 (link).
The functionally characterized pinene (RtTPS1 and RtTPS2 accession number AXY92166 and AXY92167, respectively) and caryophyllene synthases (RtTPS3 and RtTPS4 accession numbers AXY92168 and AXY92169) from Rhodomyrtus tomentosa52 (link), pinene synthase (EpTPS1 accession number MK873024) and 1,8-cineole synthases (EpTPS2 and EpTPS3 accession numbers MK873025 and QCQ05478) from Eucalyptus polybractea56 (link), beta cayophyllene synthase (Eucgr. J01451) from E. grandis29 (link), myrcene synthase from Antirrhium majus (AAO41727)101 (link), two isoprene synthase genes from E. globulus (EglobTPS106), E. grandis (Eucgr. K00881)29 (link) and five linalool synthases from Oenothera californica (AAD19841)63 (link), Clarkia breweri (AAD19840), Clarkia concinna (AAD19839), and Fragaria x ananassa (CAD57106)102 (link) were also included in the phylogenetic analysis to assess the homology of known TPS to Psidium genes.
For each dataset used to construct the trees, we first aligned the amino acid sequences of putative TPS genes using ClustalW implemented within MEGA v7.0 software package103 (link). Due to high levels of variation and variable exon counts between taxa, we trimmed the alignment using Gblocks104 (link) with the following parameters: smaller final blocks, gap positions within the final blocks, and less strict flanking positions. We used the maximum-likelihood method implemented in PhyML v2.4.4105 (link) online web server106 (link) to perform the phylogenetic analysis. The JTT + G + F was the best-fit substitution model selected with ModelGenerator for protein analyses107 (link). The confidence values in the tree topology were assessed by running 100 bootstrap replicates. Trees were visualized using Figtree v1.4.4108 .
Free full text: Click here