Protein sequences of the CCD, ALDH, and UGT family members in A. thaliana were downloaded from the TAIR database, then were used as queries in BLASTP searches against the G. jasminoides protein sequences to identify homologous sequences. Full-length protein sequences were corrected and aligned with ClustalW2 [61 (link)]. Phylogenetic trees were constructed using the maximum likelihood method with the Jones-Taylor-Thornton (JTT) model and 1000 Bootstrap replicates [62 (link)]. Further analyses incorporated blast searches (using Gardenia proteins as queries) of a number of other genomes to identify more CCD, ALDH, and UGT genes. For NMTs, the Coffea canephora XMT protein was used as a query (NCBI accession ABD90685.1). Species considered were Gardenia jasminoides (CoGe genome ID 53980), Coffea canephora (CoGe genome ID 19443), Arabidopsis thaliana (CoGe genome ID 16911), Calotropis gigantea (CoGe genome ID 36623), Catharanthus roseus (CoGe genome ID 36703), Vitis vinifera (CoGe genome ID 19990), Gelsemium sempervirens (CoGe genome ID 53941), and Solanum lycopersicum (CoGe genome ID 12289). Gene model IDs from the respective CoGe-uploaded genomes were retained as leaf IDs for phylogenetic analysis, with the exception that “:” when it appeared in a gene model ID was replaced by “_”. Several additional anchoring protein sequences from NCBI were incorporated in the NMT tree (MTL,_AFV60456.1; DXMT,_ABD90686.1; MXMT,_AFV60445.1; XMT,_ABD90685.1). Searches were run on the CoGe platform using default parameters and saving 100 Blast HSPs per species. Unique translated sequences were then downloaded, duplicates were excluded using BBedit, sequences with internal stop codons were excluded, and then trees were run using PASTA [63 (link)] with MAFFT [64 (link)] to align the protein sequences and FastTree [65 (link)] to create an approximately maximum likelihood tree. Trees were visualized and edited using FigTree (http://tree.bio.ed.ac.uk/software/figtree/) (Additional file 4: Fig. S27, Additional file 5: Fig. S28, Additional file 6: Fig. S29, Additional file 7: Fig. S30). To interpret the supplemental figures, pink branches represent gentianalean clades, green branches represent Rubiaceae clades, and orange gene model IDs represent Gardenia genes. Coffee-specific clades are shown in red. In the NMT supplemental tree (Additional file 4: Fig. S27), the anchoring protein sequences are shown in red.
Free full text: Click here