CYP protein sequences from A. thaliana and O. sativa [63 ,64 ] were used to perform BLASTP searches using a minimum E value cut-off of 1e-10 against the predicted proteomes of S. moellendorffii, M. polymorpha, A. agrestis, P. patens, K. nitens, C. reinhardtii, and C. merolae. GST protein sequences were retrieved by BLASTP searches using GST proteins from A. thaliana [65 (link),66 ], O. sativa [67 (link),68 (link)], and P. patens [28 (link)] against the predicted proteomes of S. moellendorffii, M. polymorpha, A. agrestis, K. nitens, C. reinhardtii, and C. merolae. This initial list of sequences for each species was used as a query for BLASTP searches against the proteome of that species to retrieve additional sequences belonging to species-specific clans. Each CYP sequence was checked for the presence of the cytochrome p450 domain (PF00067, IPR00128) and each GST sequence was checked for the presence of the GST N-terminal domain (IPR004045, IPR019564, PF13409, PF17172, PF13417 and PF02798) and C-terminal domain (IPR010987, PF13410, PF00043, PF14497 and PF17171) using InterProScan 84.0 [69 (link)].
Two enzyme families with glutathione transferase activity, kappa [70 (link)] and membrane associated proteins in eicosanoid and glutathione metabolism (MAPEG) [71 (link)], do not possess a GST N-terminal thioredoxin-like domain or GST C-terminal domain and lack the N-terminal active site found in all other GST proteins. An additional group of sequences was identified by this analysis possessing two GST N-terminal domains (2N) but lacking a C-terminal domain. Protein sequences belonging to the kappa, MAPEG, and 2N classes were therefore not included in the phylogenetic analysis but are listed in S5 Table.
Free full text: Click here