We reannotated barley CCT genes by blasting CCT domains from HvCMF4 (301 to 345 residues) and Pfam database (http://pfam.xfam.org/) (PF06203) against barley Morex V2 proteome (e value: 0.005). Full-length protein sequences from the retrieved genes were domain-annotated with hmmscan under Pfam database to identify additional domains such as B-box–type zinc finger domain (PF00643), response regulator receiver domain (PF00072), GATA zinc finger (PF00320), or tify domain (PF06200). Gaps present in ≥80% of the aligned sequences were removed. Phylogenetic trees were built with RAxML (64 (link)). We carried out rapid bootstrapping and best­scoring ML tree searching in the same run (-f a) with an extended majority rule (-# autoMRE). The resultant tree was visualized with Evolview v3 (65 (link)). During the analysis, we noticed two CMFs (HvCMF4L1 and HvCMF4L2) with high amino acid sequence identity (~95%) compared with HvCMF4, which may indicate gene duplication. To ascertain this, we aligned the genomic sequences from HvCMF4L1/L2 against a ~5.5-kb genomic sequence surrounding HvCMF4 under the Sequence Alignment Viewer (https://ncbi.nlm.nih.gov/projects/msaviewer/) and found that both HvCMF4L1 and HvCMF4L2 arose from a partial duplication (~2 kb, with ~97% identity) of the HvCMF4 locus. We blasted the ~2-kb genomic sequences from both HvCMF4L1 and HvCMF4L2 against the remaining 19 barley reference genomes (16 (link)). While HvCMF4L1 was present in all of the 19 reference genomes assayed, for HvCMF4L2, only 14 (including Golden Promise) of the 19 barley reference genomes showed a hit in the syntenic region. Notably, no hits were found in syntenic regions of wheat and rye genomes for both genes (HvCMF4L1/L2), suggesting that these duplication events happened after the divergence of these Triticeae species and that the HvCMF4L2 duplication is younger than HvCMF4L1.
To construct a HvCMF4-specific phylogenetic tree, its full-length protein sequence was queried against proteomes from 14 species downloaded from Phytozome v12.1 (https://phytozome-next.jgi.doe.gov/), which included eight grasses and six eudicots. E value cutoff was set as 1 × 10−10, which retrieved 195 genes. After filtering genes without CCT domain or with the additional domains mentioned above, 186 were retained for building the tree. We followed the same procedure mentioned above to construct and visualize the tree. To ascertain true homology relationships with HvCMF4 inferred from the tree, we blasted back genes from the HvCMF4 clade against barley proteome.
Free full text: Click here