The nucleotide and amino acid sequences for tobacco (
Nicotiana tabacum) KED [6 (
link)] were used as a starting point for the initial search using BLASTN and BLASTP tools against the GenBank and OneKP database including non-redundant nucleotide and protein sequences, whole-genome shot gun, expressed sequence tags, high throughput genomic sequences, UniProtKB, transcriptome shotgun assembly proteins and protein data bank. Initial search using both the coding nucleotide sequences and the amino acid sequences identified 32 eudicots and at least one monocot (
Elaeis guineensis). Subsequent searches were performed against the
E.
guineensis amino acid sequence through Liliopsida (monocotyledons) database to identify matching sequences of monocots. Likewise, using retrieved sequences to systematically search databases of the same orders and families of eudicotyledons yielded more species of possible matching sequences. Similar strategy was used to identified KED sequences from gymnosperms. Further searches were done sequentially by narrowing organism groups to find matches from more closely related species. However, it must be pointed out that the database search was aimed at surveying broadly the possible taxonomic presence of the
KED gene and the retrieved sequences are not by no means an exhaustive outcome due to genomic sequence availability and the annotation quality of the public databases.
To search for possible KED-rich sequences in charophytes, bacteria and animals, KED protein and nucleotide sequences from plants were first repeatedly blasted through each of the intended organism groups in the databases. Then each match was further examined by retrieving its sequence from the database. Translation tool was used to generate open reading frames, followed by amino acid composition analysis, specifically for K+E+D content, to score the putative KED candidates. Once a KED sequence was identified from one taxon group (for example, charophyte), this sequence was used to search the entire available entries from this group. This way, sequences predicting KED-rich open reading frames in genomes of several charophyte, bacterial and animal species were identified.
During the course of searching animal KED candidates, a 6,229-amino acid microtubule-associated protein futsch from honeybee (
Apis cerana) was found to contain an internal KED-rich region, whereas its N- and C-terminus portions have normal K, E and D contents. To illustrate examples of the presence of KED sequences in animal species, this 750-amino acid internal KED-rich region was arbitrarily taken out for demonstration in this study.
All retrieved sequences of possible matches were manually reviewed and verified for proper open reading frames and translated sequences. Wherever applicable, both genomic sequences and mRNA sequences were matched to verify the correct coding sequences. The full-length, translated sequences with considerable sequence identity and a high percentage of KED (K+E+D% greater than 30%) were designated as a candidate match.
Only partial KED sequences were available for two plants: cedar (
Cryptomeria japonica, a gymnosperm; without C-terminus) and barley (
Hordeum vulgare, a monocot, angiosperm; without N-terminus). However, they both still possessed the conserved domain (see “
Results” below), therefore were included in sequence comparison analysis. But because their KED protein lengths were unknown and would distort the analysis parameters, they were excluded from the dataset for phylogenetic analysis described below.
Zhang X.H., Swait D., Jin X.L., Vichyavichien P., Nifakos N., Kaplan N., Raymond L, & Harlin J.M. (2023). Evolutionary analysis of KED-rich proteins in plants. PLOS ONE, 18(3), e0279772.