As mentioned, the PREP suite of programs identifies potential sites of RNA editing based on the evolutionary principle that editing increases protein conservation among species. This is a fundamental quality of RNA editing in plants that was noticed upon its discovery in 1989 (19–21 ) and has been repeatedly observed in nearly all subsequent studies. Full details of the PREP-Mt methodology have been published previously (16 (link)). Essentially, all three programs perform the same series of steps: (i) an input sequence is translated using the standard genetic code; (ii) the translated sequence is aligned to a set of homologous proteins; (iii) the alignment is examined column-by-column to determine if an editing event could increase the similarity of the input sequence to the sequences in the pre-defined alignment. An edit site is predicted if a C-to-U change in a codon causes it to produce an amino acid that is found in more of the homologous proteins than the amino acid coded for by the unedited codon. If a cutoff value is specified by the user, the score of the edited version of the codon must also be >C.
The major difference between each server is in the set of homologous proteins used for comparison to the input sequence. For PREP-Aln, the protein homologs derive from the RNA-tagged sequences in the input file provided by the user. PREP-Aln pulls out all of the DNA sequences from the input alignment, and then builds the homologous protein alignment by translating the RNA sequences remaining in the input alignment. PREP-Aln then compares each of the pulled DNA sequences to the translated RNA alignment. For PREP-Mt and PREP-Cp, the set of homologous proteins is determined by the user when the gene name parameter is specified. These alignments of known mitochondrial or chloroplast proteins have been pre-generated from data available in GenBank and literature sources. The mitochondrial alignments were described previously and consist predominantly of six species with widespread transcriptomic sequence data (Figure 2 A), and three species (Marchantia polymorpha, Chara vulgaris, Chaetosphaeridium globosum) that lack RNA editing (16 (link)). To create the chloroplast alignments, chloroplast genomes from seed plants whose transcriptomes have been extensively examined for editing (Figure 2 B) were downloaded from GenBank. The known positions of edit sites were used to reconstruct mature, edited RNA sequences and these sequences were translated using the standard genetic code. Homologous proteins were aligned with ClustalW and manually adjusted when necessary to produce a collection of 35 alignments representing all chloroplast genes with evidence for editing in at least one of the seed plants in this study (Figure 2 B).
![]()
The major difference between each server is in the set of homologous proteins used for comparison to the input sequence. For PREP-Aln, the protein homologs derive from the RNA-tagged sequences in the input file provided by the user. PREP-Aln pulls out all of the DNA sequences from the input alignment, and then builds the homologous protein alignment by translating the RNA sequences remaining in the input alignment. PREP-Aln then compares each of the pulled DNA sequences to the translated RNA alignment. For PREP-Mt and PREP-Cp, the set of homologous proteins is determined by the user when the gene name parameter is specified. These alignments of known mitochondrial or chloroplast proteins have been pre-generated from data available in GenBank and literature sources. The mitochondrial alignments were described previously and consist predominantly of six species with widespread transcriptomic sequence data (
Seed plants with extensive editing data for (A) mitochondrial genes and (B) chloroplast genes. For each species is listed the number of genes with editing information, along with the number of edited (Pos) and unedited (Neg) cytidines found in those genes. The chronogram shows evolutionary relationships and approximate divergence times for species. Divergence times are listed in millions of years (MYA) and were taken from published analyses (22 ,23 (link)). Species in black were used to generate the sets of homologous protein alignments and to optimize the cutoff value. Species in red were used for the unseen tests only.