The core functional implementation of PREPACT using PHP and MySQL has been described earlier [38 (
link), 39 (
link)]. Basic functions have been revised to yield higher performance and to cope with growing query complexity. This included consistent translation of different sequence/feature numbering schemes on a global and local scale to be able to match information in partial hits and globally numbered features. The internal GenBank engine has been extended to also handle remote locations (in other accessions) and partial CDS features with annotated editing sites locally as well as in the remote part. This was necessary to also deal with complex genomes split across multiple accessions in parallel with trans-splicing as e.g. in the
Amborella trichopoda mitochondrial DNA. The reference tabs of the BLASTX output (see Fig.
2) now offer an option for download of the individual references in a GenBank-style flat file format including the standardized annotation of RNA editing sites with the additional “RNA_editing” feature we had introduced previously [39 (
link)].
The user interface has been improved mainly on the sequence upload/handling side via integration of additional JavaScript features with the help of jQuery (
https://jquery.com/) and jQueryUI (
https://jqueryui.com/) libraries as well as additional jQuery extensions “File Upload” (
https://blueimp.github.io/jQuery-File-Upload/) and “Add Clear” (
https://github.com/skorecky/Add-Clear).
EdiFacts is an addition to the relational database with data collected manually from publications. New items are continually identified by routine literature searches, journal publication alerts and journal scanning services such as “PubCrawler” [79 (
link)] using appropriate key words. Literature references are downloaded, parsed and stored locally for search purposes and linked to respective external NCBI PubMed and protein source entries. Editing sites affected by listed factors are referenced in the “RNA_editing” feature introduced in PREPACT2 [39 (
link)] using a “db_xref” qualifier. This internal crosslink is used for highlighting editing sites with known editing factors in the “commons” output. The EdiFacts input form is the graphical representation of the internal query builder which translates various combinations of selected filters/options into efficient MySQL queries combining all available data.
The TargetScan module is comparing the user-defined weight matrix in a sliding window approach to the selected sequences or sequence parts extracted from the internal GenBank database. As such, TargetScan is a custom-made and easy-to use alternative to more sophisticated motif identification algorithms such as FIMO [80 (
link)] or PWMscan [81 ]. Scores for each sub-sequence are calculated by multiplying the base value (percent) with the position weight and summing up. Results are ranked by descending score down to a certain number of results or optionally to all results with the same score after this number of results to avoid arbitrary cut-offs of equally good matching sub-sequences. In the output individual base stretches are listed with their position/features according to the selected mode and single base scores are colour coded from green (maximum score at this position = perfectly matching) to red (minimum score at this position), with mixed colours in between. Positions with no weight are excluded from colour coding to have less clutter. Editing sites are highlighted in the sequence in blue (C-to-U) or red (U-to-C) respectively. To be in line with other sequence features, the selection of sub-sequences for searching in different modes (“Genome”, “CDS”, “Around editing sites”) is internally implemented as an extension to the GenBank format defining “Search_range” and “Search_result” as GenBank features.
For detection of previously overlooked RNA editing sites, individual chloroplast references were run against all other available reference editomes. Strongly predicted editing sites (i.e. with a ‘commons’ score of at least 80% or at least one edited reference species) previously not reported not to be edited were rechecked in selected cases (Additional file
2). To that end, plant material was obtained from the Bonn University Botanic Garden Bonn and RNA was prepared by the CTAB method, the TRI Reagent Protocol (Sigma Aldrich) or with the NucleoSpin® Plant RNA II Kit (Macherey-Nagel). Subsequently, cDNA synthesis was performed with Revert Aid First Strand cDNA Synthesis Kit (Thermo Fisher) using random hexamer primers. The relevant regions were amplified by RT-PCR with gene-specific primers and products recovered from agarose gel with NucleoSpin® Extract II Kit (Macherey-Nagel). PCR products were sequenced directly after gel elution or after cloning into pGEM-T Easy (Promega).
Lenz H., Hein A, & Knoop V. (2018). Plant organelle RNA editing and its specificity factors: enhancements of analyses and new database features in PREPACT 3.0. BMC Bioinformatics, 19, 255.