Automated LTR-RT Annotation Benchmark

The reference O. sativa genome [47–51 (link)], was selected for testing the software due to its high-quality assembly, small genome size (389 Mb) and quality of its genes and TEs annotations. The O. sativa genome was identical to the one used by [33 (link)] to compare the results with benchmarking tools in this study. This work used the standard library v6.9.5 created by [33 (link)] based on the O. sativa L. ssp. japonica cv. ‘Nipponbare’ v. MSU7 genome and RepeatMasker v4.0.8 [52 ] with the following parameters ‘-pa 36 -q -no_is -norna -nolow -div 40 -cutoff 225’.
Additionally, six different plant genomes (Table 1) were used to test the execution times of Inpactor2 by assessing different genome sizes and TE compositions. The genomes were downloaded from NCBI and analyzed with Inpactor2 using the following parameters (-m 15000 -n 1000, -i no, -d no, -C 1, -c yes -a no), as suggested in [53 (link)]. Finally, EDTA was run with the same genomes to compare its execution times with Inpactor2. EDTA was executed using EDTA_raw.py script, –type ltr, and the other parameters by default.
Libraries of LTR-RTs of the species shown in Table 1 were then created using Inpactor2 (with and without filtering with the -c flag) and EDTA. In addition, two species that were not contained in the training data were used, such as Coffea humblotiana [54 (link)] and Gardenia jasminoides [55 (link)]. These libraries were then annotated using repeatMasker and compared with the proportion of genomes corresponding to LTR-RTs according to the papers where the genomes were reported. A workstation with AMD Ryzen Threadripper 3970X 32-Core Processor, 128 Gb in RAM memory and a GPU Nvidia RTX 2080 super was used to perform all the experiments.
To evaluate the performance of Inpactor2 compared with other software, a similar methodology to the one proposed in [33 (link)] was followed. First, Inpactor v.1.0 [34 ], TEsorter v.1.3 [45 ], Transposon Ultimate v.1.0 [28 ], LTR_retriever v.2.9 [56 (link)] and LTRharvest [57 ] were selected for benchmarking given their methodologies for classifying LTR-RTs to the superfamily level. A workflow was established for each software, initially using LTR_FINDER v.1.0.7 as the LTR-RTs detector. Then, the O. sativa genome was annotated with RepeatMasker and performance metrics were extracted for each workflow. The metrics evaluated were: accuracy, precision, specificity, sensitivity, FDR and F1-score. Figure 1 shows the schematic representation of the benchmarking metrics. In this study, TP, FN, TN and FP are the number of nucleotides belonging to each category (Figure 2).
The script called ‘lib-test.pl’, included in the EDTA toolkit [33 (link)], was used to extract the six metrics. Since this study only focused on the LTR-RT category, so the script was executed using the -cat ltr parameter to perform the comparative evaluation.

Free full text: Click here

Orozco-Arias S., Humberto Lopez-Murillo L., Candamil-Cortés M.S., Arias M., Jaimes P.A., Rossi Paschoal A., Tabares-Soto R., Isaza G, & Guyot R. (2022). Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes. Briefings in Bioinformatics, 24(1), bbac511.

Publication 2022

Coffea Edta Gardenia Genes Genome Library Memory Nucleotides Plant genomes Sensitivity Transposon

Corresponding Organization : Université de Montpellier

Top 5 similar protocols

Protocol cited in 3 other protocols

Variable analysis

independent variables

Genome size
Transposable element (TE) composition
Software parameters for Inpactor2 (-m 15000 -n 1000, -i no, -d no, -C 1, -c yes -a no) and EDTA (--type ltr, other parameters by default)

dependent variables

Execution times of Inpactor2 and EDTA
Accuracy, precision, specificity, sensitivity, FDR and F1-score of LTR-RT annotation workflows for Inpactor v.1.0, TEsorter v.1.3, Transposon Ultimate v.1.0, LTR_retriever v.2.9 and LTRharvest

control variables

AMD Ryzen Threadripper 3970X 32-Core Processor, 128 Gb in RAM memory and a GPU Nvidia RTX 2080 super
The O. sativa genome used in the study, which was identical to the one used by [33 (link)]
The standard library v6.9.5 created by [33 (link)]
RepeatMasker v4.0.8 [52] with the following parameters '-pa 36 -q -no_is -norna -nolow -div 40 -cutoff 225'

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!