Curating Transposable Elements in Rice Genome

Manual curation of TEs in rice was started after the release of the map-based rice genome [22 (link)]. Repetitive sequences in the rice genome were compiled by RECON [44 (link)] with a copy number cutoff of 10. Details for manual curation of LTR sequences were previously described in the LTR_retriever paper [40 (link)]. In brief, for the curation of LTR retrotransposons, we first collected known LTR elements and used them to mask LTR candidates. Unmasked candidates were manually checked for terminal motifs, TSD sequences, and conserved coding sequences. Terminal repeats were aligned with extended sequences, from which candidates were discarded if alignments extended beyond their boundaries. For the curation of non-LTR retrotransposons, new candidates were required to have a poly-A tail and TSD. We also collected 13 curated SINE elements from [53 (link)] to complement our library.
For curation of DNA TEs with TIRs, flanking sequences (100 bp or longer, if necessary) were extracted and aligned using DIALIGN2 [72 (link)] to determine element boundaries. A boundary was defined as the position to which sequence homology is conserved over more than half of the aligned sequences. Then, sequences with defined boundaries were manually examined for the presence of TSD. To classify the TEs into families, features in the terminal and TSD sequences were used. Each transposon family is associated with distinct features in their terminal sequences and TSDs, which can be used to identify and classify elements into their respective families [14 (link)]. For Helitrons, each representative sequence requires at least two copies with intact terminal sequences, distinct flanking sequences, and inserts into “AT” target sites.
To make our non-redundant curated library, each new TE candidate was first masked by the current library. The unmasked candidates were further checked for structural integrity and conserved domains. For candidates that were partially masked and presented as true elements, the “80-80-80” rule (≥ 80% of the query aligned with ≥ 80% of identity and the alignment is ≥ 80 bp long) was applied to determine whether this element would be retained. For elements containing detectable known nested insertions, the nested portions were removed and the remaining regions were joined as a sequence. Finally, protein-coding sequences were removed using the ProtExcluder package [73 (link)]. The curated library version 6.9.5 was used in this study and is available as part of the EDTA toolkit.

Free full text: Click here

Ou S., Su W., Liao Y., Chougule K., Agda J.R., Hellinga A.J., Lugo C.S., Elliott T.A., Ware D., Peterson T., Jiang N., Hirsch C.N, & Hufford M.B. (2019). Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biology, 20, 275.

Publication 2019

Coding sequences Conserved sequences Edta Genome Insertions Library Poly a tail Protein coding sequences Repetitive sequences Retrotransposons Rice Sine Terminal repeats Transposon Tsds

Corresponding Organization : University of Minnesota

Other organizations : Iowa State University, University of California, Irvine, Cold Spring Harbor Laboratory, University of Guelph

Top 5 similar protocols

Protocol cited in 57 other protocols

Variable analysis

independent variables

Rice genome
Repetitive sequences in the rice genome
LTR sequences
Non-LTR retrotransposons
DNA TEs with TIRs

dependent variables

Manual curation of TEs in rice
Compilation of repetitive sequences in the rice genome
Curation of LTR sequences
Curation of non-LTR retrotransposons
Curation of DNA TEs with TIRs

control variables

Copy number cutoff of 10 for repetitive sequences
Terminal motifs, TSD sequences, and conserved coding sequences for LTR retrotransposons
Poly-A tail and TSD for non-LTR retrotransposons
Flanking sequences and alignment using DIALIGN2 for DNA TEs with TIRs
Terminal sequences and TSDs for classifying TE families
Structural integrity and conserved domains for creating a non-redundant curated library
Protein-coding sequences removed using ProtExcluder package

positive controls

Known LTR elements used to mask LTR candidates
13 curated SINE elements collected from a previous study

negative controls

None mentioned

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!