Comprehensive Repeat Identification in Ranodon

The PiRATE pipeline was used as in the original publication (Berthelier et al., 2018 (link)), including the following steps: 1) Contigs representing repetitive sequences were identified from the assembled contigs using similarity-based, structure-based, and repetitiveness-based approaches. The similarity-based detection programs included RepeatMasker v-4.1.0 (http://repeatmasker.org/RepeatMasker/, using Repbase20.05_REPET.embl.tar.gz as the library instead) and TE-HMMER (Eddy, 2011 (link)). The structural-based detection programs included LTRharvest (Ellinghaus et al., 2008 (link)), MGEScan non-LTR (Rho and Tang, 2009 (link)), HelSearch (Yang et al., 2009 (link)), MITE-Hunter (Han and Wessler, 2010 (link)), and SINE-finder (Wenke et al., 2011 (link)). The repetitiveness-based detection programs included TEdenovo (Flutre et al., 2011 (link)) and RepeatScout (Price et al., 2005 (link)). 2) Repeat consensus sequences (e.g., representing multiple subfamilies within a TE family) were also identified from the cleaned, filtered, and unassembled reads with dnaPipeTE (Goubert et al., 2015 (link)) and RepeatModeler (http://www.repeatmasker.org/RepeatModeler/). 3) Contigs identified by each individual program in steps 1 and 2, above, were filtered to remove those <100 bp in length and clustered with CD-HIT-est (Li and Godzik, 2006 (link)) to reduce redundancy (100% sequence identity cutoff). This yielded a total of 155,999 contigs. 4) All 155,999 contigs were then clustered together with CD-HIT-est (100% sequence identity cutoff), retaining the longest contig and recording the program that classified it. 46,090 contigs were filtered out at this step. 5) The remaining 109,909 repeat contigs were annotated as TEs to the levels of order and superfamily in Wicker’s hierarchical classification system (Wicker et al., 2007 (link)), modified to include several recently discovered TE superfamilies using PASTEC (Hoede et al., 2014 (link)), and checked manually to filter chimeric contigs and those annotated with conflicting evidence (Supplementary File S2). 6) All classified repeats (“known TEs” hereafter), along with the unclassified repeats (“unknown repeats” hereafter) and putative multi-copy host genes, were combined to produce a Ranodon-derived repeat library. 7) For each superfamily, we collapsed the contigs to 95% and 80% sequence identity using CD-HIT-est to provide an overall view of within-superfamily diversity; 80% is the sequence identity threshold used to define TE families (Wicker et al., 2007 (link)).

Free full text: Click here

Wang J., Yuan L., Tang J., Liu J., Sun C., Itgen M.W., Chen G., Sessions S.K., Zhang G, & Mueller R.L. (2023). Transposable element and host silencing activity in gigantic genomes. Frontiers in Cell and Developmental Biology, 11, 1124374.

Publication 2023

Bp 100 Chimeric Consensus sequences Library Mite Multi genes Repetitive sequences Sine

Corresponding Organization : Colorado State University

Other organizations : Xinjiang Normal University, Capital Normal University, Hartwick College

Top 5 similar protocols

Variable analysis

independent variables

Not explicitly mentioned

dependent variables

Not explicitly mentioned

control variables

Not explicitly mentioned

controls

No positive or negative controls were explicitly mentioned.

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!