Total RNA was extracted separately from testis (
n = 4) and ovary (
n = 4) tissues using TRIzol (Invitrogen). For each sample, RNA quality and concentration were assessed using agarose gel electrophoresis, a NanoPhotometer spectrophotometer (Implen, CA), a Qubit 2.0 Fluorometer (ThermoFisher Scientific), and an Agilent BioAnalyzer 2,100 system (Agilent Technologies, CA), requiring an RNA integrity number (RIN) of 8.5 or higher; one ovary sample failed to meet these quality standards and was excluded from downstream analyses. Sequencing libraries were generated using the NEBNext Ultra RNA Library Prep Kit for Illumina following the manufacturer’s protocol. After cluster generation of the index-coded samples, the library was sequenced on one lane of an Illumina Hiseq 4,000 platform (PE 150). Transcriptome sequences were filtered using Trimmomatic-0.39 with default parameters (Bolger et al., 2014 (
link)). 30, 848, 170 to 39, 695, 323 reads were retained for each testis or ovary sample, and in total, 290, 925, 984 reads remained, with a total length of 42, 385, 060,050 bp. Remaining reads of all testis and ovary samples were combined and assembled using Trinity 2.12.0 (Haas et al., 2013 (
link)), yielding 573,144 contigs (i.e., putative assembled transcripts). Contigs were clustered using CD-hit-est (95% identity). Completeness of this final
de novo transcriptome assembly were assessed using the BUSCO pipeline (Simao et al., 2015 (
link)).
Expression levels of contigs in each sample were measured with Salmon (Patro et al., 2017 (
link)), and contigs with no raw counts were removed. To annotate the remaining contigs containing autonomous TEs, BLASTp and BLASTx were used against Repbase with an E-value cutoff of 1E-5 and 1E-10, respectively. The aligned length coverage was set to exceed 80% of the queried transcriptome contigs. To annotate contigs containing non-autonomous TEs, RepeatMasker was used with our
Ranodon-derived genomic repeat library of non-autonomous TEs (LARD-, TRIM-, MITE-, and SINE-annotated contigs) and the requirement that the transcriptome/genomic contig overlap was >80 bp long, >80% identical in sequence, and covered >80% of the length of the genomic contig. Contigs annotated as conflicting autonomous and non-autonomous TEs were filtered out.
To identify contigs that contained endogenous
R. sibiricus genes, the Trinotate annotation suite (Bryant et al., 2017 (
link)) was used with an E-value cutoff of 1E-5 for both BLASTx and BLASTp against the Uniport database, and 1E-5 for HMMER against the Pfam database (Wheeler and Eddy, 2013 (
link)). To identify contigs that contained both a TE and an endogenous gene (i.e., putative cases where a TE and a gene were co-transcribed on a single transcript), all contigs that were annotated both by Repbase and Trinotate were examined, and the ones annotated by Trinotate to contain a TE-encoded protein (
i.e., the contigs where Repbase and Trinotate annotations were in agreement) were not further considered. The remaining contigs annotated by Trinotate to contain a non-TE gene (
i.e., an endogenous
Ranodon gene) and also annotated either by Repbase to include a TE-encoded protein or by blastn to include a non-autonomous TE were filtered out for the expression analysis.
Wang J., Yuan L., Tang J., Liu J., Sun C., Itgen M.W., Chen G., Sessions S.K., Zhang G, & Mueller R.L. (2023). Transposable element and host silencing activity in gigantic genomes. Frontiers in Cell and Developmental Biology, 11, 1124374.