Expression levels of contigs in each sample were measured with Salmon (Patro et al., 2017 (link)), and contigs with no raw counts were removed. To annotate the remaining contigs containing autonomous TEs, BLASTp and BLASTx were used against Repbase with an E-value cutoff of 1E-5 and 1E-10, respectively. The aligned length coverage was set to exceed 80% of the queried transcriptome contigs. To annotate contigs containing non-autonomous TEs, RepeatMasker was used with our Ranodon-derived genomic repeat library of non-autonomous TEs (LARD-, TRIM-, MITE-, and SINE-annotated contigs) and the requirement that the transcriptome/genomic contig overlap was >80 bp long, >80% identical in sequence, and covered >80% of the length of the genomic contig. Contigs annotated as conflicting autonomous and non-autonomous TEs were filtered out.
To identify contigs that contained endogenous R. sibiricus genes, the Trinotate annotation suite (Bryant et al., 2017 (link)) was used with an E-value cutoff of 1E-5 for both BLASTx and BLASTp against the Uniport database, and 1E-5 for HMMER against the Pfam database (Wheeler and Eddy, 2013 (link)). To identify contigs that contained both a TE and an endogenous gene (i.e., putative cases where a TE and a gene were co-transcribed on a single transcript), all contigs that were annotated both by Repbase and Trinotate were examined, and the ones annotated by Trinotate to contain a TE-encoded protein (i.e., the contigs where Repbase and Trinotate annotations were in agreement) were not further considered. The remaining contigs annotated by Trinotate to contain a non-TE gene (i.e., an endogenous Ranodon gene) and also annotated either by Repbase to include a TE-encoded protein or by blastn to include a non-autonomous TE were filtered out for the expression analysis.