After gene prediction, gene functions were assigned according to the best match of the alignments against various protein databases using BLAST v2.2.31 (E-value = 1e-5), including the KEGG33 (link), Swiss-Prot, and TrEMBL databases34 (link). GO terms for each gene were obtained from the corresponding InterPro entries35 (link). Overall, we inferred 44,539 (96.86%) genes that were annotated based on the results from searching the protein databases (Supplementary Table
Intact LTR-RTs were identified using LTR_finder36 (link) and classified the intact LTR-RTs by predicting the RT domains using the Pfam database (version 26.0) and HMMER software37 . Muscle38 (link) was then employed to perform multiple RT sequence alignments, and RAxML39 was adopted to construct maximum likelihood (ML) trees based on the sequence alignments with 500 bootstrap replications. Finally, the interactive tree of life (iTOL)40 (link) was used to plot the ML trees. The analysis of LTR insertion time was performed as previously reported4 (link).
We also performed noncoding RNA annotation for our assembly. tRNA annotation was conducted using tRNAscan-SE (v1.3.1)41 (link) according to its structural characteristics. Homology-based rRNAs were localized by mapping known full-length plant rRNAs to the B. rapa genome v3.0. snRNAs were predicted by Infenal (v1.1)42 (link) using the Rfam database43 . miRNA annotation was performed as previously described44 (link).