Considering that the repetitive elements of many species investigated in this study are either not well annotated and/or not publicly available, we re-annotated the repetitive elements of all the sampled species except human using the same strategy. Repetitive elements of the human genome (GRCh38/hg38) have been well annotated and thus were downloaded from UCSC directly. Repetitive elements in the genomes of the rest species were identified by homology searches against known repeat databases and de novo predictions as previously described.110 (link) Briefly, we carried out homology searches for known repetitive elements in each genome assembly by screening the Repbase-derived RepeatMasker libraries with RepeatMasker (setting -nolow -no_is -norna -engine ncbi) and the transposable element protein database with RepeatProteinMask (an application within the RepeatMasker package; setting -noLowSimple -pvalue 0.0001 -engine ncbi). For de novo prediction, RepeatModeler was executed on the genome assembly to build a de novo repeat library for each species, respectively. Then RepeatMasker was employed to align the genome sequences to the de novo library for identifying repetitive elements. We also searched each genome assembly for tandem repeats using Tandem Repeats Finder100 (link) with parameters Match = 2 Mismatch = 7 Delta = 7 PM = 80 PI = 10 Minscore = 50 MaxPeriod = 2000. To confirm the reliability of our annotations, we compared our repeat annotation results of the fruit fly Drosophila melanogaster and the zebrafish Danio rerio with those downloaded from UCSC and observed good consistency (Figures S3A and S3B).
Free full text: Click here