We constructed libraries with a 350-bp insert fragment for G. thurberi and G. davidsonii according to the manufacturer’s instructions (Illumina). A HiSeq 2500 system was used to sequence the libraries, along with a PE150 strategy according to the manufacturer’s instructions (Illumina). The sequence adaptors were removed for the uncleaned Illumina reads, and the contaminated reads (viral, mitochondrial, bacterial sequences) were compared with the NCBI-NR database via BWA v0.7.13 [37 (link)] (using default instructions). Duplicate pairs were identified using FastUniq v1.12 [38 (link)]. In total, we produced 116.9 Gb and 118.2 Gb clean Illumina reads for G. raimondii and G. davidsonii, respectively. For CENH3 analysis, the raw sequencing data were downloaded from the Gene Expression Omnibus for G. hirsutum (accession number GSE119184) [39 ] and the European Molecular Biology Laboratory-European Bioinformatics Institute (accession number PRJEB14368) [40 ]. The data were aligned to the reference genome with Botiwe2, and the enrichment was calculated by dividing the CENH3 read counts by the input read counts according to previously used methods [28 (link)].
Free full text: Click here