To predict genes in the assembled genome, we used both homology-based and de novo methods. For the homology-based prediction, A. thaliana, grapevine, castor, and potato proteins were mapped onto the assembled genome using Genewise [56 (link)] to define gene models. For de novo prediction, Augustus [57 (link)] and GlimmerHMM were employed using appropriate parameters. Data from these complementary analyses were merged to produce a non-redundant reference gene set using GLEAN [58 ]. In addition, RNA-Seq data from multiple tissues (young roots, leaves, flowers, developing seeds, and shoot tips) from our previous study [17 (link)] were also incorporated to aid in gene annotation. RNA-Seq data were mapped to the assembled genome using TopHat [59 (link)], and transcriptome-based gene structures were obtained by cufflinks [60 ]. Then, we compared this gene set with the previous one to get the final non-redundant gene set of sesame (Tables S8 to S10 in Additional file 1). The non-coding gene predictions and gene function annotations were conducted as described in Supplementary Note 3 and Table S11 in Additional file 1.
Free full text: Click here