The Trinity software37 (link)38 (link) was used to assemble the unigene sets from the clean reads of all 14 tissues for whole plant, individual tissues and different year-old roots, separately, using the default parameters. We assembled the unigene sets for individual tissues and different year-old roots to estimate the variation of GO term profiles across tissues and developmental stages, which is significant for systematic understanding of ginseng’s transcriptome biology. The isoforms of rRNA genes contaminated in the unigene set was identified by blasting against the rRNA gene databases of all plant species available in GenBank. The isoforms derived from lncRNA were identified by ORF analysis because lncRNA has no or little ORF.
The unigenes were annotated using Blast2GO39 (link) (https://www.blast2go.com/), a software package that retrieves GO terms, allowing gene functions to be determined and compared. The annotation was performed against the NCBI Nr database at a cutoff of E-value ≤ 1e-05. The GO terms were assigned to query sequences, producing a broad overview of groups of genes catalogued in the transcriptome for each of three ontological vocabularies: biological process, molecular function and cellular component. The three ontological vocabularies were then further categorized into different functional subcategories (level 2), such as metabolic process, response to stimulus, reproduction, signaling, etc., under the category of biological process.
The unigenes assigned to GO terms above were also mapped to KEGG pathways using the KEGG Database (http://www.genome.jp/kegg/) with Blast2GO. Enzyme commission (EC) numbers were assigned to the unigenes that had a BLASTX score of E-value ≤ 1e-05. The unigenes were mapped to the KEGG metabolic pathways according to the EC distribution in the pathway database.
Free full text: Click here