Reads obtained from the sequencing machines included raw reads containing adapters or low-quality bases, which would affect the following assembly and analysis. The clean reads were retrieved after trimming adapter sequences and removal of low quality (containing >50% bases with a Phred quality score < 20) using the FastQC tool. Transcriptome de novo assembly was performed with the short reads assembling program-Trinity (Grabherr et al., 2011 (link)). Firstly, a short sequence library of K-mer length was constructed using high-quality sequences. Then the short sequence was extended by the overlap of K-mer-1 length between short sequences to obtain the preliminary spliced contig sequences. Next, Chrysalis clusters related contigs that correspond to portions of alternatively spliced transcripts or otherwise unique portions of paralogous genes and then builds Bruijn graphs for each cluster of related contigs. Finally, these Bruijn graphs were processed to find the path based on the reads and paired reads in the graphs to obtain the transcripts. To comprehensively obtain gene annotation information, genes were compared with six databases, including NR (NCBI non-redundant protein sequences), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genome (KEGG), eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups), Swiss-Prot, and Pfam, and the annotation situation of each database was counted.
Free full text: Click here