To obtain high-quality clean reads, the raw reads were filtered to remove reads with adaptor sequences, low-quality reads (Phred quality score <20 bp), and reads with a high percentage of unidentified nucleotides, using Perl script with G-language Genome Analysis Environment [36 (link)]. De novo assembly of the clean reads was carried out using Trinity software (version: trinity/r2014-04-13p1) with default parameters and no reference sequence. The sequences resulting from the de novo Trinity assembly were called unigenes. In order to annotate unigenes, a BLASTX search against the UniProt database was conducted with an E-value cut-off of 1e−5. The following genomic databases were used for the taxonomic distribution of annotated components : plants (Chlamydomonas reinhardtii, http://www.ncbi.nlm.nih.gov/pubmed /17932292; Arabidopsis thaliana, https://www.arabidopsis.org/); animals (Drosophila melanogaster, http://flybase.org/; Caenorhabditis elegans, https://www.wormbase.org/); fungi (Saccharomyces cerevisiae, http://www.ensembl.org/index.html); kinetoplastids (Trypanosoma cruzi, Trypanosoma brucei, Leishmania major, http://www.ncbi.nlm.nih.gov/). A venn diagram was drawn using Venny program (http://bioinfogp.cnb.csic.es/tools/venny/). The Blast2GO program [37 (link)] and Kyoto Encyclopedia of Genes and Genomes (KEGG) database [38 (link)] (http://www.genome.jp/kegg/) were used to identify the Gene ontology (GO) annotation and biological pathways in E.gracilis, respectively. Results of pathway enrichment were visualized using Pathway Projector [39 (link)].
Free full text: Click here