CDSs were annotated by a combination of semi-automatic procedures. First, P. anserina open reading frames longer than 20 codons that are evolutionary conserved in N. crassa were retrieved by TBLASTN analysis. Candidates with an e-value lower than 10-18 were conserved as hypothetical exons. Exons separated by less than 200 nucleotides were merged into putative CDSs and putative introns were predicted thanks to the P. anserina consensus sequences defined in the pilot project [24 (link)]. Then, 5' and 3' smaller exons were searched by the same procedure except that open reading frames longer than five codons surrounding putative CDSs were analyzed by BLAST with the homologous N. crassa region. Candidates with an e-value lower than 10-5 were conserved and added to the putative CDSs. CDS and intron predictions were edited with Artemis [86 (link)] and manually corrected after comparison with available ESTs. Finally, ab initio prediction with GeneID [87 (link)] using the N. crassa and Chaetomium globosum parameter files were performed on regions devoid of annotated features. Manual verification was then applied to improve prediction. This resulted in the definition of 10,545 putative CDSs.
A canonic rDNA unit was assembled. A junction sequence between the left arm of chromosome 3 and an rDNA unit was observed, confirming the position of the cluster on this chromosome based on pulse field electrophoresis data [28 (link)]. On the other end of the cluster a junction between an incomplete rDNA repeat and CCCTAA telomeric repeats [88 (link)] was detected showing that the cluster is in a subtelomeric position. Similar to the previously investigated filamentous fungi [89 (link)], 5S rRNAs were detected by comparison with the N. crassa 5S genes. They are encoded by a set of 87 genes, including 72 full-length copies dispersed in the genome. tRNAs were identified with tRNAscan [90 (link)]. A total of 361 genes encode the cytosolic tRNA set, which is composed of 48 different acceptor families containing up to 22 members. This set enabled us to decode the 61 sense codons with the classical wobble rule. Other non-coding RNAs were detected with a combination of the Erpin [91 (link)], Blast [92 (link)] and Yass [93 (link)] programs. Homology search included all RNAs contained in the RFAM V.8 [94 (link)] and ncRNAdb [95 (link)] databases. Any hit from either program with an e-value below 10-4 was retained, producing a list of 28 annotated non-coding RNA genes or elements, including 12 spliceosomal RNAs, 15 snoRNAs (mostly of the C/D box class) and one thiamine pyrophosphateriboswitch.
Free full text: Click here