The P. ultimum genome annotations were created using the MAKER program [110 (link)]. The program was configured to use both spliced EST alignments as well as single exon ESTs greater than 250 bp in length as evidence for producing hint-based gene predictions. MAKER was also set to filter out gene models for short and partial gene predictions that produce proteins with fewer than 28 amino acids. The MAKER pipeline was set to produce ab initio gene predictions from both the repeat-masked and unmasked genomic sequence using SNAP [111 (link)], FGENESH [112 (link)], and GeneMark [113 (link)]. Hint-based gene predictions were derived from SNAP and FGENESH.
The EST sequences used in the annotation process were derived from Sanger and 454 sequenced P. ultimum DAOM BR144 ESTs [31 (link)] considered together with ESTs from dbEST [114 (link)] for Aphanomyces cochlioides, Phytophthora brassicae, Phytophthora capsici, Phytophthora parasitica, Ph. sojae, Ph. infestans, and Pythium oligandrum. Protein evidence was derived from the UniProt/Swiss-Prot protein database [115 (link),116 (link)] and from predicted proteins for Ph. infestans [28 (link)], Ph. ramorum [27 (link)], and Ph. sojae [27 (link)]. Repetitive elements were identified within the MAKER pipeline using the Repbase repeat library [117 (link)] and RepeatMasker [45 (link)] in conjunction with a MAKER internal transposable element database [118 (link)] and a P. ultimum specific repeat library prepared for this work (created using PILER [119 (link)] with settings suggested in the PILER documentation). Ab initio gene predictions and hint-based gene predictions [110 (link)] were produced within the MAKER pipeline using FGENESH trained for Ph. infestans, GeneMark trained for P. ultimum via internal self-training, and SNAP trained for P. ultimum from a conserved gene set identified by CEGMA [110 (link)].
Following the initial MAKER run, a total of 14,967 genes encoding 14,999 transcripts were identified, each of which were supported by homology to a known protein or had at least one splice site confirmed by EST evidence. Additional ab initio gene predictions not overlapping a MAKER annotation were scanned for protein domains using InterProScan [120 (link)-122 (link)]. This process identified an additional 323 gene predictions; these were added to the annotation set, producing a total of 15,290 genes encoding 15,322 transcripts (referred to as v3). Selected genes within the MAKER produced gene annotation set were manually annotated using the annotation-editing tool Apollo [123 (link)]. The final annotation set (v4) contained 15,297 genes encoding 15,329 transcripts, including six rRNA transcripts.
Putative functions were assigned to each predicted P. ultimum protein using BLASTP [124 (link)] to identify the best homologs from the UniProt/Swiss-Prot protein database and/or through manual curation. Additional functional annotations include molecular weight and isoelectric point (pI) calculated using the pepstats program from the EMBOSS package [125 (link)], subcellular localization predicted with TargetP using the non-plant network [126 (link)], prediction of transmembrance helices via TMHMM [127 (link)], and PFAM (v23.0) families using HMMER [128 ] in which only hits above the trusted cutoff were retained. Expert annotation of carbohydrate-related enzymes was performed using the Carbohydrate-Active Enzyme database (CAZy) annotation pipeline [68 (link)].
The EST sequences used in the annotation process were derived from Sanger and 454 sequenced P. ultimum DAOM BR144 ESTs [31 (link)] considered together with ESTs from dbEST [114 (link)] for Aphanomyces cochlioides, Phytophthora brassicae, Phytophthora capsici, Phytophthora parasitica, Ph. sojae, Ph. infestans, and Pythium oligandrum. Protein evidence was derived from the UniProt/Swiss-Prot protein database [115 (link),116 (link)] and from predicted proteins for Ph. infestans [28 (link)], Ph. ramorum [27 (link)], and Ph. sojae [27 (link)]. Repetitive elements were identified within the MAKER pipeline using the Repbase repeat library [117 (link)] and RepeatMasker [45 (link)] in conjunction with a MAKER internal transposable element database [118 (link)] and a P. ultimum specific repeat library prepared for this work (created using PILER [119 (link)] with settings suggested in the PILER documentation). Ab initio gene predictions and hint-based gene predictions [110 (link)] were produced within the MAKER pipeline using FGENESH trained for Ph. infestans, GeneMark trained for P. ultimum via internal self-training, and SNAP trained for P. ultimum from a conserved gene set identified by CEGMA [110 (link)].
Following the initial MAKER run, a total of 14,967 genes encoding 14,999 transcripts were identified, each of which were supported by homology to a known protein or had at least one splice site confirmed by EST evidence. Additional ab initio gene predictions not overlapping a MAKER annotation were scanned for protein domains using InterProScan [120 (link)-122 (link)]. This process identified an additional 323 gene predictions; these were added to the annotation set, producing a total of 15,290 genes encoding 15,322 transcripts (referred to as v3). Selected genes within the MAKER produced gene annotation set were manually annotated using the annotation-editing tool Apollo [123 (link)]. The final annotation set (v4) contained 15,297 genes encoding 15,329 transcripts, including six rRNA transcripts.
Putative functions were assigned to each predicted P. ultimum protein using BLASTP [124 (link)] to identify the best homologs from the UniProt/Swiss-Prot protein database and/or through manual curation. Additional functional annotations include molecular weight and isoelectric point (pI) calculated using the pepstats program from the EMBOSS package [125 (link)], subcellular localization predicted with TargetP using the non-plant network [126 (link)], prediction of transmembrance helices via TMHMM [127 (link)], and PFAM (v23.0) families using HMMER [128 ] in which only hits above the trusted cutoff were retained. Expert annotation of carbohydrate-related enzymes was performed using the Carbohydrate-Active Enzyme database (CAZy) annotation pipeline [68 (link)].
Full text: Click here