The EST sequences used in the annotation process were derived from Sanger and 454 sequenced P. ultimum DAOM BR144 ESTs [31 (link)] considered together with ESTs from dbEST [114 (link)] for Aphanomyces cochlioides, Phytophthora brassicae, Phytophthora capsici, Phytophthora parasitica, Ph. sojae, Ph. infestans, and Pythium oligandrum. Protein evidence was derived from the UniProt/Swiss-Prot protein database [115 (link),116 (link)] and from predicted proteins for Ph. infestans [28 (link)], Ph. ramorum [27 (link)], and Ph. sojae [27 (link)]. Repetitive elements were identified within the MAKER pipeline using the Repbase repeat library [117 (link)] and RepeatMasker [45 (link)] in conjunction with a MAKER internal transposable element database [118 (link)] and a P. ultimum specific repeat library prepared for this work (created using PILER [119 (link)] with settings suggested in the PILER documentation). Ab initio gene predictions and hint-based gene predictions [110 (link)] were produced within the MAKER pipeline using FGENESH trained for Ph. infestans, GeneMark trained for P. ultimum via internal self-training, and SNAP trained for P. ultimum from a conserved gene set identified by CEGMA [110 (link)].
Following the initial MAKER run, a total of 14,967 genes encoding 14,999 transcripts were identified, each of which were supported by homology to a known protein or had at least one splice site confirmed by EST evidence. Additional ab initio gene predictions not overlapping a MAKER annotation were scanned for protein domains using InterProScan [120 (link)-122 (link)]. This process identified an additional 323 gene predictions; these were added to the annotation set, producing a total of 15,290 genes encoding 15,322 transcripts (referred to as v3). Selected genes within the MAKER produced gene annotation set were manually annotated using the annotation-editing tool Apollo [123 (link)]. The final annotation set (v4) contained 15,297 genes encoding 15,329 transcripts, including six rRNA transcripts.
Putative functions were assigned to each predicted P. ultimum protein using BLASTP [124 (link)] to identify the best homologs from the UniProt/Swiss-Prot protein database and/or through manual curation. Additional functional annotations include molecular weight and isoelectric point (pI) calculated using the pepstats program from the EMBOSS package [125 (link)], subcellular localization predicted with TargetP using the non-plant network [126 (link)], prediction of transmembrance helices via TMHMM [127 (link)], and PFAM (v23.0) families using HMMER [128 ] in which only hits above the trusted cutoff were retained. Expert annotation of carbohydrate-related enzymes was performed using the Carbohydrate-Active Enzyme database (CAZy) annotation pipeline [68 (link)].