'Positive' NB-LRR and 'negative' non-NB-LRR sequence training sets were used with the MEME Suite psp-gen script (version 4.4.0) [56 (link)] to encapsulate information about probable discriminative motifs in the positive set. Then, using the psp file as additional input, MEME was run on the positive training set to identify the 20 most significant motifs in the sequences (Table 1 ). A MAST search was then conducted on a combined dataset of all (~56 k) predicted protein models (PGSC0003DMP.pep.v3.4) and the training sets (see additional file 2 , Figure S1). DMP sequences were considered to be candidate NB-LRRs if their reported MAST E-values were lower than the least E-value for any member of the negative training set. A manual inspection of DMPs with E-values above this threshold was conducted to identify potential false negative results. Sequences that contained at least two TIR/CC-derived motifs or three NB-ARC-specific motifs were selected for further analysis as described below.
DM gene models (DMG) corresponding to the identified NB-LRR like DMPs, were extracted from 'PGSC_DM_v3.4_gene.fasta'. DMG sequences were extended by 3 kb at the 5' and 3' ends using the DM superscaffold sequences in 'PGSC0003DM.superscaffold.fa' to generate the DMG+ set of potato genes, which were translated in all six reading frames. The MAST search with the potentially discriminatory MEME models was repeated to identify potentially missing domains, and the DMG+ sequences manually curated to produce the DMP+ set of protein sequences. DM homologues to members of the positive Solanaceous training set were identified by BLASTP [26 (link)] search.
DM gene models (DMG) corresponding to the identified NB-LRR like DMPs, were extracted from 'PGSC_DM_v3.4_gene.fasta'. DMG sequences were extended by 3 kb at the 5' and 3' ends using the DM superscaffold sequences in 'PGSC0003DM.superscaffold.fa' to generate the DMG+ set of potato genes, which were translated in all six reading frames. The MAST search with the potentially discriminatory MEME models was repeated to identify potentially missing domains, and the DMG+ sequences manually curated to produce the DMP+ set of protein sequences. DM homologues to members of the positive Solanaceous training set were identified by BLASTP [26 (link)] search.
Full text: Click here