First, following prediction of the protein-coding gene set for H. contortus, each inferred amino acid sequence was assessed for conserved protein domains using InterProScan [114 (link),115 (link)], employing default settings. Second, amino acid sequences were subjected to BLASTp (e-value ≤10-5) against the following protein databases: C. elegans in WormBase [116 (link)]; Swiss-Prot and TrEMBL within UniProtKB [117 (link)]; Kinase SARfari [118 (link)] and the protein kinase database for C. elegans [119 (link)], which contains all domain information for C. elegans kinases [120 (link)]; GPCR SARfari [118 (link)]; Transporter Classification Database [121 (link),122 (link)]; KEGG [123 (link),124 (link)]; LGICs [125 (link)]; ChEMBL [126 (link)]; NCBI protein nr [127 (link)]; and an in-house RNAi machinery database for nematodes. Finally, the BLASTp results were used to infer key protein groups, including peptidases, kinases, phosphatases, GTPases, GPCRs, channel and transporter proteins, TFs, major sperm proteins, vitellogenins, SCP/TAPS proteins, and RNAi machinery proteins.
Each coding gene was assessed against the known KEGG Orthology (KO) term BLAST hits. These BLAST hits were clustered to a known protein family using the KEGG-BRITE hierarchy in a custom script. ES proteins were predicted using SignalP (version 4.0) [128 (link)] and TMHMM (version 2.0c) [122 (link),129 (link),130 (link)] and by BLASTp homology searching of the validated Signal Peptide Database [131 (link)] and of an ES database containing published proteomic data for A. suum [14 (link)], B. malayi [15 (link)]. C. elegans [116 (link)], and T. spiralis [16 (link)]. In the final annotation, proteins inferred from genes were classified based on a homology match (e-value cut-off, ≤10-5) to: (i) a curated, specialist protein database, followed by (ii) the KEGG database, followed by (iii) the Swiss-Prot database, followed by (iv) the annotated gene set for a model organism, including C. elegans, followed by (v) a recognized, conserved protein domain based on InterProScan analysis. Any inferred proteins lacking a match (e-value cut-off, ≤10-5) in at least one of these analyses were designated hypothetical proteins. The final annotated protein-coding gene set for H. contortus is available for download at WormBase [116 (link)] in nucleotide and amino acid formats.
Free full text: Click here