The CCTOP method has three main steps: removing cleavable parts of a target sequence, TMP filtering and topology prediction. Signal peptide segments are often mistaken with TMHs by transmembrane topology prediction algorithms, therefore a preceding analysis of these segments was applied. CCTOP uses SignalP 4.0 (34 (link)) to cleave signal peptides; however, this step can be ignored, if a homologous protein in the TOPDB database (35 (link),36 (link)) has contradictory evidence.
After removing cleavable segments the next step is to distinguish transmembrane and globular proteins. For this task a simple voting system is applied on the results of Phobius (37 (link)), Scampi-single (38 (link)) and TMHMM (2 (link),39 (link)) algorithms. If any two of these methods predict transmembrane segment(s), the protein is classified as TMP.
A variety of methods was taken into account for the consensus topology prediction, regarding both the training set and the utilized algorithm. Ten methods were selected based on their availability and performance on different benchmark sets: HMMTOP (28 (link),40 (link)), MemBrain (41 (link)), MEMSAT-SVM (42 ), Octopus (43 (link)), Philius (44 (link)), Phobius (37 (link)), Pro- and Prodiv-TMHMM (45 (link)), Scampi-MSA (38 (link)) and TMHMM (2 (link),39 (link)). The prediction results of these methods are used as constraints in the same hidden Markov model that was used by HMMTOP but with different weights. The weights depend on the accuracy of each method, measured on a benchmark set collected for the Human Transmembrane Proteome database (3 ). To further improve the prediction accuracy for each query, its homologous structures from PDBTM (4 (link)–6 (link)), experiments of homologous sequences from TOPDB (35 (link),36 (link)) and conservatively localized domains and motifs from TOPDOM (46 (link)) recognized in the query sequence are collected automatically and all these information is incorporated into a probabilistic framework provided by a hidden Markov model as described in Bagos et al. (47 ). A formalized and more detailed description of the algorithm is available in our earlier paper (3 ) and on the home page of the CCTOP server.
After removing cleavable segments the next step is to distinguish transmembrane and globular proteins. For this task a simple voting system is applied on the results of Phobius (37 (link)), Scampi-single (38 (link)) and TMHMM (2 (link),39 (link)) algorithms. If any two of these methods predict transmembrane segment(s), the protein is classified as TMP.
A variety of methods was taken into account for the consensus topology prediction, regarding both the training set and the utilized algorithm. Ten methods were selected based on their availability and performance on different benchmark sets: HMMTOP (28 (link),40 (link)), MemBrain (41 (link)), MEMSAT-SVM (42 ), Octopus (43 (link)), Philius (44 (link)), Phobius (37 (link)), Pro- and Prodiv-TMHMM (45 (link)), Scampi-MSA (38 (link)) and TMHMM (2 (link),39 (link)). The prediction results of these methods are used as constraints in the same hidden Markov model that was used by HMMTOP but with different weights. The weights depend on the accuracy of each method, measured on a benchmark set collected for the Human Transmembrane Proteome database (3 ). To further improve the prediction accuracy for each query, its homologous structures from PDBTM (4 (link)–6 (link)), experiments of homologous sequences from TOPDB (35 (link),36 (link)) and conservatively localized domains and motifs from TOPDOM (46 (link)) recognized in the query sequence are collected automatically and all these information is incorporated into a probabilistic framework provided by a hidden Markov model as described in Bagos et al. (47 ). A formalized and more detailed description of the algorithm is available in our earlier paper (3 ) and on the home page of the CCTOP server.