Apart from disorder predictors, many other bioinformatics tools yield implicit or explicit information about order and disorder. In the course of a variety of other protein sequence analysis projects, we realized that there is a clear correlation between the disorder in the target protein sequence, and the presence of gaps in alignments to structurally characterized templates calculated by the protein fold-recognition methods. Although the implementation of a method utilizing this type of information may seem trivial, it was not so straightforward to deal with different types of fold recognition methods. In other words, it was not so obvious which method should be used or, if many methods were used, how to rank them. Additionally, a template-matching method should be able to take into account the fact that matches to homologous proteins have different reliability and in some cases homologous sequences cannot be found. To address all these questions, we compared the results from arbitrary chosen fold recognition methods that were relatively fast and performed well in the framework of CASP: HHSEARCH, FFAS, mGenThreader, PSI-BLAST, PHYRE, and PCONS5 (see Methods for details and references). To optimize the weights assigned to individual methods depending on the alignment quality we used a genetic algorithm implemented in Pyevolve [47 (link)]. The fitness function of the genetic algorithm was designed as a one-dimensional vector of length 24 (8 methods mentioned above multiplied by 3 thresholds for well-, moderately- and poorly-scored templates; see Table 4 for details of the thresholds used). In this way, the weights for all methods were obtained, for the further incorporation into a combined template-matching method. The resulting predictor was tested in CASP9 as a group number 421 (GSmetaDisorder3D).
Free full text: Click here