The multiple sequence alignments are the only source of information used in the predictions. Predictions are best for accurate, nonredundant alignments of diverse sequences without significant gap regions. In the interface prediction tests, we used alignments from the 'Superfamily' [33 (link)] and PFAM [34 (link)] collections, as well as the Homology-Derived Secondary Structure of Proteins database [35 (link)] and curated alignments of human protein kinases [36 (link)] from the Protein Kinase Resource [37 (link)]. As needed, the original alignments were prepared for specificity analysis by trimming deletions and insertions across the whole alignment so as to preserve the continuity of the main sequence (the sequence of a given protein); removing redundant sequences (typically at the level of about 95% identical residues for large alignments) using the MView program [38 (link),39 ]; and removing sequences with many gaps (for example, with more than about 10% to 20% gaps compared with the main sequence). Finally, the total number of sequences in the alignment must be large (>100).
Free full text: Click here