Three kinds of quickly identified initial alignments are exploited. The first type of initial alignment is obtained by aligning the secondary structures (SSs) of two proteins using dynamic programming (DP) (19 (link)). The element of the score matrix is assigned to be 1 or 0 depending on whether or not the SS elements of aligned residues are identical. Here, a penalty of −1 for gap-opening works the best. For a given residue, an SS state (α, β or coil) is assigned based on the Cα coordinates of five neighboring residues, i.e. ith residue is assigned as α(β) when
|dj,j+kλkα(β)|<δα(β),(j=i2,i1,ik=2,3,4)
is satisfied for all dj,j+k that denotes the Cα distance between the jth and (j + k)th residues; otherwise, it is assigned to be a coil. The final assignment is further smoothed by merging and removing singlet SS states. We note that the set of eight parameters are optimized based on 100 non-homologous training proteins by maximizing the SS assignment similarity to the DSSP definition (20 (link)), which defines protein SS elements on the basis of hydrogen bond patterns and requires the full set of backbone atomic coordinates. The optimized parameters are λ2α=5.45Å , λ3α=5.18Å , λ4α=6.37Å , δα = 2.1 Å, λ2β=6.1Å , λ3β=10.4Å , λ4β=13Å , δβ = 1.42 Å. Using Equation 1, we achieve an average Q3 accuracy of 85% with respect to the DSSP assignment for the representative 1489 non-homologous test protein set used in Ref. (8 (link)).
The second type of initial alignment is based on the gapless matching of two structures. As in SAL (18 (link)), for the smaller of the two compared proteins, we perform gapless threading against the larger structure, but rather than use RMSD as the comparison metric as was done in SAL, now the alignment with the best TM-score is selected.
The third initial alignment is also obtained by DP using a gap-opening penalty of −1, but the score matrix is a half/half combination of the SS score matrix and the distance score matrix selected in the second initial alignment.