The Viterbi algorithm, when applied to profile HMMs, is formally equivalent to global sequence alignment with position-specific gap penalties [27 (link)]. We had previously introduced a modification of the Viterbi algorithm that is formally equivalent to Smith-Waterman local sequence alignment [15 (link)]. In HH-suite we use it to compute the best-scoring local alignment between two profile HMMs.
HH-suite models MSA columns with <50% gaps (default value) by match states and all other columns as insertion states. By traversing through the states of a profile HMM, the HMM can “emit” sequences. A match state (M) emits amino acids according to the 20 probabilities of amino acids estimated from their fraction in the MSA column, plus some pseudocounts. Insert states (I) emit amino acids according to a standard amino acid background distribution, while delete states (D) do not emit any amino acids.
The alignment score between two HMMs in HH-suite is the sum over all co-emitted sequences of the log odds scores for the probability for the two aligned HMMs to co-emit this sequence divided by the probability of the sequence under the background model. Since M and I states emit amino acids and D states do not, M and I in one HMM can only be aligned with M or I states in the other HMM. Conversely, a D state can only be aligned with a D state or with a Gap G (Fig. 1). The co-emission score can be written as the sum of the similarity scores of the aligned profile columns, in other words the match-match (MM) pair states, minus the position-specific penalties for indels: delete-open, delete-extend, insert-open and insert-extend.

HMM-HMM alignment of query and target. The alignment is represented as red path through both HMMs. The corresponding pair state sequence is MM, MM, MI, MM, MM, DG, MM

We denote the alignment pair states as MM, MI, IM, II, DD, DG, and GD. Figure 1 shows an example of two aligned profile HMMs. In the third column HMM q emits a residue from its M state and HMM p emits a residue from the I state. The pair state for this alignment column is MI. In column six of the alignment HMM q does not emit anything since it passes through the D state. HMM p does not emit anything either since it has a gap in the alignment. The corresponding pair state is DG. To speed up the alignment, we exclude pair states II and DD, and we only allow transitions between a pair state and itself and between pair state MM and pair states MI, IM, DG, or GD.
To calculate the local alignment score, we need five dynamic programming matrices SXY, one for each pair state XY ∈{MM, MI, IM, DG, GD }. They contain the score of the best partial alignment which ends in column i of q and column j of p in pair state XY. These five matrices are calculated recursively.
SMMi,j=Saaqip,tjp+Sssqiss,tjss+max0(for local alignment)SMM(i1,j1)+logqi1(M,M)tj1(M,M)SMI(i1,j1)+logqi1(M,M)tj1(I,M)SII(i1,j1)+logqi1(I,M)tj1(M,M)SDG(i1,j1)+logqi1(D,M)tj1(M,M)SGD(i1,j1)+logqi1M,Mtj1(D,M)
SMIi,j=maxSMM(i1,j)+logqi1(M,M)tj(D,D)SMI(i1,j)+logqi1(M,M)tj(I,I)
SDGi,j=maxSMM(i1,j)+logqi1(D,M)SDG(i1,j)+logqi1(D,D)
Saaqip,tjp=loga=120qip(a)tjp(a)fa
Vector qip contains the 20 amino acid probabilities of q at position i, tjp are the amino acid probabilities t at j, and fa denotes the background frequency of amino acid a. The score Saa measures the similarity of amino acid distributions in the two columns i and j. Sss can optionally be added to Saa. It measures the similarity of the secondary structure states of query and target HMM at i and j [15 (link)].
Free full text: Click here