Efficient HMM-HMM Alignment with Local Sequence Scoring
The Viterbi algorithm, when applied to profile HMMs, is formally equivalent to global sequence alignment with position-specific gap penalties [27 (link)]. We had previously introduced a modification of the Viterbi algorithm that is formally equivalent to Smith-Waterman local sequence alignment [15 (link)]. In HH-suite we use it to compute the best-scoring local alignment between two profile HMMs. HH-suite models MSA columns with <50% gaps (default value) by match states and all other columns as insertion states. By traversing through the states of a profile HMM, the HMM can “emit” sequences. A match state (M) emits amino acids according to the 20 probabilities of amino acids estimated from their fraction in the MSA column, plus some pseudocounts. Insert states (I) emit amino acids according to a standard amino acid background distribution, while delete states (D) do not emit any amino acids. The alignment score between two HMMs in HH-suite is the sum over all co-emitted sequences of the log odds scores for the probability for the two aligned HMMs to co-emit this sequence divided by the probability of the sequence under the background model. Since M and I states emit amino acids and D states do not, M and I in one HMM can only be aligned with M or I states in the other HMM. Conversely, a D state can only be aligned with a D state or with a Gap G (Fig. 1). The co-emission score can be written as the sum of the similarity scores of the aligned profile columns, in other words the match-match (MM) pair states, minus the position-specific penalties for indels: delete-open, delete-extend, insert-open and insert-extend.
HMM-HMM alignment of query and target. The alignment is represented as red path through both HMMs. The corresponding pair state sequence is MM, MM, MI, MM, MM, DG, MM
We denote the alignment pair states as MM, MI, IM, II, DD, DG, and GD. Figure 1 shows an example of two aligned profile HMMs. In the third column HMM q emits a residue from its M state and HMM p emits a residue from the I state. The pair state for this alignment column is MI. In column six of the alignment HMM q does not emit anything since it passes through the D state. HMM p does not emit anything either since it has a gap in the alignment. The corresponding pair state is DG. To speed up the alignment, we exclude pair states II and DD, and we only allow transitions between a pair state and itself and between pair state MM and pair states MI, IM, DG, or GD. To calculate the local alignment score, we need five dynamic programming matrices SXY, one for each pair state XY ∈{MM, MI, IM, DG, GD }. They contain the score of the best partial alignment which ends in column i of q and column j of p in pair state XY. These five matrices are calculated recursively.
Vector contains the 20 amino acid probabilities of q at position i, are the amino acid probabilities t at j, and fa denotes the background frequency of amino acid a. The score Saa measures the similarity of amino acid distributions in the two columns i and j. Sss can optionally be added to Saa. It measures the similarity of the secondary structure states of query and target HMM at i and j [15 (link)].
Steinegger M., Meier M., Mirdita M., Vöhringer H., Haunsberger S.J, & Söding J. (2019). HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics, 20, 473.
This protocol is too long. Unable to provide accurate annotations
About PubCompare
Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.
We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.
However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.
Ready to
get started?
Sign up for free.
Registration takes 20 seconds.
Available from any computer
No download required