Gapped Local Alignment Motif Discovery

glam2 examines a set of sequences provided by the user, and returns an alignment of segments of these sequences. A typical alignment is shown in Figure 1. Each sequence contributes at most one segment to the alignment. Our approach assumes that a motif is defined by residue preferences at certain positions, which we call key positions. These are analogous to the “turned-on” columns of the second-generation Gibbs sampler, or to the match states of a profile HMM. In a particular motif instance, some key positions may be deleted, and residues may be inserted between key positions (Figure 1).
glam2 defines a scoring scheme for alignments such as that in Figure 1. It rewards alignment of identical or similar residues in the same key position, and penalizes deletions and insertions. However, deletions and insertions are penalized less strongly if they repeatedly occur in the same locations. This is reasonable because some locations in a motif may be more prone to deletions or insertions than others. Having defined a scoring scheme for alignments, it is straightforward to calculate the marginal score of one aligned segment: the score of the alignment including this segment minus the score of the alignment excluding this segment. These marginal scores reflect how well each segment matches the other segments.
Having defined a scoring scheme, glam2 attempts to find a motif alignment with maximum score. Even in the gapless case, the number of possible alignments is too huge to enumerate, and there is no practical algorithm to guarantee finding the optimal alignment. This problem is only exacerbated in the gapped case. Thus glam2 uses a heuristic optimisation method – simulated annealing – highly analogous to the optimisation methods of the gapless Gibbs samplers [10] (link),[27] (link).
Simulated annealing takes an initial, presumably non-optimal, alignment and repeatedly makes changes to it. These changes have an element of randomness: they generally increase the score, but sometimes decrease it, which avoids getting stuck in local optima. The process is analogous to crystallization in a cooling material. Two types of change are performed by glam2, which we call site sampling and column sampling, because they are analogous to similarly-named procedures in the original Gibbs sampler [10] (link),[24] (link). Site sampling adjusts the alignment of one sequence to the motif, using the clever stochastic traceback procedure from hmmer to efficiently sample one from all possible such alignments [22] (link). In column sampling, one key position is moved, added, or deleted. These changes are carefully designed to satisfy the reversibility and detailed balance conditions of simulated annealing (Text S1). Such changes are applied until the score fails to improve for n (e.g. 10000) changes in succession. To check that a reproducible, high-scoring motif has been found, the whole procedure is repeated r (e.g. 10) times from different random starting alignments.
glam2's behaviour can be controlled with numerous adjustable parameters. The allowed alignments can be constrained by specifying a minimum number of key positions (a), a maximum number of key positions (b), and a minimum number of segments in the alignment (z). This z parameter is a useful generalization of the OOPS (one occurrence per sequence) and ZOOPS (zero or one occurrence per sequence) modes of previous motif discovery algorithms [28] . The annealing follows a simple geometric cooling schedule with initial temperature t and cooling rate c per n changes. glam2 can find the optimal number of key positions more quickly if the initial number (w) is set to a near-optimal value. All parameters have sensible default values.
glam2scan takes a motif found by glam2, and scans it against a database of sequences. It performs short-in-long alignments of the motif against the sequences, using position-specific residue scores, deletion scores, and insertion scores, which are derived from the glam2 alignment. The highest-scoring such alignments are reported.

Free full text: Click here

Frith M.C., Saunders N.F., Kobe B, & Bailey T.L. (2008). Discovering Sequence Motifs with Arbitrary Insertions and Deletions. PLoS Computational Biology, 4(5), e1000071.

Publication 2008

Crystallization Deletion Deletions Generalization Scans Sequence alignment Simulated conditions

Corresponding Organization : National Institute of Advanced Industrial Science and Technology

Other organizations : University of Queensland

Top 5 similar protocols

Protocol cited in 23 other protocols

Variable analysis

independent variables

The number of key positions specified (a, b)
The minimum number of segments in the alignment (z)
The initial number of key positions (w)
The initial temperature (t) and cooling rate (c) of the simulated annealing process
The number of times the whole procedure is repeated from different random starting alignments (r)

dependent variables

The score of the alignment found by the simulated annealing process
The highest-scoring alignments of the motif against a sequence database found by glam2scan

control variables

The scoring scheme used to evaluate alignments
The reversibility and detailed balance conditions of the simulated annealing process

positive controls

Not mentioned

negative controls

Not mentioned

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!