Probabilistic Modeling for Accurate Gene Prediction

AUGUSTUS is based on a generalized hidden Markov model (GHMM), which defines probability distributions for the various sections of genomic sequences. Introns, exons, intergenic regions, etc. correspond to states in the model and each state is thought to create DNA sequences with certain pre-defined emission probabilities. Similar to other HMM-based gene finders, AUGUSTUS finds an optimal parse of a given genomic sequence, i.e. a segmentation of the sequences into states that is most likely according to the underlying statistical model. We probabilistically model the sequence around the splice sites, the sequence of the branch point region, the bases before the translation start, the coding regions and non-coding regions, the first coding bases of a gene, the length distribution of single exons, initial exons, internal exons, terminal exons, intergenic regions, the distribution of the number of exons per gene and the length distribution of introns.
The performance of AUGUSTUS has been extensively evaluated on sequence data from human and Drosophila (7 ,8 (link)) (). These studies showed that, especially for long input sequences, the accuracy of our program is superior to that of existing ab initio gene finding approaches. To make our tool available to the research community, we have set up a WWW server at GOBICS (Göttingen Bioinformatics Compute Server) (9 (link)).
AUGUSTUS may be forced to predict an exon, an intron, a splice site, a translation start or a translation end point at a certain position in the sequence. An arbitrary number of such constraints is allowed and supported types of constraints are given in Table 1.
With the term gene structure, we refer to a segmentation of the input sequence into any meaningful sequence of exons, introns and intergenic regions. This includes the possibility of having no genes at all or of having multiple genes. AUGUSTUS tries to predict a gene structure that

is (biologically) consistent in the following sense:

No exon contains an in-frame stop codon.

The splice sites obey the gt–ag consensus. All complete genes start with atg and end with a stop codon.

Each gene ends before the next gene starts.

The lengths of single exons and introns exceed a species-dependent minimal length.

That obeys all given constraints.

Among all gene structures that are consistent and that obey all constraints, AUGUSTUS finds the most likely gene structure. A constraint may contradict the biological consistency. For example, an exonpart constraint may be impossible to realize because there is no containing open reading frame with allowed exon boundaries. If no consistent gene structure is possible, which obeys all constraints, then some constraints are ignored. Also, if two or more constraints contradict each other, then AUGUSTUS obeys only that constraint that fits better to the model. Figure 1 illustrates the concept. Further examples are on the page .

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Stanke M, & Morgenstern B. (2005). AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Research, 33(Web Server issue), W465-W467.

Publication 2005

Biological Drosophila Exon Fits Frame Gene Gene structure Genomic Human Intergenic regions Introns Multiple genes Stop codon

Top 5 similar protocols

Protocol cited in 476 other protocols

Variable analysis

independent variables

Not explicitly mentioned

dependent variables

Not explicitly mentioned

control variables

Not explicitly mentioned

controls

No positive or negative controls specified by the authors.

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!