Efficient Genome Segmentation Using MJSD

In building the training set, we used an advanced method based on Markovian Jensen–Shannon divergence (MJSD) to obtain the core (native) components of all available prokaryotic genomes to ensure the most balanced representation was used in our regression. We were able to significantly reduce the runtime of genome segmentation and clustering algorithm, as implemented in IslandCafe [27 (link)], by introducing a reverse-calculation step during recursive segmentation. MJSD, entropy, and statistical significance were calculated as described in [27 (link)]. Specifically, the information content of a genome sequence, quantified by the entropy function for probability distribution p_i, is obtained as,

H^{m} (p_{i}) = - \sum_{w} P (w) \sum_{x \in A} P (x | w) {log}_{2} P (x | w)

, where P(x|w) is the probability of nucleotide x given the preceding oligonucleotide w of length m (m defines the model order, is set to 2 in IslandCafe) and P(w) is the probability of oligonucleotide w. A genome is initially segmented by iterating the computation of entropy and thus MJSD at each position along the genome and identifying the location of highest MJSD of (user-defined) significance in the genome. This process is then iterated for the resulting genomic segments.

Free full text: Click here

Burks D.J., Pusadkar V, & Azad R.K. (2023). POSMM: an efficient alignment-free metagenomic profiler that complements alignment-based profiling. Environmental Microbiome, 18, 16.

Publication 2023

Entropy Genomes components Genomic Nucleotide Oligonucleotide Prokaryotic

Corresponding Organization : University of North Texas

Top 5 similar protocols

Variable analysis

independent variables

Markovian Jensen–Shannon divergence (MJSD)
Entropy
Statistical significance

dependent variables

Genome segmentation
Clustering algorithm
Runtime

control variables

Genome sequence information content quantified by the entropy function
Probability distribution p_i
Probability of nucleotide x given the preceding oligonucleotide w of length m (m defines the model order, is set to 2 in IslandCafe)
Probability of oligonucleotide w

controls

No positive or negative controls explicitly mentioned

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!