Phylogenetic Analysis of Bacterial Genomes

Homologs of each of the 31 phylogenetic marker genes were identified from the 578 complete bacterial genomes by BLASTP searches (using marker sequences of Escherichia coli as query sequences and a cut-off E-value of 0.1) followed by HMMer searches (cut-off E-value 1 × e^-10). The corresponding protein sequences were retrieved, aligned, and trimmed as described above, and then concatenated by species into a mega-alignment. A maximum likelihood tree was then constructed from the mega-alignment using PHYML [35 (link)]. The model selected based on the likelihood ratio test was the WAG model of amino acid substitution with γ-distributed rate variation (five categories) and a proportion of invariable sites. The shape of the γ-distribution and the proportion of the invariable sites were estimated by the program.
To speed up bootstrapping analyses, very closely related taxa were removed from the original mega-alignment, which left us with 310 taxa. Maximum likelihood trees were made from 100 bootstrapped replicates of this reduced dataset using PHYML with the same parameters described above.
With very few exceptions, the marker genes are single-copy genes in all of the bacterial genomes analyzed. In those rare cases in which two or more homologs were identified within a single species, a tree-guided approach was used to resolve the redundancy. If the redundancy resulted from a species-specific duplication event, then one homolog was randomly chosen as the representative. In all other cases, to avoid potential complications such as lateral gene transfer, we excluded that marker and treated it as 'missing' in that particular genome. It has been shown that as long as there is sufficient data, a few 'holes' in the dataset will not compromise the resulting tree [36 (link)].

Free full text: Click here

Wu M, & Eisen J.A. (2008). A simple, fast, and accurate method of phylogenomic inference. Genome Biology, 9(10), R151.

Publication 2008

Amino acid substitution Bacterial genes Bacterial genomes Escherichia coli Genes Genes marker Genome Lateral gene transfer Protein sequences Tree

Corresponding Organization :

Other organizations : University of California, Davis

Top 5 similar protocols

Protocol cited in 55 other protocols

Variable analysis

independent variables

BLASTP searches (using marker sequences of Escherichia coli as query sequences and a cut-off E-value of 0.1)
HMMer searches (cut-off E-value 1 × e^-10)

dependent variables

Homologs of each of the 31 phylogenetic marker genes identified from the 578 complete bacterial genomes
Protein sequences retrieved, aligned, and trimmed
Mega-alignment constructed
Maximum likelihood tree constructed from the mega-alignment using PHYML

control variables

Proportion of invariable sites and shape of the γ-distribution estimated by the program
Very closely related taxa removed from the original mega-alignment to speed up bootstrapping analyses

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!