Phylogenetic Analysis of Mycobacterium Tuberculosis

After consensus genomes were combined, we used snp-sites v2.5.1 to extract the variant positions, and then generated a neighbour-joining tree of all 6037 samples with IQ-TREE v2.1.4-beta [34 (link)]. The tb-profiler results were combined with the neighbour-joining tree and the L4 genomes identified. A maximum-likelihood phylogenetic tree of the L4 genomes was then derived using IQ-TREE with built-in model selection, and the inclusion of the number of invariant sites, as identified using snp-sites. TreeBreaker v1.1 [35 (link)] was used to identify internal nodes of the tree where there was a change in the distribution of phenotypes of interest at the tips that descended from that internal node. The TreeBreaker command line used was ‘treeBreaker -x 5000000 -y 5000000 -z 10 000 input.tree phenotype.txt output_prefix’. The phenotype of interest was the geographical location. To enable easy interpretation, separate TreeBreaker runs were carried out for Vietnam, Indonesia, China and Thailand, and all the preceding countries combined into a single category (i.e. a single ‘phenotype’ of belonging to either Vietnam, Indonesia, China or Thailand). TreeBreaker outputs a text file that, on the last line of the file, has a newick format phylogenetic tree with the results annotated onto the internal nodes. This newick tree was extracted from the text file and saved as a tree file. It was then converted to a nexus format tree using FigTree (ensuring to include annotations) for reading into dendropy v4.5.2 [36 (link)]. The nexus format tree was then parsed using a script (https://gist.github.com/flashton2003/50d645a60219c0e381874a1dd4355646) to produce sub-trees and summary information for nodes above the 0.5 posterior probability threshold. Example input and output files for TreeBreaker analysis can be accessed from https://doi.org/10.6084/m9.figshare.21378312. As TreeBreaker produces results annotated onto the nodes of the input phylogenetic tree, and we used the same input tree for all analyses, we could combine the results from these different runs based on the identifiers of the internal nodes. As we were using TreeBreaker as a screening tool, to identify nodes for further analysis using SIMMAP, we filtered for nodes with a posterior probability threshold of 0.5 and at least five descendent leaves. All SIMMAP analysis [37 (link)] was carried out using the make.simmap function from PhyTools [38 (link)] in the R statistical language [39 ] using RStudio [40 ]. The fit of each model type (all rates different, symmetrical and equal rates) was assessed using the fitMk function, and the model with the best fit was used for the SIMMAP analysis. We ran 1000 simulations within SIMMAP. Nodes that were identified as being associated with changes by TreeBreaker were targeted for investigation in the output of SIMMAP. Trees (newick format) were visualized with iTOL [41 (link)], and graphs drawn with ggplot2 [42 ]. The files for replicating the iTOL trees can be downloaded from https://doi.org/10.6084/m9.figshare.21378330.v1. Phylotemporal analysis was carried out using TreeTime v0.9.0 [43 (link)] with a substitution rate and standard deviation of 0.000000061643 and 0.0000000385, respectively. These values were obtained from the estimates of the ‘BEAST constant population size, uniform prior on clock rate’ analysis of Menardo et al. [44 (link)]. The command line used was ‘treetime –clock-rate 0.000000061643 –tree input.tree –dates input_dates.csv –outdir my_analysis –sequence-length 4411532 –confidence –clock-std-dev 0.0000000385’. Input data for TreeTime analysis can be found at https://doi.org/10.6084/m9.figshare.21401307.v1.

Free full text: Click here

Ashton P.M., Cha J., Anscombe C., Thuong N.T., Thwaites G.E, & Walker T.M. (2023). Distribution and origins of Mycobacterium tuberculosis L4 in Southeast Asia. Microbial Genomics, 9(2), mgen000955.

Publication 2023

Genomes Gist Nexus Phenotype Tree

Corresponding Organization :

Other organizations : University of Liverpool, University of Oxford, Oxford University Clinical Research Unit, Princeton University

Top 5 similar protocols

Variable analysis

independent variables

Geographical location (Vietnam, Indonesia, China, Thailand)
Phylogenetic tree

dependent variables

Distribution of phenotypes (geographical location) at the tips of the phylogenetic tree

control variables

Phylogenetic tree (used as input for TreeBreaker analysis)
Posterior probability threshold (0.5) and minimum number of descendent leaves (5) for selecting nodes of interest

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!