Exploring Phylogenetic Tree Landscapes

treespace generalizes an approach used by Amenta and Klingner (Amenta & Klingner, 2002) and later by Hillis et al. (2005), implemented as the treesetviz module for mesquite (Maddison & Maddison, 2003). This method used the Robinson–Foulds metric (Robinson & Foulds, 1979, 1981) to visualize relationships between labelled trees with identical tips in a Euclidean space. Here, we generalize this approach to any tree metric, and add the use of multiple clustering approaches to formally identify “tree islands”.
The core idea underlying tree space exploration is to map variability in tree topology or branch length onto a low‐dimensional, Euclidean space, which can then be used for visualizing relationships between the phylogenies and, potentially, to define clusters of similar trees (Figure 1). First, pairwise distances between all pairs of trees in the sample are computed (Figure 1a,b). Typically, measures of distances between trees rely on mapping each phylogeny to a vector of labelled numbers corresponding to pairwise comparisons of tips or internal nodes and then computing the Euclidean distance between the resulting vectors (Figure S1). treespace implements an extensive selection of distances relying on this principle (Kendall & Colijn, 2015; Pavoine et al., 2008; Robinson & Foulds, 1979, 1981; Steel & Penny, 1993; Williams & Clifford, 1971), as well as the BHV metric (Billera, Holmes, & Vogtmann, 2001), which directly computes distances between trees without intermediate feature extraction (Table 1).
Once pairwise distances between trees are computed, they are decomposed into a low‐dimensional space using metric multidimensional scaling (MDS), also known as principal coordinate analysis (PCoA, Gower, 1966; Dray & Dufour, 2007; Legendre & Legendre, 2012). This method finds independent (uncorrelated) synthetic variables, the “principal components” (PCs), which represent as well as possible the original distances inside a lower‐dimensional space (Figure 1c). By inspecting the proportion of the total distances between trees represented by specific axes (the “eigenvalues” of the different PCs), one can assess the number of relevant PCs to examine and, ideally, separate structured phylogenetic variation from random noise (Legendre & Legendre, 2012). Importantly, MDS can only be applied to Euclidean distances (Legendre & Legendre, 2012). In the case of non‐Euclidean tree distances (Billera et al., 2001; Robinson & Foulds, 1981), we use Cailliez's transformation (Cailliez, 1983) to render these distances Euclidean before MDS.
Exploring tree spaces using MDS allows the main features of a given phylogenetic landscape to be explored and evaluated. In particular, the resulting typology may exhibit discrete clusters of related trees (the “phylogenetic islands”), indicating that several distinct phylogenies may actually be supported by the data (Figure 1c). To identify such clusters formally, we implemented various hierarchical clustering methods based on the projected distances, including the single linkage, complete linkage, Unweighted Pair Group Method with Arithmetic Mean (UPGMA) and Ward's method (Legendre & Legendre, 2012).
This approach allows the user to seek representative trees for each cluster separately (Figure 1d). A method for selecting such representative trees is given in Kendall and Colijn (2015) and implemented in treespace as the function “medTree.” This function identifies the geometric median tree(s), which are the tree(s) closest to the mean of the Kendall–Colijn tree vectors for a given cluster. Such trees serve as alternatives to other summary tree approaches such as the consensus tree (Felsenstein, 1985) or the maximum clade credibility (MCC) tree (Drummond & Rambaut, 2007; Ronquist & Huelsenbeck, 2003), with the key advantage that they correspond to specific trees in the sample, thus avoiding implausible negative branch lengths (Heled & Bouckaert, 2013). However, given a collection of trees in a cluster, any summary approach such as MCC could be used.
All the functionalities described above are implemented in treespace as standard R functions, fully documented in a vignette tutorial, as well as in a user‐friendly web interface for interactive data analysis. This interface can be started locally (i.e. without Internet connection) from R using a simple instruction (treespaceServer()) and, therefore, demands virtually no knowledge of the R language. Alternatively, we also provide an online instance of the application at http://shiny.imperial-stats-experimental.co.uk/users/mlkendal/treespace

Free full text: Click here

Jombart T., Kendall M., Almagro‐Garcia J, & Colijn C. (2017). treespace: Statistical exploration of landscapes of phylogenetic trees. Molecular Ecology Resources, 17(6), 1385-1392.

Publication 2017

Axes Mesquite Steel Trees Vectors

Corresponding Organization : Imperial College London

Other organizations : Wellcome Centre for Human Genetics, University of Oxford

Top 5 similar protocols

Protocol cited in 37 other protocols

Variable analysis

Dependent Variables not detected.
Unable to provide accurate variables.

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!