EPA-PTP: Open Reference Species Delimitation

In the following, we describe the open reference species delimitation pipeline that combines the EPA with the PTP (EPA-PTP). The EPA initially places a large number of query sequences (short reads) into the branches of a given reference phylogeny. Thereafter, we execute PTP separately and independently for the query sequences assigned to each branch. This allows to annotate the branches of the reference tree by the number of species induced by the query sequences that were placed into each branch. The input of our pipeline is a reference alignment where each sequence represents one species and a reference phylogeny for that alignment. The PTP method and the pipeline are implemented in Python and rely on the python Environment for Tree Exploration package (Huerta-Cepas et al., 2010 (link)) for tree manipulation and visualization.
Our pipeline executes the following steps:

Run UCHIME (Edgar et al., 2011 (link)) against the reference alignment to remove chimeric query sequences.

Use EPA to place the query sequences onto the reference tree. Sequences that have a maximum placement likelihood weight of <0.5 (i.e. an uncertain placement, see Berger et al., 2011 (link) for details) are discarded.

For each branch in the reference tree, we extract the set of query sequences that have been placed into that branch and infer a tree on them using RAxML (Stamatakis, 2006 (link)). Because the PTP method requires a correctly rooted tree, we use the following two rooting strategies: if the branch leads to a tip, apart from the query sequences, we extend the alignment by including the reference tree tip sequence and that reference sequence that is furthest away from the current tip. The most distant sequence is used as outgroup. Keep in mind, that thereby the tree will be rooted at the longest branch (see the discussion below). To analyze query sequence placements at internal branches, we use the RAxML −g constraint tree option to obtain a rooted tree of the query sequences. The constraint tree consists of the bifurcating reference tree and a polytomy comprising the query sequences attached to the reference tree branch under consideration. The result of this constrained ML tree search is a resolved tree of query sequences that are attached to the reference tree branch. The attachment point is used as root.

Because we assume that the reference phylogeny is a species tree that reflects our knowledge about the speciation process and rate, we initially estimate only once on the reference phylogeny. Thereafter, we apply PTP to each query sequence (one for each branch of the reference phylogeny) tree to delimit species. Note that in this scenario we will only need to estimate , as remains fixed.

When PTP is applied to a placement of query sequences on a terminal branch, those queries that are delimited as one population with the reference sequence at the tip will be assigned taxonomically to the species represented by this reference sequence. Otherwise, they are identified as new species in the reference tree.

As mentioned previously, we also combined EPA with CROP (EPA-CROP). The method works as EPA-PTP, with the only difference that CROP is used instead of PTP to calculate the number of MOTUs for each placement.

Free full text: Click here

Zhang J., Kapli P., Pavlidis P, & Stamatakis A. (2013). A general species delimitation method with applications to phylogenetic placements. Bioinformatics, 29(22), 2869-2876.

Publication 2013

Chimeric Crop Keep Python Root Sequence placements Tree

Corresponding Organization :

Other organizations : University of Lübeck, Heidelberg Institute for Theoretical Studies, University of Crete, Hella (Germany), Foundation for Research and Technology Hellas

Top 5 similar protocols

Protocol cited in 414 other protocols

Variable analysis

independent variables

Choice of pipeline (EPA-PTP or EPA-CROP)

dependent variables

Number of species induced by the query sequences placed into each branch of the reference tree

control variables

Reference alignment where each sequence represents one species
Reference phylogeny for the reference alignment

positive controls

None specified

negative controls

None specified

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!