Avian Phylogenomic Data Partitioning

For the phylogenomic dataset from birds we first removed all sites in the alignment that were removed by the original authors [43 (link)], and then defined data blocks based on each intron, and each codon position in each exon. This resulted in a total of 168 data blocks. We then performed a total of 12,002 searches for partitioning schemes on this dataset, described below.
We performed 2 searches for optimal partitioning schemes using the greedy algorithm [2 (link)]: one with the AICc, and one with the BIC.
We performed 2000 searches for optimal partitioning schemes using the strict hierarchical clustering algorithm described above. The 2000 searches comprise 1000 searches using the BIC and 1000 using the AICc, where each search used one of 1000 distinct clustering weights (the ‘--weights’ commandline option in PartitionFinder). The clustering weights are defined by a vector of four numbers that specify the relative importance of four parameter categories (the overall subset rate, the base frequencies, the GTR model parameters, and the alpha parameter of the gamma distribution; see above). Analysing 1000 sets of weights allows us to empirically compare the performance of different weighting schemes, and to determine the relative importance of the different parameter categories when searching for partitioning schemes, as well as the variation in the algorithm’s performance under different weighting schemes. The first 15 sets of weights comprise all possible combinations of setting at least one weight to 1.0, and other weights to 0.0 (setting all weights to 0.0 is nonsensical, as it would lead to all subsets appearing to be equally similar). These represent 15 of the 16 corners of a four dimensional hypercube, and allow us to compare the 15 cases where either all parameter categories are given equal weight (i.e. --weights “1, 1, 1, 1”) or where one or more parameters are given zero weight (e.g. --weights “1, 0, 0, 1”). The other 985 points were chosen using Latin Hypercube Sampling in the ‘lhs’ package, version 0.1 in R [48 ]. This procedure ensures that the sampled points are relatively evenly distributed in four-dimensional space, and is a more efficient way of sampling high-dimensional space than using a grid-based sampling scheme.
We performed 10,000 searches for optimal partitioning schemes using the relaxed clustering algorithm described above. These 10,000 searches comprised 5000 searches using the AICc, and 5000 using the BIC, each of which was performed with 1000 different clustering weights, and at 5 different values of the parameter P. The 1000 weighting schemes we used were identical to those used above, and the values of the parameter P (which defines the percentage of possible partitioning schemes that are considered at each step of the relaxed clustering algorithm) that we used were 1%, 2%, 5%, 10%, and 20%.
The results of all 12002 analyses presented here are available at figShare (http://dx.doi.org/10.6084/m9.figshare.938920).

Free full text: Click here

Lanfear R., Calcott B., Kainer D., Mayer C, & Stamatakis A. (2014). Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evolutionary Biology, 14, 82.

Publication 2014

A 002 Birds Codon Exon Gamma Intron Nonsensical Vector

Corresponding Organization :

Other organizations : National Evolutionary Synthesis Center, Australian National University, Zoological Research Museum Alexander Koenig, Heidelberg Institute for Theoretical Studies, Karlsruhe Institute of Technology

Top 5 similar protocols

Protocol cited in 19 other protocols

Variable analysis

independent variables

Partitioning schemes used in the phylogenomic analysis, including:
- Greedy algorithm with AICc
- Greedy algorithm with BIC
- Strict hierarchical clustering algorithm with 1000 BIC-based searches and 1000 AICc-based searches using 1000 distinct clustering weights
- Relaxed clustering algorithm with 5000 AICc-based searches and 5000 BIC-based searches, each using 1000 different clustering weights and 5 different values of the parameter P (1%, 2%, 5%, 10%, and 20%)

dependent variables

Optimal partitioning schemes for the phylogenomic dataset from birds

control variables

The authors first removed all sites in the alignment that were removed by the original authors [43 (link)].

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!