Benchmarking Gene Duplication Inference

The tests for gene duplication event inference accuracy were performed on the simulated “flies” and “primates” dataset from [32 (link)] and a simulated “metazoa” dataset from [34 (link)]. To model real data, the flies and primate datasets used known species trees, parameters for divergence times, duplication rates, loss rates, population sizes, and generation times. Trees were simulated with varying effective population sizes and duplication rates so as to model incomplete lineage sorting [32 (link), 34 (link)]. The flies dataset consisted of 12,000 trees with 12 species and 12,032 gene duplication events. The primates dataset consisted of 7500 trees with 17 species and 16,066 gene duplication events. The metazoa dataset intended to emulate the complexity of real data by using heterogeneity in rates of duplication and loss, a complex model of sequence evolution, and then inferring trees with a homogenous, simple model [34 (link)]. It consisted of 2000 gene trees with 40 species and 4967 gene duplication events. For comparison, Forester [29 (link)], DLCpar (full), DLCpar (search) [32 (link)], and the overlap algorithm (i.e., without OrthoFinder’s tree resolution) were also tested.
All methods were provided with the input rooted gene tree and, where appropriate, the rooted species tree (Forester and DLCpar). No other parameters required specification for any of the other methods. The rooted gene trees were provided as part of the simulated data for the flies and primates datasets. Multiple sequence alignment (MSA) files were provided for the metazoa dataset. For this dataset, gene trees were inferred from the MSAs using FastTree so as to also include a potential level of tree inference error and were rooted with reconroot [32 (link)]. The OrthoFinder rooting algorithm was not used so as to avoid inadvertently biasing the results in favor of OrthoFinder. All methods were provided with the same input rooted gene trees. The complete set of gene duplication events identified by each of the methods was compared against the ground truth gene duplication events. An inferred gene duplication was identified as correct if the two sets of genes observed post-duplication exactly matched the two sets of genes post-duplication from the ground truth data.
The performance testing of the methods for identifying gene duplication events was performed on the orthogroup trees from the 4- to 128-species Fungi datasets as inferred by OrthoFinder with default parameters. The commands for Forester and DLCpar were run in parallel using GNU Parallel [42 ] using 16 threads on these gene trees. The OrthoFinder method was run via the “scripts/resolve.py” program included as part of the OrthoFinder distribution. To allow testing, the species-overlap method was also implemented in OrthoFinder and was run using the same program with the option “--no_resolve.”

Free full text: Click here

Emms D.M, & Kelly S. (2019). OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology, 20, 238.

Publication 2019

Evolution Flies Fungi Gene Gene duplication Gene tests Heterogeneity Homogenous Metazoa Primates Sequence alignment Trees

Corresponding Organization :

Other organizations : University of Oxford

Top 5 similar protocols

Protocol cited in 64 other protocols

Variable analysis

independent variables

Varying effective population sizes
Varying duplication rates

dependent variables

Gene duplication event inference accuracy

control variables

Known species trees
Parameters for divergence times
Parameters for duplication rates
Parameters for loss rates
Parameters for population sizes
Parameters for generation times
Rooted gene trees
Rooted species tree (for Forester and DLCpar)

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!