All methods were provided with the input rooted gene tree and, where appropriate, the rooted species tree (Forester and DLCpar). No other parameters required specification for any of the other methods. The rooted gene trees were provided as part of the simulated data for the flies and primates datasets. Multiple sequence alignment (MSA) files were provided for the metazoa dataset. For this dataset, gene trees were inferred from the MSAs using FastTree so as to also include a potential level of tree inference error and were rooted with reconroot [32 (link)]. The OrthoFinder rooting algorithm was not used so as to avoid inadvertently biasing the results in favor of OrthoFinder. All methods were provided with the same input rooted gene trees. The complete set of gene duplication events identified by each of the methods was compared against the ground truth gene duplication events. An inferred gene duplication was identified as correct if the two sets of genes observed post-duplication exactly matched the two sets of genes post-duplication from the ground truth data.
The performance testing of the methods for identifying gene duplication events was performed on the orthogroup trees from the 4- to 128-species Fungi datasets as inferred by OrthoFinder with default parameters. The commands for Forester and DLCpar were run in parallel using GNU Parallel [42 ] using 16 threads on these gene trees. The OrthoFinder method was run via the “scripts/resolve.py” program included as part of the OrthoFinder distribution. To allow testing, the species-overlap method was also implemented in OrthoFinder and was run using the same program with the option “--no_resolve.”