To evaluate the ability of detection methods to detect convergent sites, we performed two types of simulation. In one type, we simulate under convergent evolution, varying the parameters of the evolutionary model (e.g., varying the number of convergent transitions). This allows us to estimate the sensitivity of the methods. In the other type, we simulate without any event of convergent evolution. This allows us to assess the specificity of the methods. In each case, we simulated 1,000 sites. To simulate convergent evolution, we aimed at placing events of convergent evolution uniformly on a species tree, irrespective of branch length. We were interested in the impact of the number of events of convergent evolution on our power to detect it and placed between two and seven events. To avoid any bias in the location of these events, in all cases, we drew uniformly exactly seven potential events, so that all events were in independent clades. From these seven events, we then subsampled the desired number of events of convergence. All branches in the clades below those events were labeled “convergent,” and all other branches (above these events and in the nonconvergent clades) labeled “ancestral.” A particular amino acid fitness profile cx was used for ancestral branches, another cy for convergent branches and we applied the OneChange model with the cy profile on the branch where the switch to the convergent phenotype was positioned. The switch was placed at the very beginning of the branch. We randomly drew amino acid profiles from the C60 model (Quang et al., 2008 (link)) (supplementary fig. S1, Supplementary Material online) and did not attempt to test all pairs of C60 profiles in order to save computation time and slightly reduce our carbon footprint. We also performed additional simulations where more than one profile was used on branches with the ancestral phenotype (supplementary figs. S8–S10, Supplementary Material online). Although C60 was built to describe amino acid sequence evolution in a time-homogeneous manner, we assume that this limited set of profiles provides a rough approximation to the set of possible amino acid profiles. In addition to the simulations with convergent events that we used to measure the proportion of True Positives (TP) and False Negatives (FN) of the methods, we performed similar simulations (i.e., using the same trees) where the ancestral profile is used for all branches of the phylogeny, to measure their proportion of True Negative (TN) and False Positive (FP).
Sequence evolution was simulated along the phylogenetic tree using the model associated to each branch, with rate heterogeneity across sites according to a Gamma distribution discretized in four classes (Yang, 1994 (link)) with the α parameter set to 1.0, using bppseqgen (Dutheil and Boussau, 2008 (link)).