Error rates and the power of the tests reported in the previous sections were derived using sequence data simulated under the following protocol. We have used trees, base frequencies, branch lengths (assuming neutral evolution), and nucleotide substitution biases fitted to the two ZA RT sample from our study, with 74 sequences each to simulate 100 (200 codons in each). A neighbor joining tree (using the Tamura–Nei distance metric [67 (link)]) was reconstructed from each data replicate and used for further inference, allowing us to investigate whether the power and error rates of the tests were unduly influenced by errors in phylogenetic reconstruction. Previous studies [22 (link)] and simulations results presented here (Figure S7) suggest that fixed effect likelihood methods are able to infer site-specific substitution rates accurately, on average, with moderate smoothing effects for larger rates (due to a fairly small sample size). With that in mind, we set out to generate sequences under a distribution of substitution rates that is similar to those which have influenced our real samples. Having fitted the IFEL model (and thus three rates: αs,
, and
) to all four samples, we pooled each type of estimated rates into the following seven bins: [0,0.25), [0.25,0.5), [0.5,1), [1,1.5), [1.5,2.0), [2.0,4.0), and [4.0,∞), and represented each bin with its midpoint (except the final bin, which was represented by 8). For each codon, we drew αs,
, and
from the appropriate estimated rate distribution (also shown in Figure S8). Sampling from distributions with identical supports ensured that a sufficient proportion of sites was generated under the null distribution (e.g.,
for IFEL and
). For the evaluation of the differential selection test, we picked successive pairs of simulations (1−2, 2−3, 3−4, … , 99−100) for a total of 99 runs of the analysis.
Free full text: Click here