We used the program Bayesian Serial SimCoal (BSSC [22] (link)) to simulate DNA sequence data under different structural and demographic models. We simulated a 600bp fragment of the mitochondrial D-loop, commonly used in BSP studies due to its high nucleotide diversity. The sequences were set to evolve according to a HKY model with kappa = 50, gamma distributed rate heterogeneity (shape parameter 0.5) and a rate of 32% per million years per bp [6] (link) (note that this rate is subject to estimation uncertainty, but here it serves to provide a conversion between genetic distance and real time), equivalent to 0.001344 mutations per sequence per generation (using the estimated buffalo generation time of 7 years [20] (link)). We emulated the actual marker used in the buffalo case study in the simulations to facilitate comparisons, and because BSPs are almost always used in the context of dated genealogies with time measured in years. For all scenarios, we carried out 100 replicate simulations to incorporate coalescent stochasticity [23] and identify general patterns across stochastic replicates of the same demographic history. Essentially, this corresponds to simulating 100 non-linked genetic markers with the high information content of the D-loop. We were thus able to assess the performance of multi-locus inference and ensure that our conclusions were not limited by the use of a single locus. This makes our results more comparable to multi-locus data that are likely to become common in the genomic era. Two example input files for BSSC are supplied to show the details of our simulations (File S1 and File S2).
Free full text: Click here