Inferring Demographic Parameters from DNA Sequences

We have tested our program ability to recover demographic parameters from DNA sequence data in four relatively plausible but distinct scenarios of population differentiation involving one to ten populations with migration (see Figure 1). In all cases, we simulated with fastsimcoal2 400,000 unlinked regions of 50 bp, thus totaling 20 Mb of DNA sequences, assuming a mutation rate of 2.5×10⁻⁸ bp⁻¹ per generation and an infinite-site model. Pseudo-observed SFS were also directly computed with fastsimcoal2. Parameters were estimated independently from ten data sets generated under each model. For each data set generated under models with one to three populations, we performed 50 parameter estimations via ECM maximization, and each time retained the parameter set with maximum likelihood. For the model with 10 populations we only performed 20 estimations per data sets, and used 50,000 simulations instead of 100,000 for the other models to estimate the expected SFS due to long computation times. We describe the four tested models in Figure 1, and the used parameter values are showed as red dots in Figures 2, 3, S4 and S5. Absolute numbers (generations, population sizes) were obtained by assuming that the mutation rate of 2.5×10⁻⁸ bp⁻¹ per generation was known.
As a benchmark, we used to infer the demographic parameters in scenarios shown in Figure 1A–1C involving up to three populations. For each generated data set, we performed 50 parameter estimations using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization method implemented in , and we retained the parameters associated with the maximum likelihood. We followed 's manual specification to set reasonable upper and lower bounds of the search ranges of the parameter. In all cases, the expected SFS was estimated by extrapolating the SFS inferred from 3 grid sizes set to 40, 50 and 60, which are in all cases larger than our maximum samples sizes (30 in the IM model case). The composite likelihood was computed using 's multinomial model, which is in fact a product of Poisson likelihoods, where the expected model entries are scaled to sum up to 1. This likelihood also ignores information about the expected and observed numbers of monomorphic and polymorphic sites used in our likelihood formulation (as well as in [20] (link)). Therefore, the ratio should be equal to showing that barring the terms, the two CLs differ by a single constant value. The difference between likelihoods computed with fastsimcoal and is illustrated in Figure S3 for the case of the bottleneck scenarios shown in Figures 1A. It shows that when monomorphic sites are not taken into account, fastsimcoal and indeed produce essentially identical likelihood profiles around true parameters. However, when monomorphic sites are used in the likelihood, the shape of the likelihood profiles differs, making it more or less peaky depending on the parameter. There is thus no clear advantage in using one or the other likelihood form for this scenario, but our use of monomorphic sites allows us to directly get absolute values of the parameters. We report in Figures 2, S4 and S5 only the results obtained for data sets for which 's best log likelihood was less than 10% lower than the largest log-likelihood obtained with the other data sets, and we considered not to have converged for the discarded data sets.

Free full text: Click here

Excoffier L., Dupanloup I., Huerta-Sánchez E., Sousa V.C, & Foll M. (2013). Robust Demographic Inference from Genomic and SNP Data. PLoS Genetics, 9(10), e1003905.

Publication 2013

A 1 5 bp Bp 400 Dna sequences Mutation Polymorphic Populations That

Corresponding Organization : École Polytechnique Fédérale de Lausanne

Top 5 similar protocols

Protocol cited in 176 other protocols

Variable analysis

independent variables

Population differentiation models (1 to 10 populations with migration)

dependent variables

Demographic parameters recovered from DNA sequence data

control variables

Mutation rate (2.5×10^-8 bp^-1 per generation)
Infinite-site model
Number of unlinked regions (400,000 of 50 bp, totaling 20 Mb of DNA sequences)
Number of parameter estimations per data set (50 for models with 1-3 populations, 20 for model with 10 populations)
Number of simulations for parameter estimation (100,000 for models with 1-3 populations, 50,000 for model with 10 populations)

positive controls

Fastsimcoal2 used to compute pseudo-observed SFS directly

negative controls

Not explicitly mentioned

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!