If we have samples from populations, the numbers of sampled sequences from each population are . (For diploids, is typically twice the number of individuals sampled from population 1.) Entry of the AFS records the number of diallelic polymorphic sites at which the derived allele was found in samples from population 1, from population 2, and so forth. (If ancestral alleles cannot be determined, then the “folded” AFS can be considered, in which entries correspond to the frequency of the minor allele.)
We model the evolution of , the density of derived mutations at relative frequencies in populations at time . (All run from 0 to 1.) Given an infinitely-many-sites mutational model [31] (link) and Wright-Fisher reproduction in each generation, the dynamics of for an arbitrary finite number of populations are governed by a linear diffusion equation: The first term models genetic drift, and the second term models selection and migration.
Changes in population size and migration alter the parameters in Equation 1, while population splits and mergers alter the dimensionality of . For example, if new population 3 is admixed with a proportion from population 1 and from population 2 then where denotes the Dirac delta function. To remove population 2, is integrated over : .
Given , the expected value of each entry of the AFS, , is found via a P-dimensional integral over all possible population allele frequencies of the probability of sampling derived alleles times the density of sites with those population allele frequencies. For SNP data obtained by resequencing, these probabilities are binomial, so In some cases of ascertained data [33] (link), the resulting bias can be corrected by modifying the above equation [11] (link),[34] (link).