The number of possible gene flow donor-recipient combinations increases rapidly with the number of populations or species. A unified test for introgression has been developed for a five taxon symmetric phylogeny, implemented in the DFOIL package (Pease & Hahn 2015 (
link)). However, no such framework currently exists for datasets with six or more taxa. A common approach is to perform the
D and ƒ
4-ratio analyses on all four taxon subsamples from the dataset [e.g. (Green
et al. 2010 ; Martin
et al. 2013 (
link); vonHoldt
et al. 2016 (
link); Kozak
et al. 2018 ; Malinsky
et al. 2018 )]. However, the number of analyses that need to be performed grows very quickly. Even with a fixed outgroup, the number of combinations is
, i.e.
n choose 3, where
n is the number of taxa. For example, there are 1,140 different combinations of ((P1, P2), P3) in a dataset of 20 taxa, growing to 161,700 combinations in a dataset with 100 taxa. Interpreting the results of such a system of four taxon tests is not straightforward; the different subsets are not independent as soon as the taxa share drift (that is, they share branches on the phylogeny) and, therefore, a single gene flow event can be responsible for many elevated
D and ƒ
4-ratio results. At the same time, the correlations, especially of the ƒ
4-ratio scores, can be informative about the timing of introgression events and about the specific donor-recipient combinations.
The ƒ-branch or ƒ
b metric was introduced in Malinsky
et al. (2018) to disentangle correlated ƒ
4-ratio results and assign gene flow evidence to specific, possibly internal, branches on a phylogeny by building upon the logic developed by Martin
et al. (2013) (
link), as illustrated in
Fig. 1. Given a specific tree (with known or hypothesised relationships), the ƒ
b(P3) statistic reflects excess sharing of alleles between the population or species P3 and the descendants of the branch labelled b, relative to allele sharing between P3 and the descendants of the sister branch of b.
Formally:
where
B refers to the populations or taxa descending from the branch
b, and
A refers to descendants from the sister branch of
b. The calculation is over all positive ƒ
4-ratio results which had
A in the P1 and
B in the P2 positions.
Malinsky M., Matschiner M, & Svardal H. (2020). Dsuite - fast D-statistics and related admixture evidence from VCF files. Molecular ecology resources, 21(2), 584-595.