The number of possible gene flow donor-recipient combinations increases rapidly with the number of populations or species. A unified test for introgression has been developed for a five taxon symmetric phylogeny, implemented in the DFOIL package (Pease & Hahn 2015 (link)). However, no such framework currently exists for datasets with six or more taxa. A common approach is to perform the D and ƒ4-ratio analyses on all four taxon subsamples from the dataset [e.g. (Green et al. 2010 ; Martin et al. 2013 (link); vonHoldt et al. 2016 (link); Kozak et al. 2018 ; Malinsky et al. 2018 )]. However, the number of analyses that need to be performed grows very quickly. Even with a fixed outgroup, the number of combinations is (n3) , i.e. n choose 3, where n is the number of taxa. For example, there are 1,140 different combinations of ((P1, P2), P3) in a dataset of 20 taxa, growing to 161,700 combinations in a dataset with 100 taxa. Interpreting the results of such a system of four taxon tests is not straightforward; the different subsets are not independent as soon as the taxa share drift (that is, they share branches on the phylogeny) and, therefore, a single gene flow event can be responsible for many elevated D and ƒ4-ratio results. At the same time, the correlations, especially of the ƒ4-ratio scores, can be informative about the timing of introgression events and about the specific donor-recipient combinations.
The ƒ-branch or ƒb metric was introduced in Malinsky et al. (2018) to disentangle correlated ƒ4-ratio results and assign gene flow evidence to specific, possibly internal, branches on a phylogeny by building upon the logic developed by Martin et al. (2013) (link), as illustrated in Fig. 1. Given a specific tree (with known or hypothesised relationships), the ƒb(P3) statistic reflects excess sharing of alleles between the population or species P3 and the descendants of the branch labelled b, relative to allele sharing between P3 and the descendants of the sister branch of b.
Formally: fb(P3)=medianA[minB[f4ratio(A,B;P3,O)]] where B refers to the populations or taxa descending from the branch b, and A refers to descendants from the sister branch of b. The calculation is over all positive ƒ4-ratio results which had A in the P1 and B in the P2 positions.