We used two datasets for population history analysis. “HO” consists of 592,169 SNPs, taking the intersection of the SNP targets and the Human Origins SNP array4 (link); we used this dataset for co-analysis of present-day and ancient samples. “HOIll” consists of 1,055,209 SNPs that additionally includes sites from the Illumina genotype array48 (link); we used this dataset for analyses only involving the ancient samples.
On the HO dataset, we carried out principal components analysis in smartpca49 (link) using a set of 777 West Eurasian individuals4 (link), and projected the ancient individuals with the option “lsqproject: YES”. We carried out ADMIXTURE analysis on a set of 2,345 present-day individuals and the ancient samples after pruning for LD in PLINK 1.9 (https://www.cog-genomics.org/plink2)50 with parameters “-indep-pairwise 200 25 0.4”. We varied the number of ancestral populations between K=2 and K=20, and used cross-validation (–cv) to identify the value of K=17 to plot in Extended Data Fig. 2f.
We used ADMIXTOOLS11 (link) to compute f-statistics, determining standard errors with a Block Jackknife and default parameters. We used the option “inbreed: YES” when computing f3-statistics of the form f3(Ancient; Ref1, Ref2) as the Ancient samples are represented by randomly sampled alleles rather than by diploid genotypes. For the same reason, we estimated FST genetic distances between populations on the HO dataset with at least two individuals in smartpca also using the “inbreed: YES” option.
We estimated ancestral proportions as in Supplementary Information section 9 of Ref. 7 (link), using a method that fits mixture proportions on a Test population as a mixture of N Reference populations by using f4-statistics of the form f4(Test or Ref, O1; O2, O3) that exploit allele frequency correlations of the Test or Reference populations with triples of Outgroup populations. We used a set of 15 world outgroup populations4 (link),7 (link). In Extended Data Fig. 2, we added WHG and EHG as outgroups for those analyses in which they are not used as reference populations.
We determined sex by examining the ratio of aligned reads to the sex chromosomes51 . We assigned Y-chromosome haplogroups to males using version 9.1.129 of the nomenclature of the International Society of Genetic Genealogy (www.isogg.org), restricting analysis using samtools52 (link) to sites with map quality and base quality of at least 30, and excluding 2 bases at the ends of each sequenced fragment.