The presence and absence of pseudo-heterozygosity at a given site (coded as 1 and 0 respectively) was used as a phenotype to run GWAS. As a genotype, the matrix published by the 1001 Genomes Consortium containing 10 million SNPs was used [19 (link)]. To run all the GWAS, the pygwas package [https://github.com/timeu/PyGWAS; see [59 (link)]] with the amm (accelerated mixed model) option was used. The raw output containing all SNPs was filtered, removing all SNPs with a minor allele frequency below 0.05 and/or a -log10(p-value) below 4.
For each GWAS performed, the p-value as well as the position was used to call the peaks using the Fourier transform function in R (filterFFT), combined with the peak detection function (peakDetection), from the package NucleR 3.13, to automatically retrieve the position of each peak across the genome. From each peak, the highest SNPs within a region of +/− 10kb around the peak center were used (see the example in Additional file 1: Fig. S18). Using all 26647 SNPs, a summary table was generated with each pseudo-heterozygous SNP and each GWAS peak detected (Additional file 2). This matrix was then used to generate Fig. 2C, applying thresholds of −log10(p-value) of 20 and a minor allele frequency of 0.1.
Free full text: Click here