The HapMap phase 3 dataset consists of 1184 human individuals across 11 populations (ASW: African ancestry in Southwest USA; CEU: Utah residents with Northern and Western European ancestry from the CEPH collection; CHB: Han Chinese in Beijing, China; CHD: Chinese in Metropolitan Denver, Colorado; GIH: Gujarati Indians in Houston, Texas; JPT: Japanese in Tokyo, Japan; LWK: Luhya in Webuye, Kenya; MEX: Mexican ancestry in Los Angeles, California; MKK: Maasai in Kinyawa, Kenya; TSI: Toscani in Italia; YRI: Yoruba in Ibadan, Nigeria) assayed for 1,440,616 SNPs [6] (link). We performed QC on the data, including removal of SNPs with MAF , missingness , and deviation from Hardy-Weinberg equilibrium . We removed non-founders and individuals with genotyping missingness , leaving 957 individuals. Next, we removed several regions of high LD and/or known inversions (chr5∶44 Mb–51.5 Mb, chr6∶25 Mb–33.5 Mb, chr8∶8 Mb–12 Mb, chr11∶45 Mb–57 Mb) [15] (link). Finally, we used PLINK [16] (link) –indep-pairwise 1000 10 0.02 to thin the SNPs by LD ( ), leaving 14,389 SNPs.
Free full text: Click here