An Oragene DNA Kit (DNA Genotek, Ottawa, Canada) was used to collect ≥ 2 mL of saliva from each subject. From these samples, genomic DNA extraction and genotyping were conducted using an Axiom Japonica array (Toshiba, Tokyo, Japan) [14 (link)] by Cell Innovator Co., Ltd (Fukuoka, Japan), resulting in ~ 660,000 genetic variants among 94 study subjects, with a < 3% missing rate of the genotype data. Subsequently, we filtered out the genetic variants described below using PLINK ver. 1.90 [15 (link)]. After checking gender matching between phenotypic records and heterozygosity of genetic variants on chromosome X and cryptic relatedness between DNA samples (proportion identity with a descent threshold of ≥ 0.1875), we performed quality control (QC) of the genotype data and discarded the following variants: variants with a call rate of < 0.97, variants for which the genotype distribution significantly (p < 0.0001) deviated from the Hardy–Weinberg equilibrium, variants with a minor allele frequency (MAF) of < 0.005, and variants on sex chromosomes and the mitochondrial genome. Consequently, 622,446 genetic variants were used for genotype imputation.
After prephasing the genotype data that passed QC via SHAPEIT v2.r904 [16 (link), 17 (link)], untyped genotype data were imputed with the 1000 Genomes Project (1KGP) reference panel (phase 3) using IMPUTE2 ver. 2.3.2 [18 (link)] (Ne = 20,000; chunk size = 5 Mb). Genetic variants with low imputation quality (info score of < 0.5) were discarded when the imputed output data were converted into PLINK format data using GTOOL ver. 0.7.5 (https://www.well.ox.ac.uk/~cfreeman/software/gwas/gtool.html). We also conducted QC for the imputed genotype data in the same manner, although a 1% MAF threshold was set for the imputed data.
To assess population stratification on a genome-wide scale, we performed principal component analysis (PCA) using the Japonica array genotype data that passed QC in the JPQ cohort after removing genetic variants with a MAF of < 0.05 as well as datasets of five East Asian populations [CDX (Chinese Dai in Xishuangbanna, China), CHB (Han Chinese in Beijing, China), CHS (Southern Han Chinese), JPT (Japanese in Tokyo, Japan), and KHV (Kinh in Ho Chi Minh City, Vietnam)] retrieved from 1KGP [19 (link)] phase 3 reference panels (NCBI Build GRCh37; http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a). The datasets of 1KGP populations were generated from variant call format (VCF) data using the PLINK program after the VCF data were converted to the binary VCF data using SAMtools/BCFtools ver. 1.9 [20 (link)]. To detect possible population outliers, PCA was also conducted using the genotype data of the JPQ cohort only.
Free full text: Click here