After quality control criteria were finalized for each individual and each sample collection (SNPs with call rates of <95% were excluded; Supplementary Note), IMPUTE2 (ref. 42) or MaCH/Minimac43 (link) software (Supplementary Table 2) was used to impute the genotypes of all participants with haplotypes derived from samples of European ancestry in the 1000 Genome Project (2010 interim release based on the sequence data freeze from 4 August 2010 and phased haplotypes from December 2010). In each data set, SNPs with R2 or info score quality estimates of less than 0.3, as indicated by MaCH or IMPUTE2, respectively (with these two quality estimates described to be equivalent), were excluded from analyses. Similarly, SNPs with a MAF of <1% were also excluded. After these procedures, a maximum of 8,133,148 SNPs were retained that were present in at least 1 data set.
In each case-control data set, the association of LOAD with genotype dosage was analyzed by a logistic regression model including covariates for age, sex and principal components to adjust for possible population stratification (Supplementary Table 2). For the three CHARGE cohorts with incident Alzheimer’s disease data, Cox proportional hazards models were used. The four consortia used different but analogous software for these analyses (PLINK44 (link), SNPTEST45 (link), ProbABEL46 or R; Supplementary Table 2). Three of these tools were applied to the EADI data set for quality control, and very similar results were observed. After the exclusion of SNPs showing logistic regression coefficient |β| > 5 or P value equal to 0 or 1, the maximum number of SNPs in any data set was 8,131,643. Each consortium uploaded summarized results for each SNP to an internal I-GAP website for access by members of each consortium.
SNPs genotyped or imputed in at least 40% of Alzheimer’s disease cases and 40% of control samples were included in the meta-analysis. This threshold represented the best compromise between maximizing the total number of SNPs and maximizing the number of samples in which the given SNP was present. Indeed, analyzing all SNPs available in at least one study could have greatly increased the risk of false positives. On the other hand, studying SNPs only present in all studies could have led to the removal of SNPs of potential interest, even if those SNPs could have reached adequate statistical power in a more limited number of data sets (false negatives). This approach allowed us to increase homogeneity between studies for some SNPs by excluding poor quality data present only in a limited number of data sets of small size. This last selection step led to a final number of 7,055,881 SNPs in stage 1 analysis.