FlashPCA30 (link) was run for principal component analysis (PCA) to infer genetic ancestry by genotype. The regression model assumed an additive genetic model and included the first three eigenvalues from FlashPCA as covariates. For imputed data of smaller sample size, which was enrolled in our analysis later, we changed the method score to EM algorithm to accommodate smaller sample size.
We combined imputed genotypes from 14,803 cases and 12,262 controls from the OncoArray series with 14,436 cases and 44,188 controls samples undertaken by the previous lung cancer GWAS3 (link),4 (link),6 (link), including studies of IARC, MDACC, SLRI, ICR, Harvard, NCI, Germany and deCODE as described previously3 (link),4 (link),6 (link), and we ensured that there were no overlap between the ATBC, EAGLE and CARET studies included in both the previous GWAS and current OncoArray dataset by comparing the identity tags (IDs) of all study participants.
In addition to lung cancer, analyses by histological strata (adenocarcinoma, squamous cell carcinoma, small cell carcinoma (SCLC) and smoking status (Ever/Never) was assessed where data were available. Results from analyses defined by Ever and Never smoking strata did not identify any novel variants.
We conducted the fixed effects meta-analysis with the inverse variance weighting and random effects meta-analysis from the DerSimonian-Laird method31 . All meta-analysis and calculations were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA). As the same referent panel was used for all studies, all SNPs showed the same forward alignment profiles. We excluded poorly imputed SNPs defined by imputation quality R2 < 0.3 or Info < 0.4 for each meta-analysis component and SNPs with a Minor allele frequency (MAF) >0.01 (except for CHEK2 rs17879961 and BRCA2 rs11571833 which we have validated extensively previously4 (link). We generated the index of heterogeneity(I2) and P-value of Cochran’s Q statistic to assess heterogeneity in meta-analyses and considered only variants with little evidence for heterogeneity in effect between the studies (P-value of Cochran’s Q statistic >0.05). SNPs were retained for study provided the average imputation R-square was at least 0.4. For SNPs in the 0.4–0.8 range that reached genome wide significance results were evaluated for consistency with neighboring SNPs to assure a reliable inference. Due to the smaller sample size and fewer sites contributing in the strata of Never Smokers and SCLC, we additionally required variants to be present in each of the meta-analysis components to be retained for these 2 stratified analyses.
Conditional analysis was undertaken using SNPTEST where individual level data was available and GCTA32 (link) packages for the previous lung cancer GWAS, with the LD estimates obtained from individuals of European origin for the later. Results were combined using fixed effects inverse variance weighted meta-analysis as described above33 (link).
We combined imputed genotypes from 14,803 cases and 12,262 controls from the OncoArray series with 14,436 cases and 44,188 controls samples undertaken by the previous lung cancer GWAS3 (link),4 (link),6 (link), including studies of IARC, MDACC, SLRI, ICR, Harvard, NCI, Germany and deCODE as described previously3 (link),4 (link),6 (link), and we ensured that there were no overlap between the ATBC, EAGLE and CARET studies included in both the previous GWAS and current OncoArray dataset by comparing the identity tags (IDs) of all study participants.
In addition to lung cancer, analyses by histological strata (adenocarcinoma, squamous cell carcinoma, small cell carcinoma (SCLC) and smoking status (Ever/Never) was assessed where data were available. Results from analyses defined by Ever and Never smoking strata did not identify any novel variants.
We conducted the fixed effects meta-analysis with the inverse variance weighting and random effects meta-analysis from the DerSimonian-Laird method31 . All meta-analysis and calculations were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA). As the same referent panel was used for all studies, all SNPs showed the same forward alignment profiles. We excluded poorly imputed SNPs defined by imputation quality R2 < 0.3 or Info < 0.4 for each meta-analysis component and SNPs with a Minor allele frequency (MAF) >0.01 (except for CHEK2 rs17879961 and BRCA2 rs11571833 which we have validated extensively previously4 (link). We generated the index of heterogeneity(I2) and P-value of Cochran’s Q statistic to assess heterogeneity in meta-analyses and considered only variants with little evidence for heterogeneity in effect between the studies (P-value of Cochran’s Q statistic >0.05). SNPs were retained for study provided the average imputation R-square was at least 0.4. For SNPs in the 0.4–0.8 range that reached genome wide significance results were evaluated for consistency with neighboring SNPs to assure a reliable inference. Due to the smaller sample size and fewer sites contributing in the strata of Never Smokers and SCLC, we additionally required variants to be present in each of the meta-analysis components to be retained for these 2 stratified analyses.
Conditional analysis was undertaken using SNPTEST where individual level data was available and GCTA32 (link) packages for the previous lung cancer GWAS, with the LD estimates obtained from individuals of European origin for the later. Results were combined using fixed effects inverse variance weighted meta-analysis as described above33 (link).