All imputed autosomal variants with IMPUTE information score > 0.4 (
M = 39,723,562) were eligible
for association testing in phenotype-specific models. An effective sample size (
Neff) was calculated
for each SNP in a given phenotype-specific model, where
Neff = 2 × MAF × (1 —
MAF) ×
N × info where MAF is the minor allele frequency among the set of individuals included in a
phenotype-specific model,
N is the total sample size for a given phenotype and info is the IMPUTE information
score of the SNP. Variants with
Neff < 30 (continuous phenotypes) or
Neff < 50 (binary phenotypes), were excluded from the final set of phenotype-specific
results. The number of variants analysed per trait ranged from 21,894,105 to 34,656,550 for continuous phenotypes and 11,665,604
to 28,263,875 for binary phenotypes (
Supplementary Table 1).
Quantile–quantile plots and λ
GC (GC = genomic control) were used to assess genomic inflation in all
phenotypes, for which λ
GC ranged from 0.98 to 1.15. Single-variant association testing for each phenotype used
an additive model that was adjusted by indicators for study, self-identified race/ethnicity, the first 10 PCs and
phenotype-specific covariates. Additional information about the phenotype-specific model covariates and transformations are
included in the
Supplementary Information. Association testing was
completed in both SUGEN and GENESIS programs.
The GENESIS
17 (link),18 (link) program is a
Bioconductor package made available in R that was developed for large-scale genetic analyses in samples with complex structure
including relatedness, population structure and ancestry admixture. The current version of GENESIS implements both linear and
logistic mixed model regression for genome-wide association testing. The software can accommodate continuous and binary
phenotypes. The GENESIS package includes the program PC-Relate, which uses a PCA-based method to infer genetic relatedness in
samples with unspecified and unknown population structure. By using individual-specific allele frequencies estimated from the
sample with PC eigenvectors, it provides robust estimates of kinship coefficients and identity-by-descent sharing probabilities in
samples with population structure, admixture and Hardy-Weinberg equilibrium departures. It does not require additional reference
population panels or prior specification of the number of ancestral subpopulations.
The SUGEN program
19 (link) is a command-line software program developed for
genetic association analysis under complex survey sampling and relatedness patterns. It implements the generalized estimating
equation method, which does not require modelling of the correlation structures of complex pedigrees. It adopts a modified version
of the ‘sandwich’ variance estimator, which is accurate for low-frequency SNPs. Association testing in SUGEN
requires the formation of ‘extended’ families by connecting the households who share first-degree relatives or
either first- or second-degree relatives. Trait values are assumed to be correlated within families but independent between
families. In our experience in analysing this dataset, it is sufficient to account for first-degree relatedness. The current
version of SUGEN can accommodate continuous, binary and age-at-onset traits. A comparison of
P values produced by
SUGEN and GENESIS for all previously identified known loci are included in
Supplementary Fig. 12 and
Supplementary Table 4.
Wojcik G.L., Graff M., Nishimura K.K., Tao R., Haessler J., Gignoux C.R., Highland H.M., Patel Y.M., Sorokin E.P., Avery C.L., Belbin G.M., Bien S.A., Cheng I., Cullina S., Hodonsky C.J., Hu Y., Huckins L.M., Jeff J., Justice A.E., Kocarnik J.M., Lim U., Lin B.M., Lu Y., Nelson S.C., Park S.S., Poisner H., Preuss M.H., Richard M.A., Schurmann C., Setiawan V.W., Sockell A., Vahi K., Verbanck M., Vishnu A., Walker R.W., Young K.L., Zubair N., Acuña-Alonso V., Luis Ambite J., Barnes K.C., Boerwinkle E., Bottinger E.P., Bustamante C.D., Caberto C., Canizales-Quinteros S., Conomos M.P., Deelman E., Do R., Doheny K., Fernández-Rhodes L., Fornage M., Hailu B., Heiss G., Henn B.M., Hindorff L.A., Jackson R.D., Laurie C.A., Laurie C.C., Li Y., Lin D.Y., Moreno-Estrada A., Nadkarni G., Norman P.J., Pooler L.C., Reiner A.P., Romm J., Sabatti C., Sandoval K., Sheng X., Stahl E.A., Stram D.O., Thornton T.A., Wassel C.L., Wilkens L.R., Winkler C.A., Yoneyama S., Buyske S., Haiman C.A., Kooperberg C., Marchand L.L., Loos R.J., Matise T.C., North K.E., Peters U., Kenny E.E, & Carlson C.S. (2019). Genetic analyses of diverse populations improves discovery for complex traits. Nature, 570(7762), 514-518.