After phasing the UK Biobank genetic data (carried out on 81
chromosomal chunks using Eagle v.2.4), the phased data were converted from
GRCh37 to GRCh38 using LiftOver
112 (link). Imputation was performed using Minimac4
111 .
We compared the correlation of genotypes between the
exome-sequencing data released by the UK Biobank (following their SPB
pipeline
113 (link)) and
the TOPMed-imputed genotypes. The comparison assessed 49,819 individuals and
3,052,260 autosomal variants that were found in both the exome-sequencing
and TOPMed-imputed datasets (matched by chromosome, position and alleles,
and with an imputation quality of at least 0.3 in the TOPMed-imputed data).
We split the variants into MAF bins for which the MAF from the exome data
was used to define the bins, and computed Pearson correlations averaged
within each bin.
We tested single pLOF, nonsense, frameshift and essential
splice-site variants
85 ,86 (link) for association with 1,419
PheCodes constructed from composites of ICD-10 (International Classification
of Diseases 10th revision) codes to define cases and controls. Construction
of the PheCodes has been previously described
114 (link). We performed the association
analysis in the ‘white British’ individuals, which resulted in
408,008 individuals after the following quality control metrics were
applied: (1) samples did not withdraw consent from the UK Biobank study as
of the end of 2019; (2) ‘submitted gender’ matches
‘inferred sex’; (3) phased autosomal data available; (4)
outliers for the number of missing genotypes or heterozygosity removed; (5)
no putative sex chromosome aneuploidy; (6) no excess of relatives; (7) not
excluded from kinship inference; and (8) in the UK Biobank defined the
‘white British’ ancestry subset. To perform the association
analyses, we used a logistic mixed model test implemented in SAIGE
114 (link) with birth year and the
top four principal components (computed from the white British subset) as
covariates. For the pLOF burden tests, for each autosomal gene with at least
two rare pLOF variants (
n = 12,052 genes), a burden
variable was created in which dosages of rare pLOF variants were summed for
each individual. This sum of dosages was tested for association with the
1,419 traits using SAIGE. The same covariates used in the single-variant
tests were included. For both the single-variant and the burden tests, we
used 5 × 10
−8 as the genome-wide significance
threshold.
Taliun D., Harris D.N., Kessler M.D., Carlson J., Szpiech Z.A., Torres R., Taliun S.A., Corvelo A., Gogarten S.M., Kang H.M., Pitsillides A.N., LeFaive J., Lee S.B., Tian X., Browning B.L., Das S., Emde A.K., Clarke W.E., Loesch D.P., Shetty A.C., Blackwell T.W., Smith A.V., Wong Q., Liu X., Conomos M.P., Bobo D.M., Aguet F., Albert C., Alonso A., Ardlie K.G., Arking D.E., Aslibekyan S., Auer P.L., Barnard J., Barr R.G., Barwick L., Becker L.C., Beer R.L., Benjamin E.J., Bielak L.F., Blangero J., Boehnke M., Bowden D.W., Brody J.A., Burchard E.G., Cade B.E., Casella J.F., Chalazan B., Chasman D.I., Chen Y.D., Cho M.H., Choi S.H., Chung M.K., Clish C.B., Correa A., Curran J.E., Custer B., Darbar D., Daya M., de Andrade M., DeMeo D.L., Dutcher S.K., Ellinor P.T., Emery L.S., Eng C., Fatkin D., Fingerlin T., Forer L., Fornage M., Franceschini N., Fuchsberger C., Fullerton S.M., Germer S., Gladwin M.T., Gottlieb D.J., Guo X., Hall M.E., He J., Heard-Costa N.L., Heckbert S.R., Irvin M.R., Johnsen J.M., Johnson A.D., Kaplan R., Kardia S.L., Kelly T., Kelly S., Kenny E.E., Kiel D.P., Klemmer R., Konkle B.A., Kooperberg C., Köttgen A., Lange L.A., Lasky-Su J., Levy D., Lin X., Lin K.H., Liu C., Loos R.J., Garman L., Gerszten R., Lubitz S.A., Lunetta K.L., Mak A.C., Manichaikul A., Manning A.K., Mathias R.A., McManus D.D., McGarvey S.T., Meigs J.B., Meyers D.A., Mikulla J.L., Minear M.A., Mitchell B.D., Mohanty S., Montasser M.E., Montgomery C., Morrison A.C., Murabito J.M., Natale A., Natarajan P., Nelson S.C., North K.E., O’Connell J.R., Palmer N.D., Pankratz N., Peloso G.M., Peyser P.A., Pleiness J., Post W.S., Psaty B.M., Rao D.C., Redline S., Reiner A.P., Roden D., Rotter J.I., Ruczinski I., Sarnowski C., Schoenherr S., Schwartz D.A., Seo J.S., Seshadri S., Sheehan V.A., Sheu W.H., Shoemaker M.B., Smith N.L., Smith J.A., Sotoodehnia N., Stilp A.M., Tang W., Taylor K.D., Telen M., Thornton T.A., Tracy R.P., Van Den Berg D.J., Vasan R.S., Viaud-Martinez K.A., Vrieze S., Weeks D.E., Weir B.S., Weiss S.T., Weng L.C., Willer C.J., Zhang Y., Zhao X., Arnett D.K., Ashley-Koch A.E., Barnes K.C., Boerwinkle E., Gabriel S., Gibbs R., Rice K.M., Rich S.S., Silverman E.K., Qasba P., Gan W., Papanicolaou G.J., Nickerson D.A., Browning S.R., Zody M.C., Zöllner S., Wilson J.G., Cupples L.A., Laurie C.C., Jaquish C.E., Hernandez R.D., O’Connor T.D, & Abecasis G.R. (2021). Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature, 590(7845), 290-299.