We aimed to confirm our discoveries in the MGBB, an ongoing prospective clinical research cohort of patients of Mass General Brigham (MGB), the parent organization of Massachusetts General Hospital (MGH) and Brigham and Women’s Hospital (BWH) in Boston, Massachusetts, USA. All patients aged 18 years or older presenting to any of the MGB clinics consenting to broad research are included. Patients are recruited in-person at MGH and BWH and online through an electronic patient gateway. The MGBB provides ongoing electronic health record data with ICD codes for outcomes, imaging, and, on a subset of individuals, genetic data. Recruitment has been ongoing since 1998 and to date, more than 133,000 patients have been included in the Biobank and over 65,000 have been genotyped. Genotyping has been performed in batches on different arrays, with batches 1–9 performed by MGH and batches 10–13 by the Broad Institute. To minimize batch effects and variation across different genotyping arrays, many of the individuals that have been genotyped in batches 1–9 have been regenotyped in batch 13. For the current analyses, we considered only individuals that were genotyped in batches 10–13, which we combined and imputated together on the Michigan Imputation Server using the Haplotype Reference Consortium v.1.1 reference panel. We used exactly the same PRS-CS approach to construct the genetic score in MGBB participants.
We queried the MGB Biobank database using the same criteria that we used in the UKB. We identified patients with available genetic data and two or more oral prescriptions of valproate. Most of the prescriptions contained instruction texts from which we could extract the daily dose by multiplying the medication dose by the prescribed number of times taken daily. Outcomes were identified using inpatient ICD-9 and ICD-10 codes (Supplemental Table S6) and were classified as prevalent or incident events regarding their occurrence before or after the date of the first valproate prescription. We used the same statistical analysis approaches for the replication analyses in the MGBB as in the UKB. Linear regression models were used to assess the associations of the average valproate dose and the genetic score with valproate serum levels. For the association of the genetic score with outcomes, Cox proportional hazard models were constructed with the time to event as the number of days between the date of the first valproate prescription and incident event date for patients for which an outcome occured, and the number of days between the first valproate prescription and the last encounter for patients for which no outcome occured. We detected significant associations between the principal components and age and sex, most likely due to chance imbalance across genotyping batches; people in later batches were better powered and consisted of older individuals and more women, and thus models adjusted for age and sex and the principal components introduced collinearity to our model. Because we found no associations between the genetic score for valproate response with age and sex, indicating near-perfect randomization, we removed age and sex from our Cox models and the reported effect estimates are from models with the genetic score as main predictor, adjusted for principal components 1–3 which still contain age and sex information.