When possible, we excluded samples with any of the following: pregnancy (when complete blood count (CBC) done), acute medical/surgical illness (when CBC done), blood cancer, leukemia, lymphoma, chemotherapy, myelodysplastic syndrome, bone marrow transplant, congenital or hereditary anemia (e.g., hemoglobinopathy such as sickle cell anemia or thalassemia), HIV, end-stage kidney disease, dialysis, EPO treatment, splenectomy, cirrhosis and those with any of the following extreme measurements: WBC count > 100109/L with > 5% immature cell or blasts, WBC > 200109/L, Hemoglobin > 20 g/dL, Hematocrit > 60%, Platelet > 1000109/L. For the WBC subtypes (e.g., basophils count) we used the relative count, i.e., the total WBC count multiplied by the proportion for each cell type (e.g., basophils percentage). Raw phenotypes were regressed on age, age-squared, sex, principal components and cohort specific covariates (e.g., study center, cohort, etc) if needed, WBC related traits were log10 transformed before regression modeling. Residuals from the modeling were obtained and then inverse normalized for cohort level association analysis or GWAS. All cohorts followed the same exclusions and phenotype modeling except for UKBB and INTERVAL that had their procedure described elsewhere (Astle et al., 2016 (link)). The cohort level association analyses were then conducted using a linear mixed effects model in order to account for known or cryptic relatedness (e.g., BOLT-LMM (Loh et al., 2015 (link), 2018 (link)), EPACTS https://github.com/statgen/EPACTS and rvtests (Zhan et al., 2016 (link)) with the additive genetic model. Linear mixed effects models have been shown to effectively account for both population structure and inter-individual relatedness within the UK Biobank cohort, along with having increased discovery power over simple linear regression with principal components.
Free full text: Click here