The effect of different background correction methods on measurement accuracy was assessed using data from methylation control mixture samples for this same study (12 (link)), where purified human 100% methylated and unmethylated DNA (Zymo Research, Irving CA) were mixed together in different proportions to create laboratory control samples with specific methylation levels: 0%, 5%, 10%, 20%, 40%, 50%, 60%, 80% and 100% methylated Replicates for each methylation level (n = 10, 3, 2, 3, 3, 2, 3, 3 and 10, respectively) were independently assayed on different arrays.
To avoid possible impact on evaluations, we excluded 69 075 probes, which include non-specific bind probes, common (MAF > 0.05) SNPs at CpG target regions, probes on sex chromosomes and probes with multimodal methylation distributions identified using ENmix R package. We also excluded probes with low quality methylation values where the number of beads was less than 3 or detection P-value greater than 0.05.
To demonstrate the effect of ENmix background correction method on epigenome-wide association studies (EWAS), we re-analyzed raw blood DNA methylation data from 889 infants in relation to maternal smoking (12 (link)). We preprocessed the data with different methods or combinations of methods: raw data, Q5 background correction, ENmix_oob background correction, ENmix and dye bias correction (ENmixD), ENmix+dye bias correction+quantile normalization (ENmixDQ) and ENmix+dye bias correction+quantile normalization+BMIQ (ENmixDQB). We used a robust linear regression model to test for association between maternal smoking and infant DNA methylation level adjusting for the following variables: cell type proportion (CD8T, CD4T, NK, Bcell, Mono and Gran) estimated using the Houseman method (13 (link)) from minfi R package, gestational age in weeks, sex, education in two categories, birth weight, maternal age, maternal BMI, parity, experimental batch, cleft phenotype and baby birth year.