The effect of different background correction methods on reproducibility was assessed using data from 20 pairs of duplicate samples that were part of a previously published study of methylation in 891 infant whole blood samples (12 (link)). As part of this study, duplicate samples were located on separate 96 well plates that underwent independent bisulfite conversion, hybridization and array scanning. One sample was excluded due to poor data quality, leaving 19 duplicate pairs (38 samples) for evaluation.
The effect of different background correction methods on measurement accuracy was assessed using data from methylation control mixture samples for this same study (12 (link)), where purified human 100% methylated and unmethylated DNA (Zymo Research, Irving CA) were mixed together in different proportions to create laboratory control samples with specific methylation levels: 0%, 5%, 10%, 20%, 40%, 50%, 60%, 80% and 100% methylated Replicates for each methylation level (n = 10, 3, 2, 3, 3, 2, 3, 3 and 10, respectively) were independently assayed on different arrays.
To avoid possible impact on evaluations, we excluded 69 075 probes, which include non-specific bind probes, common (MAF > 0.05) SNPs at CpG target regions, probes on sex chromosomes and probes with multimodal methylation distributions identified using ENmix R package. We also excluded probes with low quality methylation values where the number of beads was less than 3 or detection P-value greater than 0.05.
To demonstrate the effect of ENmix background correction method on epigenome-wide association studies (EWAS), we re-analyzed raw blood DNA methylation data from 889 infants in relation to maternal smoking (12 (link)). We preprocessed the data with different methods or combinations of methods: raw data, Q5 background correction, ENmix_oob background correction, ENmix and dye bias correction (ENmixD), ENmix+dye bias correction+quantile normalization (ENmixDQ) and ENmix+dye bias correction+quantile normalization+BMIQ (ENmixDQB). We used a robust linear regression model to test for association between maternal smoking and infant DNA methylation level adjusting for the following variables: cell type proportion (CD8T, CD4T, NK, Bcell, Mono and Gran) estimated using the Houseman method (13 (link)) from minfi R package, gestational age in weeks, sex, education in two categories, birth weight, maternal age, maternal BMI, parity, experimental batch, cleft phenotype and baby birth year.
Free full text: Click here