The DNA methylation samples were measured using the Illumina HumanMethylation EPIC beadchip, which includes more than 850,000 CpGs. We pre-processed the samples using the SeSAMe 2 pipeline described in Welsh et al. (2023), which was found to perform the best and produced the largest percentage of reliable CpG probes in a recent comparison of various pre-processing and normalization pipelines [17 ]. Supplementary Table 2 shows the number of CpGs at each pre-processing step.
First, we removed CpGs that overlap with single nucleotide polymorphism (SNP), non-CpG probes, cross-reactive probes [18 (link)], and probes located on X or Y chromosomes. Samples and probes were further filtered using the iterative Greedy-cut algorithm (with a p-value threshold of 0.01) in RnBeads R package, which iteratively removes the probe or sample with the highest fraction of unreliable measurements one at a time [19 (link)]. Next, we removed additional probes that had missing values in more than 5% of samples or were masked by the pOOBAH (P-value with out-of-band array hybridization) algorithm in SeSAMe R package in more than 20% of samples. Finally, we performed noob (normal-exponential using out-of-band probes) background correction and a non-linear dye-bias correction [20 (link)]. These analyses were performed using the RnBeads and SeSAMe R packages.
Free full text: Click here