To assess noise, we calculated a composite STAMP signal from the sequence tags within random 15 kb windows for all samples. We defined the mean STAMP signal density within these windows as the noise floor (NF). Using this approach, we found that NF was uniformly 0.114. The NF estimate was independent of the sample and is approximately equal to one sequence tag per kb when the total number of sequence tags per data set is scaled to 106.
To identify DMEs, we first chose all regions with STAMP signal greater than a threshold value. Then the flanks of those regions were extended until the signal declined to 4× NF. To minimize false discovery of DMEs, the detection threshold value was chosen to ensure that the number of DMEs identified in unenriched DNA was less than 5% of that identified in an His-MBD enriched sample from the same source (
Comparison of DNA methylation across genomic regions with varying CG density was performed by classifying each region as either methylated or unmethylated (lowest 10% of sequence tags for regions with similar sequence content) prior to the generation of contingency tables. Fisher's exact test for count data was used to assess the likelihood of any genic element being methylated if any other element is methylated. This approach largely uncouples the analysis from CG density because the element class assignment is insensitive to the magnitude of the STAMP signal. We also calculated the CpG density (CGf), GC fraction (GCf) and CGoe ratio (CGoe = CGf/(Gf * Cf) in sliding windows of 200 bp tiled every 10 bp across the entire human genome. We then identified the fraction of each genic element that could be classified as HCP, ICP or LCP as defined by Weber et al [37] (link), and the fraction of the element detectable (CGf>0.1) by STAMP. Analyses performed using subsets of the genome restricted by these various sequence classes had minimal effect on the results and did not alter the interpretation. Data presented in
To generate density plots of CGoe vs GC fraction (