Analyses used csaw v1.2.1. Reads were extended to 100 bp and counted into windows for each library. The window size was set to 150 bp for simulated histone mark data or 10 bp for simulated TF data. Start positions of adjacent windows were separated by 50 bp along the genome. For filtering, reads were also counted into 2 kbp bins, and the median average abundance of all bins was used as a global estimate of the background abundance. This estimate was downscaled for comparison to the window abundances, based on the difference in the size of the bins and windows. Windows were filtered to retain only those with a two-fold or greater increase in the average abundance above the scaled background estimate. This corresponds to genomic regions where there is substantial enrichment over the non-specific background.
Counts from the remaining windows were tested for significant DB using edgeR v3.10.0. Briefly, an abundance-dependent trend in the NB dispersions was fitted to all windows, using the estimateDisp function. A GLM was fitted to the counts for each window using the trended NB dispersion. The quasi-likelihood (QL) dispersion was estimated from the GLM deviance. An abundance-dependent trend was robustly fitted to the QL dispersions across all windows, and the QL dispersion estimate for each window was shrunk to this trend. Finally, a P-value for DB in each window was computed using the QL F-test.
Windows were clustered into genomic regions using a nearest-neighbor approach, where adjacent windows no more than 100 bp apart were placed into the same cluster. A maximum cluster width of 5 kbp was set to avoid chaining. The P-values for all windows in each cluster were combined using Simes’ method, and the Benjamini–Hochberg (BH) method was applied on the combined P-values from all clusters.
Analyses were conducted using R 3.2.0 and Bioconductor 3.1 for UNIX.
Counts from the remaining windows were tested for significant DB using edgeR v3.10.0. Briefly, an abundance-dependent trend in the NB dispersions was fitted to all windows, using the estimateDisp function. A GLM was fitted to the counts for each window using the trended NB dispersion. The quasi-likelihood (QL) dispersion was estimated from the GLM deviance. An abundance-dependent trend was robustly fitted to the QL dispersions across all windows, and the QL dispersion estimate for each window was shrunk to this trend. Finally, a P-value for DB in each window was computed using the QL F-test.
Windows were clustered into genomic regions using a nearest-neighbor approach, where adjacent windows no more than 100 bp apart were placed into the same cluster. A maximum cluster width of 5 kbp was set to avoid chaining. The P-values for all windows in each cluster were combined using Simes’ method, and the Benjamini–Hochberg (BH) method was applied on the combined P-values from all clusters.
Analyses were conducted using R 3.2.0 and Bioconductor 3.1 for UNIX.
Full text: Click here