Libraries were obtained from the NCBI Gene Expression Omnibus using the accession number GSE53490 (19 (link)). Two replicate libraries were present to examine H3K4me3 marking in each of four biological conditions—treated wild-type, treated knockout, untreated wild-type and untreated knockout. This corresponds to Sequence Read Accession files SRR1055323 to SRR1055330, which were converted to FASTQ with the fastq-dump utility from the SRA toolkit. Reads were aligned to the mm10 build of the mouse genome, using Subread v1.4.6 (20 (link)) in paired-end mode. Unique mapping was turned on, and any ties were broken with the Hamming distance. BAM files were sorted and indexed using SAMtools v0.1.19 (http://samtools.sourceforge.net). DB detection methods were then applied to identify changes in marking between conditions.
To detect DB events with DiffBind, peaks were called in each library using MACS or HOMER in histone mode. In both cases, the same parameters were used as described for the histone mark simulations, though the fragment length was set to 200 bp based on the insert sizes of proper read pairs in each library (see Supplementary Figure S1). A consensus peak set was constructed using DiffBind, as previously described. Counting was performed for read pairs through the summarizeOverlaps function, without any removal of duplicates. Parallelization was also turned off to simplify processing. Contrasts between groups were set up with minMembers of 2 and the statistical analysis was performed with edgeR.
For csaw, properly paired reads were identified as inward-facing intra-chromosomal pairs that were no more than 600 bp apart. The interval spanned by each proper pair represents the fragment from which the reads were sequenced. The number of fragments overlapping each 150 bp window was counted for each library. Again, the starts of adjacent windows were separated by 50 bp. Background-based filtering of windows was performed as described above. Windows in unassigned contigs or the mitochrondrial genome were also discarded. To remove composition biases, fragments were counted into 10 kbp bins. Normalization factors were computed from these counts, using the trimmed mean of M-values (TMM) method without precision weighting. These factors were used to scale the library sizes when testing for DB windows in edgeR's QL framework. Clustering of windows into genomic regions was performed, P-values were combined for each region and the BH method was applied as previously described. This analysis was repeated with 1500 bp windows for a low-resolution analysis, where the starts of adjacent windows were separated by 500 bp.
Free full text: Click here