We used datasets encompassing open chromatin (ATAC-seq) and active enhancers (H3K27ac ChIP-seq) experimental datasets of H9-hESC (Day 0), MGE-like progenitors (Day 26), and inhibitory-like interneurons (Day 39). We used Model-based Analysis of ChIP-Seq (MACS) (Feng et al., 2012 (link)) for peak calling to identify open chromatin and H3K27ac-enriched genomic regions based on raw sequencing files (GEO; accession number GSE218668). Then, we identified active and non-active regions using REPTILE which locates enhancers based on genome-wide DNA methylation and histone modification profiling (He et al., 2017 (link)). As methylation data was not available in our study, we only used the H3K27ac epigenetic mark, which is associated with active enhancers. We trained REPTILE on ChIP-seq experiments conducted in mouse embryonic stem cells, which were provided as example files with the REPTILE software package. This training data included a H3K27ac ChIP-seq dataset in bigwig format, and a ground truth file with annotations of active and non-active enhancers. We trained REPTILE to identify active enhancers in open chromatin regions based on the H3K27ac mark alone (Figure 2A). The output of REPTILE is a set of predicted active enhancers among the input open chromatin regions. From the regions defined by REPTILE, we further extracted the putative active enhancers that overlap an H3K27ac ChIP-seq peak, and putative non-active (poised/repressed) enhancers that do not overlap any H3K27ac ChIP-seq peak (Figure 2A). To extract the sequences corresponding to the genomic coordinates, we used BEDTools (Quinlan and Hall, 2010 (link)), an efficient tool to analyze and process large genomic datasets. Since deep neural networks require fixed-size samples, we set all sequence lengths to be the length of the shortest sample size in the set. For length N, we selected N/2 nucleotides upstream and downstream of the center of each peak (Figure 2A). We set the sample size of the dataset to the shortest sample size, which was 500 nt for Day 26 and Day 39 and 101 nt for Day 0.
Free full text: Click here