Transition of histone marks near subcompartment boundaries. Epigenomic marks can serve as indicators of the overall accuracy of predicted annotations, even though they are not perfectly predictive of subcompartment state. We compiled histone marks ChIP-seq fold change in genomic regions within 400 kb of subcompartment boundaries, defined as nucleotide positions where subcompartment annotations of adjacent 100 kb chromatin regions are different.
Conserved and dynamic subcompartment annotations across multiple cell types. In this work, we applied SNIPER to nine cell lines—GM12878, K562, IMR90, HeLa, HUVEC, HMEC, HSPC, T cells, and HAP1—to determine regions with more conserved or more dynamic subcompartment annotations across multiple cell types. We divide subcompartment annotations in thirteen conservation states based on the entropy of each 100 kb region cross cell type annotations as follows: Si=c=1Cpi,clogpi,c pi,c=j=1Nδ(ai,j,c)N where Si is the total entropy of region i subcompartment annotations, summed over the entropy of all C subcompartments. The fraction of subcompartment c at region i , pi,c , is computed by counting the number of occurrences of subcompartment c over all N cell types, j=1Nδ(ai,j,c) , and dividing by the total number of cell types N . δ(ai,j,c)=1 if the annotation ai,j of cell type j is equal to c at region i .
Because annotations are discrete, Eqs. 10 and 11 yielded 23 possible entropy values, each corresponding to a unique distribution of annotations across cell types. Of these 23 states, 11 are associated with fewer than 5 out of 9 cell types sharing the same subcompartment annotation. The 11 states without a majority subcompartment are merged into a single non-conserved (NC) state. We sort the remaining 13 states in order of entropy, with the lowest entropy state 1 denoting the most conserved cross cell type regions, and the higher-numbered states denoting less conserved and more dynamic regions.
To represent subcompartment conservation and dynamics, we computed information content of each 100 kb region. Information content is computed similar to entropy, but normalizing subcompartment-specific fractions by a background probability within the logarithm term: ICi,c=pi,clogpi,cqc where ICi,c is the information content of subcompartment c at region i , pi,c is computed in Eq. 11, and qc=0.2 is the background probability of subcompartments assuming uniform subcompartment distribution. High information content corresponds to regions with more conserved annotations while low information content corresponds to more dynamic regions across cell types.
Free full text: Click here