Predicting Tissue-Specific Enhancer Activity

We built a deep convolutional neural network to predict tissue-specific enhancer activity directly from the enhancer DNA sequence. The DLM comprises five convolution layers with 320, 320, 240, 240, and 480 kernels, respectively (table S14). Higher-level convolution layers receive input from larger genomic ranges and are able to represent more complex patterns than the lower layers. The convolutional layers are followed by a fully connected layer with 180 neurons, integrating the information from the full length of 1000-bp sequence. In total, the DLM has 3,631,401 trainable parameters. We used the Python library Keras version 2.4.0 (https://github.com/keras-team/keras) to implement our model.
The model was trained for each of the four temporal-spatial groups of enhancers (CS16, CS23, F2F, and F2O). The positive sets contain the human embryonic enhancers of each group. The DHS profiles of non–CNS-related and nonembryonic tissues from Roadmap Epigenomics projects (55 (link)), which do not overlap the positive sets, were collected as the negative training set of the DL model. The reason we used DHS sites not overlapping embryonic neocortex H3K27ac peaks as negative control regions is that we aim to identify tissue-specific enhancers of embryonic neocortex, and DHS is a good representation of active chromatin. The fact that DHS in general overlaps H3K27ac makes it a stringent control, and in fact, our choice of DHS as the control is analogous to DeepSEA, which uses the genomic regions not overlapping the positive set and with at least one TF binding as the negative set, which broadly overlap with DHS regions.
Training and testing sets were split by chromosomes. Chromosomes 8 and 9 were excluded from training to test prediction performances. Chromosome 6 was used as the validation set, and the rest of the autosomes were used for training. Each training sample consists of a 1000-bp sequence (and their reverse complement) from the human GRCh37 (hg19) reference genome. Larger DL score of the genomic sequence corresponds to a higher propensity to be an active enhancer. The genomic sequence with DLM score ≥ 0.197 (FPR ≤ 0.1) is predicted to be active enhancers. We used the difference of the DLM score induced by a human-macaque single-nucleotide mutation to estimate its impact on enhancer activity.
Given a human (hg19) or macaque (rheMac2) enhancer, we used liftOver (56 (link)) to identify their orthologs. Only the reciprocal counterparts with their length difference no more than 50 bp were considered to be ortholog pairs. For a human sequence with n mutations relative to its macaque ortholog, to score the impact of combinations of m (m < n) mutations on enhancer activity, all possible combinations of m (n choose m) human alleles at the human-macaque mutation sites were introduced to the macaque orthologs if the total number of combinations (n choose m) is no more than 10,000; otherwise, we randomly sample 10,000 combinations of m human alleles from the human-macaque mutation sites and introduce them to the macaque ortholog. The change of DL score caused by the set of introduced human mutations was used to estimate their impact on enhancer activity.
We applied the same convolutional neural network architecture to build a HepG2 enhancer (H3K27ac peaks centered by DNase peaks) classifier. Next, we further used the HepG2 DLM to evaluate the allele-specific effects on enhancer activity using raQTLs (52 (link)).

Free full text: Click here

Li S., Hannenhalli S, & Ovcharenko I. (2023). De novo human brain enhancers created by single-nucleotide mutations. Science Advances, 9(7), eadd2911.

Publication 2023

Allele Chromatin Chromosome 6 Chromosomes Chromosomes 8 Dnase Embryonic Enhancer dna sequence Genomic Human Library Macaque Mutations Neocortex Neurons No bp Nucleotide Python Tissue specific Tissues

Corresponding Organization :

Other organizations : National Institutes of Health, National Cancer Institute, Center for Cancer Research, National Center for Biotechnology Information

Top 5 similar protocols

Variable analysis

independent variables

Enhancer DNA sequence

dependent variables

Tissue-specific enhancer activity
Change in DL (deep learning) score caused by single-nucleotide mutations

control variables

DHS (DNase I hypersensitive sites) profiles of non-CNS-related and non-embryonic tissues as the negative training set
Positive sets containing the human embryonic enhancers of each group (CS16, CS23, F2F, and F2O)

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!