We constructed a set of 175 published cases of human functional regulatory regions confirmed by experiments (literature set) (Supplementary Table S4). The comparison set was built via three routes:

Use of the set of Mendelian regulatory mutations in Genomiser, obtained by careful manual curation of the scientific literature (48 (link)). The set contains 453 non-coding variants that underlie Mendelian disease, along with the relevant disease-causing genes, based on OMIM information. For the analysis we used 301 mutations, annotated by this source as residing within enhancers, promoters, and 5′-UTR, the latter including an appreciable number of suspected transcription regulatory elements (Supplementary Table S3). Redundancy was reduced by merging variants associated with the same gene and separated by ≤1 kb into a single case, leading to a total of 132 pairs of regulatory elements and genes employed in the analysis (Supplementary Table S3).

A set of 22 invivo validated heart enhancers and target genes from the cardiac enhancer catalogue (49 (link)).

Our own literature sampling, focusing on publications that experimentally identified a human enhancer and its gene target. This effort resulted with a set of 21 curated enhancer–gene pairs. When necessary, genome coordinates were converted to hg38 using CrossMap (41 (link)) and the UCSC Genome Browser (42 (link)) chain file. All records from the curated sets are described in Supplementary Table S4.

For the entire literature set, we examined: (i) whether a literature regulatory element overlaps with at least one of GeneHancer’s predicted enhancers and (ii) whether a literature target gene is identical to one of the GeneHancer targets for the overlapping enhancer. The statistical significance of the overall enhancer overlap was evaluated as described in the ‘Enhancer mining and unification’ paragraph of the Materials and methods section above.
Free full text: Click here