Gene–enhancer associations were generated based on five methods: eQTLs (45 (link)), eRNA co-expression (22 (link)), TF co-expression, capture Hi-C (CHi-C) (30 (link)) and gene target distance, all of which are described in the Supplementary Methods. Subsequently, a score SGE was calculated for each gene–enhancer link, to estimate the strength of such connection. SGE is defined as:
SGE=-Log10pg+SC+cf
where pg is a combined P-value for eQTLs, eRNA co-expression and TF co-expression, computed by Fisher’s combined probability test via a χ2 test statistic (46 ). The second term (SC) represents the CHi-C score as provided by the source, constituting the logarithm of the ratio of observed to expected read counts (30 (link)). The third term is related to enhancer–gene distance, where c is a normalization score based on the average score from the first two terms across all gene–enhancer connections. To compute f we draw a gene–enhancer distance distribution (Supplementary Figure S8), and obtain f as the fraction of enhancers in the distance bin of the specific gene–enhancer pair. Gene–enhancer distances are computed between a gene’s TSS and the mid-point of an enhancer, and the distribution employed for the purpose of computing f excludes values from the CHi-C method, which lacks information in the crucial range of 0–20 kb.
Our method for computing SGE is on the whole unbiased, and minimally involves arbitrary weighting factors. The three scores for eQTLs, eRNA co-expression and CHi-C are based on the reported summary statistics and the significance thresholds used in the original studies. For TF co-expression we computed P-values as shown in the Supplementary Methods (‘Transcription factor co-expression analysis’ paragraph). When possible, P-values were combined in a meta-analytic fashion, using the widely utilized Fisher’s combined probability test.
Free full text: Click here