For each predicted target, the CWCS estimated the total repression expected from multiple sites to the same miRNA. This score was calculated using the 3′-UTR profiles to weight the marginal effect of each additional site to the miRNA while also taking into account the predicted mRNA depletion resulting from any downstream sites to the same miRNA. This approach was improved over that we used previously to calculate total wContext+ scores (Nam et al., 2014 (link)), in that it did not over-estimate the aggregate effect of multiple sites in distal isoforms. For each miRNA family, 8mer, 7mer-m8, 7mer-A1, and 6mer sites were first filtered to remove overlapping sites, and for each reference 3′ UTR, nonoverlapping sites to the same miRNA were numbered from 1 to n, starting at the distal end of the 3′ UTR. For each site i, from 1 to n, the cumulative predicted repression at that site (Ci) was calculated as Ci = C(i − 1) + (1 − 2CSi)(AIRi − C(i − 1)), in which CSi and AIRi were the context++ score and AIR of site i, and the (1 − 2CSi)(AIRi − C(i − 1)) term predicted the marginal repression of site i, in which the predicted repression at the site (1 − 2CSi) was modified based on the fraction of mRNAs containing that site (AIRi) as reduced by the mRNA depletion predicted to occur from the action of any more distal sites (C(i − 1), assigning C0 as 0). The CWCS was then calculated as log2(1 − Cn), in which Cn was the Ci at the most proximal site of the reference 3′ UTR. For each reference 3′ UTR, CWCSs were calculated for each member of a miRNA family, and the score from the member with the greatest predicted repression was chosen to represent that family, and the reference 3′ UTR with the most 3P-seq tags was chosen to represent the gene.
When scoring features that can vary with 3′-UTR length (Min_dist, Len_3UTR, and Off6m), a weighted score was used that accounted for the abundance of each 3′-UTR tandem isoform in which the site existed, as estimated from a compendium of 3P-seq datasets from the same species (Nam et al., 2014 (link)). Although 6mer sites are used to calculate cumulative weighted context++ scores, and 6mer sites are tallied in the tables, the locations of these 6mer sites are not displayed, and targets with only 6mer sites are not listed. When calculating PCT scores, the most abundant 3′-UTR isoform as defined by 3P-seq was used to determine the conservation bin to which the 3′ UTR belonged. Sites corresponding to poorly conserved and mammalian-conserved miRNA seed families or sites overlapping annotated ORF regions were assigned PCT scores of zero. For TargetScanFish, genome-wide alignment quality in zebrafish 3′ UTRs was not of sufficient quality to compute PCT scores, so a PCT value of zero was assigned to all sites when computing context++ scores. All PCT parameters and parameters for tree branch lengths and regression models, along with pre-computed context++ scores for human, mouse, zebrafish, and other vertebrate species are available for download (