When partitioning miRNA families according to their conservation level, we began with a high-confidence set of human miRNAs supported by small-RNA sequencing (T Tuschl, personal communication) that shared nucleotides 2–8 with a mouse miRNA supported by small-RNA sequencing (Chiang et al., 2010 (link)). We then extracted 100-way multiz alignments of each mature miRNA from the UCSC Genome Browser and counted the number of species for which nucleotides 2–8 of the miRNA did not change. As an initial pass, those conserved among ≥40 species were classified as mammalian conserved, and those conserved among >60 species were classified as more broadly conserved among vertebrate species. Due to poorer quality alignments for more distantly related species, this procedure misclassified several more broadly conserved miRNAs as mammalian conserved. Therefore, mammalian conserved miRNAs that aligned with >90% homology to a mature miRNA from chicken, frog, or zebrafish, as annotated in miRBase release 21 (Kozomara and Griffiths-Jones, 2014 (link)), were re-classified as more broadly conserved. In addition, miR-489 was included in the broadly conserved set of TargetScanHuman (but not TargetScanMouse) despite having a seed substitution in mouse.
Some mammalian pri-miRNAs give rise to two or three abundant miRNA isoforms that have different seeds, either because both strands of the miRNA duplex load into Argonaute with near-equal efficiencies or because processing heterogeneity gives rise to alternative 5′ termini (Azuma-Mukai et al., 2008 (link); Morin et al., 2008 (link); Wu et al., 2009 (link); Chiang et al., 2010 (link)). To annotate these abundant alternative isoforms, we identified all isoforms expressed at ≥33% of the level of the most abundant isoform, as determined from high-throughput sequencing (allowing for 3′ heterogeneity within each isoform). These isoforms were carried forward as mammalian conserved isoforms if they also satisfied this property in the mouse small-RNA sequencing data (Chiang et al., 2010 (link)), and as broadly conserved isoforms if they satisfied this property in zebrafish small-RNA sequencing data available in miRBase release 21. Adhering to the miRNA naming convention, if two isoforms mapped to the 5′ and 3′ arms of the hairpin they were named ‘–5p’ and ‘–3p’, respectively, and if two isoforms were processed from the same arm they were named ‘.1’ and ‘.2’ in decreasing order of their abundance, as detected in the human.
All mature miRNAs were downloaded from miRBase release 21 (Kozomara and Griffiths-Jones, 2014 (link)). Those that matched a conserved miRNA at nucleotides 2–8 were considered part of that miRNA family. All miRNAs and miRNA isoforms annotated in miRBase but not meeting our criteria for conservation in mammals or beyond were also grouped into families based on the identity of nucleotides 2–8 and were classified as poorly conserved miRNAs (which included many small RNAs misclassified as miRNAs). The miRNA seed families and associated conservation classifications are available for download at TargetScan (targetscan.org).
Free full text: Click here