Annotated mitochondrial and plastid rrn5 genes (alternatively designated rrf in plastid genomes) were retrieved from GenBank (complete mitochondrial and plastid genome sections: http://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=2759&opt=organelle and http://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=2759&opt=plastid, version 22 July 2014). The Chondrus crispus mitochondrial rrn5 (NCBI Gene ID 7020988) was removed from the downloaded sequences, because the authors’ gene assignment has been disputed (17 ). Similarly, the Bryopsis hypnoides plastid rrn5 (Gene ID 8463250) was removed due to the evidently incorrect gene annotation [both Basic Local Alignment Search Tool (BLAST) and CM searches identify a different locus in the plastid genome as bona fide rrn5]. Mitochondrial and plastid gene sequences were aligned separately with MUSCLE v3.6 (18 (link)) and incorporated into the Genetic Data Environment (GDE) sequence editor (19 (link)). Multiple alignments were then inspected by eye and manually adjusted in a few regions to improve primary sequence plus secondary structure fit, the latter assisted by minimum energy secondary structure predictions with RNAalifold (20 (link)). The verified annotated sequences include 108 mtDNA-encoded and 500 ptDNA-encoded rrn5 genes (Supplementary Table S1; marked by ‘+’ in the ‘Annotation’ column). These data sets, referred to as the mt-gene test set and the pt-gene test set, were used for developing and testing CMs. For building the models, sequence alignments of test set rrn5 sequences served as input for the Cmbuild and Cmcalibrate programs of Infernal v1.1, after masking columns that are not reliably aligned (15 (link)). The Cmbuild option ‘- -hand’ ensures that only the confidently aligned sequence positions are used for building mitochondrion- and plastid-specific CMs (referred to as mt-5S and pt-5S models, respectively). Use of the tree weighting option ‘- -wgsc’ (21 (link)) increases the chance of detecting sequences in an organismal group that is less well represented in the seed alignment. With these two basic CMs, we searched for rrn5 genes in individual organelle genome sequences by employing Cmsearch with default settings, i.e. local alignment, an inclusion (significance) E-value threshold of 10−02 and a reporting E-value threshold of 10.
Organelle rrn5 sequences discovered and validated in the course of our analyses (see below) were included in an additional CM (mtAT-5S) based on a wide taxonomic sampling and a focus on derived and A+T-rich 5S rRNAs that are less effectively identified with the basic mt-5S model. A fourth model has been developed (mtPerm-5S) based on the permuted 5S rRNAs encoded by mtDNAs from brown algae and potentially several other stramenopiles. All models will be made available (together with the seed sequence alignments) in the Rfam database. They will be also included in our automated organelle genome annotation tool MFannot (http://megasun.bch.umontreal.ca/cgi-bin/mfannot/mfannotInterface.pl).