As a starting point for comparative genome analyses, we integrated predicted trout genes in vertebrate gene families based on Ensembl version 66 (February 2012)50 (
link). The 46,585 predicted trout proteins were compared against 13,264 gene families from 14 representative vertebrate species comprising mammals, birds and fish (
Supplementary Fig. 6). Trout genes were included in 8,739 vertebrate gene trees (
Supplementary Table 7). By comparison, other genes from other vertebrate genomes are included in 7,131 (takifugu) to 9,453 (Human) gene families, suggesting that annotated trout genes cover the vast majority of vertebrate gene families. A dedicated Genomicus server (
http://www.genomicus.biologie.ens.fr/genomicus-trout-01.01/) provides access to trout genes and their phylogenetic trees, as well as syntenic relationships with other genomes (
Supplementary Fig. 7).
DCS blocks are defined as runs of genes in a non-salmonid (that is, non-duplicated by the Ss4R event) genome that are distributed on two different chromosomes (or non-anchored scaffolds) in the rainbow trout genome; the exact gene order does not need to be conserved. We systematically compared the gene locations in rainbow trout with those of medaka, stickleback, tetraodon and takifugu using
ad-hoc scripts to identify pairs of regions in the rainbow trout genome that are syntenic with single regions in non-salmonid species, and that correspond to DCS blocks. Pairs of paralogous trout genes on two different chromosomes (or non-anchored scaffolds) that belong to a DCS block are most likely duplicates originating from the Ss4R WGD event and are called ohnologues; there were 6,733 pairs of ohnologues. Genes that are inserted in a DCS block based on synteny with a non-salmonid species, but have no paralogous gene on the other chromosome or scaffold, are most likely former Ss4R duplicates in which one of the duplicated genes was lost, and are called singletons. Each pair of duplicated regions within a DCS block is descended from a single ancestral region in the pre-duplication genome. The organization of these ancestral regions into an ancestral chromosome was deduced from the synteny relationships with non-salmonid genomes using a clustering method implemented in Walktrap51 . The Ts3R-duplicated regions in the ancestral karyotype were obtained by orthology with the Ts3R-duplicated regions in the medaka genome, which were themselves deduced from the DCS blocks between the medaka and chicken genomes obtained as described above. DCS blocks can be very short, as they are dependent on assembly continuity and scaffold anchoring. Fine-scale analysis of duplicated regions and genes was restricted to 915 scaffolds that could be paired into 569 DCS blocks for at least part of their lengths, and that share at least 4 ohnologous genes. The longest scaffold in these DCS blocks is 5,466,130 bp long and the shortest is 25,207 bp long. These 915 scaffolds contain a total of 171 miRNAs and 13,352 genes (29% of the trout genome), of which 8,624 are ohnologues and 4,728 are singletons. These scaffolds were aligned using LastZ52 , resulting in 85,050 local alignments with a mean identity of 86.7%.
To better understand the fate of inactivated gene copies, protein sequences predicted from a given gene model were also aligned to their paralogous region using exonerate53 (
link) with the ‘—model protein2genome’ option (
Supplementary Methods). Rates of gene loss since the Ts3R WGD were calculated by linear extrapolation.
Berthelot C., Brunet F., Chalopin D., Juanchich A., Bernard M., Noël B., Bento P., Da Silva C., Labadie K., Alberti A., Aury J.M., Louis A., Dehais P., Bardou P., Montfort J., Klopp C., Cabau C., Gaspin C., Thorgaard G.H., Boussaha M., Quillet E., Guyomard R., Galiana D., Bobe J., Volff J.N., Genêt C., Wincker P., Jaillon O., Crollius H.R, & Guiguen Y. (2014). The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nature Communications, 5, 3657.