Comprehensive Annotation of Rainbow Trout Genome

Repeated regions of the assembly (37.8%) were masked against: (i) a collection of 634 motifs that we characterized using RepeatMasker ( http://www.repeatmasker.org), (ii) low complexity sequences using DUST33 (link), (iii) tandem repeats using Tandem Repeat Finder34 (link), (iv) teleost repeats from RepBase35 (link), and (v) simple repeats using Repeat Masker. In addition, we integrated predictions of repeated motif from RepeatScout36 (link) in the final gene prediction models (Supplementary Methods).
To refine exon/intron junction locations, 305,000 teleost protein sequences from Uniprot37 (link) and Ensembl38 (link) were aligned on the genome sequence using the BLAT algorithm39 (link) to first select the best match (plus matches greater than 0.8X best matches) and each matched protein was then realigned using Genewise40 (link) on the same trout genomic region. 93% of these teleost proteins matched at 41,300 different genomic loci in the rainbow trout genome assembly.
For building gene models, rainbow trout GenBank mRNA sequences were aligned onto the genome assembly using BLAT39 (link) and est2genome41 (link) resulting in 93% of mapping of these 421,414 mRNA sequences. Only the best matches with at least 90% of nucleotide identity were kept. On average, similarity level was 97.8% and half of these alignments supported splicing evidence, with an average of 2.5 exons per mRNA. We also used publicly available rainbow trout Roche 454 EST sequences available in SRA (accession number SRX007396) that were assembled using Newbler, and aligned using blat and est2genome with the same setting used for mRNAs. A total of 97% of these cDNA contigs were mapped on the rainbow trout assembly at 45,600 different genomic loci. In addition, we generated Illumina reads of different tissue transcriptomes (see below) that were also used to predict exon/intron structure on the genome assembly using gMorse42 (link). Using all these resources we predicted 69,676 transcripts with an average size of 4.8 Kb (median size of 2.1 Kb), and an average exon number of 6.7 (median=4). Overall, 7.7% of the assembly is targeted by a transcriptional signal.
Final gene models were built using Gaze43 (link) leading to 55,735 gene models with an average of 6 exons per gene (median=4). At the genome level, coding bases cover 3% of the assembly. Because 3,088 exons were overlapping gaps in the assembly, we inserted in-frame introns to avoid a long stretch of N letters in the corresponding protein sequences. We also tagged 585 genes that still contained transposable elements despite repeated cleaning procedures. In summary, the final gene set can be categorized into 4 classes of decreasing confidence level: (i) 46,585 protein-coding gene models with supporting protein evidence from other vertebrates (Supplementary Table 7), (ii) 6,789 genes lacking protein evidence without any assembly gap and with a transcriptional signal deduced from cDNA, (iii) 1,451 genes lacking protein evidence, without any assembly gap, and without a transcriptional signal deduced from cDNA, and (iv) 890 genes lacking protein evidence which overlap assembly gaps.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Berthelot C., Brunet F., Chalopin D., Juanchich A., Bernard M., Noël B., Bento P., Da Silva C., Labadie K., Alberti A., Aury J.M., Louis A., Dehais P., Bardou P., Montfort J., Klopp C., Cabau C., Gaspin C., Thorgaard G.H., Boussaha M., Quillet E., Guyomard R., Galiana D., Bobe J., Volff J.N., Genêt C., Wincker P., Jaillon O., Crollius H.R, & Guiguen Y. (2014). The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nature Communications, 5, 3657.

Publication 2014

Cdna Exons Frame Genes Genome I protein Introns Mrnas Nucleotide Protein Protein sequences Rainbow trout Tandem repeat Tissue Transcriptional Transcriptomes Transposable elements Trout Vertebrates

Corresponding Organization :

Other organizations : Genoscope, Institut de Génomique Fonctionnelle de Lyon, École Normale Supérieure de Lyon, Fish Physiology and Genomics Institute, Génétique Animale et Biologie Intégrative, Génétique Physiologie et Systèmes d'Elevage, Université d'Évry Val-d'Essonne, Génomique Métabolique du Genoscope

Top 5 similar protocols

Protocol cited in 20 other protocols

Variable analysis

independent variables

Repeated regions of the assembly (37.8%) were masked against: (i) a collection of 634 motifs that we characterized using RepeatMasker, (ii) low complexity sequences using DUST33, (iii) tandem repeats using Tandem Repeat Finder34, (iv) teleost repeats from RepBase35, and (v) simple repeats using Repeat Masker.

dependent variables

Not explicitly mentioned.

control variables

Not explicitly mentioned.

controls

Positive controls: None mentioned.
Negative controls: None mentioned.

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!