Genome assembly and annotation of Labeo rohita

Nanopore sequence data was filtered to remove the control lambda-phage and sequences shorter than 1,000 bases using the nanopack tool suite [v1.0.1] (De Coster et al. 2018 (link)). Trimmomatic [v0.32] (Bolger et al. 2014 (link)) was used to remove adapters, trim low-quality bases, and filter out reads shorter than 85 bp. The filtered nanopore data were assembled into contigs using wtdbg2 [v2.4] (Ruan and Li 2020 (link)). The contigs were polished using two iterations of racon [v1.4.0] (Vaser et al. 2017 (link)) with minimap2 [v2.17] (Li 2018 (link)) mapping the nanopore reads. The contigs were further polished with Illumina paired-end read data using pilon [v1.23] (Walker et al. 2014 (link)) with bwa [v0.7.10] (Li 2013 ) mapping the Illumina paired reads. The resulting contigs were scaffolded using Bionano Solve [Solve3.4.1_09262019] using the optical mapping data generated from the Saphyr run. SALSA [v2.3] (Ghurye et al. 2019 (link)) was used to produce super-scaffolds using the Hi-C library and the Bionano scaffolded sequences. Those scaffolds larger than 10Mb were linked and oriented based on the Onychostoma macrolepis genome (Sun et al. 2020 (link)), the chromosome assembly most similar to L. rohita available on NCBI, using RagTag [v1.1.1] (Alonge et al. 2022 (link)).
RepeatModeler [v2.0.1] (Flynn et al. 2020 (link)) and RepeatMasker [v4.1.1] (Smit et al. 2013 ) were used to create a species-specific repeat database, and this database was subsequently used by RepeatMasker to mask those repeats in the genome. All available RNA-seq libraries for L. rohita (comprising brain, pituitary, gonad, liver, pooled, and whole body tissues for both sexes; Supplementary Table 1) were downloaded from NCBI and mapped to the masked genome using hisat2 [v2.1.0] (Kim et al. 2019 (link)). These alignments were used in both the mikado [v2.0rc2] (Venturini et al. 2018 (link)) and braker2 [v2.1.5] (Brůna et al. 2021 (link)) pipelines. Mikado uses putative transcripts assembled from the RNA-seq alignments generated via stringtie [v2.1.2] (Kovaka et al. 2019 (link)), cufflinks [v2.2.1] (Trapnell et al. 2012 (link)), and trinity [v2.11.0] (Grabherr et al. 2011 (link)) along with the junction site prediction from portcullis [v1.2.2] (Mapleson et al. 2018 (link)), the alignments of the putative transcripts with UniprotKB Swiss-Prot [v2021.03] (The UniProt Consortium 2021 (link)), and the ORFs from prodigal [v2.6.3] (Hyatt et al. 2010 (link)) to select the best representative transcript for each locus. Braker2 uses those RNA-seq alignments and the gene prediction from GeneMark-ES [v4.61] (Borodovsky and Lomsadze 2011 (link)) to train a species-specific Augustus [v3.3.3] (Stanke et al. 2006 (link)) model. Maker2 [v2.31.10] (Holt and Yandell 2011 (link)) predicts genes based on the new Augustus, GeneMark, and SNAP models derived from Braker2 along with the Mikado predicted transcripts as an external ab-initio source, modifying the predictions based on the available RNA and protein evidence from the Cyprinidae family in the NCBI RefSeq database. Any predicted genes with an annotation edit distance (AED) above 0.47 were removed from further analysis. The remaining genes were functionally annotated using InterProScan [v5.47-82.0] (Jones et al. 2014 (link)) and BLAST + [v2.9.0] (Camacho et al. 2009 (link)) alignments against the UniprotKB Swiss-Prot database. BUSCO [v5.2.2] (Manni et al. 2021 (link)) was used to verify the completeness of both the genome and annotations against the actinopterygii_odb10 database. Lastly, genes spanning large gaps or completely contained within another gene on the opposite strand were removed using a custom Perl script (https://github.com/IGBB/rohu-genome/).

Free full text: Click here

Arick MA I.I., Grover C.E., Hsu C.Y., Magbanua Z., Pechanova O., Miller E.R., Thrash A., Youngblood R.C., Ezzell L., Alam M.S., Benzie J.A., Hamilton M.G., Karsi A., Lawrence M.L, & Peterson D.G. (2023). A high-quality chromosome-level genome assembly of rohu carp, Labeo rohita, and its utilization in SNP-based exploration of gene flow and sex determination. G3: Genes|Genomes|Genetics, 13(3), jkad009.

Publication 2023

Body tissues Brain Chromosome Cyprinidae Gene Genes annotation Genome Gonad Lambda phage Library Liver Optical Orfs Protein Rna seq Sexes Walker

Corresponding Organization : Mississippi State University

Other organizations : Iowa State University, Bangladesh Agricultural University, WorldFish

Top 5 similar protocols

Variable analysis

independent variables

Nanopore sequence data filtering
Trimmomatic processing
Genome assembly using wtdbg2
Genome polishing using Racon and Pilon
Genome scaffolding using Bionano Solve and SALSA
Genome alignment and orientation using RagTag
Repeat masking using RepeatModeler and RepeatMasker
RNA-seq data alignment using Hisat2
Gene prediction using Mikado and Braker2
Gene annotation using Maker2, InterProScan, and BLAST+

dependent variables

Genome assembly quality and completeness
Gene prediction accuracy and completeness
Functional annotation of predicted genes

control variables

Removal of control lambda-phage sequences
Filtering out sequences shorter than 1,000 bases
Filtering out reads shorter than 85 bp
Removing genes with annotation edit distance (AED) above 0.47
Verifying genome and annotation completeness using BUSCO

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!