The completeness of the genome-sequence assembly was assessed by aligning the assembly of the 167X independently generated genomic sequences from 150 individuals to the reference genome assembly using MUMmer3.23 (http://mummer.sourceforge.net/ ) with default settings. The short reads from the 150 individuals were assembled using ABySS60 (link). In addition to genome sequence alignments, the number of genes included in the sequence assembly was used as a parameter for assessing the completeness. Channel catfish genes were compared with those of 12 teleost species whose whole genome has been sequenced (Supplementary Table 6 ). Protein-coding genes of these species were retrieved from Ensembl (version 78), with exception of the genes of Cynoglossus semilaevis (Cse_v1.0), which were retrieved from NCBI. For genes with multiple splicing variants, the longest variant was used. Only genes encoding proteins of >30 amino acids were used in the analysis. First, all proteins from the channel catfish and the 12 other species were combined in an all-versus-all BLASTP comparison with maximal e-value of 1e−5. Clusters of orthologous groups among these 13 species were identified using SiLiX (ref. 61 (link)) with minimum identity of 30% and minimal sequence overlap of 50%. Comparison of gene content was conducted using BLASTP analysis with a maximal e-value of 1e−5. The predicted protein sequences of channel catfish were queried against protein sequences of each of the 12 teleost species, separately. If all members of an orthologous group of catfish proteins had no match in a given species, then the gene was deemed present in channel catfish (Catfish+) but absent from the species under comparison. Similarly, the ‘Catfish−' genes were identified through reciprocal BLASTP comparisons of protein sequences of the other 12 species against channel catfish. Because the zebrafish genome has been considered ‘complete', a similar analysis was conducted to generate ‘Zebrafish+' and ‘Zebrafish−' genes for comparison. The correctness/accuracy of the sequence assembly was assessed by comparing SNP marker positions on the genetic map versus those on the genomic sequence scaffolds using positions determined as above with MUMmer. In addition, the mate-paired BES that aligned within a single scaffold were used to assess assembly accuracy.
Full text: Click here