Whole genome shotgun sequences were generated using an Illumina HiSeq platform, from DNA libraries of the killer whale, walrus, manatee and bottlenose dolphin. The dolphin had previously been Sanger sequenced at 2× coverage and library and sequencing protocols have been previous described22 . The dolphin assembly was produced by assembling the ~2.5× Sanger data with ~ 3.5× Roche 454 FLX fragment data and ~30× Illumina HiSeq data. The Sanger and 454 data were combined with the Atlas assembler and then Atlas-Link23 and Atlas-GapFill24 were used to add the Illumina data and improve the scaffolds and fill intra-scaffold gaps.
The de-novo assemblies were produced using methods similar to those used in the Assemblathon II comparison. An initial assembly was generated using AllPath-LG with default parameters and MIN_CONTIG=300 and all sequence data except the 500 bp insert data. The assembled scaffolds from the initial assembly were further extended using Atlas-Link based upon the linking information provided from the 3 kb and 8 kb libraries. Atlas-GapFill was then used to fill gaps within scaffolds by locally assembling the reads associated with each gap. For the killer whale and walrus respectively, these reads were assembled into draft genomes with contig N50 sizes of 70.3 kb and 90.0 kb, and scaffold N50 sizes of 12.7 Mb and 2.6 Mb (Supplementary Table 1). The assemblies of 2,249 Mb and 2,300 Mb cover approximately 85% and 95% of the estimated 2,373 Mb killer whale and 2,400 Mb walrus genomes respectively. The improved dolphin assembly contig N50 is 11.9 kb and the scaffold N50 is 115 kb. The total assembled size of the genome is 2.33 Gb (2.55 Gb with gaps) and covers ~95.3% of the genome.
Sequencing and assembly of the manatee varied slightly from the other marine mammals: the manatee’s DNA was sequenced to 90× total coverage by Illumina sequencing technology comprising 45× coverage of 180 bp fragment libraries, 42× coverage of 3 kb sheared jumping libraries, 2× coverage of 6–14 kb sheared jumping libraries, and 1× coverage of Fosill jumping libraries (PMID: 22800726). The sequence was then assembled using ALLPATHS-LG (PMID: 21187386). The draft assembly is 3.10 Gb in size and is composed of 2.77 Gb of sequence plus gaps between contigs. The manatee genome assembly has a contig N50 size of 37.8 kb, a scaffold N50 size of 14.4 Mb, and quality metrics comparable to other Illumina genome assemblies.