NextSeq libraries were sequenced on the Illumina NextSeq 500, with a NextSeq 500/550 High Output Reagent Kit v2 (300 cycles), in accordance with the standard Illumina sequencing protocols resulting in over 600 million read pairs. Raw reads were converted from FastQ to BAM format using Picard Tools (v. 2.7.1) and SAMtools (v. 1.5)38 (link) and duplicate reads were removed using Picard Tools (https://github.com/broadinstitute/picard). Low-quality reads were removed using the trimBWAstyle.usingBam.pl script from the Bioinformatics Core at UC Davis Genome Center (https://github.com/genome/genome/blob/master/lib/perl/Genome/Site/TGI/Hmp/HmpSraProcess/trimBWAstyle.usingBam.pl). Specifically, bases with a quality score less than Q30 were trimmed and resulting reads shorter than 105bp were discarded.
Panphlan (v. 1.2.2.2)39 (link) was used to build a pangenome database of all freely available fish genome sequences. Host contamination was identified by aligning the quality-trimmed whole-metagenome sequencing reads to this pangenome database with Bowtie2 (v. 2.2.9).40 (link) The resulting alignments were converted to BAM format and host reads removed by SAMtools before conversion to FastQ format using BEDtools.41 (link) FastQ files were converted to FastA using the fq2fa script packaged with IDBA-UD (v. 1.1.1).42 (link)
Free full text: Click here