Two sample localities, each comprising 20 individuals, were chosen randomly from unpublished RADseq data sets of three different, marine fish species: red snapper (Lutjanus campechanus), red drum (Sciaenops ocellatus), and silk snapper (Lutjanus vivanus). These three species are part of ongoing RADseq projects in our laboratory, and preliminary analyses indicated high levels of nucleotide polymorphisms across all populations. Double-digest RAD libraries were prepared, generally following Peterson et al. (2012) (link). Individual DNA extractions were digested with EcoRI and MspI. A barcoded adapter was ligated to the EcoRI site of each fragment and a generic adapter was ligated to the MspI site. Samples were then equimollarly pooled and size-selected between 350 and 400 bp, using a Qiagen Gel Extraction Kit. Final library enhancement was completed using 12 cycles of PCR, simultaneously enhancing properly ligated fragments and adding an Illumina Index for additional barcoding. Libraries were sequenced on three separate lanes of an Illumina HiSeq 2000 at the University of Texas Genomic Sequencing and Analysis Facility. Raw sequence data were archived at NCBI’s Short Read Archive (SRA) under Accession SRP041032.
Demultiplexed individual reads were analyzed with dDocent (version 1.0), using three different levels of final reference contig clustering (90%, 96%, and 99% similarity) in an attempt to alter the most comparable analysis variable in dDocent to match the maximum distance between stacks parameter and the maximum distance between stacks from different individuals parameter of Stacks. The coverage cut-off for assembly was 12 for red snapper, 13 for red drum, and nine for silk snapper. All dDocent runs used mapping variables of one, three, and five for match-score value, mismatch score, and gap-opening penalty, respectively. For comparisons, complex variants were decomposed into canonical SNP and Indel representation from the raw VCF files, using vcfallelicprimitives from vcflib (https://github.com/ekg/vcflib).
For analysis with Stacks (version 1.08), reads were demultiplexed and cleaned using process_radtags, removing reads with ‘N’ calls and low-quality base scores. Because dDocent inherently uses both reads for SNP/Indel genotyping, forward reads and reverse reads were processed separately with denovo_map.pl, using three different sets of parameters. The first set had a minimum depth of coverage of two to create a stack, a maximum distance of two between stacks, and a maximum distance of four between stacks from different individuals, with both the deleveraging algorithm and removal algorithms enabled. The second set had a minimum depth of coverage of three to create a stack, a maximum distance of four between stacks, and a maximum distance of eight between stacks from different individuals, with both the deleveraging algorithm and removal algorithms enabled. The third set had a minimum depth of coverage of three to create a stack, a maximum distance of four between stacks, and a maximum distance of 10 between stacks from different individuals, with both the deleveraging algorithm and removal algorithms enabled. SNP calls were output in VCF format.
For both dDocent and Stacks runs, VCFtools was used to filter out all Indel s and SNPs that had a minor allele count of less than five. SNP calls were then evaluated at different individual-coverage levels: the total number of SNPs; the number of SNPs called in 75%, 90%, and 99% of individuals at 3X coverage; the number of SNPs called in 75% and 90% of individuals at 5X coverage; the number of SNPs called in 75% and 90% of individuals at 10X coverage; and the number of SNPS called in 75% and 90% of individuals at 20X coverage. Overall coverage levels for red snapper were lower and likely impacted by a few low-quality individuals; consequently, the number of 5X and 10X SNPs shared among 90% of individuals (after removing the bottom 10% of individuals in terms of coverage) were compared instead of SNP loci shared at 20X coverage. Results from two runs of Stacks (one using forward and one using reverse reads) were combined for comparison with dDocent, which inherently calls SNPs on both reads. All analyses and computations were performed on a 32-core Linux workstation with 128 GB of RAM.
Free full text: Click here