Comparative RADseq Analysis in Marine Fishes

Two sample localities, each comprising 20 individuals, were chosen randomly from unpublished RADseq data sets of three different, marine fish species: red snapper (Lutjanus campechanus), red drum (Sciaenops ocellatus), and silk snapper (Lutjanus vivanus). These three species are part of ongoing RADseq projects in our laboratory, and preliminary analyses indicated high levels of nucleotide polymorphisms across all populations. Double-digest RAD libraries were prepared, generally following Peterson et al. (2012) (link). Individual DNA extractions were digested with EcoRI and MspI. A barcoded adapter was ligated to the EcoRI site of each fragment and a generic adapter was ligated to the MspI site. Samples were then equimollarly pooled and size-selected between 350 and 400 bp, using a Qiagen Gel Extraction Kit. Final library enhancement was completed using 12 cycles of PCR, simultaneously enhancing properly ligated fragments and adding an Illumina Index for additional barcoding. Libraries were sequenced on three separate lanes of an Illumina HiSeq 2000 at the University of Texas Genomic Sequencing and Analysis Facility. Raw sequence data were archived at NCBI’s Short Read Archive (SRA) under Accession SRP041032.
Demultiplexed individual reads were analyzed with dDocent (version 1.0), using three different levels of final reference contig clustering (90%, 96%, and 99% similarity) in an attempt to alter the most comparable analysis variable in dDocent to match the maximum distance between stacks parameter and the maximum distance between stacks from different individuals parameter of Stacks. The coverage cut-off for assembly was 12 for red snapper, 13 for red drum, and nine for silk snapper. All dDocent runs used mapping variables of one, three, and five for match-score value, mismatch score, and gap-opening penalty, respectively. For comparisons, complex variants were decomposed into canonical SNP and Indel representation from the raw VCF files, using vcfallelicprimitives from vcflib (https://github.com/ekg/vcflib).
For analysis with Stacks (version 1.08), reads were demultiplexed and cleaned using process_radtags, removing reads with ‘N’ calls and low-quality base scores. Because dDocent inherently uses both reads for SNP/Indel genotyping, forward reads and reverse reads were processed separately with denovo_map.pl, using three different sets of parameters. The first set had a minimum depth of coverage of two to create a stack, a maximum distance of two between stacks, and a maximum distance of four between stacks from different individuals, with both the deleveraging algorithm and removal algorithms enabled. The second set had a minimum depth of coverage of three to create a stack, a maximum distance of four between stacks, and a maximum distance of eight between stacks from different individuals, with both the deleveraging algorithm and removal algorithms enabled. The third set had a minimum depth of coverage of three to create a stack, a maximum distance of four between stacks, and a maximum distance of 10 between stacks from different individuals, with both the deleveraging algorithm and removal algorithms enabled. SNP calls were output in VCF format.
For both dDocent and Stacks runs, VCFtools was used to filter out all Indel s and SNPs that had a minor allele count of less than five. SNP calls were then evaluated at different individual-coverage levels: the total number of SNPs; the number of SNPs called in 75%, 90%, and 99% of individuals at 3X coverage; the number of SNPs called in 75% and 90% of individuals at 5X coverage; the number of SNPs called in 75% and 90% of individuals at 10X coverage; and the number of SNPS called in 75% and 90% of individuals at 20X coverage. Overall coverage levels for red snapper were lower and likely impacted by a few low-quality individuals; consequently, the number of 5X and 10X SNPs shared among 90% of individuals (after removing the bottom 10% of individuals in terms of coverage) were compared instead of SNP loci shared at 20X coverage. Results from two runs of Stacks (one using forward and one using reverse reads) were combined for comparison with dDocent, which inherently calls SNPs on both reads. All analyses and computations were performed on a 32-core Linux workstation with 128 GB of RAM.

Free full text: Click here

Puritz J.B., Hollenbeck C.M, & Gold J.R. (2014). dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms. PeerJ, 2, e431.

Publication 2014

Allele Ecori Fish Generic Genomic Indel Library Marine Nucleotide Polymorphisms Populations Silk Snps

Corresponding Organization :

Other organizations : Texas A&M University – Corpus Christi

Top 5 similar protocols

Protocol cited in 28 other protocols

Variable analysis

independent variables

Three different marine fish species: red snapper (Lutjanus campechanus), red drum (Sciaenops ocellatus), and silk snapper (Lutjanus vivanus)
Two sample localities, each comprising 20 individuals, chosen randomly from unpublished RADseq data sets
Three different levels of final reference contig clustering (90%, 96%, and 99% similarity) in dDocent
Three different sets of parameters used in Stacks: (1) minimum depth of coverage of 2 to create a stack, maximum distance of 2 between stacks, and maximum distance of 4 between stacks from different individuals; (2) minimum depth of coverage of 3 to create a stack, maximum distance of 4 between stacks, and maximum distance of 8 between stacks from different individuals; (3) minimum depth of coverage of 3 to create a stack, maximum distance of 4 between stacks, and maximum distance of 10 between stacks from different individuals

dependent variables

Number of SNPs called at different individual-coverage levels: total number of SNPs, number of SNPs called in 75%, 90%, and 99% of individuals at 3X coverage, number of SNPs called in 75% and 90% of individuals at 5X coverage, number of SNPs called in 75% and 90% of individuals at 10X coverage, and number of SNPs called in 75% and 90% of individuals at 20X coverage

control variables

Double-digest RAD libraries prepared following Peterson et al. (2012)
Individual DNA extractions digested with EcoRI and MspI
Barcoded adapter ligated to the EcoRI site of each fragment and a generic adapter ligated to the MspI site
Samples equimollarly pooled and size-selected between 350 and 400 bp
Final library enhancement completed using 12 cycles of PCR
Libraries sequenced on three separate lanes of an Illumina HiSeq 2000
Mapping variables of one, three, and five for match-score value, mismatch score, and gap-opening penalty, respectively, used in dDocent

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!