We sequenced libraries on multiple Illumina platforms in multiple core labs. We used the Illumina NextSeq 500 platform to generate PE75 data for the Rhodnius, Gambusia, Kinosternidae, and Wisteria projects and Illumina HiSeq 2500 or NextSeq 500 platforms to generate PE150 data for the Ixodidae, Sphyrna, and Eurycea projects. We planned all sequencing runs to produce approximately one million reads per sample, which facilitates comparison of the 3RAD results among species with varying genome sizes, with the exception of the Ixodidae project, where four million reads were targeted per sample.
We assembled data from each project independently using Stacks v1.42 (Catchen et al., 2013 (link); Catchen et al., 2011 (link); see File S6). For the Wisteria project, we used molecular ID tags to facilitate PCR duplicate removal with the module clone_filter in Stacks (Catchen et al., 2013 (link); Hoffberg et al., 2016 (link)). We describe detailed parameters and software specifications for each project in File S6. Briefly, for most projects, we used the process_radtags program to demultiplex and/or clean and trim the sequence data. We parallel-merged the mates of paired-end reads. We used the denovo_map program to assemble reads de novo and to calculate coverage, number of loci, and number of SNPs recovered for each project; we compared these data to genome size and sequencing read length (PE75 or PE150). Finally, we used the populations program to export loci shared in at least in 60–75% of localities and individuals to VCF files. Because there exists a reference genome for Gambusia affinis (Hoffberg et al., 2018 (link); NCBI: NHOQ01000000; details in File S6), we also assembled data from this project against the reference. For population-level datasets, we calculated F-statistics and performed preliminary population clustering analyses in Structure v2.3.4 (Pritchard, Stephens & Donelly, 2000 (link); File S6). For the Kinosternidae project, we conducted a de novo locus assembly using pyRAD v1.0.4 (Eaton, 2014 (link); details in File S6).
Finally, we estimated the prevalence and impact of loci with third restriction enzyme cut-sites in our data. We estimated the proportion of these third restriction enzyme cut-sites relative to the first restriction enzyme cut-site (i.e., intended cut-site) for five of the projects, and we evaluated variation among adapters and projects using ANOVA in R v3.5.1 (R Core Team, 2018 ). To evaluate the effect of these loci in downstream analyses, we reanalyzed data from three of our projects (i.e., both Sphyrna and Amblyomma americanum) after removing third restriction enzyme loci from the datasets. To do this, we reassembled data in Stacks v1.44 (Catchen et al., 2011 (link); Catchen et al., 2013 (link)) using process_radtags two independent times: first, “rescuing barcodes”, cleaning, and trimming the raw sequence data as before, but disabling rad check (–disable_rad_check) to leave the cut-sites intact; and second, using the previous step’s output as input, checking only for exact, intended restriction enzyme cut-sites (i.e., XbaI and EcoRI). From this output, we assembled and analyzed data similar to above, as detailed in File S6.
Free full text: Click here