We have developed a protocol that builds on the RADseq method [19] (link) but which differs in two principal respects (Figure 2 ). First, our method eliminates random shearing and end repair of genomic DNA (an advantage shared with a family of partially overlapping protocols such as MSG, CrOPS, and other recent RADseq derivatives [9] , [20] (link), [21] (link)). Instead, we use a double restriction enzyme (RE) digest (i.e., a restriction digest with two enzymes simultaneously) that results in at least five-fold reduction in library production cost–complete ddRADseq libraries cost ∼$5 per sample, while the necessary enzymatic steps following the initial restriction digest and ligation in random shearing RAD libraries alone introduce a cost of ∼$25 per library (NEB, Ipswich, MA). Furthermore, the elimination of several high-DNA-loss steps permits construction of ddRAD libraries from 100 ng or less of starting DNA. Second, we introduced a precise selection for genomic fragments by size, which allows greater fine-scale control of the fraction of regions represented in the final library (see results ). By combining precise and repeatable size selection with sequence-specific fragmentation, double digest Restriction-Site Associated DNA sequencing (ddRADseq) produces sequencing libraries consisting of only the subset of genomic restriction digest fragments generated by cuts with both REs (i.e., have one end from each cut) and which fall within the size-selection window (Figure 2B ). This combination of requirements can be tuned to generate libraries consisting of fragments derived from hundreds to hundreds of thousands of regions genome-wide.
Precise, repeatable size selection offers two further advantages. First, because only a small fraction of restriction fragments will fall in the target size-selection regime (<5% in conditions described here), the probability of sampling both directions from the same restriction site is low. This reduces “duplicate” (i.e., immediately neighboring) region sampling, which effectively halves the number of reads that are required to reach high-confidence sampling of a SNP associated with a given RE cut site. Second, shared bias in region representation favoring fragments closest to the mean of size selection, in turn, biases independent samples (e.g., from different individuals) towards recovering the same genomic regions (Figure 2B ). Because of this correlated recovery, regions are “filled in” with reads in approximately the same order across all individual samples, and samples with read recovery counts below saturation will still share a significant number of well-covered regions (“Experimental ddRADseq results” below; Analysis S1 Supporting Figure 4 ; Analysis S1 “Region recovery: ddRADseq vs. random shearing”). Both of these properties make the ddRADseq method robust to under-sampling with respect to read counts, which is a commonly observed problem arising from unequal read representation across individual samples in pooled sequencing experiments [9] , [22] (link), [23] .
Precise, repeatable size selection offers two further advantages. First, because only a small fraction of restriction fragments will fall in the target size-selection regime (<5% in conditions described here), the probability of sampling both directions from the same restriction site is low. This reduces “duplicate” (i.e., immediately neighboring) region sampling, which effectively halves the number of reads that are required to reach high-confidence sampling of a SNP associated with a given RE cut site. Second, shared bias in region representation favoring fragments closest to the mean of size selection, in turn, biases independent samples (e.g., from different individuals) towards recovering the same genomic regions (