Total genomic DNA was extracted using the DNA IQ Tissue and Hair Extraction Kit (Promega, Madison, WI, USA) following the manufacturer’s protocol. Each sample contained 60 hair follicles with the majority of the hair shaft removed under a dissecting microscope to reduce protein and other contamination. All DNA extractions were conducted in a separate laboratory free of concentrated PCR products. Negative controls were included in each extraction to monitor contamination. DNA quantifications were conducted using real-time PCR fluorescence measurements of double stranded DNA (Blotta et al., 2005 ) and the Quant-it kit (Life Technologies, Foster City, CA).
Genomic DNA was converted into nextRAD genotyping-by-sequencing libraries (SNPsaurus, LLC). The nextRAD method uses selective PCR primers to amplify genomic loci consistently between samples. Genomic DNA (10 ng or less depending upon extraction yield) was first fragmented with Nextera reagent (Illumina, Inc), which also ligates short adapter sequences to the ends of the fragments. Fragmented DNA was then amplified, with one of the primers matching the adapter and extending 9 arbitrary nucleotides into the genomic DNA with the selective sequence. Thus, only fragments starting with a sequence that can be hybridized by the selective sequence of the primer will be efficiently amplified. The resulting fragments are fixed at the selective end, and have random lengths depending on the initial Nextera fragmentation. Because of this, amplified DNA from a particular locus is present at many different sizes and careful size selection of the library is not needed. For this project, an arbitrary 9-mer was chosen from those previously validated in smaller genomes, which did not appear to target repeat-masked regions in the publically available American pika genomic scaffolds (Ensembl, release 74, Ochotona_princeps.74.dna_sm.toplevel.fa) and that would approximate the results of standard RAD sequencing projects using SbfI (Baird et al., 2008 (link)).
Since these samples were collected non-invasively, it was important to assess the proportion of sequence reads in each sample that originated from the target organism relative to other environmental sources prior to conducting genotyping analysis. This was done using a custom script (SNPsaurus, LLC) that randomly sampled 1,000 high-quality reads from each sample and aligned those to the publically available American pika genomic scaffolds as well as subjected them to ablastn (Altschul et al., 1997 (link)) search of all sequences in the NCBI non-redundant database. Only samples that had greater than 50% sequencing reads that mapped to Ochotona princeps were retained for genotyping analysis.
The genotyping analysis used custom scripts (SNPsaurus, LLC) that created a reference from abundant reads present between 500 and 2,000 times across the combined set of samples, mapping all of the reads to the reference allowing two mismatches. The identified variants were then filtered by removing loci that had more than the expected maximum of two alleles and those that were present in less than 10% of all samples.
Following assembly, mapping and variant detection, the data were further filtered to maximize data quality. We retained only those loci that were genotyped in ≥50% of individuals from each transect, had a minor allele frequency ≥0.05, and a minimum coverage of 6X for homozygotes (affording 95% confidence in the genotype) while heterozygotes were required to have a minimum of 2X coverage per allele for each individual. These values were chosen to minimize null alleles and sequencing errors from biasing homozygote and heterozygote genotype calls, respectively. We then removed loci that displayed significant deviation from Hardy-Weinberg equilibrium (HWE) in more than two sites per transect as assessed using the method of Guo & Thompson (1992) (link) as implemented inGenepop 4.3 (Raymond & Rousset, 1995 ; Rousset, 2008 (link)).
To ensure that only non-redundant samples were included in subsequent analyses, we conducted genotype matching across a random subset of 100 loci. We conducted the match analysis and calculated the multi-locus probability of identity (Waits, Luikart & Taberlet, 2001 (link)) for the 100 randomly chosen loci using GenAlEx (Peakall & Smouse, 2006 (link)). Only samples with unique genotypes were retained.
Genomic DNA was converted into nextRAD genotyping-by-sequencing libraries (SNPsaurus, LLC). The nextRAD method uses selective PCR primers to amplify genomic loci consistently between samples. Genomic DNA (10 ng or less depending upon extraction yield) was first fragmented with Nextera reagent (Illumina, Inc), which also ligates short adapter sequences to the ends of the fragments. Fragmented DNA was then amplified, with one of the primers matching the adapter and extending 9 arbitrary nucleotides into the genomic DNA with the selective sequence. Thus, only fragments starting with a sequence that can be hybridized by the selective sequence of the primer will be efficiently amplified. The resulting fragments are fixed at the selective end, and have random lengths depending on the initial Nextera fragmentation. Because of this, amplified DNA from a particular locus is present at many different sizes and careful size selection of the library is not needed. For this project, an arbitrary 9-mer was chosen from those previously validated in smaller genomes, which did not appear to target repeat-masked regions in the publically available American pika genomic scaffolds (Ensembl, release 74, Ochotona_princeps.74.dna_sm.toplevel.fa) and that would approximate the results of standard RAD sequencing projects using SbfI (Baird et al., 2008 (link)).
Since these samples were collected non-invasively, it was important to assess the proportion of sequence reads in each sample that originated from the target organism relative to other environmental sources prior to conducting genotyping analysis. This was done using a custom script (SNPsaurus, LLC) that randomly sampled 1,000 high-quality reads from each sample and aligned those to the publically available American pika genomic scaffolds as well as subjected them to a
The genotyping analysis used custom scripts (SNPsaurus, LLC) that created a reference from abundant reads present between 500 and 2,000 times across the combined set of samples, mapping all of the reads to the reference allowing two mismatches. The identified variants were then filtered by removing loci that had more than the expected maximum of two alleles and those that were present in less than 10% of all samples.
Following assembly, mapping and variant detection, the data were further filtered to maximize data quality. We retained only those loci that were genotyped in ≥50% of individuals from each transect, had a minor allele frequency ≥0.05, and a minimum coverage of 6X for homozygotes (affording 95% confidence in the genotype) while heterozygotes were required to have a minimum of 2X coverage per allele for each individual. These values were chosen to minimize null alleles and sequencing errors from biasing homozygote and heterozygote genotype calls, respectively. We then removed loci that displayed significant deviation from Hardy-Weinberg equilibrium (HWE) in more than two sites per transect as assessed using the method of Guo & Thompson (1992) (link) as implemented in
To ensure that only non-redundant samples were included in subsequent analyses, we conducted genotype matching across a random subset of 100 loci. We conducted the match analysis and calculated the multi-locus probability of identity (Waits, Luikart & Taberlet, 2001 (link)) for the 100 randomly chosen loci using GenAlEx (Peakall & Smouse, 2006 (link)). Only samples with unique genotypes were retained.