We investigated population structure, genetic diversity, and inferred phylogenetic relationships based on a total of 105,706 SNPs filtered using bcftools (95 (
link)) to contain no missing sites, a minimum minor allele frequency of 0.01, and to remove sites with linkage greater than r
2 = 0.2 within 10-kb windows (retaining the site in an LD pair with the greater allele frequency). We investigated population structure via identity-by-state (IBS distance) and DAPC. We calculated IBS among all individuals across all six sites using PLINK (93 ) [DST: (IBS2 + 0.5*IBS1) / (N SNP pairs)] and tested whether IBS differed across sites with ANOVA (high relatedness across sites and municipalities might suggest dispersal events;
SI Appendix, Fig. S2). We implemented DAPC with the R package “adegenet” (96 (
link),97 (
link)) implemented with the function
dapc (see
SI Appendix, Fig. S1 for PCA results). Although
k-means clustering implemented with the function
find.clusters in the R package “adegenet” (96 (
link), 97 (
link)) supports the existence of three distinct genetic clusters (equivalent to the municipality for each urban–forest pair), we used group identity based on our sampling (urban or forest from each of the three municipalities) and
k = 6. We cross-validated the number of retained principal component axes in the DAPC with the function
xvalDapc, which supported retaining 10 principal component axes. Discriminant functions 1 and 2 in the DAPC (
Fig. 1C and
SI Appendix, Fig. S1) show clear separation of genetic variation between geographic regions.
Additional methods similarly validate the existence of three independent urban–forest population pairs. The sample tree indicates that, on average, individuals from within each geographic region (but not necessarily each habitat type within a region) were more genetically similar to one another than to individuals from other geographic regions (
Fig. 1D). Sequence alignment was performed with the “SNPhylo” pipeline (98 (
link)) followed by tree model fitting and optimization with IQTree (99 (
link)) with ModelFinder (100 (
link)) and ascertainment bias for SNP data (-m TEST+ASC). The midpoint-rooted sample tree was visualized in R with “phytools” (101 ) and “phangorn” (102 (
link)). We also estimated admixture coefficients using sparse Nonnegative Matrix Factorization algorithms with the function
snmf in the R package “LEA” (103 ); three genetic clusters were most strongly supported (
SI Appendix, Fig. S1).
In addition, we calculated traditional metrics of population divergence and relatedness. We calculated nucleotide diversity for each of the six sample sites as well as
FST and
DXY between all pairs of sites using the Python scripts parseVCF.py and popgenWindows.py (
https://github.com/simonhmartin/genomics_general). We excluded indels and included both variant and invariant sites in our analysis. We calculated summary statistics in 10-kb windows excluding any windows with fewer than 100 called sites (
SI Appendix, Fig. S3). We used VCFtools (91 (
link)) to calculate observed heterozygosity (--het), relatedness (--relatedness), Tajima’s D (--TajimaD 100000), and unadjusted AJK statistic (
SI Appendix, Figs. S2 and S3).
Winchell K.M., Campbell-Staton S.C., Losos J.B., Revell L.J., Verrelli B.C, & Geneva A.J. (2023). Genome-wide parallelism underlies contemporary adaptation in urban lizards. Proceedings of the National Academy of Sciences of the United States of America, 120(3), e2216789120.