We split the genome into 5 kb windows and removed windows overlapping blacklisted regions (v2) from ENCODE86 (link),87 (link). For each experiment, we created a sparse m x n matrix containing read depth for m cells passing read depth thresholds at n windows. Using scanpy88 (link) (v.1.4.4.post1), we extracted highly variable windows using mean read depth and normalized dispersion (‘min_mean=0.01, min_disp=0.25’). After normalization to uniform read depth and log-transformation, for each experiment, we regressed out the log-transformed read depth within highly variable windows for each cell. We then performed principal component analysis (PCA) and extracted the top 50 principal components. We used Harmony24 (link) to correct the principal components and remove batch effects across experiments, using donor-of-origin as a covariate. We used Harmony-corrected components to calculate the nearest 30 neighbors using the cosine metric, which were subsequently used for UMAP dimensionality reduction (‘min_dist=0.3’) and Leiden clustering89 (link) (‘resolution=1.5’).
We performed iterative clustering to identify and remove cells with abnormal features prior to the final clustering results (see Supplementary Note). After removing these cells, we ended up with 15,298 cells mapping to 12 clusters. We used chromatin accessibility at windows overlapping promoters for marker hormones to assign cell types for the endocrine islet cell types and chromatin accessibility at windows around marker genes from scRNA-seq to assign cluster labels for non-endocrine islet clusters.