Initial data exploration revealed that clustering was driven by individual of origin in addition to cell type identity, which makes it difficult to analyze changes in the relative abundance or gene expression of a given cell type across disease progression or brain regions. To recover clusters defined by mainly by cell type identity, data was aligned across samples from each brain region using with scAlign65 (link) (version 1.0.0), which leverages a neural network to learn a low-dimensional alignment space in which cells from different datasets group by biological function independent of technical and experimental factors. As noted by Johansen & Quon65 (link), scAlign converges faster with little loss of performance when the input data is represented by principal components or canonical correlation vectors. Therefore, prior to running scAlign, the top 2000 genes with the highest combined biological variance were used as the feature set for canonical correlation analysis (CCA), which was implemented using Seurat::RunMultiCCA with parameter num.cc = 15. The number of canonical coordinates to use for scAlign was determined by the elbow method using Seurat::MetageneBicorPlot. scAlign was then run on the cell loadings along the top 10 canonical correlation vectors with the parameters options = scAlignOptions(steps = 10000, log.every = 5000, architecture = ‘large’, num.dim = 64), encoder.data = ‘cca’, supervised = ‘none’, run.encoder = TRUE, run.decoder = FALSE, log.results = TRUE, and device = ‘CPU’. Clustering was then performed on the full dimensionality of the ouptut from scAlign using Seurat::FindClusters with parameter resolution = 0.8 for the SFG and resolution = 0.6 for the EC. Clusters were visualized with tSNE using Seurat::RunTSNE on the full dimensinality of the output from scAlign with parameter do.fast = TRUE. Alignment using scAlign followed by clustering was also performed for all samples from both brain regions jointly.
To assign clusters identified in the aligned subspace generated by scAlign to major brain cell types, the following marker genes were used: SLC17A7 and CAMK2A for excitatory neurons, GAD1 and GAD2 for inhibitory neurons, SLC1A2 and AQP4 for astrocytes, MBP and MOG for oligodendrocytes, PDGFRA and SOX10 for oligodendrocyte precursor cells (OPCs), CD74 and CX3CR1 for microglia/myeloid cells, and CLDN5 and FLT1 for endothelial cells. Clusters expressing markers for more than one cell type, most likely reflecting doublets, were removed from downstream analyses.