In cases where cell identity was undefined across datasets (i.e., cortical interneuron subtypes) we treated each subtype label as a positive for each other subtype, and assessed similarity using HVGs. For example, Int1 from the Zeisel dataset was used as the positive (training) set, and all other subtypes were considered the test set in turn. Mean AUROCs from both testing and training folds are plotted in the heatmap in Fig. 4. Reciprocal best matches across datasets and AUROCs ≥0.95 were used to identify putative replicated types for further assessment with our supervised framework (detailed above). New cell-type labels encompassing these replicate types (e.g., a combined Sst Chodl label containing Int1 (Zeisel), Sst Chodl (Tasic), and Sst Nos1 (Paul)) were generated for MetaNeighbor across random and GO sets, and for meta-analysis of differential expression. While only reciprocal top hits across laboratories were used to define putative replicate cell types, cross-validation within laboratories was performed to fill in AUROC scores for cell types within each laboratory.
Free full text: Click here