We extracted the single-cell RNA sequencing data used in this paper from Gene Expression Omnibus (GEO; GSE138826) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE138826; GSE138826_expression_matrix.txt) (Oprescu et al., 2020 (link)). The preliminary analyses of processed scRNA-seq data were analysed using the Seurat suite (version 4.0.3) standard workflow in RStudio Version 1.2.5042 and R version 4.0.3. First, we applied initial quality control to Oprescu et al., 2020 (link) dataset. We kept all the features (genes) expressed at least in five cells and cells with more than 200 genes detected. Otherwise, we filtered out the cells. Second, we verified nUMIs_RNA (>200 and < 4,000) and percent.mt. (less than 5%) Third, UMIs were normalized to counts-per-ten-thousand log-transformed (normalization.method = LogNormalize). The log-normalized data were then used to find variable genes (FindVariableFeatures) and scaled (ScaleData). Finally, Principal Component Analysis (PCA) was run (RunPCA) on the scaled data using highly variable features or genes. Elbowplot were used to decide the number of principal components (PCs) to use for unsupervised graph-based clustering and dimensional reduction plot (UMAP) embedding of all cells or further subclustering analyses (i.e., FAPs) using the standard FindNeighbors, FindClusters, and RunUMAP workflow. We used 30 PCs and a resolution of 0.6 to visualize a Uniform manifold approximation and projection (UMAP) dimensionality reduction plot generated on the same set of PCs used for clustering. We decided the resolution value for FindClusters on a supervised basis after considering clustering output from a range of resolutions (0.4, 0.6, 0.8, and 1.2). We used a resolution of 0.6. Our initial clustering analysis returned 29 clusters (clusters 0–28). We identified cell populations and lineage-specific marker genes for the analyzed dataset using the FindAllMarkers function with logfc.threshold = 0.25, test.use = “wilcox,” and max.cells.per.ident = 1,000. We then plotted the top 10 expressed genes, grouped by orig.ident and seurat_clusters using the DoHeatmap function. We determine cell lineages and cell types based on the expression of canonical genes. We also inspected the clusters (in Figures 2, 3) for hybrid or not well-defined gene expression signatures. Clusters that had similar canonical marker gene expression patterns were merged.
For Mesenchymal Clusters (group of FAPs + DiffFibroblasts + Tenocytes obtained in Figure 2) we used PCs 20 and a resolution of 20 to visualize on the UMAP plot. Our mesenchymal subclustering analysis returned 10 clusters (clusters 0–9). Cell populations and lineage-specific marker genes were identified for the analyzed dataset using the FindAllMarkers function with logfc.threshold = 0.25 and max.cells.per.ident = 1,000. We then plotted the top eight expressed genes, grouped by orig.ident and seurat_clusters using the DoHeatmap function. The identity of the returned cell clusters was then annotated based on known marker genes (see details about cell type and cell lineage definitions in the main text, Results section). Individual cell clusters were grouped to represent cell lineages and types better. Finally, figures were generated using Seurat and ggplot2 R packages. We also used dot plots because they reveal gross differences in expression patterns across different cell types and highlight moderately or highly expressed genes.
To validate our initial skeletal muscle single-cell analysis, we explored three publicly available scRNAseq datasets (McKellar et al., 2021 (link); Yang et al., 2022 (link); Zhang et al., 2022 (link)). Zhang et al. dataset was explored using R/ShinyApp (https://mayoxz.shinyapps.io/Muscle), McKellar et al. (2021) (link) using their web tool developed http://scmuscle.bme.cornell.edu/, and Yang et al. using their Single Cell Metab Browser http://scmetab.mit.edu/. All the figures used were downloaded from the websites (Supplementary Figure S6).
The scRNAseq pipeline used for MuSC subclustering was developed following previous studies (Oprescu et al., 2020 (link); Contreras et al., 2021a (link)). To perform unsupervised MuSC subclustering, we used Seurat’s subset function FindClusters, followed by dimensionality reduction and UMAP visualization (DimPlot) in Seurat.
Free full text: Click here