All analysis reported in this paper made use of the Seurat package developed by the Satija lab [13 (link), 20 (link)]. In essence, we used the methods that they recommend in their tutorial for analyzing a dataset of 2,700 peripheral blood mononuclear cells for identification and display of clustering. Briefly, data were log normalized and center scaled; variable markers were identified and used for linear dimensional reduction (principle component analysis, PCA). Informative principle components were identified by plotting their standard deviations using the PCElbowPlot function of the Seurat package as described in the Satija lab tutorial. These principle components were used for clustering of STAMPS using a smart local moving algorithm [21 ]; multidimensional data were displayed using a tSNE 2-dimensional representation. A first round of clustering was performed using all the STAMPs that contained at least 1,000 transcripts; libraries containing less than 200 or more than 8,500 different genes and STAMPs containing more than 0.3% mitochondrial transcripts were excluded from analysis leaving 6998 single cell libraries. Clusters of STAMPs from somatosensory neurons were identified by their expression of known marker genes including Scn9a, Tubb3 and Snap25 as well as more specific transcripts like Trpv1, Trpm8 and Piezo2 and lack of expression of markers for other cells including Plp1, Mbp and Epcam. Somatosensory neurons were re-clustered using more stringent criteria for inclusion: 500–7,500 genes, 0.2% mitochondrial transcripts; genes expressed in less than 6 neurons were also excluded leaving a dataset of 3580 neurons and more than 15,000 genes.
We examined the stability of the clustering reported here using a variety of conditions and by using different clustering methods including tSNE based clustering [10 (link)]. Random selection of STAMPs demonstrated that the number of clusters (and their markers) was not changed once 1,500 to 2,000 neurons were included in the analysis thus increasing sample size incrementally beyond 3,500 neurons would be unlikely to change our conclusions. Similarly, clusters were not changed when different criteria were used for selection of variable genes or when the number of principle components used for analysis was varied between 15 and 25. tSNE based clustering [10 (link)] also yielded very similar results. More stringent selection of cells by requiring 900–7,500 different genes to be present in a STAMP, selectively reduced the number of S100b expressing neurons and resulted in collapse of this group of three clusters into a single cluster. In contrast, eliminating genes expressed in limited numbers of cells had little effect on clustering. For example, C11 and C12 each consist of less than 100 STAMPs; nonetheless these clusters were still separated when all genes present in less than 120 cells were excluded from the analysis. Indeed, very similar clustering was still observed even when genes expressed in less than 500 cells were eliminated (e.g. including genes like Trpm8) with concomitant reduction of the number of genes used from more than 15,000 to less than 6,000. In this analysis, itch clusters C11 and C12 merged, C7 cells were incorporated in other clusters and the Ntrk2 rich cluster C5 merged with C4. Thus the clusters that we identified appear extremely stable and are not simply determined by expression of marker genes that are expressed in that class of cells or by the clustering parameters chosen and methods that were used.
Free full text: Click here