Cell type identification and clustering analysis were performed using the Seurat program [59 (link), 60 (link)]. Cell-by-gene matrices for each sample were individually imported to Seurat version 3.1.1 for downstream analysis [60 (link)]. Uniform manifold approximation and projection (UMAP) and t-distributed Stochastic Neighbor Embedding (t-SNE) were performed to visualize cell clusters. Upregulated enriched genes were determined to be significant with a threshold standard of fold change >1.28 and a P-value <0.01. Differentially expressed genes (DEGs) were considered significant with a fold change >1.50 and P-value <0.05.
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were carried out on the gene set using clusterProfiler software to explore biological functions or pathways significantly associated with specifically expressed genes [61 (link)].
For correlation analysis, gene sets were calculated based on average expression counts that belong to each set of features via the PercentageFeatureSet function in the Seurat package [46 (link)]. Pearson correlations were calculated among these gene sets or signatures. Gene correlation analysis was performed directly on the data matrix by the Pearson correlation method.
Monocle 2 (version 2.10.1) was used to perform single cell trajectory analysis based on the matrix of cells and gene expression [62 (link)]. Monocle 2 reduced the space down to one with two dimensions and ordered the cells [63 (link)]. Once the cells were ordered, the trajectory was visualized in the reduced dimensional space. Pseudotime trajectory analysis was used to further analyze the germ cell differentiation trajectories to identify key factors or pathways required for different novel stages during spermatogenesis.