Raw reads were processed to generate gene expression profiles using an internal pipeline. Briefly, for each cell barcode the unique molecular identifier (UMI) was extracted after filtering read one without poly-T tails. Adapters and poly-A tails were trimmed (fastp V1) before aligning read two to GRCh38 with Drosophila melanogaster Ensembl version 102 annotation [58 (link)]. For reads with the same cell barcode, the UMI and gene were grouped together to calculate the number of UMIs for genes in each cell. UMI count tables for each cellular barcode were employed for further analysis. Cells with an unusually high number of UMIs (>37,000) or mitochondrial gene percent (>25%) were filtered out. We also excluded cells with less than 990 or more than 4200 genes detected.
Cell type identification and clustering analysis were performed using the Seurat program [59 (link), 60 (link)]. Cell-by-gene matrices for each sample were individually imported to Seurat version 3.1.1 for downstream analysis [60 (link)]. Uniform manifold approximation and projection (UMAP) and t-distributed Stochastic Neighbor Embedding (t-SNE) were performed to visualize cell clusters. Upregulated enriched genes were determined to be significant with a threshold standard of fold change >1.28 and a P-value <0.01. Differentially expressed genes (DEGs) were considered significant with a fold change >1.50 and P-value <0.05.
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were carried out on the gene set using clusterProfiler software to explore biological functions or pathways significantly associated with specifically expressed genes [61 (link)].
For correlation analysis, gene sets were calculated based on average expression counts that belong to each set of features via the PercentageFeatureSet function in the Seurat package [46 (link)]. Pearson correlations were calculated among these gene sets or signatures. Gene correlation analysis was performed directly on the data matrix by the Pearson correlation method.
Monocle 2 (version 2.10.1) was used to perform single cell trajectory analysis based on the matrix of cells and gene expression [62 (link)]. Monocle 2 reduced the space down to one with two dimensions and ordered the cells [63 (link)]. Once the cells were ordered, the trajectory was visualized in the reduced dimensional space. Pseudotime trajectory analysis was used to further analyze the germ cell differentiation trajectories to identify key factors or pathways required for different novel stages during spermatogenesis.
Free full text: Click here