Leveraging Pre-mRNA Annotations for Improved Single-Nucleus RNA-Seq Analysis

Starting from BCL files obtained from Illumina sequencing, we ran cellranger mkfastq to extract sequence reads in FASTQ format, followed by cellranger count to generate gene-count matrices from the FASTQ files. Since our data are from single nuclei, we built and aligned reads to genome references with pre-mRNA annotations, which account for both exons and introns. Pre-mRNA annotations improve the number of detected genes significantly compared to a reference with only exon annotations^{15 (link)}. For human and mouse data, we used the GRCh38 and mm10 genome references, respectively. To compare samples of interest (e.g., different loading concentrations), we pooled their gene-count matrices together, and filtered out low-quality nuclei identified based on any one of the following criteria: (1) a total number of expressed genes <200; (2) a total number of expressed genes > = 6000; or (3) a percentage of RNA UMIs from mitochondrial genes > = 10%. We then normalized and transformed the filtered count matrix to natural log space as follows: (1) selected genes that were expressed in at least 0.05% of all remaining nuclei; (2) normalized the count vector of each nucleus such that the total sum of normalized counts from selected genes is equal to 100,000 (transcripts per 100 K, TP100K); (3) transformed the normalized matrix into the natural log space by replacing each normalized count c with

log (c + 1)

(log(TP100K+1)). We performed dimensionality reduction, clustering and visualization on the log-transformed matrix using a standard procedure^{16 (link),17 (link)}. Specifically, we selected highly variable genes^{18 (link)} with a z-score cutoff at 0.5, performed PCA on the standardized sub-matrix consisting of only highly variable genes and selected the top 50 principal components (PCs)^{19 (link)}, clustered the data based on the 50 selected PCs using the Louvain community detection algorithm^{20 (link)} with a resolution at 1.3. We identified cluster-specific gene expression by differential expression analyses between nuclei within the cluster and outside of the cluster^{16 (link)} using Welch’s t test and Fisher’s exact test; controlled false discovery rates (FDR) at 5% using the Benjamini–Hochberg procedure²¹, and annotated putative cell types based on legacy signatures of human and mouse brain cells. We visualized the reduced dimensionality data using tSNE²² with a perplexity at 30. Note that in experiments 1 and 4 (Supplementary Data 1), we identified one cluster that did not express any known cell-type markers and had the lowest median number of RNA UMIs among all clusters. We removed it from further analysis, and repeated the above analysis workflow, except the low-quality nucleus filtration step.

Free full text: Click here

Gaublomme J.T., Li B., McCabe C., Knecht A., Yang Y., Drokhlyansky E., Van Wittenberghe N., Waldman J., Dionne D., Nguyen L., De Jager P.L., Yeung B., Zhao X., Habib N., Rozenblatt-Rosen O, & Regev A. (2019). Nuclei multiplexing with barcoded antibodies for single-nucleus genomics. Nature Communications, 10, 2907.

Publication 2019

A genes Brain Cell type Exon Filtration Gene expression analyses Genes Genome Human Introns Mitochondrial genes Mouse Nucleus Pre mrna Single nuclei Space Vector

Corresponding Organization : Massachusetts Institute of Technology

Other organizations : Massachusetts General Hospital, Harvard University, Columbia University Irving Medical Center, BioLegend (United States), Hebrew University of Jerusalem

Top 5 similar protocols

Protocol cited in 9 other protocols

Variable analysis

independent variables

Loading concentration of single nuclei samples

dependent variables

Number of detected genes
Cell-type composition

control variables

Genome references used for read alignment (GRCh38 for human, mm10 for mouse)
Pre-mRNA annotations used for read alignment
Quality control filters for low-quality nuclei (total expressed genes < 200, total expressed genes >= 6000, mitochondrial gene percentage >= 10%)
Normalization method (transcripts per 100K, log-transformation)
Dimensionality reduction and clustering methods (highly variable gene selection, PCA, Louvain clustering)

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!