Generative Model for Single-Cell RNA-Seq

Census is motivated by a generative model of single-cell (sc) RNA-Seq similar to the one developed by Kim et al.⁴⁷. When performing sc-RNA-seq, each individual cell is lysed to recover its endogenous RNA molecules, some fraction of which may be degraded or lost. Lysis thus involves an RNA recovery rate α. Spike-in transcripts are then added into the cell lysate. Note that spike-in transcripts are added to the lysate as naked RNA, and thus may be degraded at different rates from the endogenous RNA. We denote the ladder recovery rate as β. The RNA counts in the lysate can be written:

Cell lysate : {\begin{matrix} Y_{ij}^{l} & \approx & α_{i} Y_{ij}^{c} \\ S_{ij}^{l} & \approx & β_{i} S_{. j} \end{matrix},

where Y^l, S^l, S, are the transcript counts of endogenous RNA in cell lysate, spike-in transcript counts in cell lysate and the spike-in transcript counts added into the cell lysate. The first subscript in all variables (here and below) corresponds to cell while the second subscript corresponds to gene index. Note that we are not able to directly observe

Y_{ij}^{c}

, the true transcript counts for gene j in cell i and thus α is an unknown variable.
The RNA molecules and spike-in transcripts will then be subjected to reverse transcription and amplified to make a cDNA library. The expected number of cDNA molecules generated from each RNA molecules is denoted by θ. The cDNA counts can be written:

cDNA : {\begin{matrix} Y_{ij}^{d} & = & Y_{ij}^{l} \cdot θ_{i} \\ S_{ij}^{d} & = & S_{ij}^{l} \cdot θ_{i} \end{matrix},

where Y^d, S^d, are the cDNA counts of endogenous RNA, spike-in cDNA counts successfully converted from the corresponding transcript counts Y^l, S^l in cell lysate under a uniform capture rate θ, which for current protocols is less than 1.
Our model generates sequencing reads from the cDNA. The relative cDNA abundances are calculated as

\frac{Y_{ij}^{d}}{\sum_{j = 1}^{n} (Y_{ij}^{d} + S_{ij}^{d})}

for endogenous RNA, or

\frac{S_{ij}^{d}}{\sum_{j = 1}^{n} (Y_{ij}^{d} + S_{ij}^{d})}

for spike-in RNA.
The model then generates γ reads per cDNA molecule on average; with sufficient sequencing, γ will be larger than 1; we expect each cDNA molecule to generate at least one sequencing read. This process can be regarded as a multinomial sampling of R reads

(R_{i} = γ \sum_{j = 1}^{n} (Y_{ij}^{d} + S_{ij}^{d}))

from the distribution of relative cDNA abundances mentioned above which can be represented as:

Read counts : {\begin{matrix} Y_{ij}^{r} \sim multinomial (\frac{Y_{ij}^{d}}{\sum_{j = 1}^{n} (Y_{ij}^{d} + S_{ij}^{d})}, R_{i}^{e}) \\ S_{. j}^{r} \sim multinomial (\frac{S_{ij}^{d}}{\sum_{j = 1}^{n} (Y_{ij}^{d} + S_{ij}^{d})}, R_{i}^{s}) \end{matrix},

where

R_{i}^{e}, R_{i}^{s}

, denotes the reads sampled for cDNA from the endogenous RNA or spike- in RNA in cell i,

Y_{ij}^{r}, S_{. j}^{r}

denotes the reads sampled for cDNA from the endogenous RNA j or spike-in RNA j in cell i.
The model described here is essentially a special case of the model in Kim et al., and differs mainly in that their model describes transcript-level capture rates and sequencing rates with beta and gamma distributions, respectively. In contrast, we simply use global constants for these rates. As Census does not make use of variance estimates from the generative model, this simpler model is sufficient for calculating key statistics (e.g. mode of the transcript counts) needed to convert relative to absolute abundances.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Qiu X., Hill A., Packer J., Lin D., Ma Y.A, & Trapnell C. (2017). Single-cell mRNA quantification and differential analysis with Census. Nature methods, 14(3), 309-315.

Publication 2017

Cdna Cdna library Cell Gamma Gene Reverse transcription Rna i Single cell rna seq

Corresponding Organization :

Other organizations : University of Washington, University of Washington Applied Physics Laboratory

Top 5 similar protocols

Protocol cited in 27 other protocols

Variable analysis

independent variables

Lysis recovery rate (α)
Spike-in recovery rate (β)
CDNA capture rate (θ)
Reads per cDNA molecule (γ)

dependent variables

Transcript counts of endogenous RNA in cell lysate (Y^l)
Spike-in transcript counts in cell lysate (S^l)
CDNA counts of endogenous RNA (Y^d)
CDNA counts of spike-in transcripts (S^d)
Read counts for endogenous RNA (Y^r)
Read counts for spike-in RNA (S^r)

control variables

Uniform capture rate (θ) for cDNA

positive controls

Spike-in transcripts added into the cell lysate

negative controls

Not specified

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!