Protocol detail

Differential Gene Expression Analysis

Find Similar Protocols

We start with summarized measures of gene expression for the experiment, represented by a matrix of read or fragment counts. The rows of the matrix represents genes,

(g = 1, \dots, G)

, and columns represent samples,

(i = 1, \dots, m)

. Let Y_gi denote the count of RNA-seq fragments assigned to gene g in sample i. We assume that Y_gi follows a NB distribution with mean μ_gi and dispersion α_g, such that

Var (Y_{g i}) = μ_{g i} + α_{g} μ_{g i}^{2}

. The mean μ_gi is a product of a scaling factor s_gi and a quantity q_gi that is proportional to the expression level of the gene g. We follow the methods of Love et al. (2014) (link) to estimate α_g and s_gi sharing information across G genes, and consider estimates as fixed for the following. We fit a GLM to the count Y_gi for gene g and sample i,

\begin{array}{l} Y_{g i} \sim NB (μ_{g i}, α_{g}) \\ μ_{g i} = s_{g i} q_{g i} \\ log q_{g i} = X_{i, *} β_{g} \end{array}

where X is the standard design matrix and β_g is the vector of regression coefficients specific to gene g. Usually X has one intercept column, and columns for covariates, e.g. indicators of the experimental conditions other than the reference condition, continuous covariates, or interaction terms. We consider design matrices where the first element of β_g is the intercept. For clarity, we partition the β_g into

β_{g} = (β_{g 0}, β_{g 1}, \dots, β_{g K})

, where

β_{g 0}

is the intercept and β_gk,

k = 1, \dots, K

is for kth covariate. The scaling factor s_gi accounts for the differences in library sizes, gene length (Soneson et al., 2015 (link)) or sample-specific experimental biases (Patro et al., 2017 (link)) between samples, and is used as an offset in our model.
In the GLM, we use the logarithmic link function. In the apeglm software, the estimated coefficients and corresponding SD estimates are reported on the same

natural log

scale. The apeglm method can be easily called from DESeq2’s lfcShrink function, which provides LFC estimates on the

{log}_{2}

scale. The apeglm method and software is generic for GLMs and can be used with other likelihoods. For example, it can be used for the Beta Binomial or zero-inflated NB model, as long as estimates for the additional parameters, e.g. dispersion or the zero component parameters, are provided. An example of apeglm applied to Beta Binomial counts, as could be used to detect differential allele-specific expression, is provided in the software package vignette.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Zhu A., Ibrahim J.G, & Love M.I. (2018). Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics, 35(12), 2084-2092.

Publication 2018

A factor Allele Expression gene Factor a Gene Generic Glms Library Love Rna seq Vector

Top 5 similar protocols

Protocol cited in 374 other protocols

Variable analysis

independent variables

Indicators of the experimental conditions other than the reference condition
Continuous covariates
Interaction terms

dependent variables

Count of RNA-seq fragments assigned to gene g in sample i (Ygi)

control variables

Gene length (Soneson et al., 2015)
Sample-specific experimental biases (Patro et al., 2017)

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!