We start with summarized measures of gene expression for the experiment, represented by a matrix of read or fragment counts. The rows of the matrix represents genes,
, and columns represent samples,
. Let
Ygi denote the count of RNA-seq fragments assigned to gene
g in sample
i. We assume that
Ygi follows a NB distribution with mean
μgi and dispersion
αg, such that
. The mean
μgi is a product of a scaling factor
sgi and a quantity
qgi that is proportional to the expression level of the gene
g. We follow the methods of Love
et al. (2014) (
link) to estimate
αg and
sgi sharing information across
G genes, and consider estimates as fixed for the following. We fit a GLM to the count
Ygi for gene
g and sample
i,
where
X is the standard design matrix and
βg is the vector of regression coefficients specific to gene
g. Usually
X has one intercept column, and columns for covariates, e.g. indicators of the experimental conditions other than the reference condition, continuous covariates, or interaction terms. We consider design matrices where the first element of
βg is the intercept. For clarity, we partition the
βg into
, where
is the intercept and
βgk,
is for
kth covariate. The scaling factor
sgi accounts for the differences in library sizes, gene length (Soneson
et al., 2015 (
link)) or sample-specific experimental biases (Patro
et al., 2017 (
link)) between samples, and is used as an offset in our model.
In the GLM, we use the logarithmic link function. In the
apeglm software, the estimated coefficients and corresponding SD estimates are reported on the same
scale. The
apeglm method can be easily called from
DESeq2’s
lfcShrink function, which provides LFC estimates on the
scale. The
apeglm method and software is generic for GLMs and can be used with other likelihoods. For example, it can be used for the Beta Binomial or zero-inflated NB model, as long as estimates for the additional parameters, e.g. dispersion or the zero component parameters, are provided. An example of
apeglm applied to Beta Binomial counts, as could be used to detect differential allele-specific expression, is provided in the software package vignette.