We start with summarized measures of gene expression for the experiment, represented by a matrix of read or fragment counts. The rows of the matrix represents genes, (g=1,,G) , and columns represent samples, (i=1,,m) . Let Ygi denote the count of RNA-seq fragments assigned to gene g in sample i. We assume that Ygi follows a NB distribution with mean μgi and dispersion αg, such that Var(Ygi)=μgi+αgμgi2 . The mean μgi is a product of a scaling factor sgi and a quantity qgi that is proportional to the expression level of the gene g. We follow the methods of Love et al. (2014) (link) to estimate αg and sgi sharing information across G genes, and consider estimates as fixed for the following. We fit a GLM to the count Ygi for gene g and sample i,
YgiNB(μgi,αg)μgi=sgiqgilogqgi=Xi,*βg
where X is the standard design matrix and βg is the vector of regression coefficients specific to gene g. Usually X has one intercept column, and columns for covariates, e.g. indicators of the experimental conditions other than the reference condition, continuous covariates, or interaction terms. We consider design matrices where the first element of βg is the intercept. For clarity, we partition the βg into βg=(βg0,βg1,,βgK) , where βg0 is the intercept and βgk, k=1,,K is for kth covariate. The scaling factor sgi accounts for the differences in library sizes, gene length (Soneson et al., 2015 (link)) or sample-specific experimental biases (Patro et al., 2017 (link)) between samples, and is used as an offset in our model.
In the GLM, we use the logarithmic link function. In the apeglm software, the estimated coefficients and corresponding SD estimates are reported on the same natural log scale. The apeglm method can be easily called from DESeq2’s lfcShrink function, which provides LFC estimates on the log2 scale. The apeglm method and software is generic for GLMs and can be used with other likelihoods. For example, it can be used for the Beta Binomial or zero-inflated NB model, as long as estimates for the additional parameters, e.g. dispersion or the zero component parameters, are provided. An example of apeglm applied to Beta Binomial counts, as could be used to detect differential allele-specific expression, is provided in the software package vignette.