For a two-library comparison, we use the sage.test function from the CRAN statmod package [28 ] to calculate a Fisher exact P-value for each gene. To apply TMM normalization, we replace the original library sizes with 'effective' library sizes. For two libraries, the effective library sizes are calculated by multiplying/dividing the square root of the estimated normalization factor with the original library size.
For comparisons with technical replicates, we followed the analysis procedure used in the Marioni et al. study [6 (link)]. Briefly, it is assumed that the counts mapping to a gene are Poisson-distributed, according to:
where represents the fraction of total reads for gene g in experimental condition zk. Their analysis utilizes an offset to account for the library size and a likelihood ratio (LR) statistic to test for differences in expression between libraries (that is, H0:μg1 = μg2). In order to use TMM normalization, we augment the original offset with the estimated normalization factor. The same LR testing framework is then used to calculate P-values for DE between tissues. We modified this analysis to use an exact Poisson test for testing the difference between two replicated groups. The strategy is similar in principle to the Fisher's exact test: conditioning on the total count, we calculated the probability of observing group counts as or more extreme than what we actually observed. The total and group total counts are all Poisson distributed.
We re-implemented the method from Cloonan et al. [12 (link)] for the analysis of simulated data using a custom R [29 ] script.
Free full text: Click here