Genome-wide expression was measured in liver and kidney using RNA-seq on the Illumina GA I and hybridization of the same samples to Affymetrix HG-U133 Plus 2.0 arrays. The sample preparation and data analysis was designed to maximize the similarity between the microarray and RNA-seq experiments (see Marioni et al. [21 (link)]). Differential expression between kidney and liver was determined using an empirical Bayes modified t-statistic on the microarray platform and P-values for DE were downloaded from their website. For the RNA-seq experiment, the data were normalized using TMM normalization [27 ] and a negative binomial exact test was used to determine DE [16 (link)]. To test the GOseq method, we used the genes called DE from the microarray experiment to calculate the significance of over-representation of each GO category using the standard GO analysis methods. We also calculated P-values for each GO category being over-represented among genes that were DE in the RNA-seq data, using both the GOseq and hypergeometric methods. GOseq's ability to outperform the hypergeometric method, as measured by its ability to reproduce the results of the microarray GO analysis, was quantified by calculating a P-value for the difference in the two methods being due to chance. To do this, a NULL was chosen under which both methods were equally likely to correctly recover each microarray GO category, with this likelihood given by a binomial distribution.
Free full text: Click here