Note that this is a challenging study design for batch effect adjustment: the control samples are balanced across batches, while each of the 3 kinds of treated cells, with different levels of biological signals, is completely nested within a single batch. A favorable adjustment would pool control samples from the three batches, while keeping all treated cells separated from the controls and from each other.
We combined the three batches and performed batch correction. Among the batch correction methods considered, only RUV-seq, the original ComBat used on logged and normalized data and ComBat-seq output adjusted data. We apply these methods to address the batch effects in the pathway signature dataset. We compared ComBat-seq with the other methods, both qualitatively through principal component analysis (PCA) and quantitatively with explained variations by condition and batch.
The ‘one-step’ approach and SVA-seq are not considered in PCA because they do not generate adjusted data after batch correction. For RUV-seq, we do not know which genes are appropriate for negative control genes, unlike in the simulation studies. Therefore, we used the RUVs method, which is more robust to the choices of negative control genes than RUVg (3 (link)). We computed the least DE genes within each batch for the 3 activated pathways (FDR > 0.95), and took the overlapping genes across pathways as the negative controls.