We illustrate our novel batch allocation algorithm (Additional file 1) in a hypothetical case–control study. In order to create a biologically plausible scenario, we downloaded a publicly available microarray gene expression dataset from NCBI GEO, GSE50397. The dataset includes gene expression data from 89 human pancreas islet donors. The samples were obtained from Nordic Islet Transplantation Programme, Uppsala University, see http://www.nordicislets.org for more information about islet processing and isolation. Data processing is described in greater detail in previous publications [27 (link)–29 (link)]. In brief, microarray profiling was performed using the Affymetrix GeneChip® Human Gene 1.0 ST whole transcript platform. Using the oligo R package, the Robust Multi-array Analysis (RMA) method was used to summarize and normalize the array data. Batch correction was performed using COMBAT function from SVA package [7 (link)]. The top 10,000 most variable genes from the batch corrected dataset (downloaded from GEO) were used for all subsequent analyses as the ‘true’ gene expression values.
Free full text: Click here