Our search for clinically annotated ovarian cancer microarray studies identified 21 published studies, which provided 23 publicly available data sets from various sources (Table 1). The search not only targeted studies of primary tumours annotated with patient survival but also included studies providing other potentially valuable clinical annotation. Other main factors of interest included drug resistance, outcome of the primary tumour debulking surgery, histology, stage and grade. We excluded studies not measuring gene expression (i.e. studies of genomic copy number), studies of cell lines, animal models, or non-primary tumours, and data sets not providing clinical information. Expression and clinical data were obtained from the two major public repositories GEO (i) and ArrayExpress (ii), otherwise from supplementary data of the original publications. Data from GEO were obtained using the GEOquery package (31 (link)). Clinical annotations were manually curated using one R script per data set, and original uncurated annotations were retained as a single field. Curated annotations were checked by syntax against a template, which standardized all the known clinically relevant indicators and allowable data values. Clinical data were twice independently curated (authors B.G. and T.R.), and all discrepancies were resolved for the final version. The availability of clinical data varied substantially across datasets (Figure 2).

Available clinical annotation. This heatmap visualizes for each curated clinical characteristic (rows) the availability in each data set (columns). Red indicates that the corresponding characteristic is available for at least one sample in the data set. See Table 2 for descriptions of these characteristics.