This dataset included 1,992 pairs of expression arrays and Affymetrix SNP 6.0 arrays profiled for tumor samples from 1,992 patients, which was divided into a discovery set (997 patients) and a validation set (995 patients)38 (link). A total of 144 expression arrays for adjacent normal tissues were provided.
We applied the DeMixT deconvolution pipeline to the expression arrays of the combined discovery and validation sets, after batch effect correction, to estimate tumor-specific proportions using the adjacent normal samples as the reference. Affymetrix CEL files were processed by PennCNV87 (link) to obtain the LogR and B allele frequency (BAF) data, followed by both ASCAT32 (link) and Sequenza49 (link) to estimate tumor purity and ploidy for each sample. The consensus TmS strategy was applied to obtain robust TmS estimations. In total, 1,664 patient samples with TmS remained after the above steps. We additionally removed 118 patient samples due to missing follow-up information of biochemical recurrence intervals or the PAM50 subtypes. A final cohort of 1,546 patient samples from both the discovery and validation sets was kept for downstream analyses. See Supplementary Notes 2.3.1 and 2.3.4 for further details.
Free full text: Click here