Our goal was to identify a prognostic signature for recurrence in OSCC, based on the hypothesis that gene expression deregulation occurring in OSCC would be an early indicator of recurrence if gene expression changes are present in a subset of histologically normal surgical resection margins. We performed gene expression profiling of both resection margins and tumors with the purpose of (1) identifying over-expressed genes in tumors as potential markers of recurrence in histologically normal margins, and (2) finding a subset of those genes predictive of recurrence. In order to generate a very high-confidence gene set, we augmented the analysis of our data with a meta-analysis of five published microarray studies [24 (link)-28 (link)] to reliably identify a set of genes significantly deregulated in OSCC compared to normal oral tissues. These five public data sets were selected based on the availability of raw microarray data, as well as for the inclusion of both oral cavity tumors and either adjacent normal tissues or oral tissues from healthy individuals. Although data from Pyeon et al. [28 (link)] included HPV positive and HPV negative head and neck carcinomas from different anatomic sites, we selected oral carcinomas only, which are mainly negative for HPV infection, as shown in a recent study performed by our group [29 (link)]. This meta-analysis sample set was composed of a total of 199 samples (141 OSCCs, 38 adjacent normal tissues and 20 healthy normal tissues) from 141 oral cancer patients and 20 healthy individuals (without cancer) (Table 3). We pre-processed data from the different array platforms with updated chip definition files, as described above, to correct outdated probe mapping information from older platforms. We used a Rank Product analysis for the public studies, which considered only the ranking of genes by differential expression between pairs of samples within studies [20 (link),30 (link)], avoiding batch and platform-related effects which would occur from directly combining expression values from the different studies. Genes were selected with evidence of up-regulation in tumors with a False Discovery Rate (FDR) of 0.01 and fold-change ≥ 2. We chose to focus on over-expressed genes only, since histologically normal margins may contain only a fraction of genetically altered cells, and the presence of genetically normal cells would likely make down-regulated genes unreliable markers. By using the intersection of genes identified both by meta-analysis and the in-house array training set, we retained only genes that were reproducibly over-expressed compared to normal oral tissues from healthy patients and histologically normal margins. These strict selection criteria for gene signature candidates, based on prior hypothesis, helped to reduce the risk of over-fitting during Cox regression analysis.
Free full text: Click here