Tumors were classified according to site as follows: oral cavity (including tongue, buccal mucosa, gingival, hard palate, retromolar trigone and floor of mouth) vs. oropharynx (including tonsils, soft palate, uvula, oropharynx and base of tongue).
Gene expression values for the ~54,000 probe sets were first extracted from probe intensity values (CEL files) using the gcRMA algorithm. We then eliminated the probe sets that either showed no variation across the samples (inter-quartile range less than 0.1 on log2 scale) or that were expressed at very low magnitude (the maximum of the expression value across the samples is less than 3 on log 2 scale). These exclusions helped to limit the number of statistical tests applied when detecting differences between HPV-positive and HPV-negative tumors. After these two filtering processes, ~21,000 probe sets remained for further analysis.
Statistical tests were carried out to compare HPV-positive and HPV-negative OSCC using a regression framework implemented in GenePlus software (http://www.enodar.com/). To control for the type I error rate, we chose to declare a particular group of genes either “upregulated/overexpressed” or “downregulated/underexpressed” based on a pre-specified Number of False Discoveries (NFD)12 (link). The choice of NFD, with an appropriate account for the number of genes under investigation (J), dictates the threshold for individual gene-specific p-values as NFD/J.
To determine whether the probe sets identified in the above analysis were up- or downregulated when compared to normal oral tissue, for each probe set we compared the mean expression values of each cancer group with those of controls using linear regression, calculating a robust estimator of variance, and accounting for the fact that multiple samples were tested for some subjects. The probe sets were then placed in order by ascending p-value, and a cutoff of 0.05 was chosen to indicate significant differences in expression.
The functional roles of the genes differentially expressed between HPV-positive and HPV-negative OSCC were assessed through the use of Ingenuity Pathways Analysis, IPA 5.0 (Ingenuity® Systems, www.ingenuity.com). The function analysis identified the biological functions by performing Fischer’s exact tests to test the null hypothesis that the set of differentially expressed genes were not representative of each biological function.