Gene expression values for the ~54,000 probe sets were first extracted from probe intensity values (CEL files) using the gcRMA algorithm. We then eliminated the probe sets that either showed no variation across the samples (inter-quartile range less than 0.1 on log2 scale) or that were expressed at very low magnitude (the maximum of the expression value across the samples is less than 3 on log 2 scale). These exclusions helped to limit the number of statistical tests applied when detecting differences between HPV-positive and HPV-negative tumors. After these two filtering processes, ~21,000 probe sets remained for further analysis.
Statistical tests were carried out to compare HPV-positive and HPV-negative OSCC using a regression framework implemented in GenePlus software (
To determine whether the probe sets identified in the above analysis were up- or downregulated when compared to normal oral tissue, for each probe set we compared the mean expression values of each cancer group with those of controls using linear regression, calculating a robust estimator of variance, and accounting for the fact that multiple samples were tested for some subjects. The probe sets were then placed in order by ascending p-value, and a cutoff of 0.05 was chosen to indicate significant differences in expression.
The functional roles of the genes differentially expressed between HPV-positive and HPV-negative OSCC were assessed through the use of Ingenuity Pathways Analysis, IPA 5.0 (Ingenuity® Systems,