Expression profile datasets from the Fred Hutchinson Cancer Research Center (FHCRC) and from the University of Texas M.D. Anderson Cancer Center (MDACC) were used. The FHCRC dataset, used for discovery, comprised the gene expression profiles of 167 tumor samples from OSCC patients and 45 normal oral mucosa samples from patients without oral cancer, all treated at the University of Washington Medical Center, the Harborview Medical Center, or the Veterans Affairs Puget Sound Health Care System during 2003 to 2007. The FHCRC dataset also contained information on patient demographics, medical and lifestyle history, vital status, cause of death, and tumor characteristics, including tumor site, stage, and HPV status. The specimen collection, storage protocols and assays for specimen processing, and generation and normalization of raw expression data were performed as previously described [8 (link)]. The MDACC dataset, used for validation, contained the gene expression profiles of 103 samples generated from residual tumor tissue, taken from the MDACC Head and Neck Tumor Bank. Demographic characteristics, tumor site, stage, vital status, and cause of death were abstracted from the medical record and the MDACC Tumor Registry. The gene expression and other data from these tissue samples were analyzed under protocol DR09-0664 and de-identified data were shared with the FHCRC under MT2010-6749. Among 103 samples from the MDACC dataset, 74 were cancer samples from OSCC patients, 24 were matched normal oral samples from OSCC patients, and 5 were normal oral samples from other cancer patients treated in the MDACC Head and Neck Center. The gene expression data obtained from the MDACC were extracted and normalized, also using the gcRMA algorithm but as implemented in Partek® Genomics Suite™ software. The microarray data can be found in the Gene Expression Omnibus database under the accession number GSE41613 and GSE42743