Information for the HNSCC training set was obtained from TCGA on November 13, 2016 (18 (link)). Gene expression data were extracted from IlluminaHiSeq_RNASeqV2 platform and normalized by RSEM method (19 (link)). In addition, we performed quality control with a total of 20530 genes. Genes with more than half of values as zero were removed, 17711 genes remained with quantile normalization. Patients with complete follow-up information and gene expression values for tumor tissues were included in the study. Information for the HNSCC testing set was collected from GSE65858 (20 (link)) in GEO. Gene expression data were extracted from Illumina HumanHT-12 V4.0 expression beadchip and normalized using the robust spline normalization (RSN) method (21 (link)). Consecutive patients with primary and metachronous secondary HNSCC of oral cavity, larynx, oro- and hypopharynx were included, while tumor cell lines and those with low quality assays were excluded. All gene expression values were log2-transformed and standardized for comparability between the training and testing sets.