The raw data of GSE85195 [10 (link)] and GSE25099 [11 (link)] were downloaded from Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) and processed using R software (4.0.5). Agilent's microarray sequencing chip was used in GSE85195; the sample type was Homo sapiens tissue samples; and the sequencing platform was the GPL6480 (Agilent-014850 Whole Human Genome Microarray 4 × 44 K G4112 F). The samples included 1 normal tissue, 15 OLK tissues, and 34 OSCC tissues. Affymetrix's microarray sequencing chip was used in GSE25099; the sample type was Homo sapiens tissue samples; and the sequencing platform was the GPL5175 (Affymetrix Human Exon 1.0 ST Array). The samples included 22 normal tissues and 57 OSCC tissues. The detailed information is shown in Table 1. The normalizeBetweenArrays function of the limma package [12 (link)] and the RMA method of the affy [13 (link)] package was used to perform data standardization, normalization, and gene annotation, remove probes without annotation information, take the average expression when the same probe appears multiple times, and take the common gene combined data in the two datasets. The ComBat method of the sva package [14 (link)] was used to remove batch effects between multiple datasets to obtain a gene expression matrix.
Free full text: Click here