In addition to the UT Lung SPORE data, 7 public NSCLC microarray datasets (10 (link), 13 , 17 (link), 26 (link)–29 (link)) were used in this study. The National Cancer Institute Director’s Challenge Consortium study (Consortium dataset)(13 ), which is the largest independent public available lung cancer microarray dataset and involves 442 resected ADCs, was used as the training set. Six datasets were used to validate the prognosis signature: UT lung SPORE data, GSE3141 (ADC n=58, SCC n=53), GSE8894 (ADC n=62, SCC n=76), GSE11969 (ADC n=90, and SCC n=35), GSE13213 (ADC n=117), GSE4573 (SCC n=129). Among these 6 datasets, three (GSE 13213, GSE8894 and GSE11969) are Asian cohorts. Two datasets were used to validate the predictive signature: UT lung SPORE data and GSE14814 that includes 90 samples (49 patients with vinorelbine plus cisplatin ACT and 41 patients without ACT) collected from the JBR.10 trial. Table 1 provides detailed information on these datasets. Since 43 out of 133 samples in the original JBR.10 dataset (GSE14814) were also included in the Consortium data (training set), these 43 samples were excluded from the JBR.10 dataset to ensure the independence between the training and validation sets.