Whole-transcriptome profiles were generated for 263 patients using TruSeq RNA Access technology (Illumina). RNA-seq reads were first aligned to ribosomal RNA sequences to remove ribosomal reads. The remaining reads were aligned to the human reference genome (NCBI Build 38) using GSNAP53 (link),54 (link) version 2013-10-10, allowing a maximum of two mismatches per 75 base sequence (parameters: ‘-M 2 -n 10 -B 2 -i 1 -N 1 -w 200000 -E 1 –pairmax-rna = 200000 -clip-overlap). To quantify gene expression levels, the number of reads mapped to the exons of each RefSeq gene was calculated using the functionality provided by the R/Bioconductor package GenomicAlignments55 (link).
Gene signatures were defined as follows: Angio23 (link): VEGFA, KDR, ESM1, PECAM1, ANGPTL4, and CD34; Teff24 (link): CD8A, EOMES, PRF1, IFNG, and CD274; myeloid inflammation29 (link)–33 : IL-6, CXCL1, CXCL2, CXCL3, CXCL8, and PTGS2. These three gene expression signatures were defined based on previously published associations with their respective biology.
To calculate scores for each of these signatures, counts were first normalized using edgeR’s normalization factors56 (link), followed by filtering out genes with low coverage (i.e., not reaching 0.25 CPM (counts per million) in at least one-tenth of available samples) and log2-transformation using limma’s voom57 (link). Then for each sample, the average expression of all genes in a given signature was computed, and is reported as the sample’s signature score. For each gene signature, patients were divided into two groups based on the median gene signature score of all tumors: high gene signature expression was defined as expression at or above median levels, and low gene signature expression was defined as expression below the median.
For the heatmap (Fig. 2a), each patient was placed into high or low groups for all three gene expression signatures: Angio, Teff, and myeloid inflammation (based on median expression, as described above). Subsequently, patients were sorted by the combination of these three groups: first TeffHighAngioLow patients are shown, sorted by myeloid inflammation low/high; then TeffHighAngioHigh patients are shown, sorted by myeloid inflammation high/low; then TeffLowAngioHigh patients are shown, sorted by myeloid inflammation low/high; finally, TeffLowAngioLow patients are shown, sorted by myeloid inflammation high/low. Also, the ordering of the genes was predetermined, based on biological function. Z-score-transformed normalized counts are shown.