Whole-transcriptome profiles were generated for 263 patients using TruSeq RNA Access technology (Illumina). RNA-seq reads were first aligned to ribosomal RNA sequences to remove ribosomal reads. The remaining reads were aligned to the human reference genome (NCBI Build 38) using GSNAP
53 (link),54 (link) version 2013-10-10, allowing a maximum of two mismatches per 75 base sequence (parameters: ‘-M 2 -n 10 -B 2 -i 1 -N 1 -w 200000 -E 1 –pairmax-rna = 200000 -clip-overlap). To quantify gene expression levels, the number of reads mapped to the exons of each RefSeq gene was calculated using the functionality provided by the R/Bioconductor package GenomicAlignments
55 (link).
Gene signatures were defined as follows: Angio
23 (link):
VEGFA, KDR, ESM1, PECAM1, ANGPTL4, and
CD34; T
eff24 (link):
CD8A, EOMES, PRF1, IFNG, and
CD274; myeloid inflammation
29 (link)–33 :
IL-6, CXCL1, CXCL2, CXCL3, CXCL8, and
PTGS2. These three gene expression signatures were defined based on previously published associations with their respective biology.
To calculate scores for each of these signatures, counts were first normalized using edgeR’s normalization factors
56 (link), followed by filtering out genes with low coverage (i.e., not reaching 0.25 CPM (counts per million) in at least one-tenth of available samples) and log
2-transformation using limma’s voom
57 (link). Then for each sample, the average expression of all genes in a given signature was computed, and is reported as the sample’s signature score. For each gene signature, patients were divided into two groups based on the median gene signature score of all tumors: high gene signature expression was defined as expression at or above median levels, and low gene signature expression was defined as expression below the median.
For the heatmap (
Fig. 2a), each patient was placed into high or low groups for all three gene expression signatures: Angio, T
eff, and myeloid inflammation (based on median expression, as described above). Subsequently, patients were sorted by the combination of these three groups: first T
effHighAngio
Low patients are shown, sorted by myeloid inflammation low/high; then T
effHighAngio
High patients are shown, sorted by myeloid inflammation high/low; then T
effLowAngio
High patients are shown, sorted by myeloid inflammation low/high; finally, T
effLowAngio
Low patients are shown, sorted by myeloid inflammation high/low. Also, the ordering of the genes was predetermined, based on biological function.
Z-score-transformed normalized counts are shown.