We considered four external expression data sets from enriched/purified immune cells: two microarray data sets (GEO accession: GSE28490 and GSE2849) [27 (
link)], an RNA-seq data set [28 (
link)], and a microarray compendium that was used to build the CIBERSORT LM22 signature matrix [17 (
link)]. All data sets were preprocessed and normalized as explained in the previous paragraphs. For each gene
g specific for a cell type
c in the signature matrix, we computed the ratio
Rgd between the median expression across all libraries in data set
d belonging to the cell type
c and the median expression across all libraries in data set
d not belonging to the cell type
c. For each cell type, the top 30 ranked signature genes (or less, when not available) with median
d(
Rgd) ≥ 2 were selected for the final signature matrix. When processing the T
reg signature genes, the data sets belonging to CD4
+ T cells were not considered. T
reg signature genes were further filtered with a similar approach, but considering the RNA-seq data of circulating CD4
+ T and T
reg cells from and selecting only the genes with median
d(
Rgd) ≥ 1.
The final signature matrix TIL10 (Additional file
1) was built considering the 170 genes satisfying all the criteria reported above. The expression profile of each cell type
c was computed as the median of the expression values
xgl over all libraries belonging to that cell type:
For the analysis of RNA-seq data, quanTIseq further reduces this signature matrix by removing a manually curated list of genes that showed a variable expression in the considered data sets:
CD36,
CSTA,
NRGN,
C5AR2,
CEP19,
CYP4F3,
DOCK5,
HAL,
LRRK2,
LY96,
NINJ2,
PPP1R3B,
TECPR2,
TLR1,
TLR4,
TMEM154, and
CD248. This default signature considered by quanTIseq for the analysis of RNA-seq data consists of 153 genes and has a lower condition number than the full TIL10 signature (6.73 compared to 7.45), confirming its higher cell specificity. We advise using the full TIL10 matrix (--rmgenes=“none”) for the analysis of microarray data, as they often lack some signature genes, and the reduced matrix (--rmgenes= “default”) for RNA-seq data. Alternatively, the “rmgenes” option allows specifying a custom list of signature genes to be disregarded (see quanTIseq manual).