We considered four external expression data sets from enriched/purified immune cells: two microarray data sets (GEO accession: GSE28490 and GSE2849) [27 (link)], an RNA-seq data set [28 (link)], and a microarray compendium that was used to build the CIBERSORT LM22 signature matrix [17 (link)]. All data sets were preprocessed and normalized as explained in the previous paragraphs. For each gene g specific for a cell type c in the signature matrix, we computed the ratio Rgd between the median expression across all libraries in data set d belonging to the cell type c and the median expression across all libraries in data set d not belonging to the cell type c. For each cell type, the top 30 ranked signature genes (or less, when not available) with mediand(Rgd) ≥ 2 were selected for the final signature matrix. When processing the Treg signature genes, the data sets belonging to CD4+ T cells were not considered. Treg signature genes were further filtered with a similar approach, but considering the RNA-seq data of circulating CD4+ T and Treg cells from and selecting only the genes with mediand(Rgd) ≥ 1.
The final signature matrix TIL10 (Additional file 1) was built considering the 170 genes satisfying all the criteria reported above. The expression profile of each cell type c was computed as the median of the expression values xgl over all libraries belonging to that cell type: xgc=medianlϵcxgl
For the analysis of RNA-seq data, quanTIseq further reduces this signature matrix by removing a manually curated list of genes that showed a variable expression in the considered data sets: CD36, CSTA, NRGN, C5AR2, CEP19, CYP4F3, DOCK5, HAL, LRRK2, LY96, NINJ2, PPP1R3B, TECPR2, TLR1, TLR4, TMEM154, and CD248. This default signature considered by quanTIseq for the analysis of RNA-seq data consists of 153 genes and has a lower condition number than the full TIL10 signature (6.73 compared to 7.45), confirming its higher cell specificity. We advise using the full TIL10 matrix (--rmgenes=“none”) for the analysis of microarray data, as they often lack some signature genes, and the reduced matrix (--rmgenes= “default”) for RNA-seq data. Alternatively, the “rmgenes” option allows specifying a custom list of signature genes to be disregarded (see quanTIseq manual).
Free full text: Click here