We developed a text-mining-based data parsing workflow and collected tumor scRNA-seq datasets of human from GEO (16 (link)) and ArrayExpress (17 (link)). We searched the single-cell-related keywords such as ‘single cell RNA sequencing’ or ‘scRNAseq’ or ‘single cell’ or ‘single-cell’, as well as the technology-related keywords like ‘microfluidics’, ‘10X Genomics’ and ‘SMARTseq’, and the tumor-related keywords such as ‘tumor’ or ‘cancer’ or ‘carcinoma’ in the description page of GEO or ArrayExpress. Each dataset was then manually confirmed and curated. A total of 118 cancer-related scRNA-seq datasets were obtained initially and were further filtered to keep the datasets with >1000 high-quality cells. To expand the utility of TISCH, we also included the scRNA-seq datasets of mice treated with immunotherapy and three scRNA-seq datasets of human peripheral blood mononuclear cells (PBMC) from 10X Genomics. Overall, the TISCH database contains 76 high-quality tumor datasets across 27 cancer types and three PBMC datasets (Supplementary Table S1). We downloaded the expression matrix of the raw count, TPM or FPKM (if available) for each dataset. We collected sample information from databases or the original studies, such as the patient ID, tissue origin, treatment condition, response groups and the original cell-type annotation. Notably, we processed each cancer type separately if a dataset contained multiple cancer types. The source code for processing all the collected scRNA-seq datasets are deposited at the Github repository (https://github.com/DongqingSun96/TISCH/tree/master/code)