Obtain the FPKM expression matrix for lung adenocarcinoma from the Genomic Data Commons (GDC) repository (https://portal.gdc.cancer.gov/ ). Gene expression profiles of lung adenocarcinoma were queried from the GEO Gene Expression Dataset (https://www.ncbi.nlm.nih.gov/geo/ ) and microarray and high-throughput sequencing transcriptome data GSE42127, GSE72094, GSE26939, GSE31547, GSE19188, GSE14814, GSE37745, GSE5828 (BULK transcriptome data for validation), GSE135222, PRJEB23709, phs000452 (BULK transcriptome data for validation of impact on immunotherapy); we applied a text-mining-based data parsing workflow to collect the TISCH2 database (http://tisch.compgenomics.org/ ) of GSE117570 lung adenocarcinoma single-cell dataset, with all genes expressed in at least 3 cells, at least 200 genes per cell, UMIs retaining reads in the range 500-6500 depending on distribution, and percentage of mitochondrial reads < 80%; single-cell data as UMI transcript matrices as well as cellular information matrices, merged into Seraut objects required for analysis, using LogNormalize to normalise the data and check for batch effects between samples by UMAP and found no significant batch effects between samples.
Free full text: Click here