The following bulk RNA-sequencing expression profiles and corresponding clinical data were downloaded from the TCGA database (https://portal.gdc.comn = 377). Raw sequencing reads were aligned using the STAR aligner and expressed as fragments per million mapped reads (FPKM). Gene expression profiles were standardized using R (https://www.r-project.org/). Only patients with complete clinical information related to the analysis were retained. Training and testing groups were randomly assigned in a ratio of 1:1 among the patients. To establish an independent validation cohort, Clinical pathology and RNA-Seq mRNA expression data were obtained for 232 samples from the ICGC portal (https://dcc.icgc.org/projects/LIRI-JP). The UCSC Xena server was used to retrieve somatic mutations and methylation data for HCC (https://xenabrowser.net/). The GEO database was used to download data for single-cell RNA sequencing of primary HCC tissues (GSE149614, n = 10). “Seurat” and “NormalizeData” R packages were used for the standardization of the single-cell RNA-Seq data. “FingVariableGenes” R package was used for the identification of the top 3,000 highly variable genes. The determination of cell types was as shown in Supplementary Figure S1A (Malignant cell markers-GPC3, CD24, MDK, KRT18; Meyloid cell markers-CD68, AIF1, C1QA, TPSAB1; T cell markers-CD3D, CD3E, CD2; B cell markers-MZB1, MS4A1, CD79A; Fibroblast cell markers-COL1A2, COL3A1, ACTA2; Endothelial cell markers-FLT1, RAMP2, PLVAP).
Free full text: Click here