In detail, STAR was adopted to map paired reads on the reference human genome hg38 [12 (link)]; while duplicates’ removal, sorting, and indexing were performed with Samtools [13 (link)]. Gene expression was quantified and normalized as CPM and TPM, after obtaining raw gene counts with the Python package HTseq-count [14 (link)]. Starting from CPM, differentially expressed genes (q-value < 0.05, FDR-corrected) between either the leiomyosarcoma or osteosarcoma case and the group including angiosarcomas (CS6 and CS7) and intimal sarcomas (CS1, CS2, CS3, CS4 and CS5) were identified with the R-bioconductor package edgeR [15 (link)]. Using the R package prcomp [16 (link)] instead, TPM were employed to perform PCA, which also included intimal sarcomas (n = 5) and angiosarcomas (n = 2).
Additionally, gene fusion detection and small variant calling combined with variant annotation via Nirvana were performed on BaseSpace, an Illumina webtool (
The vcf output files were filtered to include only stop gain/splice donor/splice acceptor/splice region/frameshift indels/inframe deletion/inframe insertion/initiator codon/ATG loss/missense mutations predicted as deleterious or probably damaging by SIFT predictor, with a GnomAD Frequency < 0.01, a QC metric quality ≥ 30, a total read depth > 14, an alt allele depth > 3, and a variant read frequency > 0.15. Finally, only genes belonging to Tier 1 of the Cancer Gene Census list (