Our bioinformatic workflow was based on a primary quality check with FastQC and MultiQC to ensure read quality was acceptable and transcriptomes were comparable among samples [10 ,11 (link)]. Secondly, reads were mapped on the human genome to identify expressed genes. To determine transcript abundance, read quantification was performed obtaining raw counts. Subsequently, to ensure comparability, raw counts were normalized as counts per million (CPM) and transcripts per million (TPM), which were required for differential gene expression (DGE) and principal component analysis (PCA), respectively.
In detail, STAR was adopted to map paired reads on the reference human genome hg38 [12 (link)]; while duplicates’ removal, sorting, and indexing were performed with Samtools [13 (link)]. Gene expression was quantified and normalized as CPM and TPM, after obtaining raw gene counts with the Python package HTseq-count [14 (link)]. Starting from CPM, differentially expressed genes (q-value < 0.05, FDR-corrected) between either the leiomyosarcoma or osteosarcoma case and the group including angiosarcomas (CS6 and CS7) and intimal sarcomas (CS1, CS2, CS3, CS4 and CS5) were identified with the R-bioconductor package edgeR [15 (link)]. Using the R package prcomp [16 (link)] instead, TPM were employed to perform PCA, which also included intimal sarcomas (n = 5) and angiosarcomas (n = 2).
Additionally, gene fusion detection and small variant calling combined with variant annotation via Nirvana were performed on BaseSpace, an Illumina webtool (http://euc1.sh.basespace.illumina.com/ accessed on 4 June 2022). For gene-fusion detection, protein-coding and lncRNA genes were included to reduce false positives.
The vcf output files were filtered to include only stop gain/splice donor/splice acceptor/splice region/frameshift indels/inframe deletion/inframe insertion/initiator codon/ATG loss/missense mutations predicted as deleterious or probably damaging by SIFT predictor, with a GnomAD Frequency < 0.01, a QC metric quality ≥ 30, a total read depth > 14, an alt allele depth > 3, and a variant read frequency > 0.15. Finally, only genes belonging to Tier 1 of the Cancer Gene Census list (https://cancer.sanger.ac.uk/census accessed on 4 June 2022) were considered for further analysis. Sequencing data of both samples were analyzed using the DRAGEN RNA app (version 3.10.4).
Free full text: Click here