In general, principal component analysis used all samples including outliers and was performed using SAS v9.4 PROC PRINCOMP (no further options). The analysis was conducted on log-transformed signal, centered by gene.
Hierarchical clustering in general used correlation as a measure of similarity and centroid-based linkage.
Simple differential analysis of groups using the RNA-Seq data was performed using two methods. In one method, the well-known t-statistic with a mildly stringent unadjusted p-value of 0.001 was combined with a fold change threshold of 1.5 to generate comparator lists. In a comparator method, we used the RNA-Seq differential method DESeq228 (link) with the same fold change threshold and a (multiple testing) corrected or adjusted p-value of 0.01. The linear model for the RNA-Seq analysis utilized 148 of the original 150 samples (specimens 4-4 and 51-4 omitted). The linear model was performed using SAS v9.4 PROC MIXED with subject as a random effect and terms for fixed effects of tissue type, collection site, and preservation protocol. Only effects with unadjusted p < 0.001 were kept for meta-analysis across genes.
We used Levene’s test (two-sided) for homogeneity of variance (SAS) when examining variation in miRNA expression by protocol.
Free full text: Click here