Overview of simulated data sets.
Cyclic data set | Bifurcating data set | Multifurcating data set | |
---|---|---|---|
Simulation framework | |||
Number of cells | 505–508 | 500 | 750 |
Number of genes | 312–444 | 5000 | 5000 |
% of DE genes | 42–47% | 20% | 20% |
Number of lineages | 1 | 2 | 3 |
Topology | Cyclic | Bifurcating | Multifurcating |
Number of data sets | 10 | 10 | 1 |
Each data set is simulated using one of the frameworks from the dynverse toolbox (dyngen or dyntoy), which are designed to simulate scRNA-seq data according to trajectory topologies. Each data set can be characterized by the topology of the trajectory, as well as the number of cells and genes. Low-dimensional representations of representative data sets can be found in Fig.
Prior to trajectory inference, the simulated counts are normalized using full-quantile normalization35 (link),36 (link). For TI with slingshot, we apply principal component analysis (PCA) dimensionality reduction to the normalized counts and k-means clustering in PCA space. For the bifurcating and multifurcating trajectories, the start and end clusters of the true trajectory are provided to slingshot to aid it in inferring the trajectory. For the edgeR analysis, we assess DE between the end clusters that are also provided to slingshot. The BEAM method can only test one bifurcation point at a time. For the multifurcating data set, we therefore assessed both branching points separately and aggregated the p-values using Fisher’s method37 . For the tradeSeq and edgeR analyses of the multifurcating data set, we perform global tests across all three lineages.
We assess performance based on scatterplots of the true positive rate (TPR) vs. the false discovery proportion (FDP), according to the following definitions where FN, FP, and TP denote, respectively, the numbers of false negatives, false positives, and true positives. FDP-TPR curves are calculated and plotted with the Bioconductor R package iCOBRA38 (link).