We compare the performance of CellChat with three other tools, including SingleCellSignalR
9 , iTALK
10 , and CellPhoneDB
16 (link) . We compare our database CellChatDB with other existing analogous databases, including CellTalkDB
71 , CellPhoneDB
16 (link), iTALK
10 , SingleCellSignalR
9 , Ramilowski2015
72 (link), NicheNet
13 (link) and ICELLNET
73 . SingleCellSignalR scores a given ligand-receptor interaction between two cell populations using a regularized product score approach based on average expression levels of a ligand and its receptor and an ad hoc approach for estimating an appropriate score threshold. iTALK identifies differentially expressed ligands and receptors among different cell populations and accounts for the matched ligand-receptor pairs as significant interactions. CellPhoneDB v2.0 predicts enriched signaling interactions between two cell populations by considering the minimum average expression of the members of the heteromeric complex and performing empirical shuffling to calculate which ligand–receptor pairs display significant cell-state specificity. The detailed description of how these methods were performed is available in Supplementary Note
3.
Both CellChat and CellPhoneDB, but not SingleCellSignalR, and iTALK, consider multi-subunit structure of ligands and receptors to represent heteromeric complexes accurately. To evaluate the effect of neglecting multi-subunit structure of ligands and receptors, we compute false positive rates for the tools that use only one ligand and one receptor gene pairs. The false positive interactions are defined by the interactions with multi-subunits that are partially identified by iTALK and SingleCellSignalR. The ground truth of the interactions with multi-subunits is based on our curated CellChatDB database. For example, for
Tgfb1 ligand and its heteromeric receptor
Tgfbr1/Tgfbr2 curated in CellChatDB, if the method only identifies one of the two pairs (Tgfb1–Tgfbr1 and Tgfb1–Tgfbr2), then we consider this prediction as one false positive interaction.
We performed subsampling of scRNA-seq datasets using a ‘geometric sketching’ approach, which maintains the transcriptomic heterogeneity within a dataset with a smaller subset of cells
96 (link). We evaluated the robustness of inferred interactions from subsampled datasets using three measures, including TPR, FPR, and ACC, which were defined in Supplementary Note
3. Note that such subsampling analysis was used to evaluate the consistency rather than accuracy.