a) Identification of differentially expressed signaling genes. To infer the cell state-specific communications, we first identified differentially expressed signaling genes across all cell groups within a given scRNA-seq dataset, using the Wilcoxon rank sum test with the significance level of 0.05.
b) Calculation of ensemble average expression. To account for the noise effects, we calculated the ensemble average expression of signaling genes in a given cell group using a statistically robust mean method: where Q1, Q2, and Q3 is the first, second and third quartile of the expression levels of a signaling gene in a cell group.
c) Calculation of intercellular communication probability. We modeled ligand-receptor mediated signaling interactions using the law of mass action. Since the physical process of ligand-receptor binding involves protein-protein interactions, we used a random walk based network propagation technique83 (link),84 (link) to project the gene expression profiles onto a high-confidence experimentally validated protein-protein network from STRINGdb83 (link),85 (link). Based on the projected ligand and receptor profiles, the communication probability Pi,j from cell groups i to j for a particular ligand-receptor pair k was modeled by: Here Li and Rj represent the expression level of ligand L and receptor R in cell group i and cell group j, respectively. The expression level of ligand L with m1 subunits (i.e., ) was approximated by their geometric mean, implying that the zero expression of any subunit leads to an inactive ligand. Similarly, we computed the expression level of receptor R with m2 subunits. In addition, co-stimulatory and co-inhibitory membrane-bound receptors are capable of modulating signaling via the control of receptor activation86 (link). For the ligand-receptor pair with multiple co-stimulatory receptors, we computed the average expression of these co-stimulatory receptors (denoted by RA) and then used a linear function to model the positive modulation of the receptor expression. For each ligand-receptor pair with multiple co-inhibitory receptors, we modeled them using the same approach. A Hill function was used to model the interactions between L and R with a parameter Kh whose default value was set to be 0.5 as the input data has a normalized range from 0 to 1. The extracellular agonists and antagonists from both sender and receiver cells are able to directly or indirectly modulate the ligand-receptor interaction86 (link). For the ligand-receptor pair with multiple soluble agonists, we computed the average expression of these agonists (denoted by AG) and then used a Hill function to model the positive modulation of the ligand-receptor interaction. For the ligand-receptor pair with multiple soluble antagonists, we modeled them using the same approach. The effect of cell proportion in each cell group was also included in the probability calculation when analyzing unsorted single-cell transcriptomes, where ni and nj are the numbers of cells in cell groups i and j, respectively, and n is the total number of cells in a given dataset. Together, the communication probabilities among all pairs of cell groups across all pairs of ligand-receptor were represented by a three-dimensional arrayP (K × K × N), where K is the number of cell groups and N is the number of ligand-receptor pairs or signaling pathways. The communication probability of a signaling pathway was computed by summarizing the probabilities of its associated ligand-receptor pairs. It should be noted that we did not perform normalization along the second dimension of P such that because the normalized data are not suitable for comparing the communication probability between different cell groups across multiple signaling pathways. The communication probability here only represents the interaction strength and is not exactly a probability.
d) Identification of statistically significant intercellular communications. The significant interactions between two cell groups are identified using a permutation test by randomly permuting the group labels of cells, and then recalculating the communication probability Pi,j between cell group i and cell group j through a pair of ligand L and receptor R. The p-value of each Pi,j is computed by: where the probability Pi,j(m) is the communication probability for the m-th permutation. M is the total number of permutations (M = 100 by default). The interactions with p-value <0.05 are considered significant.
b) Calculation of ensemble average expression. To account for the noise effects, we calculated the ensemble average expression of signaling genes in a given cell group using a statistically robust mean method: where Q1, Q2, and Q3 is the first, second and third quartile of the expression levels of a signaling gene in a cell group.
c) Calculation of intercellular communication probability. We modeled ligand-receptor mediated signaling interactions using the law of mass action. Since the physical process of ligand-receptor binding involves protein-protein interactions, we used a random walk based network propagation technique83 (link),84 (link) to project the gene expression profiles onto a high-confidence experimentally validated protein-protein network from STRINGdb83 (link),85 (link). Based on the projected ligand and receptor profiles, the communication probability Pi,j from cell groups i to j for a particular ligand-receptor pair k was modeled by: Here Li and Rj represent the expression level of ligand L and receptor R in cell group i and cell group j, respectively. The expression level of ligand L with m1 subunits (i.e., ) was approximated by their geometric mean, implying that the zero expression of any subunit leads to an inactive ligand. Similarly, we computed the expression level of receptor R with m2 subunits. In addition, co-stimulatory and co-inhibitory membrane-bound receptors are capable of modulating signaling via the control of receptor activation86 (link). For the ligand-receptor pair with multiple co-stimulatory receptors, we computed the average expression of these co-stimulatory receptors (denoted by RA) and then used a linear function to model the positive modulation of the receptor expression. For each ligand-receptor pair with multiple co-inhibitory receptors, we modeled them using the same approach. A Hill function was used to model the interactions between L and R with a parameter Kh whose default value was set to be 0.5 as the input data has a normalized range from 0 to 1. The extracellular agonists and antagonists from both sender and receiver cells are able to directly or indirectly modulate the ligand-receptor interaction86 (link). For the ligand-receptor pair with multiple soluble agonists, we computed the average expression of these agonists (denoted by AG) and then used a Hill function to model the positive modulation of the ligand-receptor interaction. For the ligand-receptor pair with multiple soluble antagonists, we modeled them using the same approach. The effect of cell proportion in each cell group was also included in the probability calculation when analyzing unsorted single-cell transcriptomes, where ni and nj are the numbers of cells in cell groups i and j, respectively, and n is the total number of cells in a given dataset. Together, the communication probabilities among all pairs of cell groups across all pairs of ligand-receptor were represented by a three-dimensional array
d) Identification of statistically significant intercellular communications. The significant interactions between two cell groups are identified using a permutation test by randomly permuting the group labels of cells, and then recalculating the communication probability Pi,j between cell group i and cell group j through a pair of ligand L and receptor R. The p-value of each Pi,j is computed by: where the probability Pi,j(m) is the communication probability for the m-th permutation. M is the total number of permutations (M = 100 by default). The interactions with p-value <0.05 are considered significant.
Full text: Click here