Monocle assigns each cell a pseudotime value and a “State” encoding the segment of the trajectory it resides upon based on the PQ-tree algorithm (see the supplemental material for Trapnell and Cacchiarelli et al for further information18 ). Transcript counts values were variance-stabilized49 via the technique described by Anders and Huber prior to tree construction.
In Monocle 2, we extended the capability to test for branch-dependent gene expression by formulating the problem as a contrast between two negative binomial GLMs.
The null model
NB(Census counts)sm.ns(Pseudotime) for the test assumes the gene being tested is not a branch specific gene, whereas the alternative model:
NB(Census counts)sm.ns(Pseudotime)+Branch+sm.ns(Pseudotime):Branch assumes that the gene is a branch specific gene where : represents an interaction term between branch and transformed pseudotime, NB means negative binomial distribution. Each model includes a natural spline (here with three degrees of freedom) describing smooth changes in mean expression as a function of pseudotime. The null model fits only a single curve, whereas the alternative will fit a distinct curve for each branch. Our current implementation of Monocle 2 relies on VGAM’s “smart” spline fitting functionality, hence the use of the sm.ns() function instead of the more widely used ns() function from the splines package in R50 (link). Likelihood ratio testing was performed with the VGAM lrtest() function, similar to Monocle’s other differential expression tests50 (link). A significant branch-dependent genes means that the gene has distinct expression dynamics along each branch, with smoothed curves that have different shapes.
To fit the full model, each cell must be assigned to the appropriate branch, which is coded through the factor “Branch” in the above model formula. Monocle’s function for testing branch dependence accepts an argument specifying which branches are to be compared. These arguments are specified using the ‘State’ attribute assigned by Monocle during trajectory reconstructions. For example, in our analysis of the Truetlein et al data 25 (link), Monocle reconstructed a trajectory with two branches (LAT1, LAT2 for AT1 and AT2 lineages, respectively), and three states (SBP, SAT1, SAT2 for progenitor, AT1, or AT2 cells). The user specifies that he or she wants to compare LAT1 and LAT2 by providing SAT1 and SAT2 as arguments to the function. Monocle then assigns all the cells with state SAT1 to branch LAT1 and similarly for the AT2 cells. However, the cells with SBP must be members of both branches, because they are on the path from each branch back to the root of the tree. In order to ensure the independence of data points required for the LRT as well as the robustness and stability of our algorithm, we implemented a strategy to partition the progenitor cells into two groups, with each branch receiving a group. The groups are computed by simply ranking the progenitor cells by pseudotime and assigning the odd-numbered cells to one group and the even numbered cells to the other. We assign the first progenitor to both branches to ensure they start at the same time which is required for downstream spline fitting and clustering. The branch plots in Figure 3d visualize the branch specific spline curves fit by this method.