To find the most parsimonious B-cell lineage tree, we model the problem as a multi-objective optimisation problem. Thus, we have two objective functions: the first one minimises the sum of edge weights, while the second function maximises the genotype abundance. There are many methods to solve a multi-objective problem [28 (link)]; we used the hierarchical optimisation criteria [29 (link)], in which two or more objective functions are ranked from the most to the least important, and are optimised in this pre-established order. The first function can be modelled by some minimum spanning tree algorithm, such as Prim’s [26 (link)] or Kruskal’s [27 (link)]. Both are greedy approaches and present low time complexity. However, Prim’s algorithm runs faster than Kruskal’s in dense graphs [30 ]. Therefore, we modified Prim’s algorithm to incorporate the second objective function. We start at the root and add all its neighbours with minimum edge weight to a priority queue. We then iteratively extract from the priority queue the node with the lowest edge weight (the first objective function) and highest genotype abundance (the second objective function). If no cycle is formed, the node and the edge are added to the tree. For each added node, all its neighbours with minimum edge weights are included in the priority queue. We keep on adding nodes and edges until we cover all nodes. To decrease the time complexity of the algorithm, we add each node only once to the priority queue.
We highlight the fact that the original Prim’s algorithm has only one objective function, which minimises the sum of edge weights (cost). Here we included a second objective function to maximise genotype abundance. If a set of edges have the same weight, we will choose the one that connects nodes with high abundance. Prim’s algorithm has a time complexity of O(|V|2) in the worst case but can be improved up to O(|E|+log|V|) when using data structures based on Fibonacci heaps [31 ]. Figure 1 shows a simple example of the tree construction process.

ClonalTree construction example. We start with a connected weighted graph (a) where nodes represent BCR genotype sequences, edge weights their distances, and node weights their abundances. The graph can be fully connected, or one can disable edges whose weight is lower than a threshold \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\delta$$\end{document}
δ
. Then, we first place the inferred ancestral sequence (the root) (b) and iteratively add nodes to the tree with the lowest edge weight and highest genotype abundance (c, d); when edges have the same weight (e), we choose that connected to the node with the highest abundance (f), we repeat until all nodes were added to the tree (g), the final tree is shown in (h)

Free full text: Click here