The full Bayesian sequence analysis with an uncorrelated relaxed-clock model allows the co-estimation of substitution parameters, relaxed-clock parameters, and the ancestral phylogeny. The posterior distribution is of the following form:
The vector Φ contains the parameters of the relaxed-clock model (e.g., μ and σ
2 in the case of lognormally distributed rates among branches). The term Pr
D|g,Φ,Ω is the standard Felsenstein likelihood, where
g is a tree with branch length measured in units of time. For the purposes of calculating this likelihood, branch lengths are converted to units of substitutions by multiplying the rates defined by Φ with the internode distance between node
i and parent node
j in tree
g. The tree prior,
f
G(
g|Θ), can either be a coalescent-based prior [
30 (
link),
65 (
link)] for within-population data or some other appropriate prior if the sequences come from multiple populations/species [
55 (
link)]. The vector Θ contains the hyperparameters of the tree prior. The vector Ω contains the parameters of the substitution model (such as transition/transversion ratio, κ; shape parameter for gamma-distributed rates among sites, α; and proportion of invariant sites,
p
inv).
We summarize the posterior density in
Equation 5 using samples (
g,Θ,Φ,Ω) ∼
f obtained via MCMC. If, for example, the divergence times are of primary interest then the other sampled parameters can be thought of as nuisance parameters, and vice versa.
The formulation in
Equation 5 implies that the branch-rates could be integrated analytically in the Felsenstein likelihood. Although this could be accomplished relatively easily by discretizing the rate distribution and averaging the likelihood over the rate categories on each branch, we elected to do the integration using MCMC. This was achieved by assigning a unique rate category
c ∈ 1,2,…,2
n−2 to each branch
j of the tree. During the calculation of the likelihood the rate category
c is converted to a rate by the following method:
The function
D−1(
x) is the inverse function of the probability distribution function,
D(
x)=
P(
X≤
x), of the relaxed-clock model specified by Equations
2 and
3. This discretization of the underlying rate distribution is illustrated in
Figure 5 for a lognormal distribution with 12 rate categories (sufficient for a tree of seven tips). To integrate the branch rates out, the assignment of rate categories
c to branches was sampled via MCMC.
Drummond A.J., Ho S.Y., Phillips M.J, & Rambaut A. (2006). Relaxed Phylogenetics and Dating with Confidence. PLoS Biology, 4(5), e88.