To validate our approach we simulated sequences using tree topologies, branch lengths, and alignment sizes based on 1099 gene families from 36 cyanobacterial genomes available in the HOGENOM database (Penel et al. 2009 (link)). As described in detail in Appendix 1 and illustrated in Figure 2a, to generate the set of simulated alignments we first reconstructed reconciled gene trees that maximize the joint likelihood and subsequently used the reconstructed gene trees to simulate amino acid sequences. To emulate the relative complexity of real data compared with available models of sequence evolution, we used a complex model of sequence evolution to simulate sequences—an LG model (Le and Gascuel 2008 (link)) with across-site rate variation and invariant sites, and attempted to reconstruct their history with a simple model—a Poisson model (Felsenstein 1981 (link)) with no rate variation.
Data.—To construct a simulated dataset, we first reconstructed gene trees for 1099 cyanobacterial gene families with 10 or more genes in any of the 36 cyanobacteria present in version 5 of the HOGENOM database (Penel et al. 2009 (link)). Families with more than 150 genes were not considered. For each family, amino acid sequences were extracted from the database and aligned using MUSCLE (v3.8.31) (Edgar 2004 (link)) with default parameters. The multiple alignment was subsequently cleaned using GBLOCKS (v0.91b) (Talavera and Castresana 2007 (link)) with the options:

Cleaned alignments are available from the Dryad data repository at http://datadryad.org, doi:10.5061/dryad.pv6df.
Reconstructing “real” trees.—For each cleaned alignment, an MCMC sample was obtained using PhyloBayes (v3.2e) (Lartillot et al. 2009 (link)) using an LG+Γ4+I substitution model (Le and Gascuel 2008 (link)) with a burn-in of 1000 samples followed by at least 3000 samples. Following this step, gene families were separated into two datasets: (i) dataset I, composed of 342 universal single-copy families with exactly one copy in each of the 36 cyanobacteria and, (ii) dataset II, which includes dataset I, and is composed of 1099 families, each with at least 10 genes in any of the 36 cyanobacterial genomes considered. For the 342 single-copy universal gene families of dataset I 10 000 trees were sampled.
For each family, we used the species tree shown in Figure A.4, sampled reconciled gene trees using ALEsample (sampling at least 5000 reconciled trees) to sample DTL rates and reconciled gene trees, and ALEml to find the ML DTL rates and the corresponding ML reconciled gene tree.
For each ALEsample sample, we computed the majority consensus tree and fully resolved “real” trees for each gene family were calculated based on the ALEsample sample of trees by finding the tree that maximized CCPs based on the sample. For both real and simulated alignments, sequence-only trees were also inferred using PhyML (version 20110526) (Guindon and Gascuel 2003 (link)) using the LG+Γ4+I model with the options:

“Real” gene trees are available from the Dryad data repository at http://datadryad.org, doi:10.5061/dryad.pv6df.
Sequence simulation.—To simulate amino acid sequences, we used bppseqgen (v1.1.0) (Dutheil and Boussau 2008 (link)) keeping the branch lengths and alignment sizes and using the COMPLEX model corresponding to an LG model with site rate variation described by a gamma distribution with α = 0.1 and 10% invariant sites.
Simulated alignments are available from the Dryad data repository at http://datadryad.org, doi:10.5061/dryad.pv6df.
Inference for simulated data.—For each simulated alignment, an MCMC sample was obtained using PhyloBayes (v3.2e) using a SIMPLE model corresponding to a Poisson model (Felsenstein 1981 (link)) with no rate variation.
We sampled 10 000 trees after a burn-in of 1000 samples with a sample taken every 10 iterations. For the simulated sequence corresponding to the 342 single-copy universal gene families of dataset I, we also sampled trees using the COMPLEX model corresponding to an LG+Γ4+I substitution model, sampling 3000 trees after a burn-in of 1000 samples.
For each family, we sampled reconciled gene trees using ALEsample (sampling at least 5000 reconciled trees) to sample DTL rates and reconciled gene trees, and ALEml to find the ML DTL rates and the corresponding ML reconciled gene tree.
Distances to the “real” tree for gene trees of dataset I (Fig. 2b) were computed as the distance between majority consensus trees calculated from the sequence-only PhyloBayes samples for both the SIMPLE and the COMPLEX model as well as the joint ALEsample samples for both. The same procedure was used for the simulated sequence corresponding to dataset II (Fig. A.1a) for the SIMPLE model. For the COMPLEX model, joint trees were not computed and PhyML trees were used for the sequence-only trees.
Inference of numbers of DTL events.—The number of DTL events for joint trees was inferred using ALEml using a sample of trees obtained using the SIMPLE model. The number of DTL events for sequence trees was inferred using ALEml using fixed PhyML trees (based on LG+Γ4+I substitution model).
ML reconciled trees are available from the Dryad data repository at http://datadryad.org, doi:10.5061/dryad.pv6df.
Statistical support.—Statistical support of bipartitions was calculated from samples of gene trees obtained either using PhyloBayes, for the sequence-only case, or using ALEsample in the joint case. The support of each observed bipartition was estimated as the fraction of all trees in which it was present.