To speed up bootstrapping analyses, very closely related taxa were removed from the original mega-alignment, which left us with 310 taxa. Maximum likelihood trees were made from 100 bootstrapped replicates of this reduced dataset using PHYML with the same parameters described above.
With very few exceptions, the marker genes are single-copy genes in all of the bacterial genomes analyzed. In those rare cases in which two or more homologs were identified within a single species, a tree-guided approach was used to resolve the redundancy. If the redundancy resulted from a species-specific duplication event, then one homolog was randomly chosen as the representative. In all other cases, to avoid potential complications such as lateral gene transfer, we excluded that marker and treated it as 'missing' in that particular genome. It has been shown that as long as there is sufficient data, a few 'holes' in the dataset will not compromise the resulting tree [36 (link)].