A genome tree incorporating 5656 trusted reference genomes (see Supplemental Methods) was inferred from a set of 43 genes with largely congruent phylogenetic histories. An initial set of 66 universal marker genes was established by taking the intersection between bacterial and archaeal genes determined to be single copy in >90% of genomes. From this initial gene set, 18 multicopy genes with divergent phylogenetic histories in >1% of the reference genomes were removed. A multicopy gene within a genome was only deemed to have a congruent phylogenetic history if all copies of the gene were situated within a single conspecific clade (i.e., all copies were contained in a clade from a single named species) within its gene tree. Genes were aligned with HMMER v3.1b1 (http://hmmer.janelia.org), and gene trees inferred with FastTree v2.1.3 (Price et al. 2009 (link)) under the WAG (Whelan and Goldman 2001 (link)) and GAMMA (Yang 1994 (link)) models. Trees were then modified with DendroPy v3.12.0 (Sukumaran and Holder 2010 (link)) in order to root the trees between archaea and bacteria unless these groups were not monophyletic, in which case midpoint rooting was used. A further five genes found to be incongruent with the IMG taxonomy were also removed as these genes may be subject to lateral transfer. Testing of taxonomic congruency was performed as described in Soo et al. (2014) (link). The final set of 43 phylogenetically informative marker genes (Supplemental Table S6) consists primarily of ribosomal proteins and RNA polymerase domains and is similar to the universal marker set used by PhyloSift (Supplemental Table S7; Darling et al. 2014 (link)). A reference genome tree was inferred from the concatenated alignment of 6988 columns with FastTree v2.1.3 under the WAG+GAMMA model and rooted between bacteria and archaea. Internal nodes were assigned taxonomic labels using tax2tree (McDonald et al. 2012 (link)).