We adapted the protocol by Wu et al. for phylogenomic reconstructions (Wu and Eisen, 2008 (link)). In a first step, the individual clusters of CSCG-encoded proteins were aligned using MUSCLE (Edgar, 2004 (link)), and HMMs were built for each cluster using hmmbuild from the HMMER package (Eddy, 2011 (link)). Then, the models were used as queries to search against other genomes and the resulting alignments were trimmed adapting scripts from AMPHORA (Wu and Eisen, 2008 (link)). In a next step the trimmed alignments were concatenated with one another into a master alignment, which was further refined using Gblocks (Talavera and Castresana, 2007 (link)) to remove the less conserved columns. Finally, the refined master alignment was used as the input for PhyML (Guindon et al., 2010 (link)) for phylogenetic reconstruction.
The CSCG tree of Epsilonproteobacteria (Figure3A ) included six additional draft or complete genomes that were published after our initial steps of data collection. These included Sulfurospirillum barnesii SES-3, Uncultured Sulfuricurvum sp. RIFRC-1, Arcobacter butzleri ED-1 (Toh et al., 2011 (link)), Arcobacter sp. L (Toh et al., 2011 (link)), Sulfurovum sp. AR (Park et al., 2012 (link)), as well as the single-cell genomes of Thiovulum sp. ES (Marshall et al., 2012 (link)). To accommodate the incompleteness of draft genomes, we selected a subset of the CSCG-encoded proteins that occurred once in every draft genomes, and used only these as markers for tree construction. As a result, 194 of the CSCG-encoded proteins were used in the above procedure to construct the local phylogeny for Epsilonproteobacteria.
The global bacterial phylogeny was constructed with 37 globally conserved single copy markers (Figure4 ). In addition to the 31 applied in the AMPHORA package (Wu and Eisen, 2008 (link)), we identified six additional phylogenetic markers using the HMM of core proteins: DNA gyrase subunit B (gyrB), Tryptophanyl-tRNA synthetase (TrpRS), SSU ribosomal protein S12p (S23e), LSU ribosomal protein L17p, SSU ribosomal protein S4p (S9e), and SSU ribosomal protein S15p (S13e). Among these new marker genes, GyrB (Kasai et al., 2000 (link); Holmes et al., 2004 (link); Peeters and Willems, 2011 (link)) and TrpRS (Rajendran et al., 2008 (link)) have been used in previous studies to determine the phylogeny of selected taxonomic groups, and the rest are ribosomal proteins.
The global bacterial tree in Figure4 was rooted using mid-point rooting. The 16S and CSCG trees in Figure 3 were rooted based on the relative positions of different epsilonproteobacterial species at the global bacterial tree and using all other bacteria as an outgroup. As indicated with a black arrow in Figure 4 , the root of Epsilonproteobacteria is located between Nautiliales and the other examined lineages.
The CSCG tree of Epsilonproteobacteria (Figure
The global bacterial phylogeny was constructed with 37 globally conserved single copy markers (Figure
The global bacterial tree in Figure