Genome-specific metabolic potential was determined by (1) searching all predicted ORFs in a genome with Pfam35 (link), TIGRfam34 (link), Panther69 (link) and custom HMM profiles (Supplementary Data 8 and 12 ) of marker genes for specific pathways using hmmscan36 (link), and (2) assessment of complete pathways for metabolic transformations using ggKbase. For generation of custom HMM profiles, references for each marker gene were aligned using MUSCLE with default parameters followed by manually trimming the start and ends of the alignment. The alignment was converted into Stockholm format and databases were built using hmmscan36 (link). For Rubisco and hydrogenases70 (link), different hmm databases were constructed for each distinct group. For HMM searches against TIGRfam, all hits above the preset noise cutoff were considered for manual inspection. Individual cutoffs for all HMMs were determined by manual inspection and are listed in Supplementary Data 14 .
In ggKbase, lists for specific metabolic pathways were generated by searching for specific keywords in gene annotations. Coupling the genome abundance to metabolic traits allowed the simultaneous assessment of all 2,540 genomes assembled in this study. All custom HMM profiles used in this study are publicly available fromhttps://github.com/banfieldlab .
In ggKbase, lists for specific metabolic pathways were generated by searching for specific keywords in gene annotations. Coupling the genome abundance to metabolic traits allowed the simultaneous assessment of all 2,540 genomes assembled in this study. All custom HMM profiles used in this study are publicly available from
Full text: Click here