A maximum-likelihood phylogenetic tree was constructed using PhyloPhlAn [58 (link)] (v.1.0) with 1904 SGBs, and visualized using Evolview [59 (link)] (v.3) and iTol [60 (link)] (v.4.3.1). All genomes were annotated using GTDB-Tk [61 (link)] (v.0.1.6) based on the Genome Taxonomy Database. The CAZyme families of the 592 high-quality genomes were annotated using HMMER [47 (link)] (v.3.2.1) based on a hidden Markov model. The PUL of high-quality SGBs was predicted by following the PULpy [62 ] (v.1.0) pipeline. The KOs of the high-quality SGBs were annotated using DIAMOND [43 (link)] (v.0.9.22) based on BLASTP searches against the KEGG [45 (link)] (v.90.0) databases. Protein sequences encoded by the 592 high-quality SGBs were also screened against HydDB [63 (link)] databases to identify the catalytic subunits of the three classes of hydrogenases ([NiFe]-, [FeFe]-, and [Fe]-hydrogenases) using BLASTP with an e-value threshold of 1e - 50, coverage values exceeding 90%, and identity values exceeding 50% [2 (link)].
Free full text: Click here