Sequence similarity networks (SSNs)18 (link) constructed using Cytoscape 4.144 (link) were used to visualize the distribution and diversity of the retrieved hydrogenase sequences. In this analysis, each node represents one of the 3248 hydrogenase sequences in the reference database (Dataset S1). Each edge represents the sequence similarity between them as determined by E-values from all-vs-all BLAST analysis, with all self and duplicate edges removed. Three networks were constructed, namely for the [NiFe]-hydrogenase large subunit sequences (Dataset S2), [FeFe]-hydrogenase catalytic domain sequences (Dataset S3), and [Fe]-hydrogenase sequences (Dataset S4). To control the degree of separation between nodes, logE cutoffs that were incrementally decreased from −5 to −200 until no major changes in clustering was observed. The logE cutoffs used for the final classifications are shown in Fig. 1 and Figure S1.
Free full text: Click here