At first, Klebsiella phages were collected from the GenBank database (retrieved at 15.08.2018). A number of 59 phages were finally analyzed (Supplementary Table S1 ). From these phages proteins annotated as tail fibers or tail spikes were analyzed with BlastP1 (Altschul et al., 1990 (link)), Phyre22 (Kelley et al., 2015 (link)), SWISS-MODEL3 (Bordoli et al., 2009 (link); Bordoli and Schwede, 2012 (link)), HMMER4 (Finn et al., 2011 (link)) and HHPred5 (Zimmermann et al., 2018 (link)) to identify phages that encode RBPs with putative depolymerase activity (Supplementary Table S2 ). If neither a tail fiber nor a tail spike gene was found in the genome, we analyzed all genes located in the vicinity of annotated structural genes. BlastP (protein–protein Blast) was performed against the non-redundant protein sequences (nr) database using standard parameters (expect threshold: 10, word size: 6, MATRIX: BLOSUM62, Gap cost: existence 11, extension 1, conditional compositional score matrix adjustment). HMMER was used in the quick search mode against: Reference Proteomes, UniProtKB, SwissProt, and Pfam with significance E-values: 0.01 (sequence) and 0.03 (hit). For Phyre2 the normal modeling mode was used. HHPred homology detection structure prediction was run using the PDB_mmCIF70 database and the following parameters [MSA generation method: HHblits uniclust30_2018_08; Maximal no. of MSA generation steps: 3; E-value incl. threshold for MSA generation: 1e-3; minimal sequence identity of MSA hits with query (%): 0; minimal coverage of MSA hits (%) 20; Secondary structure scoring: during alignment; Alignment Mode: Realign with MAC: local:norealign; MAC realignment threshold: 0.3; No. of target sequences: 250; Min. probability in hit list (>10%): 20].
Criteria for the prediction of putative depolymerase activity were (Supplementary Table S2 ): (1) the protein must be longer than 200 residues; (2) the protein must be annotated as tail fiber/tail spike/hypothetical protein in the NCBI database; (3) the protein must show homology to domains annotated as lyase [hyaluronate lyases (hyaluronidases), pectin/pectate lyases, alginate lyases, K5 lyases] or hydrolase (sialidases, rhamnosidases, levanases, dextranases, and xylanases) with a confidence of at least 40% in Phyre2 or the enzymatic domain should also be recognized by at least SWISS-MODEL, HMMER, or BlastP; (4) the length of homology with one of these enzymatic domains should span at least 100 residues; (5) a typical β-helical structure should be predicted by Phyre2. These RBP depolymerases are indicated without additional labeling in the tables. Proteins possessing experimentally confirmed depolymerizing activity were marked in the tables with (a). When the RBP was only partially fulfilling the above-mentioned criteria, it was indicated with label (b). These putative depolymerases that could only be predicted with a lower probability were fulfilling criteria 1 and 2, but the confidence of the Phyre2 prediction was below 40% or only SWISS-MODEL, HMMER or BLASTP gave a positive prediction. In addition, the homologous domain only spans between 50 and100 amino acids and no β-helical structure could be predicted with Phyre2 (for details see Supplementary Table S2 ).
All selected Klebsiella phages were then grouped based on gene homology and a conserved gene synteny into KP32viruses, KP34viruses, and KP36viruses and into groups containing only Klebsiella-specific phages similar to phage JD001 (belonging to Jedunavirus), similar to phage Menlow (belonging to Ackermannviridae), similar to phage ΦK64-1 (belonging to Alcyoneusvirus). Within each group, further subdivisions were proposed for the purpose of this study, based on the organization of the RBP gene cluster (number of RBPs, length of different genes, presence of anchor, or branching domains).
When there was one RBP, a domain in the N-terminus of a RBP was annotated as ‘anchor’ when there was at least an identity of 39% (BLASTP) over at least 166 residues starting from the N-terminus of the corresponding protein among phages belonging to the same group. These parameters were set empirically based on the shortest identity region found among all RBPs, specifically in the first RBP of phage IL33, belonging to KP32viruses group B (166 amino acids) and the identity% of the first RBP of phage Kp1. When more than one RBP was present, the anchor domain was annotated in the RBP in which also a T4gp10-like domain was detected. In the other RBP(s) the N-terminal conserved sequence was called ‘conserved peptide,’ which was also generally shorter than the anchor domains. To define consensus sequences of the anchor domains and conserved peptides, multiple sequence or pairwise alignment were used, since these structures are highly conserved among phages from the same group. To identify domains involved in the branching of RBPs, the sequences were analyzed by HHPred performing protein structure prediction5 (Zimmermann et al., 2018 (link)) in search for domains homologous to T4gp10 domain 2 and 3 as experimentally confirmed attachment sites (Prokhorov et al., 2017 (link)). WebLogos of the anchor domains and conserved peptides were created with the online available tool6 (Crooks et al., 2004 (link)).
Criteria for the prediction of putative depolymerase activity were (
All selected Klebsiella phages were then grouped based on gene homology and a conserved gene synteny into KP32viruses, KP34viruses, and KP36viruses and into groups containing only Klebsiella-specific phages similar to phage JD001 (belonging to Jedunavirus), similar to phage Menlow (belonging to Ackermannviridae), similar to phage ΦK64-1 (belonging to Alcyoneusvirus). Within each group, further subdivisions were proposed for the purpose of this study, based on the organization of the RBP gene cluster (number of RBPs, length of different genes, presence of anchor, or branching domains).
When there was one RBP, a domain in the N-terminus of a RBP was annotated as ‘anchor’ when there was at least an identity of 39% (BLASTP) over at least 166 residues starting from the N-terminus of the corresponding protein among phages belonging to the same group. These parameters were set empirically based on the shortest identity region found among all RBPs, specifically in the first RBP of phage IL33, belonging to KP32viruses group B (166 amino acids) and the identity% of the first RBP of phage Kp1. When more than one RBP was present, the anchor domain was annotated in the RBP in which also a T4gp10-like domain was detected. In the other RBP(s) the N-terminal conserved sequence was called ‘conserved peptide,’ which was also generally shorter than the anchor domains. To define consensus sequences of the anchor domains and conserved peptides, multiple sequence or pairwise alignment were used, since these structures are highly conserved among phages from the same group. To identify domains involved in the branching of RBPs, the sequences were analyzed by HHPred performing protein structure prediction5 (Zimmermann et al., 2018 (link)) in search for domains homologous to T4gp10 domain 2 and 3 as experimentally confirmed attachment sites (Prokhorov et al., 2017 (link)). WebLogos of the anchor domains and conserved peptides were created with the online available tool
Full text: Click here