Phylogenomic Analysis of Virophage Diversity

Among the 848 non-redundant virophage genomes, several categories of sequences were considered as likely representing complete and near-complete genomes: (i) reference genomes from isolates (n = 4), (ii) sequences identified as integrated with upstream and downstream host regions ≥ 2 kb (n = 7), (iii) sequences with direct or inverted terminal repeats (n = 118 and n = 8, respectively), (iv) sequences predicted to be ≥90% complete based on CheckV (AAI-based prediction, n = 59), and (v) linear contigs ≥ 25 kb (n = 61). This latter category was based on the median length of predicted complete and near-complete genomes from all other categories (25,168 bp). Overall, 257 sequences were considered complete or near-complete virophage genomes.
These complete and near-complete genomes were used as input for phylogenetic trees and genome-wide clustering to establish groups and potential taxa within the virophages. For phylogenetic trees, the sequences of the four morphogenesis genes detected in the 257 complete and near-complete genomes using the new HMM profiles (see above) were used after excluding all sequences that covered <60% of the HMM profile to remove partial gene predictions. Multiple alignments were then built for each gene using an iterative clustering-alignment-phylogeny procedure specifically adapted for aligning highly diverging sequences [54 (link)]. The alignments were then automatically trimmed using clipkit v1.3.0 [55 (link)] using the kpi-smart-gap mode to remove uninformative positions, and the trimmed alignments were used as input for tree building with IQ-Tree v2.2.0.3 [56 (link)] with automatic detection of the most appropriate substitution matrix, and 1000 replicates of ultra-fast bootstraps. The best-fit model was Q.pfam+F+R7 for PRO, Q.yeast+F+R8 for ATPase, and Q.pfam+F+R8 for both MCP and penton. For the larger MCP phylogeny, including both complete and partial virophage genomes (Figure S6), multiple alignments were computed with MAFFT v7.490 based on the curated multiple alignment including MCP from complete and near-complete genomes only (options “–add” and “–keeplength”) [51 (link)], and the phylogeny was built with tree IQ-Tree v2.2.0.3 [56 (link)] with similar parameters as described above.
Genome-wide amino acid identity (AAI) clustering was performed as in [57 (link)]. Briefly, predicted protein sequences from the 257 complete and near-complete virophages were compared all-vs-all using diamond v0.9.24.125 [58 (link)] and the following options: “--evalue 1e-5 --max-target-seqs 10,000–query-cover 50–subject-cover 50”. The resulting file was used as input for the script “amino_acid_identity.py” to calculate the average AAI for all pairs of genomes. The script “filter_aai.py” was then used to select only pairs of genomes with a minimum normalized cumulative bit score of 0.05. Finally, these selected pairwise AAI values were used as input for an MCL clustering using MCL 14-137 (inflation parameter = 1.1) [50 ].

Free full text: Click here

Roux S., Fischer M.G., Hackl T., Katz L.A., Schulz F, & Yutin N. (2023). Updated Virophage Taxonomy and Distinction from Polinton-like Viruses. Biomolecules, 13(2), 204.

Publication 2023

Amino acid Atpase Diamond Gene Genomes Inverted terminal repeats Morphogenesis Pro q Protein sequences Seqs Tree Virophages Yeast

Corresponding Organization : Lawrence Berkeley National Laboratory

Other organizations : Max Planck Institute for Medical Research, University of Groningen, Smith College, National Center for Biotechnology Information, National Institutes of Health

Top 5 similar protocols

Variable analysis

independent variables

None explicitly mentioned

dependent variables

Complete or near-complete virophage genomes
Phylogenetic trees of morphogenesis genes
Genome-wide amino acid identity (AAI) clustering

control variables

None explicitly mentioned

controls

No positive or negative controls were specified by the authors.

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!