For the newly identified virus genome, the potential ORFs were predicted and annotated using the conserved signatures of the cleavage sites recognized by coronavirus proteinases, and were processed in the Lasergene software package (v.7.1, DNAstar). The viral genes were aligned using the L-INS-i algorithm implemented in MAFFT (v.7.407)37 (link).
Phylogenetic analyses were then performed using the nucleotide sequences of various CoV gene datasets: (1) whole genome, (2) ORF1a, (3) ORF1b, (4) nsp5 (3CLpro), (5) RdRp (nsp12), (6) nsp13 (Hel), (7) nsp14 (ExoN), (8) nsp15 (NendoU), (9) nsp16 (O-MT), (10) spike (S) and (11) nucleocapsid (N). Phylogenetic trees were inferred using the maximum likelihood method implemented in the PhyML program (v.3.0)38 (link), using the generalized time reversible substitution model and subtree pruning and regrafting branch swapping. Bootstrap support values were calculated from 1,000 pseudo-replicate trees. The best-fitting model of nucleotide substitution was determined using MEGA (v.5)39 (link). Amino acid identities among sequences were calculated using the MegAlign program implemented in the Lasergene software package (v.7.1, DNAstar).
Free full text: Click here