Across all sets (feature selection, training, extra-familial validation), six families of respiratory viruses were included in this study: Coronaviridae, Paramyxoviridae, Pneumoviridae, Adenoviridae, Orthomyxoviridae, and Herpesviridae. Each of the viruses within these families has a protein responsible for viral attachment and host cell entry, which will be referred to herein as the “spike” protein (see Fig 1A). For Coronaviruses, it is the Spike S Glycoprotein which is aptly named because it projects from the surface of the virion (Fig 1B) as do the other “spike” proteins. Note that for Influenza Virus A within the Orthomyxoviridae family, we selected Hemagglutinin as the equivalent of the “spike” over Neuraminidase as the latter primarily prevents virion aggregation and as such serves more as a helper protein to the role of the former in determining cell entry [25 (link)].
A total of 50 viral sequences (ranging from 4 to 12 for each virus family) encoding 360 proteins were utilized (see Table 1 for a list of sequences). Specifically, in the feature selection set we included 7 Coronaviridae sequences representing 7 viruses; in the training set, we included 7 different Coronaviridae sequences representing 7 viruses, 4 Paramyxoviridae sequences representing 4 viruses, 12 Pneumoviridae sequences representing 2 viruses, 8 Adenoviridae sequences representing 1 virus, and 8 Orthomyxoviridae sequences representing 1 virus. Finally, for the extra-familial validation set, we included 4 Herpesviridae sequences representing 4 viruses. See Table 2 for the number of “spike” vs. non-spike proteins for each virus family.
Free full text: Click here