The H9N2 viruses that we isolated in the live-poultry market were distributed in three independent branches; therefore, we performed a spatiotemporal analysis of clades A, B, and C, respectively. In order to reduce the potential sampling biases, we randomly subsampled the database in a stratified manner to create a more equitable spatio-temporal distribution of the HA genome sequences of three branches of viruses. To be precise, sequences in each branch were clustered using the CD-HIT program (Huang et al., 2010 (link)), and identical sequences within the same time and region were removed. The discrete sampling locations of the clade A, B, and C viruses in this study include Guangdong, Yunnan, Jiangxi, Shandong, Shanghai, Fujian, Jiangsu, Hunan, Henan, Hebei, Hubei, Xinjiang, Ningxia, Chongqing, Guizhou, Guangxi, Sichuan, Shanxi, Beijing, Tianjin, Heilongjiang, and Anhui in China, there are also viruses from Vietnam in clade C. Detailed information regarding the subsampled HA gene sequences of the clade A, B, and C H9N2 subtype viruses used in this study is provide in the Supplementary Table 1.
Time-measured phylogenies were inferred using the Bayesian discrete phylogeographic approach implemented in the BEAST package (v1.10.4). We first performed a regression of root-to-tip genetic distances on the ML tree against exact sampling dates using the TempEst v1.5.3 (Rambaut et al., 2016 (link)), which showed a strong temporal signal. Then, we used an uncorrelated lognormal (UCLN) relaxed molecular clock model. In addition, a Bayesian stochastic search variable selection (BSSVS) model with asymmetric substitution was used. For each independent dataset, multiple runs of the MCMC method were combined using LogCombiner (v1.10.4), utilizing 5,000,000,000 total steps for each set, with sampling every 500,000 steps. Subsequently, we used SpreaD3 v0.9.7.1 to develop interactive visualizations of the dispersal process through time and to compute a Bayes factors (BFs) test to assess the support for significant individual transitions between distinct geographic locations (Bielejec et al., 2016 (link)). The BF values >100 indicated robust statistical support, 30 < BF values ≤100 indicated very strong statistical support, 10 < BF values ≤30 indicated strong statistical support, 3 < BF values ≤10 indicated substantial statistical support and BF values <3 indicated poor statistical support (Lemey et al., 2009 (link)). We used QGIS Version 3.28 to create plots showing the results of the BF tests.5
Free full text: Click here