With the development of high-throughput sequencing technology, more and more nuclear genes of species are obtained for phylogenomic analyses. Phylogenetics inference through large-scale genes concatenated into a supermatrix has proven to be flawed, such as being prone to systematic errors (and artifacts) and leading to an inaccurate phylogenetic relationship (Philippe et al., 2017 (link)). To understand the evolutionary history of tea plant populations cultivated in Xinyang, we constructed their phylogeny from both coalescent and ML methods by using low-copy nuclear genes and SNPs, respectively.
For coalescent analyses with 1785 low-copy nuclear genes, 94 new assemblies of sampled C. sinensis and two genome data (CSA ‘Yunkang 10’ and CSS ‘Shuchazao’) (Xia et al., 2017 (link); Xia et al., 2020b (link)) were used for phylogeny construction. Amino acid sequences were aligned using MAFFT v7.487 (Katoh and Standley, 2013 (link)) with the “-auto” parameter. Poorly aligned regions were further trimmed using the trimAl v1.2 (Capella-Gutiérrez et al., 2009 (link)) with the “-automated1” parameter. Multiple amino acid sequence alignments were converted to nucleotide alignments by PAL2NAL (Suyama et al., 2006 (link)). Single-gene ML trees were reconstructed using IQ-TREE v2.1.4-beta (Nguyen et al., 2015 (link)) under the GTR+ G model with 1000 bootstrap replicates. The coalescent analysis was implemented by ASTRAL.5.7.8 (Zhang et al., 2018 (link)).
For ML analyses by concatenating SNPs, a total of 108 samples included the 94 newly sequenced transcriptomes, and the RNA-seq data of CSA ‘Yunkang 10,’ and CSS ‘Biyun,’ ‘Hangdan,’ ‘Tieguanyin,’ ‘Longjing43’ and ‘Shuchazao,’ (Xia et al., 2017 (link); Wang et al., 2020 (link); Xia et al., 2020b (link); Zhang et al., 2020c (link); Wang et al., 2021b (link)) and eight wild tea species (Supplementary Table S2). The SNP dataset was converted to a PHYLIP file using the Python script ‘vcf2phylip’ (https://github.com/edgardomortiz/vcf2phylip/, accessed June 2022). The ML phylogeny was also inferred using IQ-TREE (Nguyen et al., 2015 (link)). The optimum model was selected with the maximum Bayesian Information Criterion (BIC) scores estimated by ModelFinder (Kalyaanamoorthy et al., 2017 (link)) implemented in IQ-TREE. Principal component analysis (PCA) was performed using Plink v1.90b6.25 (Purcell et al., 2007 (link)).
Free full text: Click here