Sequence analysis included 80 nucleotide sequences of
PfGARP from Thai isolates, one clinical isolate from Guinea (isolate MDCU32) and 18 publicly available complete gene sequences whose isolate names, country of origins and their GenBank accession numbers are as follows: 3D7 (Netherlands from West Africa, AL844501), CD01 (Congo, LR129686), Dd2 (Indochina, LR131290), FC27 (Papua New Guinea, J03998), FCC1/HN (Hainan in China, AF251290), GA01 (Gambia, LR131386), GB4 (Ghana, LR131402), KH1 (Cambodia, LR131418), KH2 (Cambodia, LR131306), HB3 (Honduras, LR131338), IGH-CR14 (India, GG6656811), IT (Brazil, LR131322), KE01 (Kenya, LR131354), ML01 (Mali, LR131481), SD01 (Sudan, LR131466), SN01 (Senegal, LR131434), TG01 (Togo, LR131450), and UGT5.1 (Vietnam, KE124372). Of these, the 3D7, FC27and FCC1/HN sequences were determined by Sanger dideoxy-chain termination method whereas the remaining isolates were assembled sequences from next-generation sequencing platforms (Supplemental Table
S1). Sequence alignment was performed by using the CLUSTAL_X program, taken into account appropriate codon match in the coding region by manual adjustment to maintain the reading frame. The sequence from the FC27 strain was used as a reference
6 (link). Searching for nucleotide repeats was performed by using the Tandem Repeats Finder version 4.0 program with the default option. Nucleotide diversity (π), the rate of synonymous substitutions per synonymous site (
dS) and the rate of nonsynonymous substitutions per nonsynonymous site (
dN) were determined from the average values of sequence differences in all pairwise comparison of each taxon and the standard error was computed from 1000 bootstrap pseudoreplicates implemented in the MEGA 6.0 program
41 (link). Haplotype diversity and its sampling variance were computed by taking into account the presence of gaps in the aligned sequences using the DnaSP version 5.10 program
42 (link). Natural selection on codon substitution was determined by using fast unconstrained Bayesian approximation (FUBAR) method in the Datamonkey Web-Server
43 (link),44 (link). Neighbor-joining phylogenetic tree based on nucleotide sequences was constructed by using maximum composite likelihood parameter whereas maximum likelihood tree was built using Tamura-Nei model with the rate variation model allowed for some sites to be evolutionarily invariable. The Arlequin 3.5.2.2 software was deployed to determine genetic differentiation between populations, the fixation index (
FST), using analysis of molecular variance approach (AMOVA) akin to the Weir and Cockerham’s method but taken into account the number of mutations between haplotypes
45 (link). One hundred permutations were deployed to determine the significance levels of the fixation indices. Prediction of linear B cell epitopes in
PfGARP was performed by using a sequence similarity to known experimentally verified epitopes from the Immune Epitope DataBase (IEDB) implemented in the BepiBlast Web Server
11 (link). Furthermore, linear B cell epitopes were also predicted based on protein language models implemented in BepiPred-3.0
12 (link). Potential HLA-class II-binding peptides were analyzed by using the IEDB recommended 2.22 algorithm with a default 12–18 amino acid residues option. The predicted HLA-class II-binding peptides were predicted based on the percentile rank < 10 and the IC
50 threshold for HLA binding affinity ≤ 1000 nM
14 (link). The analysis mainly concerned the common HLA class II haplotypes among Thai populations with allele frequency > 0.1
13 (link).
Rojrung R., Kuamsab N., Putaporntip C, & Jongwutiwes S. (2023). Analysis of sequence diversity in Plasmodium falciparum glutamic acid-rich protein (PfGARP), an asexual blood stage vaccine candidate. Scientific Reports, 13, 3951.