The immunogenicity model is build based on the enrichment of amino acids in immunogenic versus non-immunogenic peptides and the importance scores of different positions of the MHC-I presented peptide (Table 2). For each MHC-I molecule, the impact on binding affinity was determined per position of the presented peptides (as explained in [40] (link)). The six positions with least impact on the binding affinity were defined as non-anchor positions, these six positions can differ for different MHC-I molecules that use different anchor positions. Only non-anchor positions were used to study differences in immunogenicity, as anchor positions might reflect a difference in binding affinity rather than a difference in immunogenicity. Per amino acid, the enrichment is calculated as the ratio between the fraction of that amino acid in the immunogenic versus non-immunogenic data sets. For instance, Tyr occurs with a frequency of 2.5% in immunogenic and 1.5% in non-immunogenic peptides, the enrichment in immunogenic peptides is 1.7-fold, and the natural logarithm of this enrichment is 0.54. We call this enrichment the log enrichment score. To predict the immunogenicity of a new pMHC, per non-anchor residue of the presented peptide the log enrichment score was found and weighted according to the importance of that position (measured as the Kullback-Leibler divergence; see Table 2). The weighted log enrichment scores of all (non-anchor) residues were summed, the resulting score was termed the immunogenicity score. The larger the immunogenicity score, the more the pMHC is like the immunogenic peptides and therefore expected to be immunogenic. The log enrichment scores of amino acids at anchor residues are masked, i.e. not used to derive the immunogenicity score. These assumptions resulted in the following formula to calculate the immunogenicity score, S, of a peptide ligand, L, presented on an HLA molecule, H: Where for every position p in the ligand L, the log enrichment score E for the amino acid at that position A(L,p) weighted by the importance of that position is summed. The eventual masking of anchor positions on that HLA is obtained by setting M(H,p) to 0.
The immunogenicity score model was tested in a 3-fold cross-validation experiment, where a random two-thirds of the data was used to calculate the log enrichment scores. These log enrichment scores, together with the position importance weights (Table 2) were then used to construct the immunogenicity score model as described above, and the other one-third of the data was used to test its performance. 25 Cross-validations were performed. Our final immunogenicity score model, that is used throughout this paper, is based on all non-redundant HLA class I presented peptides found in HLA-transgenic mice. As the selected non-redundant set of peptides varies slightly (explained above), the final model was constructed by repeating the non-redundancy selection and model building 100 times, and taking average log enrichment scores per amino acid from these 100 models. The final log enrichment scores, position importance weights and explanations on constructing the immunogenicity score model are given in Supplemental Table S1.
Free full text: Click here