For MHC class I epitopes, it is generally observed that a length of about 8–11 residues is optimal for T cell recognition and use in assays. Because of the structure of the class I binding groove, distinct class I sequences typically represent unique epitopes, even if they are nested within a longer sequence that is also recognized by T cells. Accordingly, for the present study, we have not subjected class I epitopes of nested or overlapping character to further processing.
For MHC class II epitopes, however, optimal epitopes are usually longer than the minimal T cell recognized 9-mer core. In general, class II epitopes are optimally of 13–20 residues in length [1 (link)]. Peptides of varying length but that carry the same core may all be similarly active and/or recognized by the same T cell specificity. Thus, many of the epitope structures contained in the IEDB for class II epitopes are redundant, nested or largely overlapping. For this reason, it is desirable to devise strategies to reduce the complexity of class II epitope sets. Here, we developed a clustering algorithm to generate consensus sequences or cluster of epitopes, an illustration of such a process can be found inTable 1 . In order to solve this problem, our approach first sorts the peptides based on their RF scores. Then, taking the highest ranked peptide as starting sequence, we move down the ranked list aligning the sequences to find nested or overlapping epitopes by at least 9 residues. For this approach, we only consider identical matches over the region of overlap and identical nested peptides; given this definition, mismatches will be treated as separate epitopes. When a nested peptide is found, we will keep only the larger peptide and calculate a new RF score using the sum of all responded and tested subjects per epitope in the cluster. For overlapping epitopes, a consensus epitope or cluster will be generated combining the sequences, if the cluster length is up to 20 residues. In these cases, the RF score will be calculated as a new RF score as in the nested case. For the assay type scoring system, the highest ranked assay and application of all the assays associated with the set of nested epitopes will be considered.
For MHC class II epitopes, however, optimal epitopes are usually longer than the minimal T cell recognized 9-mer core. In general, class II epitopes are optimally of 13–20 residues in length [1 (link)]. Peptides of varying length but that carry the same core may all be similarly active and/or recognized by the same T cell specificity. Thus, many of the epitope structures contained in the IEDB for class II epitopes are redundant, nested or largely overlapping. For this reason, it is desirable to devise strategies to reduce the complexity of class II epitope sets. Here, we developed a clustering algorithm to generate consensus sequences or cluster of epitopes, an illustration of such a process can be found in
Full text: Click here