A set of known linear peptides that were tested for immune recognition and were found to be epitopes (positive assay results) or non-epitopes (negative assay results) were downloaded from the Immune Epitope Database (IEDB) (21 (link)). Peptides shorter than five or larger than 25 amino acids were removed, as B cell epitopes rarely are outside these boundaries (1 (link)). Only peptides confirmed as positives in two or more separate experiments were included in the positive dataset, and only peptides seen as negative in two or more separate experiments and never observed as positives in any experiment were included in the negative dataset. This resulted in 11 834 positives and 18 722 negative peptides. Each peptide was mapped back on its original protein sequence, and this was used to calculate the output prediction. This dataset is available for download on the BepiPred web page (http://www.cbs.dtu.dk/services/BepiPred/download.php).
The evaluation was only performed on the residues within the positive and negative peptides. In this case, an AUC was calculated only on the pooled positive and negative residues and not per antigen sequence.