Benchmark Dataset for Antioxidant Peptide Prediction
The benchmark dataset (Supplementary Data S1) used in this work was established by extracting data on antioxidant peptides of length 2–30 amino acids both derived from different protein sources (e.g., fish40 (link) and dairy41 (link)) and synthetic42 (link), obtained from various published articles and from the BIOPEP-UWM43 (link) database. Each peptide was binary labelled for the two classes, free radical scavenger (FRS) and chelator. The classes were labelled 1 (positive) if their source had measured/indicated an activity and otherwise 0 (negative). This extraction resulted in; 696 antioxidant peptides (685 FRS and 81 chelating, 70 of which have both activities) and 218 non-antioxidant experimentally-validated peptides, as seen in Table 1. Furthermore, to diminish homology bias while training, sequences were removed from both the positive and negative peptides so that no pair had more than 90% identity44 (link). All sequence identities in this paper were calculated using the Needleman–Wunsch algorithm45 (link) with the parameters; 1 for identical, 0 for dissimilar, − 10 for opening and extending gaps and 0 for end gaps.
Overview over the benchmark dataset.
FRS
CheL
FRS/CheL
Non-AO
Random
Total
AOdb
615
11
70
218
500
1414
aodb < 90%
606
11
70
217
500
1404
FRS, CHEL, FRS/CHEL and NON-AO are all experimentally-validated peptides obtained from various papers. RANDOM consists of peptides derived from the UniProt46 (link) database, with lengths between 2–30 amino acids. AODB < 90% is the number of peptides after removal of sequences, so no pair has more than 90% identity. FRS free radical scavenger, CHEL chelator, FRS/CHEL both FRS and chelator, NON-AO non-antioxidant.
Additionally, 500 random peptides with lengths between 2–30 amino acids, with the same length distribution as the positive dataset were extracted from random proteins derived from the UniProt46 (link) database. It was ensured that none of these peptides were identical to any peptide in the positive dataset. This amounted to a final, balanced benchmark dataset of 1404 peptides, consisting of 687 FRS and chelators, 717 peptides termed non-antioxidant and a positive to negative ratio of 0.94 and 0.11 for FRS and chelators respectively. To improve generalization and achieve a robust accuracy of our model’s predictions on unobserved cases, a fivefold nested cross-validation approach was used29 (link). The fivefolds were created so that all folds contained similar number of positives and negatives, and FRS and chelators. Furthermore, a upper threshold for peptide identity was enforced, for any two peptides between different folds. Four partitions were made with a threshold of 60, 70, 80 or 90% identity between folds respectively.
Olsen T.H., Yesiltas B., Marin F.I., Pertseva M., García-Moreno P.J., Gregersen S., Overgaard M.T., Jacobsen C., Lund O., Hansen E.B, & Marcatili P. (2020). AnOxPePred: using deep learning for the prediction of antioxidative properties of peptides. Scientific Reports, 10, 21471.
Source of antioxidant peptides (fish, dairy, synthetic)
Threshold for peptide identity between folds (60%, 70%, 80%, 90%)
dependent variables
Antioxidant activity (free radical scavenging and chelating)
Classification of peptides (free radical scavenger, chelator, both, non-antioxidant)
control variables
Sequence identity threshold (no pair with more than 90% identity)
Length distribution of random peptides (same as positive dataset)
Ensuring random peptides are not identical to any peptide in the positive dataset
positive controls
Experimentally-validated antioxidant peptides (free radical scavengers, chelators, and those with both activities)
negative controls
Experimentally-validated non-antioxidant peptides
Annotations
Based on most similar protocols
Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.
As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.
About PubCompare
Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.
We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.
However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.
Ready to
get started?
Sign up for free.
Registration takes 20 seconds.
Available from any computer
No download required