The data sets were obtained from the study by Thakur et al.[23] (link). 1,056 peptides were validated experimentally, containing 604 highly effective AVPs and 452 non-effective peptides; another 604 peptides without experimental validation were non-effective from the study by Lata et al.[32] . Each of the peptides in the data sets was different from one another.
Two training sample sets and two independent test sets were established based on the data described above. Here we followed the same nomenclature used in the study by Thakur et al.[23] (link). 10-fold cross-validation was performed in our analysis, where the training and validation sets came from either of the two sample sets T544P+407N and T544P+544N*. T544P+407N consisted of 544 highly effective AVPs and 407 non-effective experimental peptides; T544P+544N* contained the same 544 positive AVPs but different 544 non-experimental negative peptides. The independent test sets V60P+45N and V60P+60N* were used for the benchmark. V60P+45N consisted of 60 highly effective AVPs and 45 non-effective peptides; V60P+60N* contained 60 positive peptides and 60 non-experimental negative peptides.
Free full text: Click here