All the newly added prediction models on the ProTox-II platform are based on machine learning algorithms. A Random Forest (RF) algorithm (26 ) is used to construct the classification and prediction models for hepatotoxicity, cytotoxicity, mutagenicity, and carcinogenicity. The RF-based models are constructed using 500 decision trees and GINI index criterion. The advantage of using RF-based classifier is that it tends to avoid overfitting.
For the construction of theTox21 based toxicological pathway prediction, an ensemble approach is used including RF and Support Vector Machine (SVM) classifiers. The radial basis function (RBF) is used as kernel function for the SVM algorithm. Immunotoxicity prediction model is based on Bernoulli–Naive Bayes algorithm, as explained in the published work (24 (link)).
Here, two different fingerprints are used: MACCS molecular fingerprints-166 bits and Morgan circular fingerprints-2048 bits (http:/www.rdkit.org/). These two fingerprints have shown an optimal performance for prediction of chemical activity (11 ,24 (link)).
Additionally, a selective oversampling of minority class is introduced in the construction of the models. For each of the prediction end-points, the active (positive) and inactive (negative) data are fragmented using RECAP (27 (link)) and ROTBONDS (28 (link)) fragmentation methods. The propensity score (PS) (12 ) for each of the uniquely occurring fragments in both the sets is computed. Only those molecules having the highest propensity scores for fragments conserved for the active class are oversampled and added into model construction. The same ratio of active and inactive compounds was maintained for all the folds of cross-validation, using the fragment-based similarities between the compounds.
The prediction models are based on python programming language. Machine learning packages like scikit-learn (http:/scikit-learn.org) and cheminformatics package RDKit (http:/www.rdkit.org/) are used for the model implementation. All data are standardized using KNIME (29 ). A template script (Sample API script http://tox.charite.de/protox_II/simple_api.py) has been provided under the description ‘using the API’ on the FAQ section of the ProTox-II webserver.