For the construction of theTox21 based toxicological pathway prediction, an ensemble approach is used including RF and Support Vector Machine (SVM) classifiers. The radial basis function (RBF) is used as kernel function for the SVM algorithm. Immunotoxicity prediction model is based on Bernoulli–Naive Bayes algorithm, as explained in the published work (24 (link)).
Here, two different fingerprints are used: MACCS molecular fingerprints-166 bits and Morgan circular fingerprints-2048 bits (
Additionally, a selective oversampling of minority class is introduced in the construction of the models. For each of the prediction end-points, the active (positive) and inactive (negative) data are fragmented using RECAP (27 (link)) and ROTBONDS (28 (link)) fragmentation methods. The propensity score (PS) (12 ) for each of the uniquely occurring fragments in both the sets is computed. Only those molecules having the highest propensity scores for fragments conserved for the active class are oversampled and added into model construction. The same ratio of active and inactive compounds was maintained for all the folds of cross-validation, using the fragment-based similarities between the compounds.
The prediction models are based on python programming language. Machine learning packages like scikit-learn (