1. Molecular structure files: Protein-ligand complex files for re-docking experiments were obtained from the PDBbind database. To validate predictive models with less bias, native ligands of the co-crystallized complexes were first extracted and converted into 2D using Open Babel [43] (link). For the following docking simulation, 2D structures were then re-converted to 3D using a 3D structure generator called CORINA version 3.4 [44] .
2. Molecular docking simulation packages: Native ligands were docked to their corresponding target proteins using eHiTS, GOLD, and AutoDock VINA (Table S7 ). These docking tools are used to generate numerous binding modes of the test compound in a defined binding site, and the number of binding modes generated varies with the docking tools. For a docking simulation, eHiTS was set to output 1000 conformations for each docking study. Considering the computing speed of GOLD, we set the maximum as 300. The maximum binding mode of AutoDock VINA varies with an energy range of 10 (kcal/mol).
3. Application of machine learning systems: Binding modes generated by the three docking tools were re-scored by machine learning system A, and only the three top-score candidates in each set were retained. Subsequently, machine learning system B assessed the three top-score candidates and identified the most predictive one. Modeling exercises of the machine learning systems A and B were conducted using the R statistical package. The Random Forest algorithm was applied to build machine learning system A, which was implemented in “randomForest” (Breiman and Cutler's random forests for classification and regression) module. For machine learning system B, the multinomial logistic regression of “nnet” (Feed-forward Neural Networks and Multinomial Log-Linear Models) and “MASS” (Modern Applied Statistics with S. Fourth Edition) modules was utilized.
4. Re-docking result: The Pearson correlation coefficient between the predicted docking scores and the experimental binding affinities was calculated using R to determine the predictiveness of the screening approach.
2. Molecular docking simulation packages: Native ligands were docked to their corresponding target proteins using eHiTS, GOLD, and AutoDock VINA (
3. Application of machine learning systems: Binding modes generated by the three docking tools were re-scored by machine learning system A, and only the three top-score candidates in each set were retained. Subsequently, machine learning system B assessed the three top-score candidates and identified the most predictive one. Modeling exercises of the machine learning systems A and B were conducted using the R statistical package. The Random Forest algorithm was applied to build machine learning system A, which was implemented in “randomForest” (Breiman and Cutler's random forests for classification and regression) module. For machine learning system B, the multinomial logistic regression of “nnet” (Feed-forward Neural Networks and Multinomial Log-Linear Models) and “MASS” (Modern Applied Statistics with S. Fourth Edition) modules was utilized.
4. Re-docking result: The Pearson correlation coefficient between the predicted docking scores and the experimental binding affinities was calculated using R to determine the predictiveness of the screening approach.
Full text: Click here