A deep learning-based method, DFCNN (Dense fully Connected Neural Network), has been developed for predicting protein-drug binding probability [15 (link)] and used in this paper for the initial drug screening (Fig 1A). DFCNN utilizes the concatenated molecular vector of protein pocket and ligand as input representation, and the molecular vector are generated by Mol2vec [29 (link)] which is inspired by the word2vec model in natural language processing. DFCNN model was trained on a dataset extracted from PDBbind database [30 (link)]. Negative data samples in the dataset were generated by cross-combination of proteins and ligands from PDBbind database and positive data samples were taken from protein-ligand pairs in experimental structure. The details of the method were described in our previous paper [15 (link)], and DFCNN achieved an AUC value around 0.9 for the independent testing set [15 (link)]. The model is about ~100,000 times faster than Autodock Vina in predicting protein-ligand binding probability (range 0~1), because it does not rely on the protein-drug complex conformation.
We screen a large scale chemical compound dataset (about 10 million compounds) targeting 8 representative protein targets taken from the DUD.E diverse data set in order to examine the efficiency and effectiveness of the DFCNN method. For each target, the corresponding dataset contains some active compounds (between 40 and 536) in the DUD.E dataset and 10,402, 895 drug-like compounds from ZINC database. The effectiveness is measured by the prediction-random ratio (Ratio0.9), defined as TPR0.9/Random0.9, where TPR0.9 indicates the ratio (N0.9/Active_num) between the number of active compounds with a DFCNN score larger than 0.9 (N0.9) and the active number of compounds (Active_num). The total number of the compounds (Total_num) with score above 0.9 is defined as NN. The random selection rate (Random0.9) is defined as NN/Total_num. Using cutoff score of 0.9, the prediction-random ratio measures the ratio of predicted TPR and random selection TPR.
Free full text: Click here