Deep Learning-based Protein-Drug Binding Prediction

A deep learning-based method, DFCNN (Dense fully Connected Neural Network), has been developed for predicting protein-drug binding probability [15 (link)] and used in this paper for the initial drug screening (Fig 1A). DFCNN utilizes the concatenated molecular vector of protein pocket and ligand as input representation, and the molecular vector are generated by Mol2vec [29 (link)] which is inspired by the word2vec model in natural language processing. DFCNN model was trained on a dataset extracted from PDBbind database [30 (link)]. Negative data samples in the dataset were generated by cross-combination of proteins and ligands from PDBbind database and positive data samples were taken from protein-ligand pairs in experimental structure. The details of the method were described in our previous paper [15 (link)], and DFCNN achieved an AUC value around 0.9 for the independent testing set [15 (link)]. The model is about ~100,000 times faster than Autodock Vina in predicting protein-ligand binding probability (range 0~1), because it does not rely on the protein-drug complex conformation.
We screen a large scale chemical compound dataset (about 10 million compounds) targeting 8 representative protein targets taken from the DUD.E diverse data set in order to examine the efficiency and effectiveness of the DFCNN method. For each target, the corresponding dataset contains some active compounds (between 40 and 536) in the DUD.E dataset and 10,402, 895 drug-like compounds from ZINC database. The effectiveness is measured by the prediction-random ratio (Ratio_0.9), defined as TPR_0.9/Random_0.9, where TPR_0.9 indicates the ratio (N_0.9/Active_num) between the number of active compounds with a DFCNN score larger than 0.9 (N_0.9) and the active number of compounds (Active_num)_. The total number of the compounds (Total_num) with score above 0.9 is defined as NN. The random selection rate (Random_0.9) is defined as NN/Total_num. Using cutoff score of 0.9, the prediction-random ratio measures the ratio of predicted TPR and random selection TPR.

Free full text: Click here

Zhang H., Yang Y., Li J., Wang M., Saravanan K.M., Wei J., Tze-Yang Ng J., Tofazzal Hossain M., Liu M., Zhang H., Ren X., Pan Y., Peng Y., Shi Y., Wan X., Liu Y, & Wei Y. (2020). A novel virtual screening procedure identifies Pralatrexate as inhibitor of SARS-CoV-2 RdRp and it reduces viral replication in vitro. PLoS Computational Biology, 16(12), e1008489.

Publication 2020

Binding protein Drug Ligand Protein Protein targets Vector Zinc compounds

Corresponding Organization : Shenzhen University

Top 5 similar protocols

Protocol cited in 4 other protocols

Variable analysis

independent variables

DFCNN model

dependent variables

Protein-drug binding probability
Prediction-random ratio (Ratio_0.9)
Number of active compounds with DFCNN score larger than 0.9 (N_0.9)
Total number of compounds with score above 0.9 (NN)
Random selection rate (Random_0.9)

control variables

Molecular vector of protein pocket and ligand generated by Mol2vec
Dataset extracted from PDBbind database
Negative data samples generated by cross-combination of proteins and ligands from PDBbind database
Positive data samples taken from protein-ligand pairs in experimental structure
8 representative protein targets taken from the DUD.E diverse data set
10,402,895 drug-like compounds from ZINC database
Active compounds (between 40 and 536) in the DUD.E dataset

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!