Generating DRD2-Active Molecules via ML

In one of our studies the objective of the Agent is to generate molecules that are predicted to be active against a biological target. The dopamine type 2 receptor DRD2 was chosen as the target, and corresponding bioactivity data was extracted from ExCAPE-DB [33 (link)]. In this dataset there are 7218 actives (pIC50 > 5) and 343204 inactives (pIC50 < 5). A subset of 100,000 inactive compounds was randomly selected. In order to decrease the nearest neighbour similarity between the training and testing structures [34 (link)–36 (link)], the actives were grouped in clusters based on their molecular similarity. The Jaccard [37 ] index, for binary vectors also known as the Tanimoto similarity, based on the RDKit implementation of binary Extended Connectivity Molecular Fingerprints with a diameter of 6 (ECFP6 [38 (link)]) was used as a similarity measure and the actives were clustered using the Butina clustering algorithm [39 (link)] in RDKit with a clustering cutoff of 0.4. In this algorithm, centroid molecules will be selected, and everything with a similarity higher than 0.4 to these centroids will be assigned to the same cluster. The centroids are chosen such as to maximize the number of molecules that are assigned to any cluster. The clusters were sorted by size and iteratively assigned to the test, validation, and training sets (assigned 4 clusters each iteration) to give a distribution of

\frac{1}{6}

\frac{1}{6}

, and

\frac{4}{6}

of the clusters respectively. The inactive compounds, of which less than 0.5% were found to belong to any of the clusters formed by the actives, were split randomly into the three sets using the same ratios.
A support vector machine (SVM) classifier with a Gaussian kernel was built in Scikit-learn [40 ] on the training set as a predictive model for DRD2 activity. The optimal C and Gamma values utilized in the final model were obtained from a grid search for the highest ROC-AUC performance on the validation set.

Free full text: Click here

Olivecrona M., Blaschke T., Engkvist O, & Chen H. (2017). Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics, 9, 48.

Publication 2017

A compounds Biological Dopamine receptor Drd2 Gamma Vectors

Corresponding Organization : AstraZeneca (Sweden)

Top 5 similar protocols

Protocol cited in 6 other protocols

Variable analysis

independent variables

Molecular similarity between actives, used for clustering and assigning to training/validation/test sets

dependent variables

Predicted DRD2 activity of generated molecules

control variables

Jaccard (Tanimoto) similarity index based on ECFP6 fingerprints, used for clustering actives
Butina clustering algorithm with 0.4 cutoff, used to form active molecule clusters
Ratios of 1/6, 1/6, and 4/6 for assigning active clusters to test, validation, and training sets respectively
Random split of inactive compounds into test, validation, and training sets using the same ratios as the active clusters
Support Vector Machine (SVM) classifier with Gaussian kernel, built using Scikit-learn
Grid search to optimize C and Gamma hyperparameters for highest ROC-AUC performance on the validation set

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!