Interactive Tutorials for Computer-Aided Drug Design

TeachOpenCADD currently consists of ten talktorials covering central topics in CADD, see Fig. 1. Talktorials are offered as interactive Jupyter notebooks that can be used as tutorials but also for oral presentations, e.g. in student CADD seminars (talk + tutorial = talktorial). They start with a topic motivation and learning goals, continue with the main part composed of theoretical background and practical code, and end with a short discussion and quiz, see Fig. 2.
Open data resources employed are the ChEMBL [14 (link)] and PDB [15 (link)] databases for compound and protein structure data acquisition, respectively. Open source libraries utilized are RDKit [16 ] (cheminformatics), the ChEMBL webresource client [17 (link)] and PyPDB [18 (link)] (ChEMBL and PDB application programming interface access), BioPandas [19 (link)] (loading and manipulating molecular structures), and PyMOL [20 ] (structural data visualization). Additionally, basic Python computing libraries employed include numpy [21 , 22 (link)] and pandas [23 , 24 ] (high-performance data structures and analysis), scikit-learn [25 ] (machine learning), as well as matplotlib [26 (link)] and seaborn [27 ] (plotting). Furthermore, the user is instructed how to work with conda [28 ], a widely used package, dependency and environment management tool. A conda yml file is provided to ensure an easy and quick setup of an environment containing all required packages.
The talktorial topics include how to acquire data from ChEMBL (T1), filter compounds for drug-likeness (T2), and identify unwanted substructures (T3). Furthermore, measures for compound similarity are introduced and applied for VS of kinase inhibitor gefitinib (T4) as well as for compound clustering (T5), including the use of maximum common substructures (T6). Machine learning approaches are employed to build models for predicting active compounds (T7). Lastly, protein-ligand complexes are fetched from the PDB (T8), used to generate ligand-based ensemble pharmacophores (T9). Geometry-based binding site comparison of kinase inhibitor imatinib binding proteins is performed to analyse potential off-targets (T10). In summary, the presented talktorials build a pipeline with starting points being (i) a query protein to study associated compound data (T1 and T8) and (ii) a query ligand to investigate associated on- and off-targets (T10), see Fig. 1. These talktorials can be studied independently from each other or as a pipeline.
As an example, the talktorial pipeline is used to identify novel EGFR kinase inhibitors. EGFR kinase is a transmembrane protein, which activates several signaling cascades to convert extracellular signals into cellular responses. Dysfunctional signaling of EGFR is associated with diseases such as cancer, making it a frequent target in drug development projects (the reader is referred to a review by Chen et al. [29 (link)] for more information on EGFR). Furthermore, the pipeline can easily be adapted to other examples by simply exchanging the query protein (T1 and T8: protein UniProt ID) and query ligand (T10: ligand names in the PDB).

Free full text: Click here

Sydow D., Morger A., Driller M, & Volkamer A. (2019). TeachOpenCADD: a teaching platform for computer-aided drug design using open source packages and data. Journal of Cheminformatics, 11, 29.

Publication 2019

Binding site Cancer Compounds drug Egfr Gefitinib Imatinib Inhibitors Kinase Kinase proteins inhibitor Ligand Molecular structures Motivation Protein Python Student Talk Target drug Transmembrane protein

Corresponding Organization :

Other organizations : Charité - Universitätsmedizin Berlin

Top 5 similar protocols

Protocol cited in 4 other protocols

Variable analysis

independent variables

Query protein to study associated compound data (T1 and T8)
Query ligand to investigate associated on- and off-targets (T10)

dependent variables

Compounds filtered for drug-likeness (T2)
Unwanted substructures identified (T3)
Compound similarity measures applied for virtual screening (T4) and compound clustering (T5)
Maximum common substructures used (T6)
Machine learning models built to predict active compounds (T7)
Ligand-based ensemble pharmacophores generated (T9)
Geometry-based binding site comparison of kinase inhibitor imatinib binding proteins to analyse potential off-targets (T10)

control variables

Open data resources employed: ChEMBL database for compound data, PDB database for protein structure data
Open source libraries utilized: RDKit, ChEMBL webresource client, PyPDB, BioPandas, PyMOL, numpy, pandas, scikit-learn, matplotlib, seaborn
Conda used as package, dependency and environment management tool

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!