The Bioavailability Radar in the first section of the One-panel-per-molecule output complements the two-dimensional image from the JChem webserver and the canonical SMILES calculated by OpenBabel. We use the JpGraph PHP library (version 3.5.0b1, 2016, http://jpgraph.net ) to produce the radar plot, which bears six axes for six important properties for oral bioavailability. Each property is defined by a descriptor of SwissADME and a range of optimal values is depicted as a pink area. The latter is inspired from commonly accepted bioavailability and drug-likeness guidelines23 (link)24 (link). For saturation, the ratio of sp3 (link) hybridized carbons over the total carbon count of the molecule (Fraction Csp3) should be at least 0.25. For size, the molecular weight (MW calculated by OpenBabel) should be between 150 and 500 g/mol. For polarity, the TPSA25 (link) should be between 20 and 130Å2 (link). For solubility, log S (calculated with the ESOL model36 ) should not exceed 6. For lipophilicity, XLOGP329 (link) should be in the range from −0.7 to +6.0. For flexibility, the molecule should not have more than 9 rotatable bonds. To be estimated as drug-like, the red line of the compound under study must be fully included in the pink area. Any deviation represents a suboptimal physicochemical property for oral bioavailability.
All descriptors and molecular parameters of the Physicochemical Properties section are computed through the OpenBabel API (version 2.3.0, 2012,http://openbabel.org )9 (link). Noteworthy, the topological polar surface area (TPSA) is strictly based on the fragmental system provided by Ertl et al.25 (link) including polar sulfur and phosphorus atoms.
Multiple freely available computational methods to predict n-octanol/water partition coefficient (log Po/w) values are made available in the Lipophilicity section. iLOGP (for implicit log P) is an in-house physics-based methods relying on Gibbs free energy of solvation calculated by GB/SA in water and n-octanol. Generalized-born (GB) parameters are computed through the GBMV2 method68 (link) and solvent-accessible surface area (SA) is the analytical approximation generated by CHARMM (version c36b1, 2011,https://www.charmm.org )69 (link). The iLOGP implemented in SwissADME corresponds to Model9 of the seminal publication16 (link), which was trained on 11,993 molecules (r = 0.72, MAE = 0.89, and RMSE = 1.14 against experimental log P). 5-fold crossvalidation ensured robustness (q2CV = 0.52, MAECV = 0.89, and RMSECV = 1.14) and external test benchmarks showed the excellent predictive power and extended applicability domain compared to well-established methods. XLOGP3 values are obtained through the command-line Linux program (version 3.2.2, courtesy of CCBG, Shanghai Institute of Organic Chemistry) including the knowledge-based corrections29 (link). WLOGP is our own implementation of the atomistic method developed by Wildman and Crippen30 . MLOGP values are computed through an in-house implementation of Moriguchi’s topological method31 32 . SILICOS-IT is the log Po/w estimation returned by executing the FILTER-IT program (version 1.0.2, 2013, http://silicos-it.be.s3-website-eu-west-1.amazonaws.com/software/filter-it/1.0.2/filter-it.html ). Finally, SwissADME gives a consensus log Po/w value, which is the arithmetic mean of the five predictive values mentioned above.
Similarly to lipophilicity, the Water Solubility section includes multiple predictive methods for the user to choose between the most accurate model for a given chemical series and an averaged consensus value. The ESOL model36 is a QSPR model establishing the linear relationship between log S and five molecular parameters, i.e. MW, the number of rotatable bonds, the fraction of aromatic heavy atoms and Daylight’s CLOGP. Because the lipophilicity descriptor is not freely available, the implementation of ESOL in SwissADME replaces CLOGP by XLOGP3 as parameter in the linear equation to predict log S. XLOGP3 is known to perform well on external datasets and to return similar predictions as CLOGP28 (link). The other three parameters were computed with OpenBabel. Likewise, Ali et al.37 (link) linked log S with log Po/w and TPSA. The model implemented in SwissADME corresponds to the model 3 of the original publication, with XLOGP3 as lipophilicity descriptor. The third solubility method available in SwissADME is the log S estimated by the FILTER-IT program (version 1.0.2, 2013,http://silicos-it.be.s3-website-eu-west-1.amazonaws.com/software/filter-it/1.0.2/filter-it.html ). This prediction is based on a system of 16 fragmental contributions modulated by the squared root of MW. All three models are predicting log S values, which are also translated within SwissADME into solubility in mol/l and mg/ml. Finally a qualitative estimation of the solubility class is given according to the following log S scale: insoluble <−10 The Pharmacokinetics section proposes one linear method for skin permeation, which relies on the simple QSPR model by Potts and Guy39 linking the decimal logarithm of the skin permeability coefficient (log Kp in cm/s) with MW and log Po/w. The model implemented in SwissADME uses XLOGP3 as lipophilicity descriptor. Besides, most of the models in this section are machine-learning binary classifiers for important ADME behaviours. Passive gastro-intestinal (HIA) absorption and blood-brain barrier (BBB) permeation are predicted with the BOILED-Egg model, which defines favourable and unfavourable zones in the log Po/wversus PSA physicochemical space for passive diffusion through both physiological barriers17 (link). The classification showed 10-fold cross-validation accuracy of 92% and 88% for BBB and HIA, respectively (refer to Graphical Output).
Six other classification models are part of the Pharmacokinetics section to predict the propensity of the molecule under investigation to be substrate or inhibitor of important pharmacokinetics-related proteins, for which large diverse and balanced datasets were retrieved and meticulously cleansed. For P-glycoprotein1 (P-gp), the training set consists of 521 substrates and 512 non-substrates extracted from the Metrabase database70 (link) (http://www-metrabase.ch.cam.ac.uk , accessed January 2016), whereas the test set was obtained from ref. 71 (link). To ensure truly external validation, molecules overlapping with the training set were removed from the test set, which finally includes 215 substrates and 200 non-substrates. For CYP major isoforms, all datasets were those of Veith et al.50 (link) and downloaded from the PubChem database72 (link) (http://pubchem.ncbi.nlm.nih.gov , accessed February 2016). In case of unbalanced dataset (all except CYP1A2 and CYP2C19), sufficient chemical diversity was guaranteed by clustering with the Ward method and a reciprocal nearest neighbour (RNN) algorithm73 , the more populated class to lessen. The number of molecules (described by circular fingerprints) of the large class is reduced by defining clusters with the JKlustor program (version 14.9.29, 2014, http://www.chemaxon.com ). Only the centre of each cluster (i.e. the molecule that has the smallest sum of dissimilarities to the other molecules in the cluster) is included in the training or test set to balance. As a result, the training sets involved respectively 4301 CYP1A2 inhibitors and 4844 CYP1A2 non-inhibitors; 4284 CYP2C19 inhibitors and 4988 CYP2C19 non-inhibitors; 2940 CYP2C9 inhibitors and 3000 CYP2C9 non-inhibitors; 1814 CYP2D6 inhibitors and 1850 CYP2D6 non-inhibitors; and 3758 CYP3A4 inhibitors and 3760 CYP3A4 non-inhibitors. The test sets involved respectively 1412 CYP1A2 inhibitors and 1588 CYP1A2 non-inhibitors; 1386 CYP2C19 inhibitors and 1614 CYP2C19 non-inhibitors; 1020 CYP2C9 inhibitors and 1055 CYP2C9 non-inhibitors; 528 CYP2D6 inhibitors and 540 CYP2D6 non-inhibitors; and 1289 CYP3A4 inhibitors and 1290 CYP3A4 non-inhibitors.
SwissADME’s backend calculations were ran to generate 50 molecular and physicochemical descriptors per molecule (described in theSupplementary Table S1 ). For a given model, a descriptor was rejected if non-zero values for all molecules in the training set are less than 20% or if the coefficient of variation is less than 3%. In case of correlation higher than 0.9 between remaining descriptors, a selection is made based on F-score. The selected descriptors for each model are shown in Supplementary Tables S2–S7 . These tables also include the minimum and maximum values for each descriptor among all molecules used in the training. This enables beholding the broadness of physicochemical space involved and the applicability domain of the SVM models. The predictive capability of each model can be further appraised on Supplementary Table S8 , where external accuracy was split in sensitivity and specificity to ensure that positive and negative molecules are predicted with the same level of robustness. The final training and test sets with selected descriptors were normalized and the respective model ready to be built. First, the libSVM support vector machine python library (version 3.20, 2015, https://www.csie.ntu.edu.tw/~cjlin/libsvm/ )74 was used for multi-step grid-based optimization of the best coefficients for the above-selected descriptors as well as for the soft-margin permissivity (C) and the hyper-parameter (ϒ) of the RBF Gaussian kernel function. The 10-fold crossvalidated accuracy (ACCCV) for each model was so maximized and AUCCV was calculated. In a second step, the so-built models were used on the external test sets (normalized according to the training set) in order to evaluate predictive power in terms of external accuracy (ACCext) and AUCext. All final SVM models were stored in separate files, which are read through the libSVM API upon SwissADME job submission.
All descriptors and molecular parameters of the Physicochemical Properties section are computed through the OpenBabel API (version 2.3.0, 2012,
Multiple freely available computational methods to predict n-octanol/water partition coefficient (log Po/w) values are made available in the Lipophilicity section. iLOGP (for implicit log P) is an in-house physics-based methods relying on Gibbs free energy of solvation calculated by GB/SA in water and n-octanol. Generalized-born (GB) parameters are computed through the GBMV2 method68 (link) and solvent-accessible surface area (SA) is the analytical approximation generated by CHARMM (version c36b1, 2011,
Similarly to lipophilicity, the Water Solubility section includes multiple predictive methods for the user to choose between the most accurate model for a given chemical series and an averaged consensus value. The ESOL model36 is a QSPR model establishing the linear relationship between log S and five molecular parameters, i.e. MW, the number of rotatable bonds, the fraction of aromatic heavy atoms and Daylight’s CLOGP. Because the lipophilicity descriptor is not freely available, the implementation of ESOL in SwissADME replaces CLOGP by XLOGP3 as parameter in the linear equation to predict log S. XLOGP3 is known to perform well on external datasets and to return similar predictions as CLOGP28 (link). The other three parameters were computed with OpenBabel. Likewise, Ali et al.37 (link) linked log S with log Po/w and TPSA. The model implemented in SwissADME corresponds to the model 3 of the original publication, with XLOGP3 as lipophilicity descriptor. The third solubility method available in SwissADME is the log S estimated by the FILTER-IT program (version 1.0.2, 2013,
Six other classification models are part of the Pharmacokinetics section to predict the propensity of the molecule under investigation to be substrate or inhibitor of important pharmacokinetics-related proteins, for which large diverse and balanced datasets were retrieved and meticulously cleansed. For P-glycoprotein1 (P-gp), the training set consists of 521 substrates and 512 non-substrates extracted from the Metrabase database70 (link) (
SwissADME’s backend calculations were ran to generate 50 molecular and physicochemical descriptors per molecule (described in the
Full text: Click here