We use Frowns (developed by Brian Kelley), a chemoinformatics toolkit (
) written in Python and C++ to parse/read SMILES (see explanations about the format at
) or SDF files (see format at Molecular Design Limited).
We have implemented an algorithm in Python that make use of Frowns features to compute properties known to be important for filtering databases and that utilizes Xtool (38 ) to compute log
P-values.
Because salts and counterions are often present in compound collections we recommend users to first apply the desalt utility that removes most salts and counterions prior to FAF-Drugs calculations.
Then, our program computes the following molecular properties:
(i) Molecular weight (part of Lipinski's RO5)
(ii) Hydrogen bond donors and acceptors (part of Lipinski's RO5)
Defined as the number of hydrogen bond acceptors (sum of N + O) and hydrogen bond donors (sum of OH + NH).
(iii) Number of rigid bonds
(iv) Number of rings
(v) Size of the rings
(vi) Number of rotatable bond
Defined as any single non-ring bond, bounded to non-terminal heavy atom (29 (
link)). The amide C-N bonds are not considered because of their high rotational energy barrier.
(vii) Number of carbon atoms, number of heteroatoms and ratio.
(viii) Number of atom with a net charge
(ix) Sum of formal charges
(x) The Topological Polar Surface Area (TPSA)
The method described in (30 (
link)) has been implemented. Briefly, the molecular polar surface area (PSA) (i.e. surface belonging to polar atoms) is a descriptor that was shown to correlate well with passive molecular transport through membranes. The calculation of PSA, however, is rather time-consuming because of the necessity to generate a reasonable 3D molecular geometry and the calculation of the surface itself. A new approach for the calculation of the PSA was developed by Erlt
et al. (30 (
link)) based on the summation of tabulated surface contributions of polar fragments. This approach was called topological polar surface area, it provides results that are practically identical with the 3D PSA while the computation speed is 2–3 orders of magnitude faster.
(xi) Computation of XlogP (P = calculated octanol/water partition coefficient) (part of Lipinski's RO5)
We use the XScore package (
) to compute XlogP as described in (38 ). This method gives log
P-values by summing the contributions of component atoms while making use of correction factors. About 90 atom types are used to classify carbon, nitrogen, oxygen, sulfur, phosphorus and halogen atoms, and 10 correction factors are used for some special substructures. The contributions of each atom type and correction factor were derived by multivariate regression analysis of about 1850 organic compounds with known experimental log
P-values.
In FAF-Drugs, the format for the input files has, for the time being, to be SDF, SMILES or CANSMILES while the compounds have to be in Mol2 format for XlogP computations. We use OpenBabel for file format conversion prior to XlogP calculations. Few compounds are found to have ambiguous atom types and in this case the log P is not computed. (Please see definitions about log P at:
)
(xii) Atom check
Molecules with some specific atoms can be filtered-out (for instance molecules containing H, C, N, O, F, S, P, Cl, Br, I atoms are kept when using default parameters).
Miteva M.A., Violas S., Montes M., Gomez D., Tuffery P, & Villoutreix B.O. (2006). FAF-Drugs: free ADME/tox filtering of compound collections. Nucleic Acids Research, 34(Web Server issue), W738-W744.