To predict the toxicity of the input compound, a 2D similarity search is performed on an updated version of the in-house toxicity database SuperToxic (17 (link)) and the most similar compounds to the input molecule are considered. The set used for prediction consists of approximately 38 000 unique compounds with known oral LD50 values measured in rodents. The data was gathered from public sources and literature and prepared using Instant JChem 6.2.0 (January 2014), ChemAxon (http://www.chemaxon.com), for standardization purposes.
From the standardized molecule structures, InChI keys were calculated and used to remove duplicates in the dataset. In the case of multiple LD50 values measured for one compound, the lowest dose value was kept to represent the worst-case toxicity of a compound. Six toxicity classes were defined based on the GHS classification scheme using the LD50 thresholds of 5, 50, 300, 2000 and 5000 mg/kg body weight. Each compound of the dataset was represented using a concatenated fingerprint consisting of the ‘FP2’ and ‘FP4’ fingerprints of Mychem (http://mychem.sourceforge.net/) as well as the ECFP4 fingerprint (18 (link)). The fingerprints were calculated using Open Babel (19 (link)) and JChem 6.1.3 (November 2013), ChemAxon (http://www.chemaxon.com), respectively. The similarity between two compounds was calculated using the Tanimoto Index.
In addition to the similarity search, the prediction method takes into account the presence of toxic fragments. All compounds in the database were fragmented using RECAP (20 (link)) as well as the in-house method ROTBONDS (21 (link)). The occurrence of each distinct fragment in molecules of the prediction dataset was tested using its SMILES string, computed with JChem 6.1.3 (November 2013) in a substructure search which was implemented using Open Babel's (19 (link)) fast search. To determine fragments over-represented in the most toxic classes, a propensity analysis (22 (link)) was performed. Propensity scores (PS) were calculated for every fragment and toxicity class. Toxic fragments were defined as those showing a PS above a threshold of 3 in classes I, II or III, and a PS below 1 in classes IV–VI. Based on these conditions, a total number of 1591 and 1580 fragments specific to toxicity classes I–III, generated with the ROTBONDS and RECAP fragmentation method, respectively, were contemplated for prediction.
Free full text: Click here