Comprehensive Chemical Taxonomy for Classifying Compounds
A taxonomy requires a well-defined, structured hierarchy. Following standard notation, we use the term “category” to refer to any chemical class (at any level), each of which corresponds to a set of chemicals. These categories are arranged in a tree structure (Additional file 1). The main relationship type connecting these different categories is the “is_a” relationship. The rationale behind the choice of a tree structure was to provide a detailed annotation represented via a simple data structure, which could be easily understandable by humans. Moreover, as described in the results section, ClassyFire provides a list of all parents of a compound, which makes it easy to infer all of its ancestors. Inspired by the original Linnaean biological taxonomy [4 (link)], we assigned the terms Kingdom, SuperClass, Class, and SubClass to denote the first, second, third and fourth levels of the chemical taxonomy, respectively. The top level (Kingdom) partitions chemicals into two disjoint categories: organic compounds versus inorganic compounds. Organic compounds are defined as chemical compounds whose structure contains one or more carbon atoms. Inorganic compounds are defined as compounds that are not organic, with the exception of a small number of “special” compounds, including, cyanide/isocyanide and their respective non-hydrocarbyl derivatives, carbon monoxide, carbon dioxide, carbon sulfide, and carbon disulfide. For the complete current list of exceptions, please see Additional file 1. The classification of compounds into these two kingdoms aligns with most modern views of chemistry and is easily performed on the basis of a compound’s molecular formula. The other levels in our classification schema depend on much more detailed definitions and rules that are described below. SuperClasses (which includes 26 organic and 5 inorganic categories) consist of generic categories of compounds with general structural identifiers (e.g. organic acids and derivatives, phenylpropanoids and polyketides, organometallic compounds, homogeneous metal compounds), each of which covers millions of known compounds. The next level below the SuperClass level is the Class level, which now includes 764 nodes. Classes typically consist of more specific chemical categories with more specific and recognizable structural features (pyrimidine nucleosides, flavanols, benzazepines, actinide salts). Chemical Classes usually contain >100,000 known compounds. The level below Classes represents SubClasses, which typically consist of >10,000 known compounds. There are 1729 SubClasses in the current taxonomy. Additionally, there are 2296 additional categories below the SubClass level covering taxonomic levels 5–11. Altogether this extensive chemical taxonomy contains a total of 4825 chemical categories of organic (4146) and inorganic (678) compounds, in addition to the root category (Chemical entities). As a whole, this chemical taxonomy can be represented as a tree with a maximum depth of 11 levels, and an average depth of five levels per node (Fig. 2). As with any structured taxonomy, the creation of a well-defined hierarchical structure offers the possibility to focus on a sub-domain of the chemical space, or a specific level of classification. A more complete description of this taxonomic hierarchy can be found in the Additional file 1: Table S1. The chemical taxonomy and its hierarchical structure provided using the Open Biological and Biomedical Ontologies (OBO) format [33 (link)], which may help with its integration with respect to semantic technology approaches. The resulting OBO file was generated with OBO-Edit [34 (link)], and can be downloaded from the ClassyFire website.
Djoumbou Feunang Y., Eisner R., Knox C., Chepelev L., Hastings J., Owen G., Fahy E., Steinbeck C., Subramanian S., Bolton E., Greiner R, & Wishart D.S. (2016). ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. Journal of Cheminformatics, 8, 61.
Publication 2016
A compoundAcids Actinide Benzazepines BiologicalCarbonCarbon dioxideCarbon disulfide Carbon monoxide Carbon sulfide Chemical organic Compound s CyanideDerivatives Generic Humans Inorganic compounds Isocyanide Metal Organometallic compounds Parents Polyketides Pyrimidine nucleosides RootSalts Tree
Corresponding Organization : The Metabolomics Innovation Centre
Other organizations :
Ottawa Hospital, University of Ottawa, European Bioinformatics Institute, Wellcome Trust, La Jolla Bioengineering Institute, National Center for Biotechnology Information, National Institutes of Health, Athabasca University, Alberta Innovates
No positive or negative controls are mentioned in the provided text.
Annotations
Based on most similar protocols
Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.
As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.
About PubCompare
Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.
We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.
However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.
Ready to
get started?
Sign up for free.
Registration takes 20 seconds.
Available from any computer
No download required