While recognizing chemical mentions is valuable, many tasks ultimately require the mention to be identified or normalized. We have thus paired our named entity recognition system with a straightforward lexical approach for normalization. Our lexicon of chemical entities and their names was collected from MeSH [32 (link)] and ChEBI [33 (link)]. The system converts both mentions from the literature and entity names in the lexicon to lowercase and removes all whitespace and punctuation. For example, "flavone-C-glycoside" becomes "flavonecglycoside." The system then assigns a MeSH identifier to those mentions which can be found in the lexicon, or a ChEBI identifier if a matching MeSH identifier cannot be found. Mentions that correspond to a short form recognized by Ab3P are assigned the same identifier as the long form found by Ab3P [29 (link)]. Mentions which do not map to a specific identifier are ignored and mentions which can be assigned to both a MeSH and ChEBI identifier are only assigned the MeSH identifier.
Free full text: Click here