Canonical SMILES Representation of Molecules

A SMILES [25 ] represents a molecule as a sequence of characters corresponding to atoms as well as special characters denoting opening and closure of rings and branches. The SMILES is, in most cases, tokenized based on a single character, except for atom types which comprise two characters such as “Cl” and “Br” and special environments denoted by square brackets (e.g [nH]), where they are considered as one token. This method of tokenization resulted in 86 tokens present in the training data. Figure 3 exemplifies how a chemical structure is translated to both the SMILES and one-hot encoded representations.Fig. 3

Three representations of 4-(chloromethyl)-1H-imidazole. Depiction of a one-hot representation derived from the SMILES of a molecule. Here a reduced vocabulary is shown, while in practice a much larger vocabulary that covers all tokens present in the training data is used

There are many different ways to represent a single molecule using SMILES. Algorithms that always represent a certain molecule with the same SMILES are referred to as canonicalization algorithms [26 (link)]. However, different implementations of the algorithms can still produce different SMILES.

Free full text: Click here

Olivecrona M., Blaschke T., Engkvist O, & Chen H. (2017). Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics, 9, 48.

Publication 2017

Characters Imidazole

Corresponding Organization : AstraZeneca (Sweden)

Top 5 similar protocols

Protocol cited in 5 other protocols

Variable analysis

independent variables

Tokenization method
Canonical SMILES representation

dependent variables

Molecular representation

control variables

Atom types comprising two characters
Special environments denoted by square brackets

controls

Positive control: Canonical SMILES representation, which is a standardized way of representing a molecule
Negative control: Not explicitly mentioned

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!