The molecular energies of the various data sets are predicted using a deep tensor neural network. The core idea is to represent atoms in the molecule as vectors depending on their type and to subsequently refine the representation by embedding the atoms in their neighbourhood. This is done in a sequence of interaction passes, where the atom representations influence each other in a pair-wise fashion. While each of these refinements depends only on the pair-wise atomic distances, multiple passes enable the architecture to also take angular information into account. Because of this decomposition of atomic interactions, an efficient representation of embedded atoms is learned following quantum-chemical principles.
In the following, we describe the deep tensor neural network step-by-step, including hyper-parameters used in our experiments.
1. Assign initial atomic descriptors
We assign an initial coefficient vector to each atom i of the molecule according to its nuclear charge Zi:

where B is the number of basis functions. All presented models use atomic descriptors with 30 coefficients. We initialize each coefficient randomly following .
2. Gaussian feature expansion of the inter-atomic distances
The inter-atomic distances Dij are spread across many dimensions by a uniform grid of Gaussians

with Δμ being the gap between two Gaussians of width σ.
In our experiments, we set both to 0.2 Å. The centre of the first Gaussian μmin was set to −1, while μmax was chosen depending on the range of distances in the data (10 Å for GDB-7 and benzene, 15 Å for toluene, malonaldehyde and salicylic acid and 20 Å for GDB-9).
3. Perform T interaction passes
Each coefficient vector , corresponding to atom i after t passes, is corrected by the interactions with the other atoms of the molecule:

Here, we model the interaction v as follows:

where the circle () represents the element-wise matrix product. The factor representation in the presented models employs 60 neurons.
4. Predict energy contributions
Finally, we predict the energy contributions Ei from each atom i. Employing two fully-connected layers, for each atom a scaled energy contribution is predicted:


In our experiments, the hidden layer oi possesses 15 neurons. To obtain the final contributions, is shifted to the mean Eμ and scaled by the s.d. Eσ of the energy per atom estimated on the training set.

This procedure ensures a good starting point for the training.
5. Obtain the molecular energy E=∑iEiThe bias parameters as well as are initially set to zero. All other weight matrices are initialized drawing from a uniform distribution according to (ref. 51 ). Neural network code is available.
The deep tensor neural networks have been trained for 3,000 epochs minimizing the squared error, using stochastic gradient descent with 0.9 momentum and a constant learning rate52 . The final results are taken from the models with the best validation error in early stopping.
All DTNN models were trained and executed on an NVIDIA Tesla K40 GPU. The computational cost of the employed models depends on the number of reference calculations, the number of interaction passes as well as the number of atoms per molecule. The training times for all models and data sets are shown in Supplementary Table 2, ranging from 6 h for 5.768 reference calculations of GDB-7 with one interaction pass, to 162 h for 100,000 reference calculations of the GDB-9 data set with three interaction passes.
On the other hand, the prediction is instantaneous: all models predict examples from the employed data sets in <1 ms. Supplementary Fig. 7 shows the scaling of the prediction time with the number of atoms and interaction layers. Even for a molecule with 100 atoms, a DTNN with three interaction layers requires <5 ms for a prediction.
The prediction as well as the training steps scale linearly with the number of interaction passes and quadratically with the number of atoms, since the pairwise atomic distances are required for the interactions. For large molecules it is reasonable to introduce a distance cutoff. In that case, the DTNN will also scale linearly with the number of atoms.
Free full text: Click here