Class I additive force fields (see equation 1), which do not explicitly treat electronic polarization, have been designed for use in polar environments typically found in proteins and in solution. To achieve this, the use of experimental target data, supplemented by QM data, was strongly emphasized during optimization of the nonbonded parameters in the biomolecular CHARMM force fields, in order to ensure physical behavior in the bulk phase. However, reproducing experimental data requires molecular dynamics (MD) simulations, which have to be set up carefully and repeated multiple times in the course of the parametrization, making the usage of experimental target data non-trivial and time-consuming. In addition, for many functional groups that may occur in drug-like molecules experimental data may not be available. Due to this lack of data, and since one of the main goals of CGenFF is easy and fast extensibility, a slightly different philosophy was adapted, with more emphasis on QM results as target data for parameter optimization. This is possible due to the wide range of functionalities already available whose parameters were optimized based largely on experimental data, along with the establishment of empirical scaling factors that can be applied to QM data in order to make them relevant for the bulk phase.
The only cases where experimental data would be required are situations where novel atom types are present for which LJ parameters are not already available in CGenFF. These cases would require optimization of the LJ parameters, supplemented with Hartree-Fock (HF) model compound-water minimum interaction energies and distances (see step 2.a under “Generation of target data for parameter validation and optimization” and step 1 under “Parametrization procedure”), based on the reproduction of bulk phase properties, typically pure solvent molecular volumes and heats of vaporization or crystal lattice parameters and heats of sublimation. Descriptions of the optimization protocol have been published previously.7 ,9 ,25 (link) However, it should be noted that CGenFF has been designed to cover the majority of atom types in pharmaceutical compounds, such that optimization of LJ parameters is typically not required.
The remainder of this section includes 1) the procedure to add new model compounds and chemical groups to the force field, 2) the procedure for generating the QM target data, and 3) the procedure for application of the QM information to parametrize new molecules. To put these procedures in better context, example systems including pyrollidine, the addition of substituents to pyrollidine and the development of a linker between pyrollidine and benzene are presented.