Force field-specific MATCH libraries were constructed via MATCH based on the CHARMM36 topology files: top_all22_prot, top_all27_na, top_all35_carb, top_all35_ethers, top_all36_cgenff and top_all36_lipid. For each force field the molecular fragments for each atom type were constructed through an iterative optimization procedure. Using a given force field the goal is to correctly assign types for all the atoms within the force field. The main concern in this process is to avoid mistyping by incorrectly making one type cover the space of another. To avoid this, atom types were grouped together by the atom element and bond number and were developed simultaneously. That is, each time there was a modification of a fragment, each atom that was of the group’s element and number of bonds was typed and if there were fewer mistypings this change was accepted. This was repeated until there were no mistypings. Most aliphatic atom types have rather distinct chemical space and, thus, required a few rounds of optimization. On the other hand, it was more difficult to create the optimal set of fragments for atom types that are exclusively based in rings and, thus, these atom types required multiple rounds of optimization. The Perl script TestBuildTypeStrings.t that is required for this optimization is provided in the MATCH package distribution for future optimizations and development of atom-type fragments for new force fields. Another challenge in this optimization scheme is keeping the atom-type fragments as general as possible while preserving their unique chemical environment.
For each force field that contained residue patches, each patch was applied if it increased the chemical space of the set (i.e., added new atom types or bond increment rules) or was necessary to correct polymer connectivity. By default, the NTER and CTER patches were applied to the protein force field residues and the 5TER and 3TER patches were applied to the nucleic acid force field residues. With the exception of CGENFF, all molecules in the topology files were included in the process of constructing the force field-specific MATCH libraries. In total, 53 of the 415 molecules in the CGENFF topology file were eventually excluded. There were 3 primary categories of molecules that were excluded: molecules containing a fused ring that would require all bond increments to be refined as a result of charge smearing; molecules containing a conjugated alkene chain which has alternating CG2DC1 and CG2DC2 atom type designations but the same chemical environment; and molecules that have a connectivity of two atom types A and B such that A – B – A – B – A, which would require simultaneous refinement of the A–B bond increment. The latter two categories of molecules have been incorporated into the most recent version of the CGENFF MATCH libraries, but were not used in this study.
Bond increments were extracted from each force field topology file in an automated fashion as discussed in the previous section, and can be reproduced in MATCH using GenerateBondIncrementRules.pl. Refinement bond increments were added to fix obvious exceptions to the BCIs, e.g., where the default BCIs could not reproduce the charge distributions in the molecules, and were usually small in number, with exception of CGENFF. In addition to the compounds that were excluded when constructing the CGENFF-specific MATCH libraries, several other compounds in the CGENFF topology file do not obey clear bond increment rules. With additional refinement rules, however, it was possible to reliably reproduce charges for these compounds.