All molecular mechanics calculations were performed with the CHARMM program.20 ,25 The all-atom additive CHARMM force field uses the energy potential U(r) given in Equation 1 . In Equation 1 , Kb, Kθ, KUB, Kχ and Kimp are bond, valence angle, Urey-Bradley, dihedral angle, and improper dihedral angle force constants, respectively, while b, θ, S, χ and φ are the bond distance, valence angle, Urey-Bradley 1,3-distance, dihedral angle and improper dihedral angle, respectively, where the subscript zero represents the equilibrium value. In the dihedral potential energy term, n is the multiplicity and σ is the phase angle as in a Fourier series. The nonbonded interaction energy between atoms i and j is separated into two terms, the Lennard-Jones (LJ) 6-12 term and the Coulomb term. For the nonbonded terms, εij is the LJ well depth, Rmin,ij is the distance at the LJ energy minimum, qi and qj are the partial atomic charges, and rij is the distance between atoms i and j. For the LJ parameters, the Lorentz-Berthelot combining rules are applied.26
Hydrogen bonding water-solute pair interaction energies and distances were calculated using the standard additive CHARMM force field protocol, so as to maintain compatibility with existing CHARMM biomolecular force fields.19 ,21 (link) Solute geometries were obtained from optimization at the MP2/6-31G(d) level of the conformation in the respective crystals obtained from the Cambridge Structural Database.23 Using the optimized geometry of the monomer, water-monomer pairs were constructed, with the water internal geometry identical to that of the TIP3P water model.27 Examples of these pair interactions are shown inFigure 2 where the water molecule is interacting with the terminal hydroxyl of allitol. In pairs 1, 2, 3 and 4 the hydroxyl oxygen is the hydrogen bond donor and in 5 and 6 the hydroxyl hydrogen is the hydrogen bond acceptor. For pairs 1, 2, 3 and 4 the hydrogen of the water molecule is directed at the COH bisector. In pairs 1 and 2 the water molecule lies in the COH plane whereas in pairs 3 and 4 the water molecule lies at a 120 degrees angle to the COH plane. Pairs 5 and 6 are different because the water molecule in these interaction pairs acts as the hydrogen bond acceptor; therefore, the COH hydroxyl is directed along the water HOH bisector. For pair 5, the HOH plane of the water is at a 90-degree angle to the COH plane, and for pair 6 the HOH and COH atoms are coplanar.
Reference data for comparison of molecular mechanics (MM) interaction energies and distances were generated by geometry optimization of the interaction distances at the QM HF/6-31G(d) level for each of the water-solute pairs above, with all other degrees of freedom constrained. The QM data cannot be targeted directly, but are instead empirically scaled to account for the fact that the MM force field needs to be able to account for many-body effects in the condensed phase. The CHARMM additive force field empirical scaling rules are well-established15 ,28 and are such that the MM target distance is the RQM − 0.2 Å and the MM target pair interaction energy (denoted “EQM”) is given by the expression 1.16*(EQM,pair − EQM,solute − EQM,water). The energy-scaling factor of 1.16 and the offset of the QM distance by 0.2 Å account for limitations in the potential energy function and in the QM level of theory, and these empirical corrections lead to good agreement with condensed phase properties, as shown in previous work.20 ,21 (link)
All of the C-C-C-C, C-C-C-O, O-C-C-O and C-C-O-H dihedral parameters are fit to relaxed QM potential energy scans. The Gaussian03 package29 is used to optimize geometries at the MP2/6-31G(d) level of theory followed by single point calculations performed at the RIMP2/cc-pVTZ level with the QCHEM program.30 (link) This level of theory has previously been shown to be sufficiently accurate for a number of systems including carbohydrates.22 (link),31 (link) The target dihedral is scanned at 15° intervals from −180 to 165°, with the exception of inositol, which is scanned from 15 to 135° due to the constrained nature of the ring. The dihedral parameters are then fit to the QM dihedral scans using an automated Monte Carlo simulated annealing (MCSA) method.24 (link) In the MCSA method the selected dihedral parameters are fit simultaneously to minimize the root mean squared error (RMSE), where and are the QM and MM energies of conformation i, wi is a weighting factor for conformation i and c is a constant that aligns the QM and MM data to minimize the RMSE. All of the six-carbon (n=6) alditols are used in the fitting procedure (Figure 1 ) and all of the five-carbon (n=5) and four-carbon (n=4) alditols are used as the test set for the parameter validation. With inositol, the C-C-C-O, O-C-C-O and C-C-O-H dihedrals are transferred from the hexopyranose parameters and only the C-C-C-C is fit (independently from the n=6 alditols). For the dihedral parameters in the aldehyde and ketone groups in the linear carbohydrates D-allose and D-psicose (Figure 1 ), only the torsions containing non-hydrogen atoms and including the carbonyl atoms are parametrized and the other torsional parameters are transferred from the n=6 alditols.
Parameter optimization and validation of the parameters is performed via a number of condensed phase MD simulations. A cubic box containing TIP3P water molecules27 ,32 with periodic boundary conditions is used for all aqueous simulations. Particle Mesh Ewald (PME)33 with a 12 Å real space cutoff is used to treat the long-range Coulomb interactions and a force-switched smoothing function34 with a range of 10-12 Å is used for the Lennard-Jones interactions, with a long-range correction applied beyond the truncation distance.26 The SHAKE algorithm35 is used to constrain all hydrogen atom bonds to their equilibrium lengths and to maintain rigid water geometries. For the constant pressure – constant temperature (NPT) simulations the Nosé-Hoover thermostat36 ,37 (link) is used to maintain the temperature and the Langevin piston barostat38 is used to maintain the pressure. A leapfrog integrator39 is used with a 1 fs timestep for all of the simulations.
Pure solvent simulations are performed with a periodic box of 125 solvent molecules. The box of solvent molecules is minimized and then equilibrated for 50 ps followed by five production runs performed for 1 ns. The heat of vaporization ΔHvap is calculated from the pure solvent simulation using the relation Here, 〈U〉monomer is the average potential energy of the monomers calculated from five individual gas-phase simulations of all 125 molecules, with each simulation run for 500 ps. The 〈U〉box term is the average potential energy of the periodic box. N is the number of molecules in the box, R is the universal gas constant for an ideal gas and T is the temperature.
The free energy of aqueous solvation ΔGsol is calculated from the difference in free energy of a molecule in aqueous solution compared to that in the gas phase. ΔGsol is calculated from the sum of nonpolar ΔGnp and electrostatic ΔGelec free energies40 : ΔGnp is the sum of the repulsive and dispersive contribution, which are calculated using the Weeks, Chandler, Anderson decomposition of the LJ potential.41 The repulsion term in the LJ potential is treated using a soft-core potential.42 In the aqueous phase, free energy calculations are performed using 1 molecule centered in a water box of 250 TIP3P water molecules. The aqueous system at each window is equilibrated for 50 ps and then simulated for 200 ps in the NPT ensemble. In the gas phase, Langevin dynamics are used with an infinite non-bond cutoff.26 ,43 Since the gas phase energies converge much more quickly, the gas phase system is equilibrated for 10 ps and the production run is simulated for 100 ps. The simulations are performed at a temperature of 298K and a pressure of 1 atm, which is consistent with experiment. The free energy calculations are analyzed using thermodynamic integration44 and the weighted-histogram analysis method45 (WHAM). Additional details for calculating the free energy have been described previously.40 ,46 Unlike all other condensed phase simulations in the present work, due to software limitations, the long-range pressure correction is not part of the MD protocol for the free energy simulations. Thus, the long-range contribution (LRC) from the LJ potential to the free energy of solvation is calculated as the difference in LJ energy of the aqueous system with a nonbond cutoff of 12 Å and a nonbond cutoff of 30 Å. The LRC is calculated from a 5 ps NPT simulation trajectory of the molecules in solution using coordinates saved every 100 fs and averaged over all values.
From the pure solvent simulation trajectories, the self-diffusion coefficient Dsim incorporates a system-size dependent finite-size correction developed by Yeh and Hummer47 : InEquation 5 , DPBC is the diffusion coefficient calculated from a simulation with periodic boundary conditions to which the correction term is added. ζ is a constant of 2.837297, kB is the Boltzmann constant, T is the temperature, η is the viscosity and L is the length of the cubic simulation box. DPBC is calculated from the slope of the mean square displacement of the C1 atom of all solute molecules in the simulation box versus time.26 For diffusion coefficients of polyols in aqueous solutions of TIP3P water, Equation 5 is further modified to take into account the low viscosity of TIP3P water relative to experiment: Here, the scaling factor of 0.375 is applied to correct for the underestimation of the viscosity of water by the TIP3P model. The scaling factor is calculated from ηTIP3P /ηw where ηTIP3P = 0.35 cP and ηw = 0.93 cP, the experimental viscosity of water. Equation 6b is the viscosity of a solution with the presence of a solute estimated by the Einstein formula,48 where ηTIP3P is the viscosity of TIP3P (0.35 cP) and ϕ is the volume fraction of the solute. The method for calculating the simulation diffusion coefficient for a polyol-water mixture is similar to that previously used for a system of polyethylene oxide and polyethylene glycol.49 (link)
Complete crystal unit cells, obtained from the Cambridge Structural Database,23 are used as starting structures for crystal simulations, with periodic boundary conditions applied in accordance with the length and angle parameters of the respective crystals. Each crystal system is minimized initially to remove bad contacts and is then equilibrated for 100 ps. After equilibration, the simulation is run for 2 ns. For all of the polyol crystals, the reference temperature is set to room temperature 298K, the temperature at which the crystals were obtained, and constant pressure is maintained at 1 atm by allowing independent variation in the crystal cell length parameters.
For the aqueous phase MD simulations, a box containing 1100 waters and the number of solute molecules based on the experimental concentration is set up and then minimized using harmonic restraints with a force constant of 1*(particle mass)*kcal*mol−1*Å−2*amu−1 on only the solute molecules. The system is equilibrated for 500 ps and then the equilibrated conformation is used as the starting conformations for five different unrestrained 1ns runs, using different initial velocities for each of the runs to achieve improved statistics. The reference pressure of the glucitol and mannitol systems is 1 atm, and the reference pressure of the galacitol, xylitol, erythritol, ribitol, glycerol and myoinositol system is 3.5 atm, in accordance with the experimental conditions. The density of each system is calculated using the following equations: where 〈V〉 is the average volume calculated from all five runs. Nwater , Nsolute and NAvogadro are the number of water molecules, the number of polyol solute molecules and Avogadro's number respectively. 〈MW〉 is the average molecular weight of the system.Equation 7a is also used to calculate the density for neat liquids; however, in this case N is simply the number of molecules in the periodic box.
The J coupling constants for glucitol and mannitol are also calculated from the aqueous simulations described above. However, the coupling constants for arabitol, ribitol and xylitol are calculated from aqueous phase simulations at 1 atm and a molality of 0.5 mol*kg−1 using the same protocol for the aqueous simulations. The dihedral value for the proton-proton coupling is calculated every 1 ps for each of the production runs. Moreover, the dihedral value is calculated for each of the solute molecules in the respective systems; therefore, depending on the concentration of the simulation the amount of torsional data differs. The J coupling is then calculated from the dihedral values for each snapshot using the generalized Karplus equation 50 where ϕ is the H-X-X-H dihedral angle. Manipulation of the Karplus equation given inEquation 8 allows the fraction of trans conformers to be calculated:50 In Equation 9 , Jobs is the observed coupling constant.
Hydrogen bonding water-solute pair interaction energies and distances were calculated using the standard additive CHARMM force field protocol, so as to maintain compatibility with existing CHARMM biomolecular force fields.19 ,21 (link) Solute geometries were obtained from optimization at the MP2/6-31G(d) level of the conformation in the respective crystals obtained from the Cambridge Structural Database.23 Using the optimized geometry of the monomer, water-monomer pairs were constructed, with the water internal geometry identical to that of the TIP3P water model.27 Examples of these pair interactions are shown in
Reference data for comparison of molecular mechanics (MM) interaction energies and distances were generated by geometry optimization of the interaction distances at the QM HF/6-31G(d) level for each of the water-solute pairs above, with all other degrees of freedom constrained. The QM data cannot be targeted directly, but are instead empirically scaled to account for the fact that the MM force field needs to be able to account for many-body effects in the condensed phase. The CHARMM additive force field empirical scaling rules are well-established15 ,28 and are such that the MM target distance is the RQM − 0.2 Å and the MM target pair interaction energy (denoted “EQM”) is given by the expression 1.16*(EQM,pair − EQM,solute − EQM,water). The energy-scaling factor of 1.16 and the offset of the QM distance by 0.2 Å account for limitations in the potential energy function and in the QM level of theory, and these empirical corrections lead to good agreement with condensed phase properties, as shown in previous work.20 ,21 (link)
All of the C-C-C-C, C-C-C-O, O-C-C-O and C-C-O-H dihedral parameters are fit to relaxed QM potential energy scans. The Gaussian03 package29 is used to optimize geometries at the MP2/6-31G(d) level of theory followed by single point calculations performed at the RIMP2/cc-pVTZ level with the QCHEM program.30 (link) This level of theory has previously been shown to be sufficiently accurate for a number of systems including carbohydrates.22 (link),31 (link) The target dihedral is scanned at 15° intervals from −180 to 165°, with the exception of inositol, which is scanned from 15 to 135° due to the constrained nature of the ring. The dihedral parameters are then fit to the QM dihedral scans using an automated Monte Carlo simulated annealing (MCSA) method.24 (link) In the MCSA method the selected dihedral parameters are fit simultaneously to minimize the root mean squared error (RMSE), where and are the QM and MM energies of conformation i, wi is a weighting factor for conformation i and c is a constant that aligns the QM and MM data to minimize the RMSE. All of the six-carbon (n=6) alditols are used in the fitting procedure (
Parameter optimization and validation of the parameters is performed via a number of condensed phase MD simulations. A cubic box containing TIP3P water molecules27 ,32 with periodic boundary conditions is used for all aqueous simulations. Particle Mesh Ewald (PME)33 with a 12 Å real space cutoff is used to treat the long-range Coulomb interactions and a force-switched smoothing function34 with a range of 10-12 Å is used for the Lennard-Jones interactions, with a long-range correction applied beyond the truncation distance.26 The SHAKE algorithm35 is used to constrain all hydrogen atom bonds to their equilibrium lengths and to maintain rigid water geometries. For the constant pressure – constant temperature (NPT) simulations the Nosé-Hoover thermostat36 ,37 (link) is used to maintain the temperature and the Langevin piston barostat38 is used to maintain the pressure. A leapfrog integrator39 is used with a 1 fs timestep for all of the simulations.
Pure solvent simulations are performed with a periodic box of 125 solvent molecules. The box of solvent molecules is minimized and then equilibrated for 50 ps followed by five production runs performed for 1 ns. The heat of vaporization ΔHvap is calculated from the pure solvent simulation using the relation Here, 〈U〉monomer is the average potential energy of the monomers calculated from five individual gas-phase simulations of all 125 molecules, with each simulation run for 500 ps. The 〈U〉box term is the average potential energy of the periodic box. N is the number of molecules in the box, R is the universal gas constant for an ideal gas and T is the temperature.
The free energy of aqueous solvation ΔGsol is calculated from the difference in free energy of a molecule in aqueous solution compared to that in the gas phase. ΔGsol is calculated from the sum of nonpolar ΔGnp and electrostatic ΔGelec free energies40 : ΔGnp is the sum of the repulsive and dispersive contribution, which are calculated using the Weeks, Chandler, Anderson decomposition of the LJ potential.41 The repulsion term in the LJ potential is treated using a soft-core potential.42 In the aqueous phase, free energy calculations are performed using 1 molecule centered in a water box of 250 TIP3P water molecules. The aqueous system at each window is equilibrated for 50 ps and then simulated for 200 ps in the NPT ensemble. In the gas phase, Langevin dynamics are used with an infinite non-bond cutoff.26 ,43 Since the gas phase energies converge much more quickly, the gas phase system is equilibrated for 10 ps and the production run is simulated for 100 ps. The simulations are performed at a temperature of 298K and a pressure of 1 atm, which is consistent with experiment. The free energy calculations are analyzed using thermodynamic integration44 and the weighted-histogram analysis method45 (WHAM). Additional details for calculating the free energy have been described previously.40 ,46 Unlike all other condensed phase simulations in the present work, due to software limitations, the long-range pressure correction is not part of the MD protocol for the free energy simulations. Thus, the long-range contribution (LRC) from the LJ potential to the free energy of solvation is calculated as the difference in LJ energy of the aqueous system with a nonbond cutoff of 12 Å and a nonbond cutoff of 30 Å. The LRC is calculated from a 5 ps NPT simulation trajectory of the molecules in solution using coordinates saved every 100 fs and averaged over all values.
From the pure solvent simulation trajectories, the self-diffusion coefficient Dsim incorporates a system-size dependent finite-size correction developed by Yeh and Hummer47 : In
Complete crystal unit cells, obtained from the Cambridge Structural Database,23 are used as starting structures for crystal simulations, with periodic boundary conditions applied in accordance with the length and angle parameters of the respective crystals. Each crystal system is minimized initially to remove bad contacts and is then equilibrated for 100 ps. After equilibration, the simulation is run for 2 ns. For all of the polyol crystals, the reference temperature is set to room temperature 298K, the temperature at which the crystals were obtained, and constant pressure is maintained at 1 atm by allowing independent variation in the crystal cell length parameters.
For the aqueous phase MD simulations, a box containing 1100 waters and the number of solute molecules based on the experimental concentration is set up and then minimized using harmonic restraints with a force constant of 1*(particle mass)*kcal*mol−1*Å−2*amu−1 on only the solute molecules. The system is equilibrated for 500 ps and then the equilibrated conformation is used as the starting conformations for five different unrestrained 1ns runs, using different initial velocities for each of the runs to achieve improved statistics. The reference pressure of the glucitol and mannitol systems is 1 atm, and the reference pressure of the galacitol, xylitol, erythritol, ribitol, glycerol and myoinositol system is 3.5 atm, in accordance with the experimental conditions. The density of each system is calculated using the following equations: where 〈V〉 is the average volume calculated from all five runs. Nwater , Nsolute and NAvogadro are the number of water molecules, the number of polyol solute molecules and Avogadro's number respectively. 〈MW〉 is the average molecular weight of the system.
The J coupling constants for glucitol and mannitol are also calculated from the aqueous simulations described above. However, the coupling constants for arabitol, ribitol and xylitol are calculated from aqueous phase simulations at 1 atm and a molality of 0.5 mol*kg−1 using the same protocol for the aqueous simulations. The dihedral value for the proton-proton coupling is calculated every 1 ps for each of the production runs. Moreover, the dihedral value is calculated for each of the solute molecules in the respective systems; therefore, depending on the concentration of the simulation the amount of torsional data differs. The J coupling is then calculated from the dihedral values for each snapshot using the generalized Karplus equation 50 where ϕ is the H-X-X-H dihedral angle. Manipulation of the Karplus equation given in