In this work, we used the previously established S2648 dataset (15–18 (link)), derived from the ProTherm database (19 (link)). This dataset is comprised of 2648 different point-mutations across 131 globular proteins with experimentally determined structures whose impact on protein stability has been experimentally measured (602 stabilizing and 2046 destabilizing). The DynaMut training set comprises 2297 mutations randomly selected from the original dataset. A blind test set composed of 351 non-redundant mutations derived from the S2648 set was also compiled. This blind test set has been widely used in the literature (15–18 (link)), enabling direct comparative performance of methods that quantify the impact of mutations on the folding free energy.
Previous studies have reported performance comparisons of difference methods on predicting changes in folding free energy (ΔΔG) using these datasets (20–22 (link)). Given the unbalanced nature of the original dataset, here we have considered the hypothetical reverse mutations (22 (link)) in order to build a more robust, balanced and self-consistent predictive method. The change in folding free energy is a thermodynamic state function, and it has been proposed that the change in folding free energy of a mutation from a wild-type protein to its mutant (ΔΔGWT→MT) should be equivalent to the negative change in folding free energy of the hypothetical reverse mutation—from the mutant to the wild-type protein (–ΔΔGMT→WT) (16 (link),22–24 (link)). Including the hypothetical reverse mutations, our predictive model was trained using 4594 mutations and our blind test was comprised of 702 single-point mutations.
Free full text: Click here