100 proteins for which pKa values had been determined experimentally were taken from PPD, a database of protein ionization constants [36 (link),37 ]. The full list of the pdb files comprising the dataset is included as an additional file [See PDB codes]. A wide range of both protein size and function was represented in the dataset. The protein structures were taken from the RCSB protein data bank [38 ]. In order to run the MEAD program, pdb files were protonated by using the leap program and the AMBER 94 force field (subsequent versions of the force field proved to be incompatible) and changed into pqr format using the online PDB2PQR converter [39 ,40 ]. Separate sets of files were created based on the AMBER99 and PARSE force fields. MEAD and UHBD were run on an IBM Blade Center Cluster, which consists of 5 Blade Centers containing 67 Dual Xeon (3.06Ghz, 1Gb) Blades. The MCCE calculations were carried out on an SG Octane. The majority of the pdb files did not need any modification. However, 1D3K, 1GU8, 1HRH and 1DRH were protonated with the leap program and the AMBER 03 force field in order to remove inconsistencies in the pdb files. Additionally, 1DUK, 1NFN and 2CI2 underwent minimization with sander using a steepest descent method that continued for 20,000 1 fs time steps or until the root mean square deviation between successive time-steps had fallen below 0.01Å in order to eliminate steric clashes. The PROPKA program was run online from its server [41 ]; no modification was required to run the files. Values for all Asp, Glu, His, Tyr, Lys residues were predicted. Arg was excluded from the calculation due to lack of experimental data. Arginines's high pKa precludes establishing a titratable curve as the protein denatures at high pH. Cys was also excluded from the calculations due to a lack of experimental data.
The resultant data was also analysed using the Partial Least Squares (PLS) method. PLS is an extension of Multiple Linear Regression (MLR) that where a set of coefficients are developed from dependent variables, in this case the pKa prediction values, by comparison with the independent variables, the experimental pKa values. The PLS analysis was performed using the program GOLPE (Generating Optimal Linear PLS Estimations)[42 ].
Free full text: Click here