Peptide quantification and S/N determination were performed using Vista, an automated software suite developed in-house. The software functioned as follows: the theoretical masses of both labeled and unlabeled species were determined from the sequence composition of each peptide. These masses were then used to extract ion chromatogram intensities separately for each species from high resolution MS spectra within 20 MS survey scans and within a user-defined mass window. The mass window was separately defined for each run at ±5σ of the total mass accuracy distribution of all estimated confidently assigned peptides. Spectral peaks for each species were separated from surrounding noise using an iterative mass precision algorithm described in the text. Chromatographic peak boundaries were determined by extending each peak from the location of the data-dependent MS/MS scan within which the peptide was identified to a dynamically determined noise baseline, incorporating regions where the MP z-score was greater than zero. This noise baseline was calculated from all peaks observed within a ±25 m/z window around the theoretical species masses and within the adjacent ±20 MS spectra. The S/N ratio for each species was determined as the ratio of the maximum chromatographic peak intensity observed to the noise baseline, as calculated in the text. The area under the curve was then separately determined for each species, and compared to generate a relative abundance ratio. The monoisotopic peak for each species only was used to generate the area measurements; we note that, while multiple peaks may be used in theory, we found that such an approach is problematic at low signal levels with extreme quantitative ratios due to issues with the limit of detection of the mass spectrometer.
Quantified peptide species were individually scored for quality using both the Random Forest classifier16 and a heuristic score. This score represents a weighted average of a variety of empirically determined Boolean predictors, including signal-to-noise ratios, number of observations across the chromatographic peak, mass accuracy statistics, unlabeled/labeled pair coelution, distance from the tandem MS scan, split peak signature, and encapsulation of the surrounding data. Except where otherwise noted, all successfully quantified peptides within 5σ of the mean log2 ratio were included in all analyses, regardless of score. Vista also includes additional enhancements to improve quantification, including rescaling of the heavy peptide ion series, deconvolution to normalize for label impurity, and compensation for the interconversion of arginine to proline, as is sometimes seen in studies using SILAC.29 (link)
The software supports data supplied in the pepXML and mzXML formats,27 (link) from a wide variety of high-mass-accuracy-capable instrumentation, analyzed using any of a variety of search algorithms. The algorithm also supports most differential labeling methods, including SILAC,6 (link) cleavable ICAT,30 (link) and CAR.24 (link) It is further extensible to deuterium-based approaches.
All analyses in this study were performed on a dual-processor, dual-core 3.00 GHz Xeon system running Fedora Core 7. On average, Vista currently performs approximately one analysis per second; a typical analysis of 3445 peptides from the 1:1 labeled sample using LTQ FT data took 57 min.
More information on the Vista software suite, including information on obtaining the software, can be found at