Shannon Entropy is a measure of the information content of a message or data set.
H is the Shannon Entropy, so named because it resembles the Boltzman equation for thermodynamic entropy.
P can be any data. In this case is the intensity at an
m/z ratio of
x. The equation sums the product of the
P(x) times the
log(P(x)) over all
i points in the mass spectrum. The base of logarithm can be any number, in this case
b=10. In data not shown, there was very little discriminatory gain to utilizing different bases. For each mass spectrum analyzed here, the ion peak lists were generated using the Data Extractor function of
Spectrum Mill MS Proteomics Workbench (Agilent) and binned into vectors of length
i, with a cumulative intensity in each bin. The intensities for each vector were normalized prior to calculating
H.
After sorting the values of
H for the spectra with the lowest values, the second step of the algorithm was to check for the ion pattern typical of the Amadori products, namely that the maximum abundance ion had a neutral loss of a water and a loss of HCHO. All of the ions in the top 10% of intensities were checked for two corresponding mass peaks at the 2+, 3+, or 4+ states for 3H
2O and HCHO, with a mass tolerance of 2 Da. Both the Shannon entropy and ion pattern based filtering steps are evaluated below.
Johnson K.L., Williams J.G., Maleki S.J., Hurlburt B.K., London R.E, & Mueller G.A. (2016). Enhanced approaches for identifying Amadori products: application to peanut allergens. Journal of agricultural and food chemistry, 64(6), 1406-1413.