To identify different types of bacterial species from the collected
SERS spectra, we used the common machine learning algorithms from
the open-source Python (3.8) library, Scikit-learn. To read, process,
and visualize the spectral data, we used python packages: NumPy, SciPy,
Matplotlib, and Seaborn.
To classify the five different bacteria
species, 1114 SERS spectra were recorded on the Ag–CuxO nanostructures. These include 157 for Bacillus subtilis (B. subtilis), 309 for Escherichia coli (E. coli), 155 for Enterococcus faecalis (E. faecalis), 343 for Staphylococcus aureus (S. aureus), and 150 for Streptococcus mutans (S. mutans). Specifically, the data
were first normalized using StandardScaler and then principal component
analysis (PCA) was applied on the transformed data. Machine learning
methods were used to distinguish bacteria. To facilitate the machine
learning-based identification for real-life adaptation, the spectral
data obtained from bacteria were used directly, without any pre-processing
such as background subtraction or smoothing. For each bacterial species,
approximately 66.7% of the spectral data were used as training data,
which was obtained by parsing it using the randomization parameter
(randomization coefficient = 40) of the split function from the Scikit-learn
library. These data were used to train classification algorithms like
support vector machines (SVM), k-nearest neighbors (KNN), and decision
tree. Finally, the remaining approximately 33.3% of the bacterial
spectra were used to test the accuracy of the system.
Free full text: Click here