BinBase is a large GC-TOF MS based metabolomics database encompassing
1,561 studies with 114,795 samples for various species, organs, matrices, and
experimental conditions. By the physics of GC-MS, analysis is restricted to
thermostable small molecules that range up to 650 Da in size, even if using
derivatization by trimethylsilylation to reduce boiling points. Molecules
profiled by trimethylsilylation GC-MS based metabolomics include amino acids,
di- and tripeptides, hydroxyl acids, organic phosphates, fatty acids, alcohols,
sugar acids, mono-, di- and trisaccharides including sugar acids and sugar
alcohols, aromatic acids, nucleosides and mononucleotides (but not di- or
trinucleotides), sterols, polyamines, and a large variety of miscellaneous
compounds.
BinBase uses a retention index- and mass spectral quality filtering
system based on GC-TOF based mass spectral deconvolution results as
input21 (link) to store and
report unique metabolite signals that are detected in metabolomic studies.
Through the connected MiniX system22 , all studies in BinBase are associated with metadata such
as species, organs, cell types, and treatments. The BinBase algorithm has been
published previously11 ,23 (link) and is used over the past 13
years. It relies on mass spectral deconvolution of GC-TOF MS data by the Leco
ChromaTOF software and utilizes a multi-tiered filter system with different
settings to annotate deconvoluted instrument peak spectra as unique database
entries (“bins”). For typical studies on mammalian plasma with
about 50–60 samples, about 1,000 peaks would be detected by ChromaTOF
software at least in one chromatogram at signal/noise ratios s/n>5.
BinBase removes low abundant, inconsistent and noisy peaks that cannot be
assigned to existing bins in BinBase and that have too low spectra quality to
generate a new bin in BinBase, resulting in datasets that typically report
400-500 peaks for mammalian plasma samples. Compound identifications within
BinBase are managed by the administrator using spectral libraries and retention
index information from the Fiehnlib libraries12 (link) and NIST mass spectra. In a typical
final BinBase report such as on mammalian plasma, about 30-40% of the
reported bins are noted as identified metabolites, i.e. about 150 compounds,
including database identifiers such as KEGG, PubChem and InChI keys.