The GMD uses a Microsoft SQL Server 2008™ as the relational database backend for relating the mass spectrum and retention behaviour to an analyte, i.e. the chemically modified compound, which is mapped to represent a metabolite (Fig. 1) (Hummel et al. 2008 ). Both analyte and metabolite have the properties of a chemical compound and are linked to structures archived as .mol-files and InChI™ codes (http://www.iupac.org/inchi/). A typical metabolite has one to two analytes, which are generated by the chemical derivatization process inherent to the GC-MS profiling technique. Each analyte has multiple technological versions of MSTs. These replicate mass spectra and RIs are empirically determined using different mass spectral technologies, e.g. time of flight, quadrupole or ion trap based mass detectors, and variations of gas chromatographic systems (Strehmel et al. 2008 (link)).

Excerpt of the GMD scheme. MSTs (mass spectral tags, i.e. repeatedly observed mass spectra with retention behaviour) are linked to analytes via experiments and a supervised annotation process. Likewise, analytes are mapped to metabolites. Structural information has been added to both types of compounds, the metabolites and their respective analytes

In the current GMD release, 6,187 mass spectra are available representing 2,444 analytes and 1,535 metabolites. It should be noted that the GMD compendium is biased towards GC-MS accessible, stable, primary metabolites. Therefore, the structural moieties of the metabolite classes, amino acids, organic acids, fatty acids, fatty alcohols, sugars, sugar alcohols and respective conjugates dominate. Structural annotations are in most cases stereo-chemically correct, even though routine GC-MS profiling (Lisec et al. 2006 (link), Wagner et al. 2003 (link)) allows only the differentiation of anomeric, epimeric structures and E/Z-geometric isomers.