An important advantage of computational metabolomics lies in the use of correlations among ion signals to aid in determination of chemical identity. Metabolites are interconnected by a series of biochemical reactions, and this network of metabolites is organized in a hierarchical manner such that many small modules combine to form larger modules.56 (link),57 Correlation-based network and modularity analysis is one approach to elucidate the association structure of metabolites. Although there are several mechanisms that could lead to correlations between metabolites, the association structure can be used to identify ions derived from the same metabolite,58 (link)–60 (link) identify biotransformations,61 (link) and detect associations between environmental exposures and endogenous metabolites.15 (link)For high abundance unidentified chemicals, multiple spectral features arising from a single chemical provide valuable structural information to characterize a chemical. A network of ions where a pair of ions is linked if their correlation exceeds the significance threshold, e.g., |r| > 0.8, can be generated to identify isotopes, adducts, and in-source fragments associated with a chemical (Figure 4). A similar approach can be used to identify biotransformations and other related metabolites.60 (link) Metabolome-wide association studies (MWAS) allow identification of associations between a specific target variable, e.g., cotinine levels in individuals, and metabolic profiles.8 (link),62 (link)–64 (link) In an MWAS, statistical tests are performed for association of a parameter (e.g., disease biomarker, chemical, or other measured parameter) with each m/z feature to test for significance of association. Application of targeted MWAS using correlation-based criteria identified choline-related metabolites and demonstrated similarity between correlation patterns of choline in different species (Figure 5).64 (link)Correlation-based network analysis can also facilitate identification of in-source fragments. Gas-chromatography–mass spectrometry with electron ionization sources results in a large number of characteristic spectra indicative of chemical functional groups and structure.61 (link),65 Electrospray ionization can produce in-source fragmentation (e.g., loss of NH3, H2O, CHOOH, etc.) from electrical potentials or heat applied in the ion source.66 (link),67 (link) Because in-source fragments can mimic accurate masses of other common metabolites, computational methods that identify adducts, isotopes, and in-source fragments (based on clustering of highly correlated coeluting ions) increases the ability to correctly assign chemical identities. An example is the in-source formation of pyroglutamate from glutamine or glutamate.68 (link) The identification of in-source fragments requires consideration of chromatographic conditions to separate possible coeluting chemicals, as well as ion source conditions. When using soft ionization techniques, in-source fragmentation is only commonly observed for highly abundant metabolites, many low abundance chemicals will generate only a single detectable signal.3 (link),18 (link) To ensure detected, unannotated ions are unique chemicals, it is important to perform targeted MWAS to exclude the possibility of a signal originating from source fragments, adducts, and/or isotopes. To increase confidence of chemical identification, alternative detection methods with increased sensitivity for unknown chemicals and methods for defining unknown ions will be needed.
In addition to characterizing ions arising from known chemicals, MWAS using univariate and multivariate approaches can be used to generate hypotheses about biochemical roles of features with no database matches. This process uses targeted MWAS with validated metabolites or xMWAS, where “x” corresponds to other–omes (transcriptome, microbiome, genome, etc.). Krumsiek et al. used a systems-level approach where they combined genome-wide association analysis, knowledge-based pathway information, and metabolic networks to predict the identity of unknown metabolites.69 (link) Other studies have used integrative methods based on partial least-squares regression (PLS) to determine correlations between the metabolome and the transcriptome,70 proteome,71 (link) and microbiome.72 (link) These methods combined with pathway and literature based information can provide alternative approaches for generating hypotheses about chemical identity, particularly for low abundance chemicals.