MetXBioDB is a database that consists of a manually curated collection of > 2000 experimentally confirmed biotransformations derived from the literature. It was developed to help with: (1) the design of biotransformation rules, (2) the training and validation of machine learning metabolism prediction models, and (3) the design of preference rules. Each biotransformation in MetXBioDB includes a starting reactant (structure and identifiers), a reaction product (structure and identifiers), the name or type of the enzyme catalyzing the biotransformation, the type of reaction, and one or more citations. For the purposes of this paper, a reactant is defined as a small molecule that binds to a specific enzyme and undergoes a metabolic transformation catalyzed by that enzyme. A biotransformation describes the chemical conversion or molecular transformation of a reactant to one or more products by a specific enzyme (or enzyme class) through a defined chemical reaction. Cytochrome P450 enzymes (CYP450s) are responsible for > 90% of phase I oxidative reactions and > 75% of drug metabolism [58 ], while UDP-glucuronosyltransferases (UGTs) and sulfotransferases (SULTs) are responsible for the phase II metabolism of most xenobiotics [59 (
link), 60 ] In the gut microbiota, enzymatic reactions are mostly reductive, and are carried out by anaerobic bacteria due to the very low concentration of oxygen.
The “starting” reactants in the current version (version 1.0) of MetXBioDB primarily consist of xenobiotics such as drugs, pesticides, toxins and phytochemicals. The database also includes a small number of sterol lipids and a selected set of mammalian primary metabolites. In assembling MetXBioDB we gathered reaction data from the existing literature (> 100 references) along with data downloaded from publicly available databases such as DrugBank [38 (
link)], PharmGKB [61 (
link)], XMETDB [62 (
link)], and SuperCYP [63 (
link)]. These databases list over 1000 enzyme-substrate associations for the major CY4P50s and UDP-glucuronosyltransferases (UGTs). Along with published scientific reports, PhenolExplorer [64 (
link)] and PhytoHub [40 ] were also used to compile information about the metabolism of polyphenolic compounds in the gut.
The data curation process consisted of three phases including: (1) the collection of biotransformation data, (2) the creation and annotation of biotransformation objects and, (3) data validation. This process was conducted collaboratively with a small team of chemistry experts. A detailed description of the data collection and curation process is provided in the Additional file
2. Additional file
2: Figure S2 illustrates one entry in MetXBioDB, corresponding to the oxidation of acetaminophen to
N-acetyl-
p-benzoquinone (NAPQI). Overall, MetXBioDB contains > 2000 biotransformations, which include the cytochrome P450-catalyzed phase I reactions of ~ 800 unique starting reactants (and > 1500 reaction products), the phase II reactions of > 500 unique starting reactants (and > 600 reaction products) and human gut microbial metabolism of > 50 unique polyphenolic compounds.